Random Walks - Python Bindings#1516
Conversation
Codecov Report
@@ Coverage Diff @@
## branch-0.19 #1516 +/- ##
===============================================
- Coverage 60.54% 59.53% -1.01%
===============================================
Files 70 72 +2
Lines 3153 3188 +35
===============================================
- Hits 1909 1898 -11
- Misses 1244 1290 +46
Continue to review full report at Codecov.
|
afender
left a comment
There was a problem hiding this comment.
Please add Random Walk to:
- the main
__init__.py - the
.rstfile - the list of algorithms in the
readme
| directed=False, | ||
| max_depth=None | ||
| ): | ||
| """ |
There was a problem hiding this comment.
your description text does not match the function API
There was a problem hiding this comment.
Yes, I can either make changes to match this API's description or modify the description
rlratzel
left a comment
There was a problem hiding this comment.
The cython code looks fine, but I just had some comments/suggestions about the API and py tests.
| Use weight parameter if weights need to be considered | ||
| (currently not supported) |
There was a problem hiding this comment.
I wonder if this last sentence should just be removed until we support weights, since it's a little confusing now (ie. is that referring to a weight parameter to this function, or is that referring to just a weighted graph, etc.)
There was a problem hiding this comment.
This should be removed. There is no weight parameter
| seeds_offsets: cudf.Series | ||
| Series containing the starting offset in the returned edge list | ||
| for each vertex in start_vertices. | ||
| """ |
There was a problem hiding this comment.
Only if there's time for this: a lot of our other docstrings also include examples, and it might be nice to have an example for this too. It might especially useful since the output type is somewhat unique.
| Series containing the starting offset in the returned edge list | ||
| for each vertex in start_vertices. | ||
| """ | ||
| if max_depth is None: |
There was a problem hiding this comment.
if you just remove the default value of =Nonein the function def, python will do this check for you:
def random_walks(
G,
start_vertices,
max_depth
):
| next_path_idx = 0 | ||
| offsets = [0] | ||
|
|
||
| df = cudf.DataFrame() |
There was a problem hiding this comment.
Our convention is that if a NetworkX graph is passed in, we return pandas dataframes (ie. we return types that are "native" to the input type). If there's no time for this, it might have to be a FIXME.
There was a problem hiding this comment.
You may be able to just call this utility as is done here (you would have test with a Nx input to be sure though).
There was a problem hiding this comment.
Ok. And Brad asked to remove networkX
| from rmm._lib.device_buffer cimport DeviceBuffer | ||
| from cudf.core.buffer import Buffer | ||
| from cython.operator cimport dereference as deref | ||
| def random_walks(input_graph, start_vertices, max_depth): |
There was a problem hiding this comment.
minor: since these don't get style checks, we try to manually conform to a Python style (I think?), meaning there would be 2 blank lines between the imports and the def.
| # ============================================================================= | ||
| DIRECTED_GRAPH_OPTIONS = [False, True] | ||
| WEIGHTED_GRAPH_OPTIONS = [False, True] | ||
| DATASETS = [pytest.param(d) for d in utils.DATASETS] |
There was a problem hiding this comment.
I've been adding ids to make it easier to see what dataset is being run in the event of a failure:
| DATASETS = [pytest.param(d) for d in utils.DATASETS] | |
| DATASETS = [pytest.param(d) for d in utils.DATASETS, | |
| ids=[f"dataset={d.as_posix()}" for d in utils.DATASETS]] |
| # ============================================================================= | ||
|
|
||
|
|
||
| def prepare_test(): |
There was a problem hiding this comment.
If you make this a setup function, pytest will automatically call it for you before each test, as done here.
| max_depth | ||
| ): | ||
| """Test calls random_walks an invalid type""" | ||
| prepare_test() |
There was a problem hiding this comment.
You can remove this line if you change the above function to a setup function.
| graph_file, | ||
| directed | ||
| ): | ||
| max_depth = random.randint(2, 10) |
There was a problem hiding this comment.
If the test fails, will the user know what randomly chosen max_depth was used to get the results? This may need to be printed somewhere too so devs can reproduce the error if necessary.
| if i == offsets[offsets_idx]: | ||
| if df['src'].iloc[i] != seeds[offsets_idx]: | ||
| invalid_seeds += 1 | ||
| print( |
There was a problem hiding this comment.
This can be a FIXME, but tests probably shouldn't rely on print statements to show failures (ie. they should use specific assertions). I see that this allows you to check every path instead of stopping on the first failure, so we may need to rethink how the test works if we want both assertions instead of prints, and having it not stop on the first failure.
There was a problem hiding this comment.
I borrowed the idea from test_BFS. I can fix and Fail the test when the first assertion is not met. I liked this approach because the test does not crash at the first failure and I get to see the other mismatches to find a pattern when debugging.
There was a problem hiding this comment.
I agree that can be nice when you want to see everything. A FIXME will let us revisit this later in a way that can give us both assertions and the ability to see other mismatches if you'd rather do that (which means they'd be individual tests that inspect individual paths, so it would require some thought...).
|
rerun tests |
|
@gpucibot merge |
Python bindings for random walks
closes #1488
check the rendering after the PR is merged to make sure everything render as expected