Skip to content

Random Walks - Python Bindings#1516

Merged
rapids-bot[bot] merged 4 commits intorapidsai:branch-0.19from
jnke2016:fea-Python-RW
Apr 7, 2021
Merged

Random Walks - Python Bindings#1516
rapids-bot[bot] merged 4 commits intorapidsai:branch-0.19from
jnke2016:fea-Python-RW

Conversation

@jnke2016
Copy link
Contributor

@jnke2016 jnke2016 commented Apr 7, 2021

Python bindings for random walks
closes #1488
check the rendering after the PR is merged to make sure everything render as expected

@jnke2016 jnke2016 requested a review from a team as a code owner April 7, 2021 01:40
@codecov-io
Copy link

Codecov Report

Merging #1516 (59fca7d) into branch-0.19 (b442f3b) will decrease coverage by 1.00%.
The diff coverage is 0.00%.

Impacted file tree graph

@@               Coverage Diff               @@
##           branch-0.19    #1516      +/-   ##
===============================================
- Coverage        60.54%   59.53%   -1.01%     
===============================================
  Files               70       72       +2     
  Lines             3153     3188      +35     
===============================================
- Hits              1909     1898      -11     
- Misses            1244     1290      +46     
Impacted Files Coverage Δ
python/cugraph/sampling/__init__.py 0.00% <0.00%> (ø)
python/cugraph/sampling/random_walks.py 0.00% <0.00%> (ø)
python/cugraph/utilities/utils.py 69.23% <0.00%> (-3.08%) ⬇️
python/cugraph/structure/number_map.py 63.82% <0.00%> (-2.04%) ⬇️
python/cugraph/_version.py 44.40% <0.00%> (-0.40%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b442f3b...59fca7d. Read the comment docs.

@BradReesWork BradReesWork added this to the 0.19 milestone Apr 7, 2021
@BradReesWork BradReesWork added improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Apr 7, 2021
Copy link
Member

@BradReesWork BradReesWork left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a tests

Copy link
Member

@afender afender left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add Random Walk to:

  • the main __init__.py
  • the .rst file
  • the list of algorithms in the readme

@jnke2016 jnke2016 requested a review from a team as a code owner April 7, 2021 18:15
Copy link
Member

@afender afender left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @jnke2016 !

directed=False,
max_depth=None
):
"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

your description text does not match the function API

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I can either make changes to match this API's description or modify the description

Copy link
Contributor

@rlratzel rlratzel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cython code looks fine, but I just had some comments/suggestions about the API and py tests.

Comment on lines +35 to +36
Use weight parameter if weights need to be considered
(currently not supported)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this last sentence should just be removed until we support weights, since it's a little confusing now (ie. is that referring to a weight parameter to this function, or is that referring to just a weighted graph, etc.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be removed. There is no weight parameter

seeds_offsets: cudf.Series
Series containing the starting offset in the returned edge list
for each vertex in start_vertices.
"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only if there's time for this: a lot of our other docstrings also include examples, and it might be nice to have an example for this too. It might especially useful since the output type is somewhat unique.

Series containing the starting offset in the returned edge list
for each vertex in start_vertices.
"""
if max_depth is None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you just remove the default value of =Nonein the function def, python will do this check for you:

def random_walks(
    G,
    start_vertices,
    max_depth
):

next_path_idx = 0
offsets = [0]

df = cudf.DataFrame()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our convention is that if a NetworkX graph is passed in, we return pandas dataframes (ie. we return types that are "native" to the input type). If there's no time for this, it might have to be a FIXME.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You may be able to just call this utility as is done here (you would have test with a Nx input to be sure though).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. And Brad asked to remove networkX

from rmm._lib.device_buffer cimport DeviceBuffer
from cudf.core.buffer import Buffer
from cython.operator cimport dereference as deref
def random_walks(input_graph, start_vertices, max_depth):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: since these don't get style checks, we try to manually conform to a Python style (I think?), meaning there would be 2 blank lines between the imports and the def.

# =============================================================================
DIRECTED_GRAPH_OPTIONS = [False, True]
WEIGHTED_GRAPH_OPTIONS = [False, True]
DATASETS = [pytest.param(d) for d in utils.DATASETS]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been adding ids to make it easier to see what dataset is being run in the event of a failure:

Suggested change
DATASETS = [pytest.param(d) for d in utils.DATASETS]
DATASETS = [pytest.param(d) for d in utils.DATASETS,
ids=[f"dataset={d.as_posix()}" for d in utils.DATASETS]]

# =============================================================================


def prepare_test():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you make this a setup function, pytest will automatically call it for you before each test, as done here.

max_depth
):
"""Test calls random_walks an invalid type"""
prepare_test()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can remove this line if you change the above function to a setup function.

graph_file,
directed
):
max_depth = random.randint(2, 10)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the test fails, will the user know what randomly chosen max_depth was used to get the results? This may need to be printed somewhere too so devs can reproduce the error if necessary.

if i == offsets[offsets_idx]:
if df['src'].iloc[i] != seeds[offsets_idx]:
invalid_seeds += 1
print(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be a FIXME, but tests probably shouldn't rely on print statements to show failures (ie. they should use specific assertions). I see that this allows you to check every path instead of stopping on the first failure, so we may need to rethink how the test works if we want both assertions instead of prints, and having it not stop on the first failure.

Copy link
Contributor Author

@jnke2016 jnke2016 Apr 7, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I borrowed the idea from test_BFS. I can fix and Fail the test when the first assertion is not met. I liked this approach because the test does not crash at the first failure and I get to see the other mismatches to find a pattern when debugging.

Copy link
Contributor

@rlratzel rlratzel Apr 7, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that can be nice when you want to see everything. A FIXME will let us revisit this later in a way that can give us both assertions and the ability to see other mismatches if you'd rather do that (which means they'd be individual tests that inspect individual paths, so it would require some thought...).

@BradReesWork
Copy link
Member

rerun tests

@BradReesWork
Copy link
Member

@gpucibot merge

@rapids-bot rapids-bot bot merged commit 63e69fc into rapidsai:branch-0.19 Apr 7, 2021
@jnke2016 jnke2016 deleted the fea-Python-RW branch September 24, 2022 23:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

improvement Improvement / enhancement to an existing function non-breaking Non-breaking change

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEA] RandomWalk - Python Bindings

6 participants