[WIP] Optimize Sampling for graph_store#2283
Merged
rapids-bot[bot] merged 2 commits intorapidsai:branch-22.06from May 19, 2022
Merged
[WIP] Optimize Sampling for graph_store#2283rapids-bot[bot] merged 2 commits intorapidsai:branch-22.06from
rapids-bot[bot] merged 2 commits intorapidsai:branch-22.06from
Conversation
BradReesWork
approved these changes
May 18, 2022
wangxiaoyunNV
approved these changes
May 18, 2022
Codecov Report
@@ Coverage Diff @@
## branch-22.06 #2283 +/- ##
================================================
- Coverage 70.82% 63.80% -7.03%
================================================
Files 170 100 -70
Lines 11036 4481 -6555
================================================
- Hits 7816 2859 -4957
+ Misses 3220 1622 -1598
Continue to review full report at Codecov.
|
BradReesWork
approved these changes
May 19, 2022
Member
|
@gpucibot merge |
nv-rliu
pushed a commit
to nv-rliu/cugraph-gnn
that referenced
this pull request
Jul 15, 2025
This PR optimizes the sampling function for `graph_store` by 3x+ by getting rid of the host side code and doing the sampling end to end on GPUs.
More **importantly** this code makes sure that the actual sampling in `batched_ego_graphs` is the bottleneck , previously we only spent `32%` in the core sampling code while now we spend `98.5%` of the time there.
See Below Benchmarks :
### Before PR
```python
Timer unit: 1e-06 s
Total time: 17.3772 s
File: /home/nfs/vjawa/dgl/cugraph/python/cugraph/cugraph/gnn/graph_store.py
Function: sample_neighbors at line 73
Line # Hits Time Per Hit % Time Line Contents
==============================================================
73 def sample_neighbors(self,
74 nodes,
75 fanout=-1,
76 edge_dir='in',
77 prob=None,
78 replace=False):
79 """
80 Sample neighboring edges of the given nodes and return the subgraph.
81
82 Parameters
83 ----------
84 nodes : array (single dimension)
85 Node IDs to sample neighbors from.
86 fanout : int
87 The number of edges to be sampled for each node on each edge type.
88 edge_dir : str {"in" or "out"}
89 Determines whether to sample inbound or outbound edges.
90 Can take either in for inbound edges or out for outbound edges.
91 prob : str
92 Feature name used as the (unnormalized) probabilities associated
93 with each neighboring edge of a node. Each feature must be a
94 scalar. The features must be non-negative floats, and the sum of
95 the features of inbound/outbound edges for every node must be
96 positive (though they don't have to sum up to one). Otherwise,
97 the result will be undefined. If not specified, sample uniformly.
98 replace : bool
99 If True, sample with replacement.
100
101 Returns
102 -------
103 CuPy array
104 The sampled arrays for bipartite graph.
105 """
106 1 18.0 18.0 0.0 num_nodes = len(nodes)
107 1 7833.0 7833.0 0.0 current_seeds = nodes.reindex(index=np.arange(0, num_nodes))
108 2 129790.0 64895.0 0.7 _g = self.__G.extract_subgraph(create_using=cugraph.Graph,
109 1 1.0 1.0 0.0 allow_multi_edges=True)
110 2 5467307.0 2733653.5 31.5 ego_edge_list, seeds_offsets = batched_ego_graphs(_g,
111 1 1.0 1.0 0.0 current_seeds,
112 1 0.0 0.0 0.0 radius=1)
113 1 123.0 123.0 0.0 all_parents = cupy.ndarray(0)
114 1 12.0 12.0 0.0 all_children = cupy.ndarray(0)
115 # filter and get a certain size neighborhood
116 1001 1143.0 1.1 0.0 for i in range(1, len(seeds_offsets)):
117 1000 262330.0 262.3 1.5 pos0 = seeds_offsets.values_host[i-1]
118 1000 211487.0 211.5 1.2 pos1 = seeds_offsets.values_host[i]
119 1000 335515.0 335.5 1.9 edge_list = ego_edge_list[pos0:pos1]
120 # get randomness fanout
121 1000 6202089.0 6202.1 35.7 filtered_list = edge_list[edge_list['dst'] == current_seeds[i-1]]
122
123 # get sampled_list
124 1000 14097.0 14.1 0.1 if len(filtered_list) > fanout:
125 1654 19502.0 11.8 0.1 sampled_indices = random.sample(
126 827 192781.0 233.1 1.1 filtered_list.index.to_arrow().to_pylist(), fanout)
127 827 4080293.0 4933.8 23.5 filtered_list = filtered_list.reindex(index=sampled_indices)
128
129 1000 146293.0 146.3 0.8 children = cupy.asarray(filtered_list['src'])
130 1000 126122.0 126.1 0.7 parents = cupy.asarray(filtered_list['dst'])
131 1000 105440.0 105.4 0.6 all_parents = cupy.append(all_parents, parents)
132 1000 74987.0 75.0 0.4 all_children = cupy.append(all_children, children)
133 1 1.0 1.0 0.0 return all_parents, all_children
````
### After PR:
```python
Timer unit: 1e-06 s
Total time: 5.73069 s
File: /datasets/vjawa/miniconda3/envs/cugraph_dev/lib/python3.8/site-packages/cugraph-22.6.0a0+86.gd9ec8f718.dirty-py3.8-linux-x86_64.egg/cugraph/gnn/graph_store.py
Function: sample_neighbors at line 73
Line # Hits Time Per Hit % Time Line Contents
==============================================================
73 def sample_neighbors(self,
74 nodes,
75 fanout=-1,
76 edge_dir='in',
77 prob=None,
78 replace=False):
79 """
80 Sample neighboring edges of the given nodes and return the subgraph.
81
82 Parameters
83 ----------
84 nodes : array (single dimension)
85 Node IDs to sample neighbors from.
86 fanout : int
87 The number of edges to be sampled for each node on each edge type.
88 edge_dir : str {"in" or "out"}
89 Determines whether to sample inbound or outbound edges.
90 Can take either in for inbound edges or out for outbound edges.
91 prob : str
92 Feature name used as the (unnormalized) probabilities associated
93 with each neighboring edge of a node. Each feature must be a
94 scalar. The features must be non-negative floats, and the sum of
95 the features of inbound/outbound edges for every node must be
96 positive (though they don't have to sum up to one). Otherwise,
97 the result will be undefined. If not specified, sample uniformly.
98 replace : bool
99 If True, sample with replacement.
100
101 Returns
102 -------
103 CuPy array
104 The sampled arrays for bipartite graph.
105 """
106 1 20.0 20.0 0.0 num_nodes = len(nodes)
107 1 7681.0 7681.0 0.1 current_seeds = nodes.reindex(index=np.arange(0, num_nodes))
108 2 143943.0 71971.5 2.5 _g = self.__G.extract_subgraph(create_using=cugraph.Graph,
109 1 0.0 0.0 0.0 allow_multi_edges=True)
110 2 5500286.0 2750143.0 96.0 ego_edge_list, seeds_offsets = batched_ego_graphs(_g,
111 1 1.0 1.0 0.0 current_seeds,
112 1 0.0 0.0 0.0 radius=1)
113 # filter and get a certain size neighborhood
114
115 # Step 1
116 # Get Filtered List of ego_edge_list corresposing to current_seeds
117 # We filter by creating a series of destination nodes
118 # corresponding to the offsets and filtering non matching vallues
119
120 1 719.0 719.0 0.0 seeds_offsets_s = cudf.Series(seeds_offsets).values
121 1 174.0 174.0 0.0 offset_lens = seeds_offsets_s[1:] - seeds_offsets_s[0:-1]
122 1 4042.0 4042.0 0.1 dst_seeds = current_seeds.repeat(offset_lens)
123 1 637.0 637.0 0.0 dst_seeds.index = ego_edge_list.index
124 1 5196.0 5196.0 0.1 filtered_list = ego_edge_list[ego_edge_list["dst"] == dst_seeds]
125
126 # Step 2
127 # Sample Fan Out
128 # for each dst take maximum of fanout samples
129 2 67247.0 33623.5 1.2 filtered_list = sample_groups(filtered_list,
130 1 1.0 1.0 0.0 by="dst",
131 1 1.0 1.0 0.0 n_samples=fanout)
132
133 1 744.0 744.0 0.0 return filtered_list['src'].values, filtered_list['dst'].values
```
## Todo:
- [ ] Add Unit Tests
Authors:
- Vibhu Jawa (https://github.com/VibhuJawa)
Approvers:
- Brad Rees (https://github.com/BradReesWork)
- Xiaoyun Wang (https://github.com/wangxiaoyunNV)
URL: rapidsai/cugraph#2283
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR optimizes the sampling function for
graph_storeby 3x+ by getting rid of the host side code and doing the sampling end to end on GPUs.More importantly this code makes sure that the actual sampling in
batched_ego_graphsis the bottleneck , previously we only spent32%in the core sampling code while now we spend98.5%of the time there.See Below Benchmarks :
Before PR
After PR:
Todo: