Skip to content

Commit b74e22a

Browse files
authored
Use Datasets API to Update Notebook Examples (#2440)
Addresses issue #2364 All of the SG notebook examples have been updated to use the newly added Datasets API. Previously, Graph objects were created by specifying a path to the `.csv` file, calling `cuDF` to read in the file, and then converting the edge list to a graph. Now, a dataset object is imported and can create graphs by calling the `get_graph()` method. Comments and headings have also been updated for continuity. Authors: - Ralph Liu (https://github.com/oorliu) Approvers: - Rick Ratzel (https://github.com/rlratzel) URL: #2440
1 parent 5c7303c commit b74e22a

27 files changed

+727
-1188
lines changed

notebooks/algorithms/centrality/Betweenness.ipynb

Lines changed: 12 additions & 44 deletions
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,8 @@
1212
"| --------------|------------|------------------|-----------------|----------------|\n",
1313
"| Brad Rees | 04/24/2019 | created | 0.15 | GV100, CUDA 11.0\n",
1414
"| Brad Rees | 08/16/2020 | tested / updated | 21.10 nightly | RTX 3090 CUDA 11.4\n",
15-
"| Don Acosta | 07/05/2022 | tested / updated | 22.08 nightly | DGX Tesla V100 CUDA 11.5"
15+
"| Don Acosta | 07/05/2022 | tested / updated | 22.08 nightly | DGX Tesla V100 CUDA 11.5\n",
16+
"| Ralph Liu | 07/26/2022 | updated | 22.08 nightly | DGX Tesla V100 CUDA 11.5"
1617
]
1718
},
1819
{
@@ -111,7 +112,10 @@
111112
"source": [
112113
"# Import needed libraries\n",
113114
"import cugraph\n",
114-
"import cudf"
115+
"import cudf\n",
116+
"\n",
117+
"# Import a built-in dataset\n",
118+
"from cugraph.experimental.datasets import karate"
115119
]
116120
},
117121
{
@@ -124,42 +128,6 @@
124128
"import networkx as nx"
125129
]
126130
},
127-
{
128-
"cell_type": "markdown",
129-
"metadata": {},
130-
"source": [
131-
"### Some Prep"
132-
]
133-
},
134-
{
135-
"cell_type": "code",
136-
"execution_count": null,
137-
"metadata": {},
138-
"outputs": [],
139-
"source": [
140-
"# Define the path to the test data \n",
141-
"datafile='../../data/karate-data.csv'"
142-
]
143-
},
144-
{
145-
"cell_type": "markdown",
146-
"metadata": {},
147-
"source": [
148-
"### Read in the data - GPU\n",
149-
"cuGraph depends on cuDF for data loading and the initial Dataframe creation\n",
150-
"\n",
151-
"The data file contains an edge list, which represents the connection of a vertex to another. The `source` to `destination` pairs is in what is known as Coordinate Format (COO). In this test case, the data is just two columns. However a third, `weight`, column is also possible"
152-
]
153-
},
154-
{
155-
"cell_type": "code",
156-
"execution_count": null,
157-
"metadata": {},
158-
"outputs": [],
159-
"source": [
160-
"gdf = cudf.read_csv(datafile, delimiter='\\t', names=['src', 'dst'], dtype=['int32', 'int32'] )"
161-
]
162-
},
163131
{
164132
"cell_type": "markdown",
165133
"metadata": {},
@@ -173,9 +141,8 @@
173141
"metadata": {},
174142
"outputs": [],
175143
"source": [
176-
"# create a Graph using the source (src) and destination (dst) vertex pairs from the Dataframe \n",
177-
"G = cugraph.Graph()\n",
178-
"G.from_cudf_edgelist(gdf, source='src', destination='dst')"
144+
"# Create a graph using the imported Dataset object\n",
145+
"G = karate.get_graph(fetch=True)"
179146
]
180147
},
181148
{
@@ -256,6 +223,7 @@
256223
"outputs": [],
257224
"source": [
258225
"# Read the data, this also created a NetworkX Graph \n",
226+
"datafile=\"../../data/karate-data.csv\"\n",
259227
"file = open(datafile, 'rb')\n",
260228
"Gnx = nx.read_edgelist(file)"
261229
]
@@ -321,7 +289,7 @@
321289
],
322290
"metadata": {
323291
"kernelspec": {
324-
"display_name": "Python 3.8.13 ('cugraph_dev')",
292+
"display_name": "Python 3.9.7 ('base')",
325293
"language": "python",
326294
"name": "python3"
327295
},
@@ -335,11 +303,11 @@
335303
"name": "python",
336304
"nbconvert_exporter": "python",
337305
"pygments_lexer": "ipython3",
338-
"version": "3.8.13"
306+
"version": "3.9.7"
339307
},
340308
"vscode": {
341309
"interpreter": {
342-
"hash": "cee8a395f2f0c5a5bcf513ae8b620111f4346eff6dc64e1ea99c951b2ec68604"
310+
"hash": "f708a36acfaef0acf74ccd43dfb58100269bf08fb79032a1e0a6f35bd9856f51"
343311
}
344312
}
345313
},

notebooks/algorithms/centrality/Katz.ipynb

Lines changed: 14 additions & 39 deletions
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,8 @@
1212
"| --------------|------------|------------------|-----------------|----------------|\n",
1313
"| Brad Rees | 10/15/2019 | created | 0.14 | GV100, CUDA 10.2\n",
1414
"| Brad Rees | 08/16/2020 | tested / updated | 0.15.1 nightly | RTX 3090 CUDA 11.4\n",
15-
"| Don Acosta | 07/05/2022 | tested / updated | 22.08 nightly | DGX Tesla V100 CUDA 11.5"
15+
"| Don Acosta | 07/05/2022 | tested / updated | 22.08 nightly | DGX Tesla V100 CUDA 11.5\n",
16+
"| Ralph Liu | 07/26/2022 | updated | 22.08 nightly | DGX Tesla V100 CUDA 11.5"
1617
]
1718
},
1819
{
@@ -40,9 +41,9 @@
4041
" this value is 0.0f, cuGraph will use the default value which is 0.00001. \n",
4142
" Setting too small a tolerance can lead to non-convergence due to numerical \n",
4243
" roundoff. Usually values between 0.01 and 0.00001 are acceptable.\n",
43-
" nstart:cuDataFrame, GPU Dataframe containing the initial guess for katz centrality. \n",
44+
" nstart: cuDataFrame, GPU Dataframe containing the initial guess for katz centrality. \n",
4445
" Default is None\n",
45-
" normalized:bool, If True normalize the resulting katz centrality values. \n",
46+
" normalized: bool, If True normalize the resulting katz centrality values. \n",
4647
" Default is True\n",
4748
"\n",
4849
"Returns:\n",
@@ -106,7 +107,10 @@
106107
"source": [
107108
"# Import rapids libraries\n",
108109
"import cugraph\n",
109-
"import cudf"
110+
"import cudf\n",
111+
"\n",
112+
"# Import a built-in dataset\n",
113+
"from cugraph.experimental.datasets import karate"
110114
]
111115
},
112116
{
@@ -140,35 +144,6 @@
140144
"tol = 0.00001 # tolerance"
141145
]
142146
},
143-
{
144-
"cell_type": "code",
145-
"execution_count": null,
146-
"metadata": {},
147-
"outputs": [],
148-
"source": [
149-
"# Define the path to the test data \n",
150-
"datafile='../../data/karate-data.csv'"
151-
]
152-
},
153-
{
154-
"cell_type": "markdown",
155-
"metadata": {},
156-
"source": [
157-
"### Read in the data - GPU\n",
158-
"cuGraph depends on cuDF for data loading and the initial Dataframe creation\n",
159-
"\n",
160-
"The data file contains an edge list, which represents the connection of a vertex to another. The `source` to `destination` pairs is in what is known as Coordinate Format (COO). In this test case, the data is just two columns. However a third, `weight`, column is also possible"
161-
]
162-
},
163-
{
164-
"cell_type": "code",
165-
"execution_count": null,
166-
"metadata": {},
167-
"outputs": [],
168-
"source": [
169-
"gdf = cudf.read_csv(datafile, delimiter='\\t', names=['src', 'dst'], dtype=['int32', 'int32'] )"
170-
]
171-
},
172147
{
173148
"cell_type": "markdown",
174149
"metadata": {},
@@ -182,9 +157,8 @@
182157
"metadata": {},
183158
"outputs": [],
184159
"source": [
185-
"# create a Graph using the source (src) and destination (dst) vertex pairs from the Dataframe \n",
186-
"G = cugraph.Graph()\n",
187-
"G.from_cudf_edgelist(gdf, source='src', destination='dst')"
160+
"# Create a graph using the imported Dataset object\n",
161+
"G = karate.get_graph(fetch=True)"
188162
]
189163
},
190164
{
@@ -275,6 +249,7 @@
275249
"outputs": [],
276250
"source": [
277251
"# Read the data, this also created a NetworkX Graph \n",
252+
"datafile = \"../../data/karate-data.csv\"\n",
278253
"file = open(datafile, 'rb')\n",
279254
"Gnx = nx.read_edgelist(file)"
280255
]
@@ -348,7 +323,7 @@
348323
],
349324
"metadata": {
350325
"kernelspec": {
351-
"display_name": "Python 3.8.13 ('cugraph_dev')",
326+
"display_name": "Python 3.9.7 ('base')",
352327
"language": "python",
353328
"name": "python3"
354329
},
@@ -362,11 +337,11 @@
362337
"name": "python",
363338
"nbconvert_exporter": "python",
364339
"pygments_lexer": "ipython3",
365-
"version": "3.8.13"
340+
"version": "3.9.7"
366341
},
367342
"vscode": {
368343
"interpreter": {
369-
"hash": "cee8a395f2f0c5a5bcf513ae8b620111f4346eff6dc64e1ea99c951b2ec68604"
344+
"hash": "f708a36acfaef0acf74ccd43dfb58100269bf08fb79032a1e0a6f35bd9856f51"
370345
}
371346
}
372347
},

notebooks/algorithms/community/ECG.ipynb

Lines changed: 12 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,7 @@
1313
"| | 08/16/2020 | updated | 0.15 | GV100, CUDA 10.2 |\n",
1414
"| | 08/05/2021 | tested/updated | 21.10 nightly | RTX 3090 CUDA 11.4 |\n",
1515
"| Don Acosta | 07/20/2022 | tested/updated | 22.08 nightly | DGX Tesla V100 CUDA 11.5 |\n",
16+
"| Ralph Liu | 07/26/2022 | updated | 22.08 nightly | DGX Tesla V100 CUDA 11.5 |\n",
1617
"\n",
1718
"## Introduction\n",
1819
"\n",
@@ -101,34 +102,17 @@
101102
"source": [
102103
"# Import needed libraries\n",
103104
"import cugraph\n",
104-
"import cudf"
105+
"import cudf\n",
106+
"\n",
107+
"# Import a built-in dataset\n",
108+
"from cugraph.experimental.datasets import karate"
105109
]
106110
},
107111
{
108112
"cell_type": "markdown",
109113
"metadata": {},
110114
"source": [
111-
"## Read data using cuDF"
112-
]
113-
},
114-
{
115-
"cell_type": "code",
116-
"execution_count": null,
117-
"metadata": {},
118-
"outputs": [],
119-
"source": [
120-
"# Test file \n",
121-
"datafile='../../data/karate-data.csv'"
122-
]
123-
},
124-
{
125-
"cell_type": "code",
126-
"execution_count": null,
127-
"metadata": {},
128-
"outputs": [],
129-
"source": [
130-
"# read the data using cuDF\n",
131-
"gdf = cudf.read_csv(datafile, delimiter='\\t', names=['src', 'dst'], dtype=['int32', 'int32'] )"
115+
"## Create an Edgelist"
132116
]
133117
},
134118
{
@@ -137,6 +121,9 @@
137121
"metadata": {},
138122
"outputs": [],
139123
"source": [
124+
"# You can also just get the edgelist\n",
125+
"gdf = karate.get_edgelist(fetch=True)\n",
126+
"\n",
140127
"# The algorithm also requires that there are vertex weights. Just use 1.0 \n",
141128
"gdf[\"data\"] = 1.0"
142129
]
@@ -232,7 +219,7 @@
232219
],
233220
"metadata": {
234221
"kernelspec": {
235-
"display_name": "Python 3.8.13 ('cugraph_dev')",
222+
"display_name": "Python 3.9.7 ('base')",
236223
"language": "python",
237224
"name": "python3"
238225
},
@@ -246,11 +233,11 @@
246233
"name": "python",
247234
"nbconvert_exporter": "python",
248235
"pygments_lexer": "ipython3",
249-
"version": "3.8.13"
236+
"version": "3.9.7"
250237
},
251238
"vscode": {
252239
"interpreter": {
253-
"hash": "cee8a395f2f0c5a5bcf513ae8b620111f4346eff6dc64e1ea99c951b2ec68604"
240+
"hash": "f708a36acfaef0acf74ccd43dfb58100269bf08fb79032a1e0a6f35bd9856f51"
254241
}
255242
}
256243
},

notebooks/algorithms/community/Louvain.ipynb

Lines changed: 12 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,7 @@
1515
"| | 08/16/2020 | updated | 0.14 | GV100, CUDA 10.2 |\n",
1616
"| | 08/05/2021 | tested / updated | 21.10 nightly | RTX 3090 CUDA 11.4 |\n",
1717
"| Don Acosta | 07/11/2022 | tested / updated | 22.08 nightly | DGX Tesla V100 CUDA 11.5 |\n",
18+
"| Ralph Liu | 07/26/2022 | updated | 22.08 nightly | DGX Tesla V100 CUDA 11.5 |\n",
1819
"\n",
1920
"\n",
2021
"\n",
@@ -140,34 +141,17 @@
140141
"source": [
141142
"# Import needed libraries\n",
142143
"import cugraph\n",
143-
"import cudf"
144+
"import cudf\n",
145+
"\n",
146+
"# Import a built-in dataset\n",
147+
"from cugraph.experimental.datasets import karate"
144148
]
145149
},
146150
{
147151
"cell_type": "markdown",
148152
"metadata": {},
149153
"source": [
150-
"## Read data using cuDF"
151-
]
152-
},
153-
{
154-
"cell_type": "code",
155-
"execution_count": null,
156-
"metadata": {},
157-
"outputs": [],
158-
"source": [
159-
"# Test file \n",
160-
"datafile='../../data//karate-data.csv'"
161-
]
162-
},
163-
{
164-
"cell_type": "code",
165-
"execution_count": null,
166-
"metadata": {},
167-
"outputs": [],
168-
"source": [
169-
"# read the data using cuDF\n",
170-
"gdf = cudf.read_csv(datafile, delimiter='\\t', names=['src', 'dst'], dtype=['int32', 'int32'] )"
154+
"## Create an Edgelist"
171155
]
172156
},
173157
{
@@ -176,6 +160,9 @@
176160
"metadata": {},
177161
"outputs": [],
178162
"source": [
163+
"# You can also just get the edgelist\n",
164+
"gdf = karate.get_edgelist(fetch=True)\n",
165+
"\n",
179166
"# The algorithm also requires that there are vertex weights. Just use 1.0 \n",
180167
"gdf[\"data\"] = 1.0"
181168
]
@@ -323,7 +310,7 @@
323310
],
324311
"metadata": {
325312
"kernelspec": {
326-
"display_name": "Python 3.8.13 ('cugraph_dev')",
313+
"display_name": "Python 3.9.7 ('base')",
327314
"language": "python",
328315
"name": "python3"
329316
},
@@ -337,11 +324,11 @@
337324
"name": "python",
338325
"nbconvert_exporter": "python",
339326
"pygments_lexer": "ipython3",
340-
"version": "3.8.13"
327+
"version": "3.9.7"
341328
},
342329
"vscode": {
343330
"interpreter": {
344-
"hash": "cee8a395f2f0c5a5bcf513ae8b620111f4346eff6dc64e1ea99c951b2ec68604"
331+
"hash": "f708a36acfaef0acf74ccd43dfb58100269bf08fb79032a1e0a6f35bd9856f51"
345332
}
346333
}
347334
},

0 commit comments

Comments
 (0)