Skip to content
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,50 @@
"import time"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# ****** Do not change the code below! It verifies that the notebook is being run correctly! ******\n",
"\n",
"def verify_and_print_times(pandas_time, modin_time):\n",
" if modin_time < pandas_time:\n",
" print(f\"Modin was {pandas_time / modin_time:.2f}X faster than stock Pandas!\")\n",
" return\n",
" print(\n",
" f\"Oops, stock Pandas appears to be {modin_time / pandas_time:.2f}X faster than Modin in this case. \"\n",
" \"This is unlikely but could happen sometimes on certain machines/environments/datasets. \"\n",
" \"One of the most probable reasons is the excessive amount of partitions being assigned to a single worker. \"\n",
" \"You may visit Modin's optimization guide in order to learn more about such cases and how to fix them: \"\n",
" \"\\nhttps://modin.readthedocs.io/en/latest/usage_guide/optimization_notes/index.html\\n\\n\"\n",
" \"But first, verify that you're using the latest Modin version, also, try to use different executions, \"\n",
" \"for basic usage we recommend non-experimental 'PandasOnRay'\\n\"\n",
" \"Current execution is:\"\n",
" )\n",
" try:\n",
" import modin.config as cfg\n",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can use get_current_execution to simplify the code.

"\n",
" try:\n",
" storage_format = cfg.StorageFormat.get()\n",
" except AttributeError:\n",
" # for modin versions < 0.12\n",
" storage_format = cfg.Backend.get()\n",
" print(\n",
" f\"\\tExecution: {storage_format}On{cfg.Engine.get()}\\n\"\n",
" f\"\\tIs experimental: {cfg.IsExperimental.get()}\\n\"\n",
" f\"\\tCores to use by Modin (check that Modin uses all cores on your machine): {cfg.CpuCount.get()}\\n\"\n",
" f\"\\tIs in debug mode (debug mode may perform slower): {cfg.IsDebug.get()}\"\n",
" )\n",
" except ImportError:\n",
" # for modin versions < 0.8.2\n",
" pass\n",
" import modin\n",
"\n",
" print(f\"\\tModin version: {modin.__version__}\") "
]
},
{
"cell_type": "markdown",
"metadata": {},
Expand All @@ -58,7 +102,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"We will also be importing **stock Pandas as pd** and **Modin as mpd to show differentiation**. You can see importing Modin is simple and **does not require any additional steps.**"
"We will also be importing **stock Pandas as pandas** and **Modin as pd to show differentiation**. You can see importing Modin is simple and **does not require any additional steps.**"
]
},
{
Expand Down Expand Up @@ -612,6 +656,7 @@
"modin_time = time.time() - t1\n",
"\n",
"print(\"Pandas Time(seconds):\",pandas_time,\"\\nModin Time(seconds):\",modin_time)\n",
"verify_and_print_times(pandas_time, modin_time)\n",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see a return result from this function for read_csv.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where do you expect to see the return result? verify_and_print_times returns void, the function's only effect is that it prints time results on the screen

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see any print from this function in the notebook (when opening ... -> View file).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, that's because I haven't saved the new output of the notebook's cells. Rerunning the notebook with saving the new output would probably change execution times that were printed before, as I would probably be running the notebook on a different version of Modin and on a different machine. @praveenkk123, can I just rerun the notebook on an arbitrary machine and update the execution times? Wouldn't this cause any legal problems?

"outputDict={\"Pandas\":pandas_time,\"Modin\":modin_time}\n",
"plotter(outputDict)"
]
Expand Down Expand Up @@ -693,7 +738,7 @@
}
],
"source": [
"print(\"Modin was {}X faster than stock Pandas!\".format(round(pandas_time/modin_time, 2)))"
"verify_and_print_times(pandas_time, modin_time)"
]
},
{
Expand Down Expand Up @@ -852,7 +897,7 @@
}
],
"source": [
"print(\"Modin was {}X faster than stock Pandas!\".format(round(pandas_time/modin_time, 2)))"
"verify_and_print_times(pandas_time, modin_time)"
]
},
{
Expand Down Expand Up @@ -1000,7 +1045,7 @@
}
],
"source": [
"print(\"Modin was {}X faster than stock Pandas!\".format(round(pandas_time/modin_time, 2)))"
"verify_and_print_times(pandas_time, modin_time)"
]
},
{
Expand Down