-
Notifications
You must be signed in to change notification settings - Fork 747
[MODIN]: Add time results verifier for getting started notebook #843
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -47,6 +47,50 @@ | |
| "import time" | ||
| ] | ||
| }, | ||
| { | ||
| "cell_type": "code", | ||
| "execution_count": null, | ||
| "metadata": {}, | ||
| "outputs": [], | ||
| "source": [ | ||
| "# ****** Do not change the code below! It verifies that the notebook is being run correctly! ******\n", | ||
| "\n", | ||
| "def verify_and_print_times(pandas_time, modin_time):\n", | ||
| " if modin_time < pandas_time:\n", | ||
| " print(f\"Modin was {pandas_time / modin_time:.2f}X faster than stock Pandas!\")\n", | ||
| " return\n", | ||
| " print(\n", | ||
| " f\"Oops, stock Pandas appears to be {modin_time / pandas_time:.2f}X faster than Modin in this case. \"\n", | ||
| " \"This is unlikely but could happen sometimes on certain machines/environments/datasets. \"\n", | ||
| " \"One of the most probable reasons is the excessive amount of partitions being assigned to a single worker. \"\n", | ||
| " \"You may visit Modin's optimization guide in order to learn more about such cases and how to fix them: \"\n", | ||
| " \"\\nhttps://modin.readthedocs.io/en/latest/usage_guide/optimization_notes/index.html\\n\\n\"\n", | ||
| " \"But first, verify that you're using the latest Modin version, also, try to use different executions, \"\n", | ||
| " \"for basic usage we recommend non-experimental 'PandasOnRay'\\n\"\n", | ||
| " \"Current execution is:\"\n", | ||
| " )\n", | ||
| " try:\n", | ||
| " import modin.config as cfg\n", | ||
| "\n", | ||
| " try:\n", | ||
| " storage_format = cfg.StorageFormat.get()\n", | ||
| " except AttributeError:\n", | ||
| " # for modin versions < 0.12\n", | ||
| " storage_format = cfg.Backend.get()\n", | ||
| " print(\n", | ||
| " f\"\\tExecution: {storage_format}On{cfg.Engine.get()}\\n\"\n", | ||
| " f\"\\tIs experimental: {cfg.IsExperimental.get()}\\n\"\n", | ||
| " f\"\\tCores to use by Modin (check that Modin uses all cores on your machine): {cfg.CpuCount.get()}\\n\"\n", | ||
| " f\"\\tIs in debug mode (debug mode may perform slower): {cfg.IsDebug.get()}\"\n", | ||
| " )\n", | ||
| " except ImportError:\n", | ||
| " # for modin versions < 0.8.2\n", | ||
| " pass\n", | ||
| " import modin\n", | ||
| "\n", | ||
| " print(f\"\\tModin version: {modin.__version__}\") " | ||
| ] | ||
| }, | ||
| { | ||
| "cell_type": "markdown", | ||
| "metadata": {}, | ||
|
|
@@ -58,7 +102,7 @@ | |
| "cell_type": "markdown", | ||
| "metadata": {}, | ||
| "source": [ | ||
| "We will also be importing **stock Pandas as pd** and **Modin as mpd to show differentiation**. You can see importing Modin is simple and **does not require any additional steps.**" | ||
| "We will also be importing **stock Pandas as pandas** and **Modin as pd to show differentiation**. You can see importing Modin is simple and **does not require any additional steps.**" | ||
| ] | ||
| }, | ||
| { | ||
|
|
@@ -612,6 +656,7 @@ | |
| "modin_time = time.time() - t1\n", | ||
| "\n", | ||
| "print(\"Pandas Time(seconds):\",pandas_time,\"\\nModin Time(seconds):\",modin_time)\n", | ||
| "verify_and_print_times(pandas_time, modin_time)\n", | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't see a return result from this function for read_csv.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Where do you expect to see the return result?
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't see any print from this function in the notebook (when opening
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Oh, that's because I haven't saved the new output of the notebook's cells. Rerunning the notebook with saving the new output would probably change execution times that were printed before, as I would probably be running the notebook on a different version of Modin and on a different machine. @praveenkk123, can I just rerun the notebook on an arbitrary machine and update the execution times? Wouldn't this cause any legal problems? |
||
| "outputDict={\"Pandas\":pandas_time,\"Modin\":modin_time}\n", | ||
| "plotter(outputDict)" | ||
| ] | ||
|
|
@@ -693,7 +738,7 @@ | |
| } | ||
| ], | ||
| "source": [ | ||
| "print(\"Modin was {}X faster than stock Pandas!\".format(round(pandas_time/modin_time, 2)))" | ||
| "verify_and_print_times(pandas_time, modin_time)" | ||
| ] | ||
| }, | ||
| { | ||
|
|
@@ -852,7 +897,7 @@ | |
| } | ||
| ], | ||
| "source": [ | ||
| "print(\"Modin was {}X faster than stock Pandas!\".format(round(pandas_time/modin_time, 2)))" | ||
| "verify_and_print_times(pandas_time, modin_time)" | ||
| ] | ||
| }, | ||
| { | ||
|
|
@@ -1000,7 +1045,7 @@ | |
| } | ||
| ], | ||
| "source": [ | ||
| "print(\"Modin was {}X faster than stock Pandas!\".format(round(pandas_time/modin_time, 2)))" | ||
| "verify_and_print_times(pandas_time, modin_time)" | ||
| ] | ||
| }, | ||
| { | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can use
get_current_executionto simplify the code.