Merge pull request #71 from dmatrix/br_jsd_add_good_bad_usage_tip_3_4

dmatrix · web-flow · commit 72064b114515 · 2023-01-01T08:46:00.000-08:00
added good/bad usage comments
diff --git a/ex_06_ray_api_calls.ipynb b/ex_06_ray_api_calls.ipynb
@@ -162,7 +162,7 @@
        "</div>\n"
       ],
       "text/plain": [
-       "RayContext(dashboard_url='127.0.0.1:8265', python_version='3.8.13', ray_version='2.2.0', ray_commit='b6af0887ee5f2e460202133791ad941a41f15beb', address_info={'node_ip_address': '127.0.0.1', 'raylet_ip_address': '127.0.0.1', 'redis_address': None, 'object_store_address': '/tmp/ray/session_2022-12-31_16-01-17_526291_25937/sockets/plasma_store', 'raylet_socket_name': '/tmp/ray/session_2022-12-31_16-01-17_526291_25937/sockets/raylet', 'webui_url': '127.0.0.1:8265', 'session_dir': '/tmp/ray/session_2022-12-31_16-01-17_526291_25937', 'metrics_export_port': 62563, 'gcs_address': '127.0.0.1:62593', 'address': '127.0.0.1:62593', 'dashboard_agent_listen_port': 52365, 'node_id': '4e9cbd35b4e72abbf50b1b6201b666cb5ce50f1aab8c5753b21f2283'})"
+       "RayContext(dashboard_url='127.0.0.1:8265', python_version='3.8.13', ray_version='2.2.0', ray_commit='b6af0887ee5f2e460202133791ad941a41f15beb', address_info={'node_ip_address': '127.0.0.1', 'raylet_ip_address': '127.0.0.1', 'redis_address': None, 'object_store_address': '/tmp/ray/session_2023-01-01_08-43-09_914236_61566/sockets/plasma_store', 'raylet_socket_name': '/tmp/ray/session_2023-01-01_08-43-09_914236_61566/sockets/raylet', 'webui_url': '127.0.0.1:8265', 'session_dir': '/tmp/ray/session_2023-01-01_08-43-09_914236_61566', 'metrics_export_port': 61622, 'gcs_address': '127.0.0.1:54042', 'address': '127.0.0.1:54042', 'dashboard_agent_listen_port': 52365, 'node_id': 'b5a66792d6abaa014788485a35caaa837573e16d451ea93aef504f8d'})"
       ]
      },
      "execution_count": 3,
@@ -187,7 +187,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "## Fetching Cluster Information\n",
+    "### Fetching Cluster Information\n",
     "\n",
     "Many methods return information:\n",
     "\n",
@@ -214,9 +214,9 @@
      "text": [
       "\n",
       "ray.get_gpu_ids():          []\n",
-      "ray.nodes():                [{'NodeID': '4e9cbd35b4e72abbf50b1b6201b666cb5ce50f1aab8c5753b21f2283', 'Alive': True, 'NodeManagerAddress': '127.0.0.1', 'NodeManagerHostname': 'Juless-MacBook-Pro-16', 'NodeManagerPort': 60973, 'ObjectManagerPort': 60972, 'ObjectStoreSocketName': '/tmp/ray/session_2022-12-31_16-01-17_526291_25937/sockets/plasma_store', 'RayletSocketName': '/tmp/ray/session_2022-12-31_16-01-17_526291_25937/sockets/raylet', 'MetricsExportPort': 62563, 'NodeName': '127.0.0.1', 'alive': True, 'Resources': {'node:127.0.0.1': 1.0, 'CPU': 4.0, 'memory': 49407454413.0, 'object_store_memory': 2147483648.0}}]\n",
-      "ray.cluster_resources():    {'CPU': 4.0, 'node:127.0.0.1': 1.0, 'memory': 49407454413.0, 'object_store_memory': 2147483648.0}\n",
-      "ray.available_resources():  {'memory': 49407454413.0, 'object_store_memory': 2147483648.0, 'CPU': 4.0, 'node:127.0.0.1': 1.0}\n",
+      "ray.nodes():                [{'NodeID': 'b5a66792d6abaa014788485a35caaa837573e16d451ea93aef504f8d', 'Alive': True, 'NodeManagerAddress': '127.0.0.1', 'NodeManagerHostname': 'Juless-MacBook-Pro-16', 'NodeManagerPort': 50600, 'ObjectManagerPort': 50599, 'ObjectStoreSocketName': '/tmp/ray/session_2023-01-01_08-43-09_914236_61566/sockets/plasma_store', 'RayletSocketName': '/tmp/ray/session_2023-01-01_08-43-09_914236_61566/sockets/raylet', 'MetricsExportPort': 61622, 'NodeName': '127.0.0.1', 'alive': True, 'Resources': {'node:127.0.0.1': 1.0, 'CPU': 4.0, 'object_store_memory': 2147483648.0, 'memory': 49037236634.0}}]\n",
+      "ray.cluster_resources():    {'memory': 49037236634.0, 'object_store_memory': 2147483648.0, 'node:127.0.0.1': 1.0, 'CPU': 4.0}\n",
+      "ray.available_resources():  {'memory': 49037236634.0, 'CPU': 4.0, 'object_store_memory': 2147483648.0, 'node:127.0.0.1': 1.0}\n",
       "\n"
      ]
     }
@@ -283,7 +283,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "### @ray.method()\n",
+    "## @ray.method()\n",
     "\n",
     "Related to `@ray.remote()`, [@ray.method()](https://ray.readthedocs.io/en/latest/package-ref.html#ray.method) allows you to specify the number of return values for a method in a task or an actor, by passing the `num_returns` keyword argument. None of the other `@ray.remote()` keyword arguments are allowed. Here is an example:"
    ]
@@ -297,7 +297,7 @@
      "name": "stdout",
      "output_type": "stream",
      "text": [
-      "(LIONEL MESSIE, 5, 12.100000000000001)\n"
+      "(LIONEL MESSIE, 8, 12.100000000000001)\n"
      ]
     }
    ],
@@ -333,7 +333,7 @@
      "name": "stdout",
      "output_type": "stream",
      "text": [
-      "(LIONEL MESSIE, 9, 12.100000000000001)\n"
+      "(LIONEL MESSIE, 5, 12.100000000000001)\n"
      ]
     }
    ],
@@ -369,7 +369,7 @@
      "name": "stdout",
      "output_type": "stream",
      "text": [
-      "(LIONEL MESSIE, 10, 12.100000000000001)\n"
+      "(LIONEL MESSIE, 7, 12.100000000000001)\n"
      ]
     }
    ],
@@ -394,7 +394,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "# Tips and Tricks for first-time users\n",
+    "## Tips and Tricks for first-time users\n",
     "Because Ray's core APIs are simple and flexible, first time users can trip upon certain API calls in Ray's usage patterns. This short tips & tricks will insure you against unexpected results. Below we briefly explore a handful of API calls and their best practices."
    ]
   },
@@ -438,8 +438,8 @@
      "name": "stdout",
      "output_type": "stream",
      "text": [
-      "CPU times: user 54.1 ms, sys: 24.9 ms, total: 79 ms\n",
-      "Wall time: 5.1 s\n"
+      "CPU times: user 45 ms, sys: 21.5 ms, total: 66.5 ms\n",
+      "Wall time: 5.09 s\n"
      ]
     },
     {
@@ -486,8 +486,8 @@
      "name": "stdout",
      "output_type": "stream",
      "text": [
-      "CPU times: user 19.6 ms, sys: 11.4 ms, total: 31 ms\n",
-      "Wall time: 2.31 s\n"
+      "CPU times: user 15 ms, sys: 9.86 ms, total: 24.8 ms\n",
+      "Wall time: 2.26 s\n"
      ]
     },
     {
@@ -554,7 +554,7 @@
      "name": "stdout",
      "output_type": "stream",
      "text": [
-      "CPU times: user 144 ms, sys: 195 ms, total: 339 ms\n",
+      "CPU times: user 136 ms, sys: 175 ms, total: 311 ms\n",
       "Wall time: 12.9 s\n"
      ]
     },
@@ -603,8 +603,8 @@
      "name": "stdout",
      "output_type": "stream",
      "text": [
-      "CPU times: user 7.23 s, sys: 2.32 s, total: 9.55 s\n",
-      "Wall time: 10.8 s\n"
+      "CPU times: user 7.22 s, sys: 3.33 s, total: 10.5 s\n",
+      "Wall time: 11.9 s\n"
      ]
     },
     {
@@ -638,6 +638,14 @@
     "One way to mitigate is to make the remote tasks \"larger\" in order to amortize invocation overhead. This is achieved by aggregating tasks into bigger chunks of 1000.\n"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "#### Bad Usage\n",
+    "Avoid small many tiny tasks as the overhead to scheduler may be slower than serial execution"
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": 16,
@@ -658,8 +666,8 @@
      "name": "stdout",
      "output_type": "stream",
      "text": [
-      "CPU times: user 204 ms, sys: 26.7 ms, total: 230 ms\n",
-      "Wall time: 3.92 s\n"
+      "CPU times: user 223 ms, sys: 33.3 ms, total: 257 ms\n",
+      "Wall time: 3.95 s\n"
      ]
     }
    ],
@@ -674,9 +682,20 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "A huge difference in execution time, almost **4X** faster!"
+    "A huge difference in execution time, almost **4X** faster!\n",
+    "\n",
+    "#### Good Usage\n",
+    "Break or restructure many small tasks into batches or chunks of large Ray remote tasks, as demonstrated above\n",
+    "\n",
+    "#### Takeway tip 2:\n",
+    "Where possible strive to batch tiny smaller Ray tasks into chuncks to reap the benefits of distributing them."
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": []
+  },
   {
    "cell_type": "markdown",
    "metadata": {},
@@ -732,7 +751,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 31,
+   "execution_count": 20,
    "metadata": {},
    "outputs": [],
    "source": [
@@ -780,16 +799,16 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 32,
+   "execution_count": 21,
    "metadata": {},
    "outputs": [
     {
      "name": "stdout",
      "output_type": "stream",
      "text": [
-      "Duration: 9.13 seconds and predictions: [0, 0, 1, 1, 2, 3]\n",
-      "CPU times: user 55.2 ms, sys: 30.5 ms, total: 85.7 ms\n",
-      "Wall time: 9.13 s\n"
+      "Duration: 8.96 seconds and predictions: [0, 0, 1, 1, 2, 3]\n",
+      "CPU times: user 61.2 ms, sys: 33.2 ms, total: 94.4 ms\n",
+      "Wall time: 8.96 s\n"
      ]
     }
    ],
@@ -802,6 +821,15 @@
     "print(f\"Duration: {round(time.time() - start, 2)} seconds and predictions: {predictions}\")"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "#### Bad Usage\n",
+    "Waiting for large number of tasks to finish using `ray.get()` on all of them before processing\n",
+    "the results returned."
+   ]
+  },
   {
    "cell_type": "markdown",
    "metadata": {},
@@ -811,16 +839,16 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 33,
+   "execution_count": 22,
    "metadata": {},
    "outputs": [
     {
      "name": "stdout",
      "output_type": "stream",
      "text": [
-      "Duration: 6.37 seconds and predictions: [0, 1, 3, 1, 0, 2]\n",
-      "CPU times: user 40.9 ms, sys: 22.9 ms, total: 63.8 ms\n",
-      "Wall time: 6.37 s\n"
+      "Duration: 6.88 seconds and predictions: [0, 1, 0, 1, 2, 3]\n",
+      "CPU times: user 50.5 ms, sys: 28.3 ms, total: 78.9 ms\n",
+      "Wall time: 6.88 s\n"
      ]
     }
    ],
@@ -846,6 +874,31 @@
     "**Notice**: You get some incremental difference. However, for compute intensive and many tasks, and overtime, this difference will be in order of magnitude."
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "#### Good Usage:\n",
+    "For large number of tasks in flight, use `ray.get()` and `ray.wait()` to implement pipeline execution of processing\n",
+    "those tasks already finished. \n",
+    "\n",
+    "#### Takeaway Tip 3: \n",
+    "Use pipeline execution to process results returned from the finished Ray tasks using `ray.get()` and `ray.wait()`"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "#### Exercise for **Tip 3**:\n",
+    " * Extend or add more images of sizes: 1024, 2048, ...\n",
+    " * Increase the number of returns to 2 from the `ray.wait`()`\n",
+    " * Process the images\n",
+    " \n",
+    " \n",
+    " Is there a difference in processing time between serial and pipelining?"
+   ]
+  },
   {
    "cell_type": "markdown",
    "metadata": {},
@@ -875,7 +928,7 @@
      "name": "stdout",
      "output_type": "stream",
      "text": [
-      " results = 125005622.08 and duration = 0.703 sec\n"
+      " results = 124995931.27 and duration = 0.729 sec\n"
      ]
     }
    ],
@@ -889,6 +942,15 @@
     "print(f\" results = {results:.2f} and duration = {time.time() - start:.3f} sec\")"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "#### Bad Usage\n",
+    "Avoid sending the same large objects to Ray remote tasks. This creates multiple copies of the same\n",
+    "object in the Ray distributed object store. Storing and fetching and copying identical object can degrade performance overtime."
+   ]
+  },
   {
    "cell_type": "markdown",
    "metadata": {},
@@ -906,7 +968,7 @@
      "name": "stdout",
      "output_type": "stream",
      "text": [
-      " results = 124977503.45 and duration = 0.330 sec\n"
+      " results = 124998578.35 and duration = 0.418 sec\n"
      ]
     }
    ],
@@ -924,20 +986,16 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "### Exercise\n",
+    "#### Good Usage\n",
+    "Place or insert the large object store into Ray's remote object store and only send the object Ref to the Ray remote task.\n",
     "\n",
-    "For **Tip 3**:\n",
-    " * Extend or add more images of sizes: 1024, 2048, ...\n",
-    " * Increase the number of returns to 2 from the `ray.wait`()`\n",
-    " * Process the images\n",
-    " \n",
-    " \n",
-    " Is there a difference in processing time between serial and pipelining?"
+    "#### Takeaway Tip 4:\n",
+    "Avoid sending the same large object to a Ray remote tasks. Instead, put it into the object store and only send the object ref."
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": null,
+   "execution_count": 26,
    "metadata": {},
    "outputs": [],
    "source": [
@@ -950,13 +1008,13 @@
    "source": [
     "### Summary\n",
     "\n",
-    "In this short tutorial, we got a short glimpse at the Ray Core APIs. By no means it was comprehensive, but we touched on some methods we \n",
-    "have seen in the previous lessons; however, here with those methods, we explored additional arguments to the `.remote()` call such as number of return\n",
+    "In this short tutorial, we got a short glimpse at the Ray Core APIs. By no means it was comprehensive, but we touched upon some methods we \n",
+    "have seen in the previous lessons. With those methods, we explored additional arguments to the `.remote()` call such as number of return\n",
     "statements as well as how to supply runtime environments and dependencies for your Ray cluster during `ray.init()` call. Note that some arguments to `ray.init()` \n",
-    "can also be supplied to `ray.remote()` decorator, such as num_cpus, num_gpus, runtime_env, etc. \n",
+    "can also be supplied to `ray.remote()` decorator, such as `num_cpus`, `num_gpus`, `runtime_env`, etc. \n",
     "\n",
     "More importantly, we walked through some tips and tricks that many developers new to Ray can easily stumble upon. Although the examples were short and simple,\n",
-    "the idea and cautionary tales are important part of the learning process."
+    "the lessons behind the cautionary tales are important part of the learning process."
    ]
   },
   {