|
162 | 162 | "</div>\n" |
163 | 163 | ], |
164 | 164 | "text/plain": [ |
165 | | - "RayContext(dashboard_url='127.0.0.1:8265', python_version='3.8.13', ray_version='2.2.0', ray_commit='b6af0887ee5f2e460202133791ad941a41f15beb', address_info={'node_ip_address': '127.0.0.1', 'raylet_ip_address': '127.0.0.1', 'redis_address': None, 'object_store_address': '/tmp/ray/session_2022-12-31_16-01-17_526291_25937/sockets/plasma_store', 'raylet_socket_name': '/tmp/ray/session_2022-12-31_16-01-17_526291_25937/sockets/raylet', 'webui_url': '127.0.0.1:8265', 'session_dir': '/tmp/ray/session_2022-12-31_16-01-17_526291_25937', 'metrics_export_port': 62563, 'gcs_address': '127.0.0.1:62593', 'address': '127.0.0.1:62593', 'dashboard_agent_listen_port': 52365, 'node_id': '4e9cbd35b4e72abbf50b1b6201b666cb5ce50f1aab8c5753b21f2283'})" |
| 165 | + "RayContext(dashboard_url='127.0.0.1:8265', python_version='3.8.13', ray_version='2.2.0', ray_commit='b6af0887ee5f2e460202133791ad941a41f15beb', address_info={'node_ip_address': '127.0.0.1', 'raylet_ip_address': '127.0.0.1', 'redis_address': None, 'object_store_address': '/tmp/ray/session_2023-01-01_08-43-09_914236_61566/sockets/plasma_store', 'raylet_socket_name': '/tmp/ray/session_2023-01-01_08-43-09_914236_61566/sockets/raylet', 'webui_url': '127.0.0.1:8265', 'session_dir': '/tmp/ray/session_2023-01-01_08-43-09_914236_61566', 'metrics_export_port': 61622, 'gcs_address': '127.0.0.1:54042', 'address': '127.0.0.1:54042', 'dashboard_agent_listen_port': 52365, 'node_id': 'b5a66792d6abaa014788485a35caaa837573e16d451ea93aef504f8d'})" |
166 | 166 | ] |
167 | 167 | }, |
168 | 168 | "execution_count": 3, |
|
187 | 187 | "cell_type": "markdown", |
188 | 188 | "metadata": {}, |
189 | 189 | "source": [ |
190 | | - "## Fetching Cluster Information\n", |
| 190 | + "### Fetching Cluster Information\n", |
191 | 191 | "\n", |
192 | 192 | "Many methods return information:\n", |
193 | 193 | "\n", |
|
214 | 214 | "text": [ |
215 | 215 | "\n", |
216 | 216 | "ray.get_gpu_ids(): []\n", |
217 | | - "ray.nodes(): [{'NodeID': '4e9cbd35b4e72abbf50b1b6201b666cb5ce50f1aab8c5753b21f2283', 'Alive': True, 'NodeManagerAddress': '127.0.0.1', 'NodeManagerHostname': 'Juless-MacBook-Pro-16', 'NodeManagerPort': 60973, 'ObjectManagerPort': 60972, 'ObjectStoreSocketName': '/tmp/ray/session_2022-12-31_16-01-17_526291_25937/sockets/plasma_store', 'RayletSocketName': '/tmp/ray/session_2022-12-31_16-01-17_526291_25937/sockets/raylet', 'MetricsExportPort': 62563, 'NodeName': '127.0.0.1', 'alive': True, 'Resources': {'node:127.0.0.1': 1.0, 'CPU': 4.0, 'memory': 49407454413.0, 'object_store_memory': 2147483648.0}}]\n", |
218 | | - "ray.cluster_resources(): {'CPU': 4.0, 'node:127.0.0.1': 1.0, 'memory': 49407454413.0, 'object_store_memory': 2147483648.0}\n", |
219 | | - "ray.available_resources(): {'memory': 49407454413.0, 'object_store_memory': 2147483648.0, 'CPU': 4.0, 'node:127.0.0.1': 1.0}\n", |
| 217 | + "ray.nodes(): [{'NodeID': 'b5a66792d6abaa014788485a35caaa837573e16d451ea93aef504f8d', 'Alive': True, 'NodeManagerAddress': '127.0.0.1', 'NodeManagerHostname': 'Juless-MacBook-Pro-16', 'NodeManagerPort': 50600, 'ObjectManagerPort': 50599, 'ObjectStoreSocketName': '/tmp/ray/session_2023-01-01_08-43-09_914236_61566/sockets/plasma_store', 'RayletSocketName': '/tmp/ray/session_2023-01-01_08-43-09_914236_61566/sockets/raylet', 'MetricsExportPort': 61622, 'NodeName': '127.0.0.1', 'alive': True, 'Resources': {'node:127.0.0.1': 1.0, 'CPU': 4.0, 'object_store_memory': 2147483648.0, 'memory': 49037236634.0}}]\n", |
| 218 | + "ray.cluster_resources(): {'memory': 49037236634.0, 'object_store_memory': 2147483648.0, 'node:127.0.0.1': 1.0, 'CPU': 4.0}\n", |
| 219 | + "ray.available_resources(): {'memory': 49037236634.0, 'CPU': 4.0, 'object_store_memory': 2147483648.0, 'node:127.0.0.1': 1.0}\n", |
220 | 220 | "\n" |
221 | 221 | ] |
222 | 222 | } |
|
283 | 283 | "cell_type": "markdown", |
284 | 284 | "metadata": {}, |
285 | 285 | "source": [ |
286 | | - "### @ray.method()\n", |
| 286 | + "## @ray.method()\n", |
287 | 287 | "\n", |
288 | 288 | "Related to `@ray.remote()`, [@ray.method()](https://ray.readthedocs.io/en/latest/package-ref.html#ray.method) allows you to specify the number of return values for a method in a task or an actor, by passing the `num_returns` keyword argument. None of the other `@ray.remote()` keyword arguments are allowed. Here is an example:" |
289 | 289 | ] |
|
297 | 297 | "name": "stdout", |
298 | 298 | "output_type": "stream", |
299 | 299 | "text": [ |
300 | | - "(LIONEL MESSIE, 5, 12.100000000000001)\n" |
| 300 | + "(LIONEL MESSIE, 8, 12.100000000000001)\n" |
301 | 301 | ] |
302 | 302 | } |
303 | 303 | ], |
|
333 | 333 | "name": "stdout", |
334 | 334 | "output_type": "stream", |
335 | 335 | "text": [ |
336 | | - "(LIONEL MESSIE, 9, 12.100000000000001)\n" |
| 336 | + "(LIONEL MESSIE, 5, 12.100000000000001)\n" |
337 | 337 | ] |
338 | 338 | } |
339 | 339 | ], |
|
369 | 369 | "name": "stdout", |
370 | 370 | "output_type": "stream", |
371 | 371 | "text": [ |
372 | | - "(LIONEL MESSIE, 10, 12.100000000000001)\n" |
| 372 | + "(LIONEL MESSIE, 7, 12.100000000000001)\n" |
373 | 373 | ] |
374 | 374 | } |
375 | 375 | ], |
|
394 | 394 | "cell_type": "markdown", |
395 | 395 | "metadata": {}, |
396 | 396 | "source": [ |
397 | | - "# Tips and Tricks for first-time users\n", |
| 397 | + "## Tips and Tricks for first-time users\n", |
398 | 398 | "Because Ray's core APIs are simple and flexible, first time users can trip upon certain API calls in Ray's usage patterns. This short tips & tricks will insure you against unexpected results. Below we briefly explore a handful of API calls and their best practices." |
399 | 399 | ] |
400 | 400 | }, |
|
438 | 438 | "name": "stdout", |
439 | 439 | "output_type": "stream", |
440 | 440 | "text": [ |
441 | | - "CPU times: user 54.1 ms, sys: 24.9 ms, total: 79 ms\n", |
442 | | - "Wall time: 5.1 s\n" |
| 441 | + "CPU times: user 45 ms, sys: 21.5 ms, total: 66.5 ms\n", |
| 442 | + "Wall time: 5.09 s\n" |
443 | 443 | ] |
444 | 444 | }, |
445 | 445 | { |
|
486 | 486 | "name": "stdout", |
487 | 487 | "output_type": "stream", |
488 | 488 | "text": [ |
489 | | - "CPU times: user 19.6 ms, sys: 11.4 ms, total: 31 ms\n", |
490 | | - "Wall time: 2.31 s\n" |
| 489 | + "CPU times: user 15 ms, sys: 9.86 ms, total: 24.8 ms\n", |
| 490 | + "Wall time: 2.26 s\n" |
491 | 491 | ] |
492 | 492 | }, |
493 | 493 | { |
|
554 | 554 | "name": "stdout", |
555 | 555 | "output_type": "stream", |
556 | 556 | "text": [ |
557 | | - "CPU times: user 144 ms, sys: 195 ms, total: 339 ms\n", |
| 557 | + "CPU times: user 136 ms, sys: 175 ms, total: 311 ms\n", |
558 | 558 | "Wall time: 12.9 s\n" |
559 | 559 | ] |
560 | 560 | }, |
|
603 | 603 | "name": "stdout", |
604 | 604 | "output_type": "stream", |
605 | 605 | "text": [ |
606 | | - "CPU times: user 7.23 s, sys: 2.32 s, total: 9.55 s\n", |
607 | | - "Wall time: 10.8 s\n" |
| 606 | + "CPU times: user 7.22 s, sys: 3.33 s, total: 10.5 s\n", |
| 607 | + "Wall time: 11.9 s\n" |
608 | 608 | ] |
609 | 609 | }, |
610 | 610 | { |
|
638 | 638 | "One way to mitigate is to make the remote tasks \"larger\" in order to amortize invocation overhead. This is achieved by aggregating tasks into bigger chunks of 1000.\n" |
639 | 639 | ] |
640 | 640 | }, |
| 641 | + { |
| 642 | + "cell_type": "markdown", |
| 643 | + "metadata": {}, |
| 644 | + "source": [ |
| 645 | + "#### Bad Usage\n", |
| 646 | + "Avoid small many tiny tasks as the overhead to scheduler may be slower than serial execution" |
| 647 | + ] |
| 648 | + }, |
641 | 649 | { |
642 | 650 | "cell_type": "code", |
643 | 651 | "execution_count": 16, |
|
658 | 666 | "name": "stdout", |
659 | 667 | "output_type": "stream", |
660 | 668 | "text": [ |
661 | | - "CPU times: user 204 ms, sys: 26.7 ms, total: 230 ms\n", |
662 | | - "Wall time: 3.92 s\n" |
| 669 | + "CPU times: user 223 ms, sys: 33.3 ms, total: 257 ms\n", |
| 670 | + "Wall time: 3.95 s\n" |
663 | 671 | ] |
664 | 672 | } |
665 | 673 | ], |
|
674 | 682 | "cell_type": "markdown", |
675 | 683 | "metadata": {}, |
676 | 684 | "source": [ |
677 | | - "A huge difference in execution time, almost **4X** faster!" |
| 685 | + "A huge difference in execution time, almost **4X** faster!\n", |
| 686 | + "\n", |
| 687 | + "#### Good Usage\n", |
| 688 | + "Break or restructure many small tasks into batches or chunks of large Ray remote tasks, as demonstrated above\n", |
| 689 | + "\n", |
| 690 | + "#### Takeway tip 2:\n", |
| 691 | + "Where possible strive to batch tiny smaller Ray tasks into chuncks to reap the benefits of distributing them." |
678 | 692 | ] |
679 | 693 | }, |
| 694 | + { |
| 695 | + "cell_type": "markdown", |
| 696 | + "metadata": {}, |
| 697 | + "source": [] |
| 698 | + }, |
680 | 699 | { |
681 | 700 | "cell_type": "markdown", |
682 | 701 | "metadata": {}, |
|
732 | 751 | }, |
733 | 752 | { |
734 | 753 | "cell_type": "code", |
735 | | - "execution_count": 31, |
| 754 | + "execution_count": 20, |
736 | 755 | "metadata": {}, |
737 | 756 | "outputs": [], |
738 | 757 | "source": [ |
|
780 | 799 | }, |
781 | 800 | { |
782 | 801 | "cell_type": "code", |
783 | | - "execution_count": 32, |
| 802 | + "execution_count": 21, |
784 | 803 | "metadata": {}, |
785 | 804 | "outputs": [ |
786 | 805 | { |
787 | 806 | "name": "stdout", |
788 | 807 | "output_type": "stream", |
789 | 808 | "text": [ |
790 | | - "Duration: 9.13 seconds and predictions: [0, 0, 1, 1, 2, 3]\n", |
791 | | - "CPU times: user 55.2 ms, sys: 30.5 ms, total: 85.7 ms\n", |
792 | | - "Wall time: 9.13 s\n" |
| 809 | + "Duration: 8.96 seconds and predictions: [0, 0, 1, 1, 2, 3]\n", |
| 810 | + "CPU times: user 61.2 ms, sys: 33.2 ms, total: 94.4 ms\n", |
| 811 | + "Wall time: 8.96 s\n" |
793 | 812 | ] |
794 | 813 | } |
795 | 814 | ], |
|
802 | 821 | "print(f\"Duration: {round(time.time() - start, 2)} seconds and predictions: {predictions}\")" |
803 | 822 | ] |
804 | 823 | }, |
| 824 | + { |
| 825 | + "cell_type": "markdown", |
| 826 | + "metadata": {}, |
| 827 | + "source": [ |
| 828 | + "#### Bad Usage\n", |
| 829 | + "Waiting for large number of tasks to finish using `ray.get()` on all of them before processing\n", |
| 830 | + "the results returned." |
| 831 | + ] |
| 832 | + }, |
805 | 833 | { |
806 | 834 | "cell_type": "markdown", |
807 | 835 | "metadata": {}, |
|
811 | 839 | }, |
812 | 840 | { |
813 | 841 | "cell_type": "code", |
814 | | - "execution_count": 33, |
| 842 | + "execution_count": 22, |
815 | 843 | "metadata": {}, |
816 | 844 | "outputs": [ |
817 | 845 | { |
818 | 846 | "name": "stdout", |
819 | 847 | "output_type": "stream", |
820 | 848 | "text": [ |
821 | | - "Duration: 6.37 seconds and predictions: [0, 1, 3, 1, 0, 2]\n", |
822 | | - "CPU times: user 40.9 ms, sys: 22.9 ms, total: 63.8 ms\n", |
823 | | - "Wall time: 6.37 s\n" |
| 849 | + "Duration: 6.88 seconds and predictions: [0, 1, 0, 1, 2, 3]\n", |
| 850 | + "CPU times: user 50.5 ms, sys: 28.3 ms, total: 78.9 ms\n", |
| 851 | + "Wall time: 6.88 s\n" |
824 | 852 | ] |
825 | 853 | } |
826 | 854 | ], |
|
846 | 874 | "**Notice**: You get some incremental difference. However, for compute intensive and many tasks, and overtime, this difference will be in order of magnitude." |
847 | 875 | ] |
848 | 876 | }, |
| 877 | + { |
| 878 | + "cell_type": "markdown", |
| 879 | + "metadata": {}, |
| 880 | + "source": [ |
| 881 | + "#### Good Usage:\n", |
| 882 | + "For large number of tasks in flight, use `ray.get()` and `ray.wait()` to implement pipeline execution of processing\n", |
| 883 | + "those tasks already finished. \n", |
| 884 | + "\n", |
| 885 | + "#### Takeaway Tip 3: \n", |
| 886 | + "Use pipeline execution to process results returned from the finished Ray tasks using `ray.get()` and `ray.wait()`" |
| 887 | + ] |
| 888 | + }, |
| 889 | + { |
| 890 | + "cell_type": "markdown", |
| 891 | + "metadata": {}, |
| 892 | + "source": [ |
| 893 | + "#### Exercise for **Tip 3**:\n", |
| 894 | + " * Extend or add more images of sizes: 1024, 2048, ...\n", |
| 895 | + " * Increase the number of returns to 2 from the `ray.wait`()`\n", |
| 896 | + " * Process the images\n", |
| 897 | + " \n", |
| 898 | + " \n", |
| 899 | + " Is there a difference in processing time between serial and pipelining?" |
| 900 | + ] |
| 901 | + }, |
849 | 902 | { |
850 | 903 | "cell_type": "markdown", |
851 | 904 | "metadata": {}, |
|
875 | 928 | "name": "stdout", |
876 | 929 | "output_type": "stream", |
877 | 930 | "text": [ |
878 | | - " results = 125005622.08 and duration = 0.703 sec\n" |
| 931 | + " results = 124995931.27 and duration = 0.729 sec\n" |
879 | 932 | ] |
880 | 933 | } |
881 | 934 | ], |
|
889 | 942 | "print(f\" results = {results:.2f} and duration = {time.time() - start:.3f} sec\")" |
890 | 943 | ] |
891 | 944 | }, |
| 945 | + { |
| 946 | + "cell_type": "markdown", |
| 947 | + "metadata": {}, |
| 948 | + "source": [ |
| 949 | + "#### Bad Usage\n", |
| 950 | + "Avoid sending the same large objects to Ray remote tasks. This creates multiple copies of the same\n", |
| 951 | + "object in the Ray distributed object store. Storing and fetching and copying identical object can degrade performance overtime." |
| 952 | + ] |
| 953 | + }, |
892 | 954 | { |
893 | 955 | "cell_type": "markdown", |
894 | 956 | "metadata": {}, |
|
906 | 968 | "name": "stdout", |
907 | 969 | "output_type": "stream", |
908 | 970 | "text": [ |
909 | | - " results = 124977503.45 and duration = 0.330 sec\n" |
| 971 | + " results = 124998578.35 and duration = 0.418 sec\n" |
910 | 972 | ] |
911 | 973 | } |
912 | 974 | ], |
|
924 | 986 | "cell_type": "markdown", |
925 | 987 | "metadata": {}, |
926 | 988 | "source": [ |
927 | | - "### Exercise\n", |
| 989 | + "#### Good Usage\n", |
| 990 | + "Place or insert the large object store into Ray's remote object store and only send the object Ref to the Ray remote task.\n", |
928 | 991 | "\n", |
929 | | - "For **Tip 3**:\n", |
930 | | - " * Extend or add more images of sizes: 1024, 2048, ...\n", |
931 | | - " * Increase the number of returns to 2 from the `ray.wait`()`\n", |
932 | | - " * Process the images\n", |
933 | | - " \n", |
934 | | - " \n", |
935 | | - " Is there a difference in processing time between serial and pipelining?" |
| 992 | + "#### Takeaway Tip 4:\n", |
| 993 | + "Avoid sending the same large object to a Ray remote tasks. Instead, put it into the object store and only send the object ref." |
936 | 994 | ] |
937 | 995 | }, |
938 | 996 | { |
939 | 997 | "cell_type": "code", |
940 | | - "execution_count": null, |
| 998 | + "execution_count": 26, |
941 | 999 | "metadata": {}, |
942 | 1000 | "outputs": [], |
943 | 1001 | "source": [ |
|
950 | 1008 | "source": [ |
951 | 1009 | "### Summary\n", |
952 | 1010 | "\n", |
953 | | - "In this short tutorial, we got a short glimpse at the Ray Core APIs. By no means it was comprehensive, but we touched on some methods we \n", |
954 | | - "have seen in the previous lessons; however, here with those methods, we explored additional arguments to the `.remote()` call such as number of return\n", |
| 1011 | + "In this short tutorial, we got a short glimpse at the Ray Core APIs. By no means it was comprehensive, but we touched upon some methods we \n", |
| 1012 | + "have seen in the previous lessons. With those methods, we explored additional arguments to the `.remote()` call such as number of return\n", |
955 | 1013 | "statements as well as how to supply runtime environments and dependencies for your Ray cluster during `ray.init()` call. Note that some arguments to `ray.init()` \n", |
956 | | - "can also be supplied to `ray.remote()` decorator, such as num_cpus, num_gpus, runtime_env, etc. \n", |
| 1014 | + "can also be supplied to `ray.remote()` decorator, such as `num_cpus`, `num_gpus`, `runtime_env`, etc. \n", |
957 | 1015 | "\n", |
958 | 1016 | "More importantly, we walked through some tips and tricks that many developers new to Ray can easily stumble upon. Although the examples were short and simple,\n", |
959 | | - "the idea and cautionary tales are important part of the learning process." |
| 1017 | + "the lessons behind the cautionary tales are important part of the learning process." |
960 | 1018 | ] |
961 | 1019 | }, |
962 | 1020 | { |
|
0 commit comments