oneapi-src
diff --git a/‎DirectProgramming/DPC++/Jupyter/oneapi-essentials-training/01_oneAPI_Intro/oneAPI_Intro.ipynb‎
Lines changed: 3 additions & 3 deletions b/‎DirectProgramming/DPC++/Jupyter/oneapi-essentials-training/01_oneAPI_Intro/oneAPI_Intro.ipynb‎
Lines changed: 3 additions & 3 deletions
diff --git a/‎DirectProgramming/DPC++/Jupyter/oneapi-essentials-training/02_DPCPP_Program_Structure/DPCPP_Program_Structure.ipynb‎
Lines changed: 5 additions & 6 deletions b/‎DirectProgramming/DPC++/Jupyter/oneapi-essentials-training/02_DPCPP_Program_Structure/DPCPP_Program_Structure.ipynb‎
Lines changed: 5 additions & 6 deletions
diff --git a/‎DirectProgramming/DPC++/Jupyter/oneapi-essentials-training/03_DPCPP_Unified_Shared_Memory/Unified_Shared_Memory.ipynb‎
Lines changed: 5 additions & 5 deletions b/‎DirectProgramming/DPC++/Jupyter/oneapi-essentials-training/03_DPCPP_Unified_Shared_Memory/Unified_Shared_Memory.ipynb‎
Lines changed: 5 additions & 5 deletions
diff --git a/‎DirectProgramming/DPC++/Jupyter/oneapi-essentials-training/04_DPCPP_Sub_Groups/Sub_Groups.ipynb‎
Lines changed: 12 additions & 12 deletions b/‎DirectProgramming/DPC++/Jupyter/oneapi-essentials-training/04_DPCPP_Sub_Groups/Sub_Groups.ipynb‎
Lines changed: 12 additions & 12 deletions
@@ -169,7 +169,7 @@
    "metadata": {},
    "source": [
     "## What is Data Parallel C++\n",
-    "__Data Parallel C++ (DPC++)__ is oneAPI's implementation of SYCL compiler. It takes advantage of modern C++ productivity benefits and familiar constructs, and incorporates the __SYCL*__ standard for data parallelism and heterogeneous programming. DPC++ is a __single source__ language where host code and __heterogeneous accelerator kernels__ can be mixed in same source files. A DPC++ program is invoked on the host computer and offloads the computation to an accelerator. Programmers use familiar C++ and library constructs with added functionliaties like a __queue__ for work targeting, __buffer__ for data management, and __parallel_for__ for parallelism to direct which parts of the computation and data should be offloaded."
+    "__Data Parallel C++ (DPC++)__ is oneAPI's implementation of SYCL compiler. It takes advantage of modern C++ productivity benefits and familiar constructs, and incorporates the __SYCL*__ standard for data parallelism and heterogeneous programming. DPC++ is a __single source__ language where host code and __heterogeneous accelerator kernels__ can be mixed in same source files. A DPC++ program is invoked on the host computer and offloads the computation to an accelerator. Programmers use familiar C++ and library constructs with added functionalities like a __queue__ for work targeting, __buffer__ for data management, and __parallel_for__ for parallelism to direct which parts of the computation and data should be offloaded."
    ]
   },
   {
@@ -185,7 +185,7 @@
    "metadata": {},
    "source": [
     "## HPC Single Node Workflow with oneAPI \n",
-    "Accelerated code can be written in either a kernel (SYCL) or __directive based style__. Developers can use the __Intel® DPC++ Compatibility tool__ to perform a one-time migration from __CUDA__ to __SYCL__. Existing __Fortran__ applications can use a __directive style based on OpenMP__. Existing __C++__ applications can choose either the __Kernel style__ or the __directive based style option__ and existing __OpenCL__ applications can remain in the OpenCL language or migrate to Data Parallel C++.\n",
+    "Accelerated code can be written in either a kernel (SYCL) or __directive-based style__. Developers can use the __Intel® DPC++ Compatibility tool__ to perform a one-time migration from __CUDA__ to __SYCL__. Existing __Fortran__ applications can use a __directive-based style in OpenMP__. Existing __C++__ applications can choose either the __Kernel style__ or the __directive-based style option__ and existing __OpenCL__ applications can remain in the OpenCL language or migrate to Data Parallel C++.\n",
     "\n",
     "__Intel® Advisor__ is recommended to  __Optimize__ the design for __vectorization and memory__ (CPU and GPU) and __Identify__ loops that are candidates for __offload__ and project the __performance on target accelerators.__\n",
     "\n",
@@ -285,7 +285,7 @@
     " \n",
     "#### Compiling and Running on Intel&reg; DevCloud:\n",
     " \n",
-    "For this training, we have written a script (q) to aid developers in developing projects on DevCloud. This script submits the `run.sh` script to a gpu node on DevCloud for execution, waits for the job to complete and prints out the output/errors. We will be using this command to run on DevCloud: `./q run.sh`\n",
+    "For this training, we have written a script (q) to aid developers in developing projects on DevCloud. This script submits the `run.sh` script to a GPU node on DevCloud for execution, waits for the job to complete and prints out the output/errors. We will be using this command to run on DevCloud: `./q run.sh`\n",
     "\n",
     "\n",
     "\n",
 
@@ -43,7 +43,7 @@
    "metadata": {},
    "source": [
     "## What is Data Parallel C++ and SYCL?\n",
-    "__Data Parallel C++ (DPC++)__ is oneAPI's implementation of SYCL. It is based on modern C++ productivity benefits and familiar constructs and incorporates the __SYCL__ standard for data parallelism and heterogeneous programming. SYCL is a __single source__ where __host code__ and __heterogeneous accelerator kernels__ can be mixed in same source files. A SYCL program is invoked on the host computer and offloads the computation to an accelerator. Programmers use familiar C++ and library constructs with added functionliaties like a __queue__ for work targeting, __buffer__ for data management, and __parallel_for__ for parallelism to direct which parts of the computation and data should be offloaded."
+    "__Data Parallel C++ (DPC++)__ is oneAPI's implementation of SYCL. It is based on modern C++ productivity benefits and familiar constructs and incorporates the __SYCL__ standard for data parallelism and heterogeneous programming. SYCL is a __single source__ where __host code__ and __heterogeneous accelerator kernels__ can be mixed in same source files. A SYCL program is invoked on the host computer and offloads the computation to an accelerator. Programmers use familiar C++ and library constructs with added functionalities like a __queue__ for work targeting, __buffer__ for data management, and __parallel_for__ for parallelism to direct which parts of the computation and data should be offloaded."
    ]
   },
   {
@@ -156,7 +156,7 @@
    "metadata": {},
    "source": [
     "## Kernel\n",
-    "The __kernel__ class encapsulates methods and data for executing code on the device when a command group is instantiated. Kernel object is not explicitly constructed by the user and is is constructed when a kernel dispatch function, such as __parallel_for__, is called \n",
+    "The __kernel__ class encapsulates methods and data for executing code on the device when a command group is instantiated. Kernel object is not explicitly constructed by the user and is constructed when a kernel dispatch function, such as __parallel_for__, is called \n",
     " ```cpp\n",
     " q.submit([&](handler& h) {\n",
     "  h.parallel_for(range<1>(N), [=](id<1> i) {\n",
@@ -199,7 +199,7 @@
     "## SYCL Language and Runtime\n",
     "SYCL language and runtime consists of a set of C++ classes, templates, and libraries.\n",
     "\n",
-    " __Application scope__ and __command group scope__ :\n",
+    " __Application scope__ and __command group scope__:\n",
     " * Code that executes on the host\n",
     " * The full capabilities of C++ are available at application and command group scope \n",
     "\n",
@@ -247,7 +247,7 @@
     "});\n",
     "\n",
     "```\n",
-    "The above example is good if all you need is the __index (id)__, but if you need the __range__ value in your kernel code, then you can use __item__ class instead of __id__ class , which you can use to query for the __range__ as shown below.  __item__ class represents an __individual instance__ of a kernel function, exposes additional functions to query properties of the execution range\n",
+    "The above example is good if all you need is the __index (id)__, but if you need the __range__ value in your kernel code, then you can use __item__ class instead of __id__ class, which you can use to query for the __range__ as shown below.  __item__ class represents an __individual instance__ of a kernel function, exposes additional functions to query properties of the execution range\n",
     "\n",
     "\n",
     "```cpp\n",
@@ -871,7 +871,6 @@
   {
    "cell_type": "markdown",
    "metadata": {
-    "jp-MarkdownHeadingCollapsed": true,
     "tags": []
    },
    "source": [
@@ -883,7 +882,7 @@
     "- Create a new second `vector2` and initialize to value 20.\n",
     "- Create sycl buffers for the above second vector\n",
     "- In the kernel code, create a second accessor for the second vector buffer\n",
-    "- Modify the vector increment to vector add, byt adding `vector2` to `vector1`\n",
+    "- Modify the vector increment to vector add, by adding `vector2` to `vector1`\n",
     "\n",
     "1. Edit the code cell below by following the steps and then click run ▶ to save the code to a file.\n",
     "2. Next run ▶ the cell in the __Build and Run__ section below the code to compile and execute the code."
 
@@ -49,7 +49,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "Unified Shared Memory (USM) is a pointer based memory management in SYCL. USM is a\n",
+    "Unified Shared Memory (USM) is a pointer-based memory management in SYCL. USM is a\n",
     "__pointer-based approach__ that should be familiar to C and C++ programmers who use malloc\n",
     "or new to allocate data. USM __simplifies development__ for the programmer when __porting existing\n",
     "C/C++ code__ to SYCL."
@@ -115,7 +115,7 @@
    "metadata": {},
    "source": [
     "__USM Initialization__:\n",
-    "The initialization below shows example of shared allocation using `malloc_shared`, the \"q\" queue parameter provides information about the device that memory is accessable.\n",
+    "The initialization below shows example of shared allocation using `malloc_shared`, the \"q\" queue parameter provides information about the device that memory is accessible.\n",
     "```cpp\n",
     "int *data = malloc_shared<int>(N, q);\n",
     "                  ^               ^\n",
@@ -320,11 +320,11 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "When using unified shared memory, dependences between tasks must be specified using events since tasks execute asynchronously and mulitple tasks can execute simultaneously. \n",
+    "When using unified shared memory, dependences between tasks must be specified using events since tasks execute asynchronously and multiple tasks can execute simultaneously. \n",
     "\n",
     "Programmers may either explicitly <code>wait</code> on event objects or use the <code>depends_on</code> method inside a command group to specify a list of events that must complete before a task may begin.\n",
     "\n",
-    "In the example below, the two kernel tasks are updating the same `data` array, these two kernels can execute simultanously and may cause undesired result. The first task must be complete before the second can begin, the next section will show different ways the data dependency can be resolved.\n",
+    "In the example below, the two kernel tasks are updating the same `data` array, these two kernels can execute simultaneously and may cause undesired result. The first task must be complete before the second can begin, the next section will show different ways the data dependency can be resolved.\n",
     "```cpp\n",
     "    q.parallel_for(range<1>(N), [=](id<1> i) { data[i] += 2; });\n",
     "\n",
@@ -475,7 +475,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "The code below uses USM and has three kernels that are submitted to device. The first two kernels modify two different memeory objects and the third one has a dependency on the first two. There is no data dependency between the three queue submissions, so the code can be fixed to get desired output of 25.\n",
+    "The code below uses USM and has three kernels that are submitted to device. The first two kernels modify two different memory objects and the third one has a dependency on the first two. There is no data dependency between the three queue submissions, so the code can be fixed to get desired output of 25.\n",
     "\n",
     "- Implementing **depends_on()** method gets the best performance\n",
     "- Using **in_order** queue property or **wait()** will get results but not the most efficient\n",
 
@@ -38,7 +38,7 @@
    "metadata": {},
    "source": [
     "- Understand advantages of using Subgroups in SYCL\n",
-    "- Take advantage of Subgroup collectives in ND-Range kernel implementation\n",
+    "- Take advantage of Subgroup algorithms for performance and productivity\n",
     "- Use Subgroup Shuffle operations to avoid explicit memory operations"
    ]
   },
@@ -158,7 +158,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "Once you have the subgroup handle, you can query for more information about the subgroup, do shuffle operations or use collective functions."
+    "Once you have the subgroup handle, you can query for more information about the subgroup, do shuffle operations or use group algorithm."
    ]
   },
   {
@@ -280,7 +280,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "For tuning applications for performance, sub-group size may have to be set a specific value. For example Intel(R) GPU supports sub-groups sizes of 8, 16 and 32; by default the compiler implimentation will pick optimal sub-group size, but it can also be forced to use a specific value.\n",
+    "For tuning applications for performance, sub-group size may have to be set a specific value. For example, Intel(R) GPU supports sub-groups sizes of 8, 16 and 32; by default the compiler implementation will pick optimal sub-group size, but it can also be forced to use a specific value.\n",
     "\n",
     "The supported sub-group sizes for a GPU can be queried from device information as shown below:\n",
     "\n",
@@ -405,7 +405,7 @@
     "Providing these implementations as library functions instead __increases developer productivity__ and gives implementations the ability to __generate highly optimized \n",
     "code__ for individual target devices.\n",
     "\n",
-    "Below are some of the group algorithms available for sub-groups, they include useful fuctionalities to perform shuffles, reductions, scans and votes:\n",
+    "Below are some of the group algorithms available for sub-groups, they include useful functionalities to perform shuffles, reductions, scans and votes:\n",
     "\n",
     "- select_by_group\n",
     "- shift_group_left\n",
@@ -470,7 +470,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "The code below uses subgroup shuffle to swap items in a subgroup. You can try other shuffle operations or change the fixed constant in the shuffle function to express some common commuinication patterns using `permute_group_by_xor`.\n",
+    "The code below uses subgroup shuffle to swap items in a subgroup. You can try other shuffle operations or change the fixed constant in the shuffle function to express some common communication patterns using `permute_group_by_xor`.\n",
     "\n",
     "The SYCL code below demonstrates sub-group shuffle operations, the code shows how `permute_group_by_xor` can be used to swap adjacent elements in sub-group, and also you can change the code to reverse the order of element in sub-group using a different mask.\n",
     "\n",
@@ -561,15 +561,15 @@
     "    h.parallel_for(nd_range<1>(N,B), [=](nd_item<1> item){\n",
     "      auto sg = item.get_sub_group();\n",
     "      auto i = item.get_global_id(0);\n",
-    "      /* Reduction Collective on Sub-group */\n",
+    "      /* Reduction algorithm on Sub-group */\n",
     "      int result = reduce_over_group(sg, data[i], plus<>());\n",
     "      //int result = reduce_over_group(sg, data[i], maximum<>());\n",
     "      //int result = reduce_over_group(sg, data[i], minimum<>());\n",
     "    });\n",
     "\n",
     "```\n",
     "\n",
-    "The SYCL code below demonstrates sub-group collectives: Inspect code, you can change the operator \"_plus_\" to \"_maximum_\" or \"_minimum_\" and check output:\n",
+    "The SYCL code below demonstrates sub-group algorithm: Inspect code, you can change the operator \"_plus_\" to \"_maximum_\" or \"_minimum_\" and check output:\n",
     "\n",
     "1. Inspect the code cell below and click run ▶ to save the code to file.\n",
     "\n",
@@ -608,7 +608,7 @@
     "    auto sg = item.get_sub_group();\n",
     "    auto i = item.get_global_id(0);\n",
     "\n",
-    "    //# Add all elements in sub_group using sub_group collectives\n",
+    "    //# Add all elements in sub_group using sub_group algorithm\n",
     "    int result = reduce_over_group(sg, data[i], plus<>());\n",
     "\n",
     "    //# write sub_group sum in first location for each sub_group\n",
@@ -655,7 +655,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "The code below uses subgroup collectives `group_broadcast` function, this enables one work-item in a group to share the value of a variable with all other work-items in the group.\n",
+    "The code below uses subgroup algorithm `group_broadcast` function, this enables one work-item in a group to share the value of a variable with all other work-items in the group.\n",
     "\n",
     "The SYCL code below demonstrates sub-group broadcast function: Inspect code, there are no modifications necessary:\n",
     "\n",
@@ -742,7 +742,7 @@
     "“vote” functions) enable work-items to compare the result of a Boolean\n",
     "condition across their group.\n",
     "\n",
-    "The SYCL code below demonstrates sub-group collectives `any_of_group`, `all_of_group` and `none_of_group` functions: Inspect code, there are no modifications necessary:\n",
+    "The SYCL code below demonstrates sub-group algorithms `any_of_group`, `all_of_group` and `none_of_group` functions: Inspect code, there are no modifications necessary:\n",
     "\n",
     "1. Inspect the code cell below and click run ▶ to save the code to file.\n",
     "\n",
@@ -839,10 +839,10 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "Complete the coding excercise below using Sub-Group concepts:\n",
+    "Complete the coding exercise below using Sub-Group concepts:\n",
     "- The code has an array `data` of size `N=1024` elements initialized\n",
     "- We will offload kernel task to compute the sum of all items in each sub-group and save in new array `sg_data`\n",
-    "- We will set a the sub-group size to `S=32`, which will make the `sg_data` array of size `N/S`\n",
+    "- We will set the sub-group size to `S=32`, which will make the `sg_data` array of size `N/S`\n",
     "- Create USM shared allocation for `data` and `sg_data`\n",
     "- Create a nd-range kernel task with fixed sub-group size of `S`\n",
     "- In the kernel task, compute the sub-group sum using `reduce_over_group` function\n",