Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
cmake_minimum_required (VERSION 3.0)
project (ISO3DFD_DPCPP)
if(WIN32)
set(CMAKE_CXX_COMPILER "dpcpp-cl")
else()
set(CMAKE_CXX_COMPILER "dpcpp")
endif()
add_subdirectory (src)
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
Copyright 2020 Intel Corporation

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
145 changes: 145 additions & 0 deletions DirectProgramming/DPC++/StructuredGrids/iso3dfd_dpcpp/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,145 @@
# `ISO3DFD DPC++` Sample

The ISO3DFD sample refers to Three-Dimensional Finite-Difference Wave Propagation in Isotropic Media. It is a three-dimensional stencil to simulate a wave propagating in a 3D isotropic medium and shows some of the more common challenges when targeting SYCL devices (GPU/CPU) in more complex applications.

For comprehensive instructions regarding DPC++ Programming, go to https://software.intel.com/en-us/oneapi-programming-guide and search based on relevant terms noted in the comments.

| Optimized for | Description
|:--- |:---
| OS | Linux* Ubuntu* 18.04; Windows 10
| Hardware | Skylake with GEN9 or newer
| Software | Intel® oneAPI DPC++ Compiler beta;
| What you will learn | How to offload the computation to GPU using Intel DPC++ compiler
| Time to complete | 15 minutes

Performance number tabulation [if applicable -- **NO for beta**]

| iso3dfd sample | Performance data
|:--- |:---
| Scalar baseline -O2 | 1.0
| SYCL | 2x speedup


## Purpose

ISO3DFD is a finite difference stencil kernel for solving the 3D acoustic isotropic wave equation which can be used as a proxy for propogating a seismic wave. Kernels in this sample are implemented as 16th order in space, with symmetric coefficients, and 2nd order in time scheme without boundary conditions.. Using Data Parallel C++, the sample can explicitly run on the GPU and/or CPU to propagate a seismic wave which is a compute intensive task.

The code will attempt first to execute on an available GPU and fallback to the system's CPU if a compatible GPU is not detected. By default, the output will print the device name where the DPC++ code ran along with the grid computation metrics - flops and effective throughput. For validating results, a serial version of the application will be run on CPU and results will be compared to the DPC++ version.


## Key Implementation Details

The basic DPC++ implementation explained in the code includes includes the use of the following :
* DPC++ local buffers and accessors (declare local memory buffers and accessors to be accessed and managed by each DPC++ workgroup)
* Code for Shared Local Memory (SLM) optimizations
* DPC++ kernels (including parallel_for function and nd-range<3> objects)
* DPC++ queues (including custom device selector and exception handlers)


## License

This code sample is licensed under MIT license.


## Building the `ISO3DFD` Program for CPU and GPU

### Include Files
The include folder is located at `%ONEAPI_ROOT%\dev-utilities\latest\include` on your development system".

### Running Samples In DevCloud
If running a sample in the Intel DevCloud, remember that you must specify the compute node (CPU, GPU) as well whether to run in batch or interactive mode. For more information see the Intel® oneAPI Base Toolkit Get Started Guide (https://devcloud.intel.com/oneapi/get-started/base-toolkit/)

### On a Linux* System
Perform the following steps:
1. Build the program using the following `cmake` commands.
```
$ mkdir build
$ cd build
$ cmake ..
$ make -j
```

> Note: by default, executable is build with kernel with direct global memory usage. You can build the kernel with shared local memory (SLM) buffers with the following:
```
cmake -DSHARED_KERNEL=1 ..
make -j
```

2. Run the program :
```
make run
```
> Note: for selecting CPU as a SYCL device use `make run_cpu`

3. Clean the program using:
```
make clean
```

### On a Windows* System Using Visual Studio* Version 2017 or Newer
```
* Build the program using VS2017 or VS2019
Right click on the solution file and open using either VS2017 or VS2019 IDE.
Right click on the project in Solution explorer and select Rebuild.
From top menu select Debug -> Start without Debugging.

* Build the program using MSBuild
Open "x64 Native Tools Command Prompt for VS2017" or "x64 Native Tools Command Prompt for VS2019"
Run - MSBuild mandelbrot.sln /t:Rebuild /p:Configuration="Release"
```

## Running the Sample
```
make run
```

### Application Parameters
You can modify the ISO3DFD parameters from the command line.
* Configurable Application Parameters

Usage: src/iso3dfd.exe n1 n2 n3 b1 b2 b3 Iterations [omp|sycl] [gpu|cpu]

n1 n2 n3 : Grid sizes for the stencil
b1 b2 b3 OR : cache block sizes for cpu openmp version.
b1 b2 : Thread block sizes in X and Y dimension for SYCL version.
and b3 : size of slice of work in Z dimension for SYCL version.
Iterations : No. of timesteps.
[omp|sycl] : Optional: Run the OpenMP or the SYCL variant. Default is to use both for validation
[gpu|cpu] : Optional: Device to run the SYCL version Default is to use the GPU if available, if not fallback to CPU

### Example of Output
```
Grid Sizes: 256 256 256
Memory Usage: 230 MB
***** Running C++ Serial variant *****
Initializing ...
--------------------------------------
time : 2.92984 secs
throughput : 57.2632 Mpts/s
flops : 3.49306 GFlops
bytes : 0.687159 GBytes/s

--------------------------------------

--------------------------------------
***** Running SYCL variant *****
Initializing ...
Running on Intel(R) Gen9
The Device Max Work Group Size is : 256
The Device Max EUCount is : 48
The blockSize x is : 32
The blockSize y is : 8
Using Global Memory Kernel
--------------------------------------
time : 0.597494 secs
throughput : 280.793 Mpts/s
flops : 17.1284 GFlops
bytes : 3.36952 GBytes/s

--------------------------------------

--------------------------------------
Final wavefields from SYCL device and CPU are equivalent: Success
--------------------------------------
```

Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
//==============================================================
// Copyright © 2020 Intel Corporation
//
// SPDX-License-Identifier: MIT
// =============================================================

#ifndef DEVICESELECTOR_HPP
#define DEVICESELECTOR_HPP

#include <cstring>
#include <iostream>
#include <string>
#include "CL/sycl.hpp"

// This is the class provided to SYCL runtime by the application to decide
// on which device to run, or whether to run at all.
// When selecting a device, SYCL runtime first takes (1) a selector provided by
// the program or a default one and (2) the set of all available devices. Then
// it passes each device to the '()' operator of the selector. Device, for
// which '()' returned the highest number, is selected. If a negative number
// was returned for all devices, then the selection process will cause an
// exception.
class MyDeviceSelector : public sycl::device_selector {
public:
MyDeviceSelector(const std::string &p) : pattern(p) {
// std::cout << "Looking for \"" << p << "\" devices" << std::endl;
}

// This is the function which gives a "rating" to devices.
virtual int operator()(const sycl::device &device) const override {
// The template parameter to device.get_info can be a variety of properties
// defined by the SYCL spec's sycl::info:: enum. Properties may have
// different types. Here we query name which is a string.
const std::string name = device.get_info<sycl::info::device::name>();
// std::cout << "Trying device: " << name << "..." << std::endl;
// std::cout << " Vendor: " <<
// device.get_info<sycl::info::device::vendor>() << std::endl;

// Device with pattern in the name is prioritized:
return (name.find(pattern) != std::string::npos) ? 100 : 1;
}

private:
std::string pattern;
};

#endif
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
//==============================================================
// Copyright © 2020 Intel Corporation
//
// SPDX-License-Identifier: MIT
// =============================================================

#include <CL/sycl.hpp>
using namespace sycl;

#include <chrono>
#include <cmath>
#include <cstring>
#include <ctime>
#include <fstream>
/*
* Parameters to define coefficients
* kHalfLength: Radius of the stencil
* Sample source code is tested for kHalfLength=8 resulting in
* 16th order Stencil finite difference kernel
*/
constexpr float dt = 0.002f;
constexpr float dxyz = 50.0f;
constexpr unsigned int kHalfLength = 8;

/*
* Padding to test and eliminate shared local memory bank conflicts for
* the shared local memory(slm) version of the kernel executing on GPU
*/
constexpr unsigned int kPad = 0;

bool Iso3dfdDevice(sycl::queue &q, float *ptr_next, float *ptr_prev,
float *ptr_vel, float *ptr_coeff, size_t n1, size_t n2,
size_t n3, size_t n1_block, size_t n2_block,
size_t n3_block, size_t end_z, unsigned int num_iterations);

void PrintTargetInfo(sycl::queue &q, unsigned int dim_x, unsigned int dim_y);

void Usage(const std::string &program_name);

void PrintStats(double time, size_t n1, size_t n2, size_t n3,
unsigned int num_iterations);

bool WithinEpsilon(float *output, float *reference, const size_t dim_x,
const size_t dim_y, const size_t dim_z,
const unsigned int radius, const int zadjust,
const float delta);

bool CheckGridDimension(size_t n1, size_t n2, size_t n3, unsigned int dim_x,
unsigned int dim_y, unsigned int block_z);

bool CheckBlockDimension(sycl::queue &q, unsigned int dim_x, unsigned int dim_y);
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@

Microsoft Visual Studio Solution File, Format Version 12.00
# Visual Studio 15
VisualStudioVersion = 15.0.27130.2010
MinimumVisualStudioVersion = 10.0.40219.1
Project("{8BC9CEB8-8B4A-11D0-8D11-00A0C91BC942}") = "iso3dfd_dpcpp", "iso3dfd_dpcpp.vcxproj", "{07DA0A96-CA76-4446-9586-99A145B9A9C8}"
EndProject
Global
GlobalSection(SolutionConfigurationPlatforms) = preSolution
Debug|x64 = Debug|x64
Release|x64 = Release|x64
EndGlobalSection
GlobalSection(ProjectConfigurationPlatforms) = postSolution
{07DA0A96-CA76-4446-9586-99A145B9A9C8}.Debug|x64.ActiveCfg = Debug|x64
{07DA0A96-CA76-4446-9586-99A145B9A9C8}.Debug|x64.Build.0 = Debug|x64
{07DA0A96-CA76-4446-9586-99A145B9A9C8}.Release|x64.ActiveCfg = Release|x64
{07DA0A96-CA76-4446-9586-99A145B9A9C8}.Release|x64.Build.0 = Release|x64
EndGlobalSection
GlobalSection(SolutionProperties) = preSolution
HideSolutionNode = FALSE
EndGlobalSection
GlobalSection(ExtensibilityGlobals) = postSolution
SolutionGuid = {3F515120-AC09-42A9-97D3-A26B1251EC9D}
EndGlobalSection
EndGlobal
Loading