@@ -18,36 +18,35 @@ https://www.boost.org/doc/libs/1_66_0/more/getting_started/unix-variants.html#ea
1818
1919
2020# Build
21- Change target ID in source code, example below is gfx908
22- https://github.com/asroy/modular_convolution/blob/aafb5eb18781f1ac9e06a17c3e53d968dd53dcc0/composable_kernel/include/utility/config.amd.hpp.in#L16-L23
23-
2421Add path of Boost
2522```
2623 export LD_LIBRARY_PATH=/usr/local/lib:$LD_LIBRARY_PATH
2724```
2825
2926```
3027mkdir build && cd build
28+ ```
3129
32- # need to manually set target ID, example below is gfx908
30+ cmake cmd. Need to Specify target ID, example below is gfx908
31+ ```
3332cmake \
34- -D CMAKE_BUILD_TYPE=Release \
35- -D DEVICE_BACKEND=AMD \
36- -D CMAKE_CXX_FLAGS ="-O3 --amdgpu-target=gfx908 -mllvm --amdgpu-spill-vgpr-to-agpr=0 -gline-tables-only -save-temps=$CWD" \
37- -D CMAKE_CXX_COMPILER=/opt/rocm/bin/hipcc \
38- -D CMAKE_PREFIX_PATH=/opt/rocm \
39- -D CMAKE_VERBOSE_MAKEFILE:BOOL=ON \
33+ -D CMAKE_BUILD_TYPE=Release \
34+ -D CMAKE_CXX_FLAGS="-DCK_AMD_GPU_GFX908 -O3 --amdgpu-target=gfx908 -mllvm --amdgpu-spill-vgpr-to-agpr=0 -gline-tables-only -save-temps=$PWD" \
35+ -D HIP_ONLINE_COMPILER_FLAGS ="-DCK_AMD_GPU_GFX908" \
36+ -D CMAKE_CXX_COMPILER=/opt/rocm/bin/hipcc \
37+ -D CMAKE_PREFIX_PATH=/opt/rocm \
38+ -D CMAKE_VERBOSE_MAKEFILE:BOOL=ON \
4039..
4140```
4241
4342Build drivers: \
44- `` conv_driver_v2 `` is (offline compilation) driver for forward convolution, \
45- `` conv_bwd_data_driver_v2 `` is (offline compilation) driver for backward-data convolution \
46- `` conv_driver_v2_olc `` is (online compilation) driver for forward convolution
43+ `` conv_fwd_driver_offline `` is (offline compilation) driver for forward convolution, \
44+ `` conv_bwd_driver_offline `` is (offline compilation) driver for backward-data convolution \
45+ `` conv_fwd_driver_online `` is (online compilation) driver for forward convolution
4746```
48- make -j conv_driver_v2
49- make -j conv_bwd_data_driver_v2
50- make -j conv_driver_v2_olc
47+ make -j conv_fwd_driver_offline
48+ make -j conv_bwd_driver_offline
49+ make -j conv_fwd_driver_online
5150```
5251
5352# Run
@@ -60,18 +59,18 @@ Build drivers: \
6059* log: 0 = no log; 1 = do log
6160* repeat: number of time kernel being launched
6261```
63- ########################### layout algo verify init log repeat N__ K___ C___ Y X Hi_ Wi__ Strides Dilations LeftPads RightPads
64- ./conv_driver_v2 0 6 0 3 0 1 128 256 192 3 3 71 71 2 2 1 1 1 1 1 1
65- ./conv_driver_v2 0 6 0 3 0 1 256 1024 256 3 3 14 14 1 1 1 1 1 1 1 1
66- ./conv_driver_v2 1 9 0 3 0 1 128 256 192 3 3 71 71 2 2 1 1 1 1 1 1
67- ./conv_driver_v2 1 9 0 3 0 1 256 1024 256 3 3 14 14 1 1 1 1 1 1 1 1
68- ./conv_bwd_data_driver_v2 1 1 0 3 0 1 256 256 1024 3 3 14 14 1 1 1 1 1 1 1 1
62+ ######################################################## layout algo verify init log repeat N__ K___ C___ Y X Hi_ Wi__ Strides Dilations LeftPads RightPads
63+ ./host/driver_offline/conv_fwd_driver_offline 0 4 0 0 0 1 128 256 192 3 3 71 71 2 2 1 1 1 1 1 1
64+ ./host/driver_offline/conv_fwd_driver_offline 0 4 0 0 0 1 256 1024 256 3 3 14 14 1 1 1 1 1 1 1 1
65+ ./host/driver_offline/conv_fwd_driver_offline 1 5 0 0 0 1 128 256 192 3 3 71 71 2 2 1 1 1 1 1 1
66+ ./host/driver_offline/conv_fwd_driver_offline 1 5 0 0 0 1 256 1024 256 3 3 14 14 1 1 1 1 1 1 1 1
67+ ./host/driver_offline/conv_bwd_driver_offline 1 5 0 0 0 1 256 256 1024 3 3 14 14 1 1 1 1 1 1 1 1
6968```
7069
7170# Result
7271Forward convoltuion, FP16, NCHW
7372```
74- ./conv_driver_v2 0 6 0 3 0 1 128 256 192 3 3 71 71 2 2 1 1 1 1 1 1
73+ ./host/driver_offline/conv_fwd_driver_offline 0 4 0 0 0 1 128 256 192 3 3 71 71 2 2 1 1 1 1 1 1
7574
7675layout: 0
7776in: dim 4, lengths {128, 192, 71, 71}, strides {967872, 5041, 71, 1}
@@ -93,7 +92,7 @@ Average time : 1.4155 ms, 103.686 TFlop/s
9392
9493Forward convoltuion, FP16, NCHW
9594```
96- ./conv_driver_v2 0 6 0 3 0 1 256 1024 256 3 3 14 14 1 1 1 1 1 1 1 1
95+ ./host/driver_offline/conv_fwd_driver_offline 0 4 0 0 0 1 256 1024 256 3 3 14 14 1 1 1 1 1 1 1 1
9796
9897 layout: 0
9998in: dim 4, lengths {256, 256, 14, 14}, strides {50176, 196, 14, 1}
@@ -115,7 +114,7 @@ Average time : 2.21357 ms, 106.959 TFlop/s
115114
116115 Forward convolution, FP16, NHWC
117116 ```
118- ./conv_driver_v2 1 9 0 3 0 1 128 256 192 3 3 71 71 2 2 1 1 1 1 1 1
117+ ./host/driver_offline/conv_fwd_driver_offline 1 5 0 0 0 1 128 256 192 3 3 71 71 2 2 1 1 1 1 1 1
119118
120119 layout: 1
121120in: dim 4, lengths {128, 71, 71, 192}, strides {967872, 13632, 192, 1}
@@ -137,7 +136,7 @@ Average time : 1.12014 ms, 131.025 TFlop/s
137136
138137 Forward convolution, FP16, NHWC
139138 ```
140- ./conv_driver_v2 1 9 0 3 0 1 256 1024 256 3 3 14 14 1 1 1 1 1 1 1 1
139+ ./host/driver_offline/conv_fwd_driver_offline 1 5 0 0 0 1 256 1024 256 3 3 14 14 1 1 1 1 1 1 1 1
141140
142141 layout: 1
143142in: dim 4, lengths {256, 14, 14, 256}, strides {50176, 3584, 256, 1}
@@ -159,7 +158,7 @@ Average time : 1.86877 ms, 126.693 TFlop/s
159158
160159 Backward data convolution, FP16, NHWC
161160 ```
162- ./conv_bwd_data_driver_v2 1 1 0 3 0 1 256 256 1024 3 3 14 14 1 1 1 1 1 1 1 1
161+ ./host/driver_offline/conv_bwd_driver_offline 1 1 0 3 0 1 256 256 1024 3 3 14 14 1 1 1 1 1 1 1 1
163162
164163 layout: 1
165164in: dim 4, lengths {256, 14, 14, 1024}, strides {200704, 14336, 1024, 1}
0 commit comments