Skip to content

Commit ccbee24

Browse files
committed
Clarify a bunch of things
1 parent 008fd4c commit ccbee24

File tree

1 file changed

+20
-15
lines changed

1 file changed

+20
-15
lines changed

README.md

Lines changed: 20 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,10 @@ This is due Sunday, September 13 at midnight.
2020
from scratch. This algorithm is widely used, and will be important for
2121
accelerating your path tracer project.
2222

23+
Your stream compaction implementations in this project will simply remove `0`s
24+
from an array of `int`s. In the path tracer, you will remove terminated paths
25+
from an array of rays.
26+
2327
In addition to being useful for your path tracer, this project is meant to
2428
reorient your algorithmic thinking to the way of the GPU. On GPUs, many
2529
algorithms can benefit from massive parallelism and, in particular, data
@@ -68,6 +72,8 @@ important for debugging performance bottlenecks in your program.
6872

6973
## Part 1: CPU Scan & Stream Compaction
7074

75+
This stream compaction method will remove `0`s from an array of `int`s.
76+
7177
In `stream_compaction/cpu.cu`, implement:
7278

7379
* `StreamCompaction::CPU::scan`: compute an exclusive prefix sum.
@@ -86,17 +92,20 @@ These implementations should only be a few lines long.
8692
In `stream_compaction/naive.cu`, implement `StreamCompaction::Naive::scan`
8793

8894
This uses the "Naive" algorithm from GPU Gems 3, Section 39.2.1. We haven't yet
89-
taught shared memory, but you **shouldn't use it yet**. Example 39-1 uses
95+
taught shared memory, and you **shouldn't use it yet**. Example 39-1 uses
9096
shared memory, but is limited to operating on very small arrays! Instead, write
9197
this using global memory only. As a result of this, you will have to do
9298
`ilog2ceil(n)` separate kernel invocations.
9399

94100
Beware of errors in Example 39-1 in the book; both the pseudocode and the CUDA
95-
code in the online version of this chapter are known to have a few small errors
101+
code in the online version of Chapter 39 are known to have a few small errors
96102
(in superscripting, missing braces, bad indentation, etc.)
97103

98-
Make sure your implementation works on non-power-of-two sized arrays (see
99-
`ilog2ceil`).
104+
Since the parallel scan algorithm operates on a binary tree structure, it works
105+
best with arrays with power-of-two length. Make sure your implementation works
106+
on non-power-of-two sized arrays (see `ilog2ceil`). This requires extra memory
107+
- your intermediate array sizes will need to be rounded to the next power of
108+
two.
100109

101110

102111
## Part 3: Work-Efficient GPU Scan & Stream Compaction
@@ -106,20 +115,16 @@ Make sure your implementation works on non-power-of-two sized arrays (see
106115
In `stream_compaction/efficient.cu`, implement
107116
`StreamCompaction::Efficient::scan`
108117

109-
This uses the "Naive" algorithm from GPU Gems 3, Section 39.2.1. We haven't yet
110-
taught shared memory, but you **shouldn't use it yet**. Example 39-1 uses
111-
shared memory, but is limited to operating on very small arrays! Instead, write
112-
this using global memory only. As a result of this, you will have to do
113-
`ilog2ceil(n)` separate kernel invocations.
114-
115-
Beware of errors in Example 39-2 in the book; both the pseudocode and the CUDA
116-
code in the online version of this chapter are known to have a few small errors
117-
(in superscripting, missing braces, bad indentation, etc.)
118+
All of the text in Part 2 applies.
118119

119-
Make sure your implementation works on non-power-of-two sized arrays (see
120-
`ilog2ceil`).
120+
* This uses the "Work-Efficient" algorithm from GPU Gems 3, Section 39.2.2.
121+
* Beware of errors in Example 39-2.
122+
* Test non-power-of-two sized arrays.
121123

122124
### 3.2. Stream Compaction
125+
126+
This stream compaction method will remove `0`s from an array of `int`s.
127+
123128
In `stream_compaction/efficient.cu`, implement
124129
`StreamCompaction::Efficient::compact`
125130

0 commit comments

Comments
 (0)