@@ -20,6 +20,10 @@ This is due Sunday, September 13 at midnight.
2020from scratch. This algorithm is widely used, and will be important for
2121accelerating your path tracer project.
2222
23+ Your stream compaction implementations in this project will simply remove ` 0 ` s
24+ from an array of ` int ` s. In the path tracer, you will remove terminated paths
25+ from an array of rays.
26+
2327In addition to being useful for your path tracer, this project is meant to
2428reorient your algorithmic thinking to the way of the GPU. On GPUs, many
2529algorithms can benefit from massive parallelism and, in particular, data
@@ -68,6 +72,8 @@ important for debugging performance bottlenecks in your program.
6872
6973## Part 1: CPU Scan & Stream Compaction
7074
75+ This stream compaction method will remove ` 0 ` s from an array of ` int ` s.
76+
7177In ` stream_compaction/cpu.cu ` , implement:
7278
7379* ` StreamCompaction::CPU::scan ` : compute an exclusive prefix sum.
@@ -86,17 +92,20 @@ These implementations should only be a few lines long.
8692In ` stream_compaction/naive.cu ` , implement ` StreamCompaction::Naive::scan `
8793
8894This uses the "Naive" algorithm from GPU Gems 3, Section 39.2.1. We haven't yet
89- taught shared memory, but you ** shouldn't use it yet** . Example 39-1 uses
95+ taught shared memory, and you ** shouldn't use it yet** . Example 39-1 uses
9096shared memory, but is limited to operating on very small arrays! Instead, write
9197this using global memory only. As a result of this, you will have to do
9298` ilog2ceil(n) ` separate kernel invocations.
9399
94100Beware of errors in Example 39-1 in the book; both the pseudocode and the CUDA
95- code in the online version of this chapter are known to have a few small errors
101+ code in the online version of Chapter 39 are known to have a few small errors
96102(in superscripting, missing braces, bad indentation, etc.)
97103
98- Make sure your implementation works on non-power-of-two sized arrays (see
99- ` ilog2ceil ` ).
104+ Since the parallel scan algorithm operates on a binary tree structure, it works
105+ best with arrays with power-of-two length. Make sure your implementation works
106+ on non-power-of-two sized arrays (see ` ilog2ceil ` ). This requires extra memory
107+ - your intermediate array sizes will need to be rounded to the next power of
108+ two.
100109
101110
102111## Part 3: Work-Efficient GPU Scan & Stream Compaction
@@ -106,20 +115,16 @@ Make sure your implementation works on non-power-of-two sized arrays (see
106115In ` stream_compaction/efficient.cu ` , implement
107116` StreamCompaction::Efficient::scan `
108117
109- This uses the "Naive" algorithm from GPU Gems 3, Section 39.2.1. We haven't yet
110- taught shared memory, but you ** shouldn't use it yet** . Example 39-1 uses
111- shared memory, but is limited to operating on very small arrays! Instead, write
112- this using global memory only. As a result of this, you will have to do
113- ` ilog2ceil(n) ` separate kernel invocations.
114-
115- Beware of errors in Example 39-2 in the book; both the pseudocode and the CUDA
116- code in the online version of this chapter are known to have a few small errors
117- (in superscripting, missing braces, bad indentation, etc.)
118+ All of the text in Part 2 applies.
118119
119- Make sure your implementation works on non-power-of-two sized arrays (see
120- ` ilog2ceil ` ).
120+ * This uses the "Work-Efficient" algorithm from GPU Gems 3, Section 39.2.2.
121+ * Beware of errors in Example 39-2.
122+ * Test non-power-of-two sized arrays.
121123
122124### 3.2. Stream Compaction
125+
126+ This stream compaction method will remove ` 0 ` s from an array of ` int ` s.
127+
123128In ` stream_compaction/efficient.cu ` , implement
124129` StreamCompaction::Efficient::compact `
125130
0 commit comments