@@ -85,11 +85,16 @@ These implementations should only be a few lines long.
8585
8686In ` stream_compaction/naive.cu ` , implement ` StreamCompaction::Naive::scan `
8787
88- This uses the "Naive" algorithm from GPU Gems 3, Section 39.2.1. However, note
89- that they use shared memory in Example 39-1; don't do that yet. Instead, write
88+ This uses the "Naive" algorithm from GPU Gems 3, Section 39.2.1. We haven't yet
89+ taught shared memory, but you ** shouldn't use it yet** . Example 39-1 uses
90+ shared memory, but is limited to operating on very small arrays! Instead, write
9091this using global memory only. As a result of this, you will have to do
9192` ilog2ceil(n) ` separate kernel invocations.
9293
94+ Beware of errors in Example 39-1 in the book; the pseudocode (Example 2) is
95+ probably correct, but the CUDA code has a few small errors (missing braces, bad
96+ indentation, etc.)
97+
9398Make sure your implementation works on non-power-of-two sized arrays (see
9499` ilog2ceil ` ).
95100
@@ -101,9 +106,18 @@ Make sure your implementation works on non-power-of-two sized arrays (see
101106In ` stream_compaction/efficient.cu ` , implement
102107` StreamCompaction::Efficient::scan `
103108
104- This is equivalent to the "Work-Efficient Parallel Scan" from the slides and
105- * GPU Gems 3* section 39.2.2. Instead of using shared memory as in Example 39-2,
106- use global memory only. This will again require multiple kernel invocations.
109+ This uses the "Naive" algorithm from GPU Gems 3, Section 39.2.1. We haven't yet
110+ taught shared memory, but you ** shouldn't use it yet** . Example 39-1 uses
111+ shared memory, but is limited to operating on very small arrays! Instead, write
112+ this using global memory only. As a result of this, you will have to do
113+ ` ilog2ceil(n) ` separate kernel invocations.
114+
115+ Beware of errors in Example 39-1 in the book; the pseudocode (Example 2) is
116+ probably correct, but the CUDA code has a few small errors (missing braces, bad
117+ indentation, etc.)
118+
119+ Make sure your implementation works on non-power-of-two sized arrays (see
120+ ` ilog2ceil ` ).
107121
108122### 3.2. Stream Compaction
109123In ` stream_compaction/efficient.cu ` , implement
@@ -117,13 +131,6 @@ In `stream_compaction/common.cu`, implement these for use in `compact`:
117131* ` StreamCompaction::Common::kernMapToBoolean `
118132* ` StreamCompaction::Common::kernScatter `
119133
120- Beware of errors in Example 39-2 in the book; the pseudocode (Examples 3/4) is
121- correct, but the CUDA code has a few errors (missing braces, bad indentation,
122- etc.)
123-
124- Make sure your implementation works on non-power-of-two sized arrays (see
125- ` ilog2ceil ` ).
126-
127134
128135## Part 4: Using Thrust's Implementation
129136
0 commit comments