|
| 1 | +# Command Protocol |
| 2 | + |
| 3 | +This document describes how to drive the live `tt_um_crockpotveggies_neuron` wrapper through the TinyTapeout pins. |
| 4 | + |
| 5 | +The current protocol is a request/ready interface at the wrapper boundary. The wrapper synchronizes the external pins, de-duplicates the input request, decodes a `tt_cmd_t`, and forwards that command to the programmable neuron core. |
| 6 | + |
| 7 | +## 1. Pin Map |
| 8 | + |
| 9 | +### Inputs |
| 10 | + |
| 11 | +- `ui_in[7:0]`: primary command payload byte |
| 12 | +- `uio_in[0]`: input request / command valid |
| 13 | +- `uio_in[1]`: output acknowledge |
| 14 | +- `uio_in[7:2]`: sideband payload |
| 15 | + |
| 16 | +### Outputs |
| 17 | + |
| 18 | +- `uo_out[7:0]`: held output beat from the neuron core |
| 19 | +- `uio_out[0]`: wrapper ready for the next command |
| 20 | +- `uio_out[1]`: output valid |
| 21 | +- `uio_out[7:2]`: always `0` |
| 22 | +- `uio_oe[1:0] = 1`, `uio_oe[7:2] = 0` |
| 23 | + |
| 24 | +## 2. Input Handshake Rules |
| 25 | + |
| 26 | +The live wrapper accepts exactly one command per assertion of `uio_in[0]`. |
| 27 | + |
| 28 | +Host-side sequence: |
| 29 | + |
| 30 | +1. Drive `ui_in` and `uio_in[7:2]` with the desired command payload. |
| 31 | +2. Assert `uio_in[0]`. |
| 32 | +3. Hold the payload stable until `uio_out[0]` is observed high. |
| 33 | +4. Deassert `uio_in[0]`. |
| 34 | +5. Only then begin the next command. |
| 35 | + |
| 36 | +Important behavior: |
| 37 | + |
| 38 | +- The wrapper internally synchronizes the incoming pins before latching a command. |
| 39 | +- `in_req_seen` blocks duplicate acceptance while `uio_in[0]` remains high. |
| 40 | +- Holding `uio_in[0]` high after acceptance will not enqueue repeated commands. |
| 41 | +- A new command requires a fresh low-to-high transition of `uio_in[0]`. |
| 42 | + |
| 43 | +## 3. Output Handshake Rules |
| 44 | + |
| 45 | +The core emits one held output beat at a time. |
| 46 | + |
| 47 | +Host-side sequence: |
| 48 | + |
| 49 | +1. Wait for `uio_out[1] = 1`. |
| 50 | +2. Read `uo_out[7:0]`. |
| 51 | +3. Assert `uio_in[1]` to acknowledge the beat. |
| 52 | + |
| 53 | +The core will keep `uo_out[7:0]` stable until it sees `uio_in[1]` through the synchronized frontend. |
| 54 | + |
| 55 | +## 4. Command Classes |
| 56 | + |
| 57 | +`tt_event_decode.sv` always projects the raw pin payload into all `tt_cmd_t` fields. The active meaning depends on `cmd.kind = ui_in[7:6]`. |
| 58 | + |
| 59 | +### `CMD_CSR = 2'b00` |
| 60 | + |
| 61 | +Writes one 8-bit CSR value. |
| 62 | + |
| 63 | +Encoding: |
| 64 | + |
| 65 | +- `ui_in[5:2] = csr_addr` |
| 66 | +- `ui_in[1:0] = data[1:0]` |
| 67 | +- `uio_in[7:2] = data[7:2]` |
| 68 | + |
| 69 | +Reconstructed byte: |
| 70 | + |
| 71 | +- `cmd.data = {uio_in[7:2], ui_in[1:0]}` |
| 72 | + |
| 73 | +### `CMD_WEIGHT = 2'b01` |
| 74 | + |
| 75 | +Writes one host-supplied ternary weight. |
| 76 | + |
| 77 | +Encoding: |
| 78 | + |
| 79 | +- `ui_in[5:2] = synapse_id` |
| 80 | +- `ui_in[1:0] = weight_code` |
| 81 | + |
| 82 | +Weight codes: |
| 83 | + |
| 84 | +- `2'b00` => `0` |
| 85 | +- `2'b01` => `+1` |
| 86 | +- `2'b11` => `-1` |
| 87 | +- `2'b10` => treated as `0` |
| 88 | + |
| 89 | +### `CMD_UCODE = 2'b10` |
| 90 | + |
| 91 | +Streams one microcode byte into the 16x16 microcode store. |
| 92 | + |
| 93 | +Encoding: |
| 94 | + |
| 95 | +- `ui_in[1:0] = data[1:0]` |
| 96 | +- `uio_in[7:2] = data[7:2]` |
| 97 | + |
| 98 | +The destination byte address comes from `ucode_ptr_r`. After acceptance: |
| 99 | + |
| 100 | +- `ucode_prog_we` pulses for one cycle |
| 101 | +- `ucode_prog_addr = ucode_ptr_r` |
| 102 | +- `ucode_prog_data = cmd.data` |
| 103 | +- `ucode_ptr_r` auto-increments |
| 104 | + |
| 105 | +### `CMD_EVENT = 2'b11` |
| 106 | + |
| 107 | +Queues one inbound event in the 2-entry FIFO. |
| 108 | + |
| 109 | +Encoding: |
| 110 | + |
| 111 | +- `ui_in[5:2] = sid` |
| 112 | +- `ui_in[1:0] = tag` |
| 113 | +- `uio_in[7:2] = event_time` |
| 114 | + |
| 115 | +Queued event payload: |
| 116 | + |
| 117 | +- `sid[3:0]` |
| 118 | +- `tag[1:0]` |
| 119 | +- `event_time[5:0]` |
| 120 | + |
| 121 | +## 5. `cmd_ready` Behavior |
| 122 | + |
| 123 | +The neuron core computes readiness from the decoded command class. |
| 124 | + |
| 125 | +### Non-event commands |
| 126 | + |
| 127 | +`CMD_CSR`, `CMD_WEIGHT`, and `CMD_UCODE` are accepted whenever: |
| 128 | + |
| 129 | +- `ena = 1` |
| 130 | +- `rst_n = 1` |
| 131 | +- no event is currently in flight (`busy_r = 0`) |
| 132 | + |
| 133 | +They do not depend on FIFO occupancy or a held output beat, but they are intentionally blocked while the core is mid-event so an in-flight event sees a stable program image and weight configuration. |
| 134 | + |
| 135 | +### Event commands |
| 136 | + |
| 137 | +`CMD_EVENT` is accepted only when: |
| 138 | + |
| 139 | +- `ena = 1` |
| 140 | +- `rst_n = 1` |
| 141 | +- no event is currently in flight (`busy_r = 0`) |
| 142 | +- no output beat is currently held |
| 143 | +- the registered FIFO level is not `2` |
| 144 | + |
| 145 | +The live implementation uses a state-only fullness check (`fifo_level != 2`) to avoid a combinational loop through FIFO ready/pop logic, so a full FIFO will not accept a same-cycle replacement push. |
| 146 | + |
| 147 | +## 6. CSR Map |
| 148 | + |
| 149 | +The core uses a compact wrapper-owned CSR bank. |
| 150 | + |
| 151 | +### `0x0` `CSR_CTRL` |
| 152 | + |
| 153 | +Pulse bits: |
| 154 | + |
| 155 | +- `bit0`: soft runtime reset |
| 156 | +- `bit1`: clear held output beat |
| 157 | +- `bit2`: clear event FIFO |
| 158 | + |
| 159 | +### `0x1` `CSR_UCODE_PTR` |
| 160 | + |
| 161 | +- `cmd.data[4:0]` sets the byte pointer used by `CMD_UCODE` |
| 162 | + |
| 163 | +### `0x2` `CSR_UCODE_LEN` |
| 164 | + |
| 165 | +- `cmd.data[3:0]` sets the last active microcode step index |
| 166 | +- `0` means one active instruction |
| 167 | +- `15` means sixteen active instructions |
| 168 | + |
| 169 | +### `0x3` `CSR_VEC_BASE_01` |
| 170 | + |
| 171 | +- `cmd.data[3:0]` => vector base for `tag 0` |
| 172 | +- `cmd.data[7:4]` => vector base for `tag 1` |
| 173 | + |
| 174 | +### `0x4` `CSR_VEC_BASE_23` |
| 175 | + |
| 176 | +- `cmd.data[3:0]` => vector base for `tag 2` |
| 177 | +- `cmd.data[7:4]` => vector base for `tag 3` |
| 178 | + |
| 179 | +### `0x5` `CSR_INIT_VI` |
| 180 | + |
| 181 | +- `cmd.data[3:0]` => reset/init value for `R0 = V` |
| 182 | +- `cmd.data[7:4]` => reset/init value for `R1 = I` |
| 183 | + |
| 184 | +### `0x6` `CSR_INIT_TR` |
| 185 | + |
| 186 | +- `cmd.data[3:0]` => reset/init value for `R2 = TH` |
| 187 | +- `cmd.data[7:4]` => reset/init value for `R3 = R` |
| 188 | + |
| 189 | +### `0x7` `CSR_INIT_T01` |
| 190 | + |
| 191 | +- `cmd.data[3:0]` => reset/init value for `R4 = T0` |
| 192 | +- `cmd.data[7:4]` => reset/init value for `R5 = T1` |
| 193 | + |
| 194 | +### `0x8` `CSR_INIT_WAUX` |
| 195 | + |
| 196 | +- `cmd.data[3:0]` => reset/init value for `R6 = W` |
| 197 | +- `cmd.data[7:4]` => reset/init value for `R7 = AUX` |
| 198 | + |
| 199 | +This only changes the RF reset image for `W` and `AUX`. |
| 200 | + |
| 201 | +It does not preload the persistent 16-entry synapse weight bank. Use `CMD_WEIGHT` for that. |
| 202 | + |
| 203 | +## 7. FIFO Semantics |
| 204 | + |
| 205 | +The ingress FIFO is two entries deep and stores only event commands. |
| 206 | + |
| 207 | +Properties: |
| 208 | + |
| 209 | +- in-order delivery |
| 210 | +- simultaneous push and pop supported |
| 211 | +- `level` is `0`, `1`, or `2` |
| 212 | +- `out_valid` mirrors whether slot 0 is occupied |
| 213 | +- `clear` empties both entries immediately on the next clock edge |
| 214 | + |
| 215 | +The neuron core pops automatically whenever: |
| 216 | + |
| 217 | +- `ena = 1` |
| 218 | +- `rst_n = 1` |
| 219 | +- `have_out_r = 0` |
| 220 | +- `out_valid = 1` |
| 221 | + |
| 222 | +There is no separate "run" command for service once an event has entered the FIFO. |
| 223 | + |
| 224 | +## 8. Output Beat Encoding |
| 225 | + |
| 226 | +`uo_out[7:0]` is generated by the `EMIT` micro-op and stored in `neuron_state`. |
| 227 | + |
| 228 | +Layout: |
| 229 | + |
| 230 | +- `uo_out[7] = 1` valid marker |
| 231 | +- `uo_out[6:5] = emitted tag literal` |
| 232 | +- `uo_out[4:1] = last_sid` |
| 233 | +- `uo_out[0] = spike_flag` |
| 234 | + |
| 235 | +Only the first `EMIT` encountered during one event service pass is kept. |
| 236 | + |
| 237 | +## 9. Practical Programming Sequence |
| 238 | + |
| 239 | +A typical host-side setup looks like this: |
| 240 | + |
| 241 | +1. Program `CSR_UCODE_PTR` if you want to start writing microcode somewhere other than byte `0`. |
| 242 | +2. Stream microcode bytes with `CMD_UCODE`. |
| 243 | +3. Program `CSR_UCODE_LEN`. |
| 244 | +4. Program `CSR_VEC_BASE_01` and `CSR_VEC_BASE_23`. |
| 245 | +5. Program any desired initial RF values with the `CSR_INIT_*` registers. |
| 246 | +6. Program weights with `CMD_WEIGHT`. |
| 247 | +7. Send events with `CMD_EVENT`. |
| 248 | + |
| 249 | +Because non-event commands are blocked while `busy_r = 1`, the clean operating model is: |
| 250 | + |
| 251 | +- program while idle |
| 252 | +- then enqueue events |
| 253 | +- then optionally perform more programming only after the current event retires |
| 254 | + |
| 255 | +For a complete worked example, including how to map several core instances into a fully connected layer, see `docs/layer_examples.md`. |
0 commit comments