Skip to content

Commit 50bdc79

Browse files
widgetiiclaude
andauthored
Enable MMU + D-cache: fix sustained host→device WRITE (#20)
* Enable MMU + D-cache: fixes host→device WRITE reliability ARMv7 short-descriptor page tables with 1MB identity-mapped sections. DDR (128MB from RAM_BASE) is cacheable write-back/write-allocate. All I/O regions (UART, FMC, CRG, flash window) are device/uncached. With D-cache, COBS decode + CRC32 processing is ~10x faster, eliminating PL011 FIFO overflow during sustained host→device transfers. Previously WRITE failed after ~16-420KB; now 256KB verified at 79 KB/s. Page table (16KB) allocated in BSS with 16KB alignment for TTBR0. Tested on hi3516ev300: - 16KB write: OK (previously OK) - 64KB write: OK (previously FAILED) - 256KB write: OK (previously IMPOSSIBLE) - All verified with CRC32 read-back Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * IRQ-driven UART RX + MMU/D-cache (WIP) PL011 RX interrupt handler drains hardware FIFO into 4KB ring buffer automatically. GIC configured for UART0 IRQ (SPI 7 on ev200/ev300). IRQ mode stack set up. proto_recv reads from ring buffer via uart_getc_safe — no more polling soft_rx_drain. Combined with MMU/D-cache, this should eliminate sustained WRITE failures. Testing showed 3/4 blocks work but block 4 loses 3 packets (14848/16384 bytes received). Ring buffer overflow suspected. Known issue: 8KB ring buffer crashes agent (BSS overlap or GIC issue). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * WIP: Backpressure for WRITE, RX drain during TX, proto_reset_rx Per-packet COBS ACK in handle_write for flow control. Host waits for ACK before sending next DATA packet. Added proto_drain_fifo call in uart_putc TX wait loop to prevent RX FIFO overflow during bidirectional backpressure traffic. Added proto_reset_rx to flush both software and hardware RX buffers. WRITE_MAX_TRANSFER set to 32MB (single block) to avoid inter-block READY packet desynchronization. Root cause found: selfupdate also loses data, producing corrupted agent binaries. Need per-packet backpressure in selfupdate too. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Critical fix: per-packet backpressure in SELFUPDATE + WRITE Root cause found: selfupdate was blasting packets without flow control, losing data. Agent's CRC check should have caught this but corrupted binaries were being deployed, causing cascading failures in all subsequent operations. Fix: both handle_selfupdate and handle_write now send proto_send_ack after each DATA packet. Host waits for ACK before sending next. Guarantees zero data loss at any baud rate. Also: proto_drain_fifo in uart_putc TX wait loop, proto_reset_rx for flushing both hardware and software RX buffers. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Critical fix: COBS decode trailing zero stripping caused data corruption ROOT CAUSE: cobs_decode() stripped trailing zero bytes from decoded output. When a COBS packet's CRC32 had 0x00 as its MSB (LE last byte), the decode removed it, producing a 1-byte-shorter output. This caused CRC mismatch for ~1/256 of all packets — deterministic, data-dependent. Found via ASAN: "left shift of 136 by 24 places cannot be represented in type 'int'" led to investigating CRC byte extraction, which led to the COBS decode length mismatch. Fixes: - Remove trailing zero stripping from cobs_decode() (C) - Cast uint8_t to uint32_t before << 24 in CRC extraction (UB fix) - Per-packet backpressure ACK in WRITE and SELFUPDATE - Fixed _recv_packet_sync: no partial frame stashing - recv_response: reset deadline after READY skip - ASAN firmware data test catches the bug offline Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Dmitry Ilyin <widgetii@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent 2cc5b58 commit 50bdc79

File tree

13 files changed

+797
-156
lines changed

13 files changed

+797
-156
lines changed

agent/cobs.c

Lines changed: 0 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -47,10 +47,5 @@ uint32_t cobs_decode(const uint8_t *input, uint32_t len, uint8_t *output) {
4747
}
4848
}
4949

50-
/* Remove trailing zero if present */
51-
if (out_idx > 0 && output[out_idx - 1] == 0x00) {
52-
out_idx--;
53-
}
54-
5550
return out_idx;
5651
}

agent/main.c

Lines changed: 9 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -76,7 +76,8 @@ static void watchdog_disable(void) {
7676
}
7777

7878
static uint32_t read_le32(const uint8_t *p) {
79-
return p[0] | (p[1] << 8) | (p[2] << 16) | (p[3] << 24);
79+
return (uint32_t)p[0] | ((uint32_t)p[1] << 8) |
80+
((uint32_t)p[2] << 16) | ((uint32_t)p[3] << 24);
8081
}
8182

8283
static void write_le32(uint8_t *p, uint32_t v) {
@@ -226,7 +227,9 @@ static void handle_write(const uint8_t *data, uint32_t len) {
226227
uint32_t chunk = pkt_len - 2;
227228
for (uint32_t i = 0; i < chunk && received < size; i++)
228229
dest[received++] = pkt[2 + i];
229-
/* No per-packet ACK — continuous streaming for throughput */
230+
/* Backpressure: COBS-framed ACK after each DATA packet.
231+
* Host waits for this before sending next packet. */
232+
proto_send_ack(ACK_OK);
230233
} else if (cmd == 0) {
231234
uint8_t err[5];
232235
err[0] = ACK_FLASH_ERROR;
@@ -439,6 +442,8 @@ static void handle_selfupdate(const uint8_t *data, uint32_t len) {
439442
uint32_t chunk = pkt_len - 2;
440443
for (uint32_t i = 0; i < chunk && received < size; i++)
441444
dest[received++] = pkt[2 + i];
445+
/* Backpressure ACK — host must wait before sending next */
446+
proto_send_ack(ACK_OK);
442447
} else if (cmd == 0) {
443448
proto_send_ack(ACK_FLASH_ERROR);
444449
return;
@@ -448,7 +453,7 @@ static void handle_selfupdate(const uint8_t *data, uint32_t len) {
448453
uint32_t actual_crc = crc32(0, dest, size);
449454
if (actual_crc != expected_crc) {
450455
proto_send_ack(ACK_CRC_ERROR);
451-
return;
456+
return; /* Stay alive — don't jump to bad code */
452457
}
453458

454459
proto_send_ack(ACK_OK);
@@ -653,7 +658,7 @@ int main(void) {
653658
* revert to 115200. Host may have disconnected. */
654659
if (!at_default_baud) {
655660
baud_idle++;
656-
if (baud_idle >= 20) {
661+
if (baud_idle >= 60) { /* ~30 seconds */
657662
uart_set_baud(115200);
658663
while (uart_readable()) uart_getc();
659664
at_default_baud = 1;

agent/protocol.c

Lines changed: 9 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -164,7 +164,8 @@ uint8_t proto_recv(uint8_t *data, uint32_t *len, uint32_t timeout_ms) {
164164
/* Verify CRC32 — read bytes individually to avoid unaligned access */
165165
uint32_t payload_len = raw_len - 4;
166166
volatile uint8_t *cp = &rx_raw[payload_len];
167-
uint32_t expected_crc = cp[0] | (cp[1] << 8) | (cp[2] << 16) | (cp[3] << 24);
167+
uint32_t expected_crc = (uint32_t)cp[0] | ((uint32_t)cp[1] << 8) |
168+
((uint32_t)cp[2] << 16) | ((uint32_t)cp[3] << 24);
168169
uint32_t actual_crc = crc32(0, rx_raw, payload_len);
169170

170171
/* Drain after CRC */
@@ -191,3 +192,10 @@ void proto_send_ack(uint8_t status) {
191192
void proto_drain_fifo(void) {
192193
soft_rx_drain();
193194
}
195+
196+
void proto_reset_rx(void) {
197+
/* Flush everything: software FIFO + hardware FIFO */
198+
soft_rx_head = 0;
199+
soft_rx_tail = 0;
200+
uart_drain_rx();
201+
}

agent/protocol.h

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -54,4 +54,7 @@ uint32_t crc32(uint32_t crc, const uint8_t *buf, uint32_t len);
5454
* long computations to prevent 16-byte hardware FIFO overflow. */
5555
void proto_drain_fifo(void);
5656

57+
/* Reset all RX buffers (software + hardware) */
58+
void proto_reset_rx(void);
59+
5760
#endif /* PROTOCOL_H */

agent/startup.S

Lines changed: 105 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,8 @@
33
*
44
* Uploaded via HiSilicon boot protocol to DDR.
55
* DDR and clocks are already initialized by the bootrom + DDR step.
6-
* We just need to set up the stack and jump to main().
6+
* Sets up MMU with identity mapping + D-cache for fast COBS+CRC
7+
* processing, then jumps to main().
78
*/
89

910
.section .text.start
@@ -13,8 +14,6 @@
1314
/*
1415
* ARM exception vector table.
1516
* Must be at the entry point so VBAR points here.
16-
* All exceptions loop forever instead of resetting the SoC
17-
* (the SPL's handlers call reset_cpu which kills us).
1817
*/
1918
_start:
2019
_vectors:
@@ -32,42 +31,33 @@ _exc_hang:
3231

3332
/*
3433
* Data abort handler — prints fault info over UART before hanging.
35-
* On entry (ABT mode): LR_abt = faulting instruction + 8.
36-
* We must NOT use the stack (SP_abt is uninitialized).
37-
* Instead, read CP15 regs and call handler using SVC stack.
3834
*/
3935
.global _data_abort
4036
_data_abort:
41-
sub lr, lr, #8 /* LR = faulting PC */
42-
43-
/* Read fault info into r0-r2 (caller-saved, safe to clobber) */
44-
mrc p15, 0, r0, c6, c0, 0 /* r0 = DFAR (fault address) */
45-
mrc p15, 0, r1, c5, c0, 0 /* r1 = DFSR (fault status) */
37+
sub lr, lr, #8
38+
mrc p15, 0, r0, c6, c0, 0 /* r0 = DFAR */
39+
mrc p15, 0, r1, c5, c0, 0 /* r1 = DFSR */
4640
mov r2, lr /* r2 = faulting PC */
47-
48-
/* Switch to SVC mode (which has a valid stack) with IRQ/FIQ disabled */
49-
cps #0x13 /* SVC mode */
50-
41+
cps #0x13 /* SVC mode (valid stack) */
5142
push {r0-r3, lr}
5243
bl data_abort_handler
5344
pop {r0-r3, lr}
54-
5545
b _exc_hang
5646

5747
_start_code:
58-
/* Set VBAR to our vector table (use ADR for PC-relative) */
48+
/* Set VBAR to our vector table */
5949
adr r0, _start
60-
mcr p15, 0, r0, c12, c0, 0 /* Write VBAR */
50+
mcr p15, 0, r0, c12, c0, 0
6151
isb
6252

6353
/* Disable interrupts */
6454
mrs r0, cpsr
65-
orr r0, r0, #0xC0 /* Disable IRQ and FIQ */
55+
orr r0, r0, #0xC0
6656
msr cpsr_c, r0
6757

68-
/* Invalidate TLBs, caches, then disable MMU */
58+
/* Disable MMU/caches first (clean state for setup) */
6959
mov r0, #0
70-
mcr p15, 0, r0, c8, c7, 0 /* Invalidate entire TLB */
60+
mcr p15, 0, r0, c8, c7, 0 /* Invalidate TLB */
7161
mcr p15, 0, r0, c7, c5, 0 /* Invalidate I-cache */
7262
mcr p15, 0, r0, c7, c5, 6 /* Invalidate branch predictor */
7363
dsb
@@ -77,15 +67,15 @@ _start_code:
7767
bic r0, r0, #0x1 /* Disable MMU */
7868
bic r0, r0, #0x4 /* Disable D-cache */
7969
bic r0, r0, #0x1000 /* Disable I-cache */
80-
mcr p15, 0, r0, c1, c0, 0 /* Write SCTLR */
70+
mcr p15, 0, r0, c1, c0, 0
8171
dsb
8272
isb
8373

8474
/* Set up stack pointer (16KB below _start) */
8575
ldr sp, =_start
86-
sub sp, sp, #0x4000 /* 16KB stack below code */
76+
sub sp, sp, #0x4000
8777

88-
/* Clear BSS */
78+
/* Clear BSS (includes page table) */
8979
ldr r0, =__bss_start
9080
ldr r1, =__bss_end
9181
mov r2, #0
@@ -94,10 +84,100 @@ bss_loop:
9484
strlt r2, [r0], #4
9585
blt bss_loop
9686

87+
/* ===== MMU PAGE TABLE SETUP =====
88+
*
89+
* ARMv7 short-descriptor, 1MB sections.
90+
* Identity mapping: virtual == physical.
91+
*
92+
* Section descriptor bits:
93+
* [1:0] = 0b10 (section)
94+
* [2] = B (bufferable)
95+
* [3] = C (cacheable)
96+
* [4] = XN (execute never)
97+
* [5:8] = Domain 0
98+
* [11:10]= AP = 0b11 (full access)
99+
* [14:12]= TEX
100+
* [31:20]= Physical base (section number)
101+
*
102+
* DEVICE (uncached, strongly-ordered):
103+
* AP=11, Domain=0, XN=1, B=0, C=0, TEX=000, section
104+
* = (3<<10) | (1<<4) | (0b10) = 0x00000C12
105+
*
106+
* CACHED (write-back, write-allocate):
107+
* AP=11, Domain=0, XN=0, B=1, C=1, TEX=001, section
108+
* = (3<<10) | (1<<12) | (1<<3) | (1<<2) | (0b10) = 0x00001C0E
109+
*/
110+
111+
/* r4 = page table base, r5 = DEVICE descriptor template,
112+
* r6 = CACHED descriptor template, r7 = loop counter */
113+
ldr r4, =_page_table
114+
115+
/* Fill all 4096 entries as DEVICE (uncached) */
116+
ldr r5, =0x00000C12 /* DEVICE descriptor */
117+
mov r0, r4 /* r0 = current entry pointer */
118+
mov r1, r5 /* r1 = descriptor for section 0 */
119+
mov r7, #4096
120+
pt_fill_device:
121+
str r1, [r0], #4
122+
add r1, r1, #0x100000 /* Next 1MB section */
123+
subs r7, r7, #1
124+
bne pt_fill_device
125+
126+
/* Mark DDR sections as CACHED.
127+
* RAM_BASE is set at compile time (0x40000000 or 0x80000000).
128+
* We cache 128MB = 128 sections starting at RAM_BASE >> 20. */
129+
ldr r6, =0x00001C0E /* CACHED descriptor */
130+
ldr r1, =RAM_BASE
131+
lsr r2, r1, #20 /* r2 = first DDR section number */
132+
add r0, r4, r2, lsl #2 /* r0 = &page_table[first_section] */
133+
orr r1, r6, r2, lsl #20 /* r1 = descriptor for first DDR section */
134+
mov r7, #128 /* 128 sections = 128MB */
135+
pt_fill_ddr:
136+
str r1, [r0], #4
137+
add r1, r1, #0x100000
138+
subs r7, r7, #1
139+
bne pt_fill_ddr
140+
141+
/* ===== ENABLE MMU + CACHES ===== */
142+
143+
/* Invalidate TLB again (page table just written) */
144+
mov r0, #0
145+
mcr p15, 0, r0, c8, c7, 0
146+
dsb
147+
isb
148+
149+
/* Set TTBR0 = page_table | inner WB/WA | outer WB/WA */
150+
ldr r0, =_page_table
151+
orr r0, r0, #(1 << 0) /* IRGN bit 0 (inner WB) */
152+
orr r0, r0, #(1 << 6) /* IRGN bit 6 */
153+
orr r0, r0, #(1 << 3) /* RGN = outer WB */
154+
mcr p15, 0, r0, c2, c0, 0
155+
isb
156+
157+
/* DACR = all domains manager (no permission checks) */
158+
mvn r0, #0
159+
mcr p15, 0, r0, c3, c0, 0
160+
isb
161+
162+
/* Enable MMU + D-cache + I-cache via SCTLR */
163+
mrc p15, 0, r0, c1, c0, 0
164+
orr r0, r0, #(1 << 0) /* M = MMU enable */
165+
orr r0, r0, #(1 << 2) /* C = D-cache enable */
166+
orr r0, r0, #(1 << 12) /* I = I-cache enable */
167+
mcr p15, 0, r0, c1, c0, 0
168+
dsb
169+
isb
170+
97171
/* Jump to C main */
98172
bl main
99173

100-
/* If main returns, loop forever */
101174
hang:
102175
wfi
103176
b hang
177+
178+
/* ===== PAGE TABLE (in BSS, 16KB aligned) ===== */
179+
.section .bss
180+
.align 14
181+
.global _page_table
182+
_page_table:
183+
.space 16384

agent/test_agent

-29.4 KB
Binary file not shown.

agent/test_agent.c

Lines changed: 4 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -46,19 +46,17 @@ static void test_cobs_all_zeros(void) {
4646
ASSERT(enc[i] != 0, "cobs all-zeros: no zeros in output");
4747

4848
uint32_t dec_len = cobs_decode(enc, enc_len, dec);
49-
/* COBS strips trailing zero — [0,0,0,0] roundtrips to [0,0,0] */
50-
ASSERT(dec_len == 3, "cobs all-zeros: length (trailing zero stripped)");
51-
for (uint32_t i = 0; i < dec_len; i++)
52-
ASSERT(dec[i] == 0, "cobs all-zeros: data");
49+
ASSERT(dec_len == 4, "cobs all-zeros: length");
50+
ASSERT(memcmp(in, dec, 4) == 0, "cobs all-zeros: data");
5351
}
5452

5553
static void test_cobs_single_zero(void) {
5654
uint8_t in[] = {0};
5755
uint8_t enc[8], dec[8];
5856
uint32_t enc_len = cobs_encode(in, 1, enc);
5957
uint32_t dec_len = cobs_decode(enc, enc_len, dec);
60-
/* Single zero roundtrips to empty — trailing zero stripped */
61-
ASSERT(dec_len == 0, "cobs single zero: stripped to empty");
58+
ASSERT(dec_len == 1, "cobs single zero: length");
59+
ASSERT(dec[0] == 0, "cobs single zero: value");
6260
}
6361

6462
static void test_cobs_empty(void) {

agent/test_cobs_detail.c

Lines changed: 77 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,77 @@
1+
#include <stdio.h>
2+
#include <string.h>
3+
#include <stdint.h>
4+
#include <stdlib.h>
5+
#include "cobs.h"
6+
7+
extern uint32_t crc32(uint32_t, const uint8_t*, uint32_t);
8+
9+
10+
int main(int argc, char *argv[]) {
11+
FILE *f = fopen(argv[1], "rb");
12+
fseek(f, 0, SEEK_END); long sz = ftell(f); fseek(f, 0, SEEK_SET);
13+
uint8_t *fw = malloc(sz); fread(fw, 1, sz, f); fclose(f);
14+
15+
/* Block 3 (offset 49152), packet 29 (offset 49152 + 29*512 = 63,616) */
16+
uint32_t off = 49152 + 29 * 512;
17+
uint8_t chunk[512];
18+
memcpy(chunk, &fw[off], 512);
19+
20+
/* Build DATA packet: cmd(1) + seq(2) + data(512) + crc(4) = 519 bytes */
21+
uint8_t raw[520];
22+
raw[0] = 0x82; /* RSP_DATA */
23+
raw[1] = 29; raw[2] = 0; /* seq=29 LE */
24+
memcpy(&raw[3], chunk, 512);
25+
uint32_t raw_len = 515;
26+
uint32_t c = crc32(0, raw, raw_len);
27+
raw[raw_len+0] = c & 0xFF;
28+
raw[raw_len+1] = (c >> 8) & 0xFF;
29+
raw[raw_len+2] = (c >> 16) & 0xFF;
30+
raw[raw_len+3] = (c >> 24) & 0xFF;
31+
raw_len += 4; /* 519 */
32+
33+
printf("Raw payload: %u bytes, CRC=0x%08x\n", raw_len, c);
34+
printf(" raw[515..518] (CRC bytes): %02x %02x %02x %02x\n",
35+
raw[515], raw[516], raw[517], raw[518]);
36+
37+
/* COBS encode */
38+
uint8_t encoded[600];
39+
uint32_t enc_len = cobs_encode(raw, raw_len, encoded);
40+
printf("COBS encoded: %u bytes\n", enc_len);
41+
42+
/* COBS decode */
43+
uint8_t decoded[600];
44+
uint32_t dec_len = cobs_decode(encoded, enc_len, decoded);
45+
printf("COBS decoded: %u bytes\n", dec_len);
46+
47+
/* Compare */
48+
if (dec_len != raw_len) {
49+
printf("LENGTH MISMATCH: decoded=%u raw=%u\n", dec_len, raw_len);
50+
}
51+
52+
int match = (dec_len == raw_len) && (memcmp(decoded, raw, raw_len) == 0);
53+
printf("Roundtrip match: %s\n", match ? "YES" : "NO");
54+
55+
if (!match) {
56+
for (uint32_t i = 0; i < raw_len && i < dec_len; i++) {
57+
if (decoded[i] != raw[i]) {
58+
printf(" First diff at byte %u: decoded=0x%02x raw=0x%02x\n",
59+
i, decoded[i], raw[i]);
60+
break;
61+
}
62+
}
63+
}
64+
65+
/* Extract CRC from decoded */
66+
if (dec_len >= 5) {
67+
uint32_t plen = dec_len - 4;
68+
uint32_t exp = (uint32_t)decoded[plen] | ((uint32_t)decoded[plen+1]<<8) |
69+
((uint32_t)decoded[plen+2]<<16) | ((uint32_t)decoded[plen+3]<<24);
70+
uint32_t act = crc32(0, decoded, plen);
71+
printf("CRC check: actual=0x%08x expected=0x%08x %s\n",
72+
act, exp, act == exp ? "OK" : "MISMATCH");
73+
}
74+
75+
free(fw);
76+
return match ? 0 : 1;
77+
}

0 commit comments

Comments
 (0)