wolfCrypt on TI C2000 C28x (LAUNCHXL-F28P55X)#10724
Draft
dgarske wants to merge 4 commits into
Draft
Conversation
Contributor
There was a problem hiding this comment.
Pull request overview
This PR adds and CI-guards a bare-metal wolfCrypt port for TI C2000 C28x targets where CHAR_BIT == 16, introducing gated fixes so hashing, DRBG, ML-DSA verify, and SP-math ECC work correctly when a C “byte” is wider than 8 bits.
Changes:
- Introduces
WOLFSSL_NO_OCTET_BYTEdetection and uses octet-wise load/store paths to avoid invalid byte/word aliasing onCHAR_BIT != 8targets (SHA-256/512 family, SHA-3/SHAKE, Base64 CT decode, DRBG helpers, rotate helpers). - Adds “smallest memory” ML-DSA verify mode that streams
zper polynomial to reduce pinned RAM inwc_MlDsaKey. - Adds TI C2000 compile-only guard scripts plus a GitHub Actions workflow that downloads the TI CGT and compiles a scoped subset.
Reviewed changes
Copilot reviewed 19 out of 19 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| wolfssl/wolfcrypt/wc_port.h | Makes atomic arg type selection robust for 16-bit int by also checking UINT_MAX. |
| wolfssl/wolfcrypt/wc_mldsa.h | Adds WOLFSSL_MLDSA_VERIFY_SMALLEST_MEM struct layout variant for reduced verify RAM. |
| wolfssl/wolfcrypt/types.h | Adds WOLFSSL_NO_OCTET_BYTE auto-detection; adjusts WC_16BIT_CPU 64-bit availability behavior. |
| wolfssl/wolfcrypt/sp_int.h | Adds support for unsigned char being 16-bit (no native 8-bit type). |
| wolfssl/wolfcrypt/settings.h | Requires explicit opt-in for SP math on 16-bit-int CPUs via WOLFSSL_SP_ALLOW_16BIT_CPU. |
| wolfssl/wolfcrypt/dilithium.h | Adds smallest-mem verify gating and defaults slow Montgomery reduction macros on WC_16BIT_CPU. |
| wolfcrypt/test/test.c | Switches large-digest constants from C strings to byte[] to avoid CHAR_BIT!=8 pitfalls. |
| wolfcrypt/src/wc_port.c | Fixes init-state static assert to use CHAR_BIT instead of hardcoded 8. |
| wolfcrypt/src/wc_mldsa.c | Adds octet-masking for packed bytes and fixes integer-promotion/sign issues on 16-bit int; adds streaming z verify path. |
| wolfcrypt/src/sha512.c | Adds octet-wise word load/store and corrects length carry/length placement for CHAR_BIT!=8. |
| wolfcrypt/src/sha3.c | Forces bytewise Keccak absorb/squeeze for WOLFSSL_NO_OCTET_BYTE and adds squeeze helper. |
| wolfcrypt/src/sha256.c | Adds octet-wise word load/store and corrects length carry/length placement for CHAR_BIT!=8. |
| wolfcrypt/src/random.c | Fixes DRBG serialization/addition helpers for non-8-bit “byte” targets. |
| wolfcrypt/src/misc.c | Fixes rotate helpers to use CHAR_BIT-based bit width when needed. |
| wolfcrypt/src/coding.c | Ensures Base64 CT decode returns 0xFF for invalid chars even when byte is wider than 8 bits. |
| wolfcrypt/benchmark/benchmark.c | Adds static buffers for WOLFSSL_NO_MALLOC benchmarking and adjusts frees/allocations accordingly. |
| scripts/ti-c2000/user_settings.h | Adds minimal CI-only config for cl2000 compile-guard. |
| scripts/ti-c2000/compile.sh | Adds compile-only script to build a scoped source set with TI cl2000. |
| .github/workflows/ti-c2000-compile.yml | Adds CI workflow to download/cache TI CGT and run the compile-only guard. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
… hashes, DRBG Enables wolfCrypt on toolchains where a C byte/char is wider than 8 bits (e.g. TI C2000 C28x, CHAR_BIT == 16), all gated on WOLFSSL_WIDE_BYTE and a no-op on 8-bit-byte targets (the default fast paths are left exactly as-is): - types.h: auto-set WOLFSSL_WIDE_BYTE for CHAR_BIT != 8 / known TI C2000 toolchains (and define CHAR_BIT = 16 when <limits.h> is absent); wc_port.h/.c widen the atomic init-state bitfield + CHAR_BIT static assert for 16-bit int. - settings.h + sp_int.h: allow SP math on a 16-bit-int CPU via WOLFSSL_SP_ALLOW_16BIT_CPU, and detect a 16-bit char in the SP smallest-type selection. - misc.c/misc.h: shared big-endian octet<->word helpers (WordsFromBytesBE32/64, BytesFromWordsBE32/64) for WOLFSSL_WIDE_BYTE, where a word cannot be aliased as an octet stream. They are CHAR_BIT-generic, cl2000-safe (loads accumulate with <<= 8, since (word)octet << 24 is miscompiled as a 16-bit shift), in-place safe for the SHA schedule, and store by octet count for partial digests. misc.c rotate width uses CHAR_BIT. - coding.c: mask the constant-time base64 result to an octet. - sha256.c/sha512.c: use the shared helpers for the schedule load and digest store, plus a CHAR_BIT*sizeof length carry; sha3.c: octet-wise Keccak squeeze. - random.c: Hash-DRBG length + reseed-counter serialization via the shared helpers (and an octet-masked carry) under WOLFSSL_WIDE_BYTE; default builds keep the word-aliasing path unchanged. WOLFSSL_WIDE_BYTE replaces the earlier WOLFSSL_NO_OCTET_BYTE working name.
…EST_MEM ML-DSA-87 keygen/sign/verify on a 16-bit byte/int CPU (TI C28x), gated and a no-op on normal targets: - Encode/decode integer-promotion fixes: a byte/word16 field promotes to *unsigned* int where int is 16-bit, so '2 - field' was unsigned and a negative coefficient zero-extended into sword32 (e.g. -1 -> 0x0000FFFF); cast the unpacked field to sword32 (eta-2/eta-4/t0 decode). Bit-packers relied on (byte) truncating to 8 bits; mask with MLDSA_OCT() and cast the <<MLDSA_D shift to sword32 (eta-2/t0/t1/gamma1 encode). - dilithium.h: shift-based Montgomery reduction on WC_16BIT_CPU (cl2000 miscompiles the multiply form). - New WOLFSSL_MLDSA_VERIFY_SMALLEST_MEM: stream the signature z vector one polynomial at a time instead of pinning the whole l-vector, cutting the ML-DSA-87 verify key by ~6 KB (with WOLFSSL_MLDSA_ASSIGN_KEY, ~10.7 KB total verify RAM on the C28x).
…mpile CI - test.c: store the SHA/SHAKE large_digest KAT vectors as brace-init byte arrays (clean octets) instead of "\x.." string literals, which a signed-16-bit-char toolchain (cl2000) would sign-extend. - benchmark.c: WOLFSSL_NO_MALLOC mode uses static plain/cipher buffers and skips the key/iv XMALLOC/XFREE (gated; default build unchanged). - scripts/ti-c2000/ + .github/workflows/ti-c2000-compile.yml: a hardware-free cl2000 compile-only CI guard for the CHAR_BIT!=8 wolfCrypt subset.
…it CPUs The TI cl2000 (C2000 C28x) compiler miscompiles the 32x32->32 low multiply used for the q^-1 step of mldsa_mont_red() - verified on a TMS320F28P550SJ, the ML-DSA-87 verify KAT fails (res=0) - but compiles the 32x64->64 widening multiply correctly. Compute the q^-1 product through the 64-bit path (MLDSA_MUL_QINV_WIDE64): correct on any conforming compiler and, on the C28x, ~4% faster than the shift-based reduction (305 vs 317 ms/op for ML-DSA-87 verify). dilithium.h auto-selects it for WC_16BIT_CPU and leaves the q multiply enabled (it compiles correctly); a user can still force the shift form with MLDSA_MUL_QINV_SLOW / MLDSA_MUL_Q_SLOW. Validated on hardware for keygen+sign+verify (round-trip res=1). No effect on 8-bit/>=32-bit-int builds.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
wolfCrypt: support TI C2000 C28x (CHAR_BIT == 16) targets
What
Enables wolfCrypt on toolchains where a C
byte/intis wider than 8 bits - specifically the TI C2000 C28x DSP, whereCHAR_BIT == 16(the smallest addressable unit is a 16-bit cell,int/shortare 16-bit,longis 32-bit). Validated on a TI LAUNCHXL-F28P55X (TMS320F28P550SJ) at 150 MHz: SHA-256/384/512(+512-224/256), SHA-3, SHAKE128/256, ML-DSA-87 (verify, keygen, sign), and ECDSA + ECDH P-256 all pass on hardware.wolfcrypt_testpasses on x86-64 with no regression.Why it's non-trivial
On a 16-bit-
chartarget, aword32occupies two 16-bit cells (two octets packed per cell),sizeof(word32) == 2, and abyte[]holds one octet per cell. So the common idioms - aliasing a word as a byte stream ((byte*)&w,XMEMCPY+ByteReverseWords),sizeofas a byte count,(byte)xto truncate to an octet, and8 * sizeof(x)for a bit width - are all wrong. There is also a cl2000 codegen quirk:(word32)octet << 24is miscompiled as a 16-bit shift (the fix accumulates with<<= 8), and the 32x32->32 q^-1 multiply in the ML-DSA Montgomery reduction is miscompiled (split-testing on hardware pinned it to that one multiply; the 64-bit widening multiply compiles correctly, so the fix computes the q^-1 product through the 64-bit path, which is also ~4% faster than the shift-based form).Changes (4 commits, all gated / no-op on 8-bit-byte targets)
types.hauto-detectsWOLFSSL_WIDE_BYTE(CHAR_BIT!=8 / TI C2000 toolchains), guaranteesCHAR_BITis defined, and adds the sharedWC_OCTET()octet mask;wc_port.{h,c}widen the atomic init-state bitfield for 16-bitint;settings.h+sp_int.hallow SP math on a 16-bit-intCPU (WOLFSSL_SP_ALLOW_16BIT_CPU, 16-bit-charSP type detection);misc.crotate bit-width viaCHAR_BIT;coding.cbase64 octet mask;sha256/sha512octet-wise big-endian word I/O +CHAR_BIT*sizeoflength carry;sha3.coctet-wise Keccak squeeze;random.coctet-portable Hash-DRBG length/counter serialization.byte/word16field promotes to unsigned 16-bitint, so2 - fieldwas unsigned and a negative coefficient zero-extended intosword32; cast the field tosword32); encode octet masks. AddsWOLFSSL_MLDSA_VERIFY_SMALLEST_MEM, which streams the signature's z vector one polynomial at a time instead of pinning the whole l-vector - cutting the ML-DSA-87 verify key by ~6 KB (withWOLFSSL_MLDSA_ASSIGN_KEY, ~10.7 KB total verify RAM)."\x.."string is sign-extended by a signed-16-bit-charcompiler);WOLFSSL_NO_MALLOCbenchmark buffers; and a hardware-freecl2000compile-only CI guard (scripts/ti-c2000/+.github/workflows/ti-c2000-compile.yml).mldsa_mont_red()through the 32x64->64 widening multiply (MLDSA_MUL_QINV_WIDE64, auto-enabled forWC_16BIT_CPU) instead of the 32x32->32 low multiply cl2000 miscompiles; correct on any conforming compiler and ~4% faster than the shift-based form on the C28x.Algorithms validated on hardware (TI F28P55x @ 150 MHz)
SHA-256; SHA-384; SHA-512; SHA-512/224; SHA-512/256; SHA3-224/256/384/512; SHAKE128; SHAKE256; HMAC/Hash wrappers; SHA-256 Hash-DRBG; ML-DSA-87 verify, key generation and signing; ECDSA P-256 sign and verify; ECDH P-256 key agreement. (
wolfcrypt_testMEMORY/mutex/full-ML-DSA report config-expected results on this bare-metal, verify-only, no-WOLFSSL_MEMORYbuild.)Benchmarks (TI F28P55x @ 150 MHz, generic C)
SHAKE vs a reference C implementation (cycles for 1 KB): SHAKE128 ~824 k (ref 1,195,069); SHAKE256 ~1.01 M (ref 1,360,788) - roughly 26-31% fewer cycles. ML-DSA-87 verify RAM: ~10.7 KB total (struct ~8.7 KB + stack <2 KB, zero heap) with
WOLFSSL_MLDSA_VERIFY_SMALLEST_MEM+WOLFSSL_MLDSA_ASSIGN_KEY, down from ~22 KB. The ~305 ms/op verify figure reflects two optimizations measured on hardware: the 64-bit-widened Montgomery q^-1 multiply above (this PR; 317 -> 305 ms/op) and the companion example running the Keccak permutation and the ML-DSA NTTs from RAM (example PR; 354 -> 317 ms/op).Notes
Every change is behind
WOLFSSL_WIDE_BYTE/WC_16BIT_CPU/WC_SHA3_BYTEWISE/WOLFSSL_SP_ALLOW_16BIT_CPU/WOLFSSL_MLDSA_*, or is an idempotent octet mask (WC_OCTET), so 8-bit-byte builds are functionally unchanged (CHAR_BIT == 8makes theCHAR_BIT-based expressions byte-for-byte identical to the originals). The bare-metal board example (BSP, linker, KATs, harness) is the companion PR wolfSSL/wolfssl-examples#576 (wolfSSL/wolfssl-examples#576), underembedded/ti-c2000-f28p55x/- not in this PR. There is no public C28x instruction-set simulator, so the CI is compile-only; on-target KATs run on a hardware-in-the-loop runner.Test
./configure --enable-alland--enable-dilithium --enable-experimental;wolfcrypt_test(incl. ECC, ML-DSA) passes.makethe wolfssl-examplesembedded/ti-c2000-f28p55x(default verify+test,SIGN=1,ECC=1); all KATs + round-trips pass on the F28P55x.CGT_ROOT=... scripts/ti-c2000/compile.sh.