Skip to content

Commit 19cbf69

Browse files
bkataruclaude
andcommitted
feat(trial): add BF16 to quant harness; fix igllama json_mode crash
- trial_quant.zig: add 2B-BF16 and 9B-BF16 to QUANT_MODELS; covers Q4/Q5/Q6/Q8/BF16 precision curve for both weight classes - trial_quant.zig: change response_format from json_object to text; igllama grammar sampler (llama_sampler_init_grammar) crashes during generation for 2B+ vocab sizes β€” system prompt alone is sufficient for constrained JSON output at 2B+ - trial_quant.zig: remove --mlock spawn flag (optimization, not required) - igllama v0.3.10: fix streaming handler use-after-free in json_mode; loadGrammar + defer free β†’ direct JSON_GRAMMAR comptime const - LICENSE: add MIT license file - CHANGELOG.md: v0.2.7 entry - CLAUDE.md: version 0.2.7, roadmap items 14-15 - showcase.smd: document igllama json_mode crash finding, update engineering requirements, expand trial task suite to T01-T17, update framework version to v0.2.7/v0.3.10 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
1 parent 252eae9 commit 19cbf69

File tree

6 files changed

+68
-754
lines changed

6 files changed

+68
-754
lines changed

β€ŽCHANGELOG.mdβ€Ž

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,17 @@
22

33
All notable changes to the **powerglide** project will be documented in this file.
44

5+
## [0.2.7] - 2026-03-04
6+
7+
### Added
8+
- **BF16 trials in `trial_quant.zig`** β€” `2B-BF16` and `9B-BF16` added to `QUANT_MODELS`; harness now covers the full Q4/Q5/Q6/Q8/BF16 precision curve for both weight classes
9+
10+
### Fixed
11+
- **igllama v0.3.10 β€” streaming json_mode use-after-free** β€” streaming handler called `loadGrammar(allocator, "json")` then `defer allocator.free(gs)` inside the if-block, freeing the grammar string while the sampler still held a pointer. Replaced with direct `JSON_GRAMMAR` comptime constant (matching the non-streaming handler); no allocation, no lifetime issue. igllama no longer crashes when `response_format: {"type":"json_object"}` is sent on a streaming endpoint.
12+
13+
### Changed
14+
- **CLAUDE.md** β€” version `0.2.7`; roadmap item 14–15 updated
15+
516
## [0.2.6] - 2026-03-04
617

718
### Added

β€ŽCLAUDE.mdβ€Ž

Lines changed: 6 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -55,11 +55,11 @@ Start with: `igllama api <model> --port <N> --no-think --max-tokens 512 --thread
5555
- `OpenAIClient.json_mode = true` β†’ forces `response_format: {"type":"json_object"}` for constrained output
5656
- Doctor scans `:8090–8099` automatically
5757
- Trial harness: `zig build trial` β€” runs T01–T17 Γ— all 4 endpoints
58-
- Quant harness: `zig build trial-quant` β€” runs T01–T13 Γ— Q4/Q5/Q6/Q8 on 2B and 9B (sequential on :8090)
58+
- Quant harness: `zig build trial-quant` β€” runs T01–T13 Γ— Q4/Q5/Q6/Q8/BF16 on 2B and 9B (sequential on :8090)
5959

6060
## Current Version
6161

62-
`0.2.6` β€” 195/195 tests passing, 0 leaks.
62+
`0.2.7` β€” 195/195 tests passing, 0 leaks.
6363

6464
## Roadmap
6565

@@ -73,9 +73,8 @@ Start with: `igllama api <model> --port <N> --no-think --max-tokens 512 --thread
7373
8. βœ… igllama integration β€” local Qwen3.5 agents, json_mode, port scanning
7474
9. βœ… Session summary output on `powerglide run` completion
7575
10. βœ… Showcase page β€” dogfooding case studies: full Qwen3.5 lineup (0.8B/2B/4B/9B)
76-
11. βœ… Zig trial harness (`examples/trial.zig`) β€” T01–T13 Γ— 4 weight classes at Q4/Q8
77-
12. βœ… igllama json_mode patch β€” GBNF grammar constraint via `response_format` (v0.3.9)
76+
11. βœ… Zig trial harness (`examples/trial.zig`) β€” T01–T17 Γ— 4 weight classes at Q4/Q8
77+
12. βœ… igllama json_mode fix β€” streaming handler use-after-free (v0.3.10)
7878
13. βœ… Security: grep/glob tools use direct argv (no shell interpolation of user input)
79-
14. βœ… BF16 analysis β€” confirmed capacity-limited, not precision-limited; removed BF16 harness
80-
15. βœ… Expanded trial tasks T14–T17 (code gen, JSON round-trip, error recovery, multi-source)
81-
16. βœ… Quantization sensitivity harness (`examples/trial_quant.zig`) β€” Q4/Q5/Q6/Q8 on 2B+9B
79+
14. βœ… Quantization sensitivity harness (`examples/trial_quant.zig`) β€” Q4/Q5/Q6/Q8/BF16 on 2B+9B
80+
15. βœ… BF16 precision trials added to quant harness β€” full precision curve documented in showcase

β€ŽLICENSEβ€Ž

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
MIT License
2+
3+
Copyright (c) 2026 bkataru
4+
5+
Permission is hereby granted, free of charge, to any person obtaining a copy
6+
of this software and associated documentation files (the "Software"), to deal
7+
in the Software without restriction, including without limitation the rights
8+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9+
copies of the Software, and to permit persons to whom the Software is
10+
furnished to do so, subject to the following conditions:
11+
12+
The above copyright notice and this permission notice shall be included in all
13+
copies or substantial portions of the Software.
14+
15+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21+
SOFTWARE.

0 commit comments

Comments
Β (0)