Skip to content

Commit 70b08f9

Browse files
bkataruclaude
andcommitted
feat: v0.3.0 — 4B quant curve, bench harness, T01–T17 in trial_quant
New harnesses: - examples/bench.zig (zig build bench): tokens/sec benchmark across Q4/Q8/BF16 per weight class; uses igllama v0.3.10 usage.completion_tokens for accurate counts - trial_quant.zig extended: T01–T17 (was T01–T13); 16 models (was 12) with full 4B Q4/Q5/Q6/Q8/BF16 curve alongside 2B and 9B Key findings documented in showcase: - 4B saturated at Q4 — all quants pass 13/17; 4B-Q4 (2.6 GB) is optimal - Speed cliff at 4B-Q8: 0.1 tok/s from swap thrashing (≤6 GB free RAM) - 0.8B-Q8: 3.4 tok/s; 2B-Q4: 2.9 tok/s; 4B-Q4: 1.3 tok/s igllama fix upstreamed: usage.completion_tokens no longer hardcoded 0 (PR #82, v0.3.10) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
1 parent d971abd commit 70b08f9

File tree

9 files changed

+506
-21
lines changed

9 files changed

+506
-21
lines changed

CHANGELOG.md

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,26 @@
22

33
All notable changes to the **powerglide** project will be documented in this file.
44

5+
## [0.3.0] - 2026-03-05
6+
7+
### Added
8+
- **4B quant curve completed** — downloaded Qwen3.5-4B-Q4_K_M/Q5_K_M/Q6_K GGUFs; full Q4→BF16 precision curve measured: 4B saturates at Q4 (13/17), Q4 is optimal (2.6 GB vs 7.9 GB for BF16)
9+
- **T01–T17 in `trial_quant.zig`** — harness extended from T01–T13 to T01–T17, adding code generation, JSON round-trip, error recovery, and multi-source synthesis tasks across all quantization variants
10+
- **`examples/bench.zig`** — throughput benchmark: measures tokens/second via igllama `usage.completion_tokens` across Q4/Q8/BF16 for each weight class; reports tok/s, file size, and RAM (RSS); `zig build bench`
11+
- **`zig build bench` step** — bench harness added to build.zig
12+
- **igllama v0.3.10 patch**`usage.completion_tokens` in non-streaming responses now returns real counts (was hardcoded 0); fix upstreamed as igllama PR #82, released as v0.3.10
13+
14+
### Changed
15+
- **`trial_quant.zig` QUANT_MODELS** — 12 → 16 models: added 4B-Q4/Q5/Q6/Q8 to complete the 4B quant curve alongside 4B-BF16
16+
- **`build.zig` step description** — updated to reflect T01–T17 and all four weight classes
17+
- **Showcase** — quantization sensitivity table expanded to include full 4B curve; speed benchmark section added with measured tok/s and RAM data; key finding documented (RAM cliff at 4B-Q8 on ≤6 GB systems)
18+
- **CLAUDE.md** — version 0.2.9 → 0.3.0; roadmap items 20–23 added; bench harness documented
19+
- **`src/main.zig` VERSION**`"0.2.9"``"0.3.0"`; test assertion updated
20+
- **`build.zig.zon`** — version `"0.2.9"``"0.3.0"`
21+
22+
### Fixed
23+
- **bench.zig token counting** — initial implementation used content-length/4 estimate (igllama returned `completion_tokens:0`); updated to prefer API counts with fallback; igllama v0.3.10 upstream fix makes API counts accurate
24+
525
## [0.2.9] - 2026-03-05
626

727
### Added

CLAUDE.md

Lines changed: 7 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -55,11 +55,12 @@ Start with: `igllama api <model> --port <N> --no-think --max-tokens 512 --thread
5555
- `OpenAIClient.json_mode = true` → forces `response_format: {"type":"json_object"}` for constrained output
5656
- Doctor scans `:8090–8099` automatically
5757
- Trial harness: `zig build trial` — runs T01–T17 × all 4 endpoints
58-
- Quant harness: `zig build trial-quant` — runs T01–T13 × 12 models: 0.8B-BF16 | 2B (Q4/Q5/Q6/Q8/BF16) | 4B-BF16 | 9B (Q4/Q5/Q6/Q8/BF16), sequential on :8090
58+
- Quant harness: `zig build trial-quant` — runs T01–T17 × 16 models: 0.8B-BF16 | 2B (Q4/Q5/Q6/Q8/BF16) | 4B (Q4/Q5/Q6/Q8/BF16) | 9B (Q4/Q5/Q6/Q8/BF16), sequential on :8090
59+
- Bench harness: `zig build bench` — tokens/sec throughput benchmark, accurate via igllama v0.3.10 `usage.completion_tokens`
5960

6061
## Current Version
6162

62-
`0.2.9` — 195/195 tests passing, 0 leaks.
63+
`0.3.0` — 195/195 tests passing, 0 leaks.
6364

6465
## Roadmap
6566

@@ -82,3 +83,7 @@ Start with: `igllama api <model> --port <N> --no-think --max-tokens 512 --thread
8283
17. ✅ /security-review pass — MCP input validation hardened, OOM guard on readLine, JSON injection in listAsJson fixed
8384
18. ✅ 0.8B-BF16 added to quant harness — all four weight classes now have BF16 coverage; 4B-BF16 confirmed 13/13
8485
19. ✅ MCP server hardened — type assertion panic fixed, OOM guard on stdin buffer, error logging filtered
86+
20. ✅ 4B quant curve completed — Q4/Q5/Q6 GGUFs downloaded, full Q4→BF16 curve measured; 4B saturated at Q4
87+
21. ✅ T01–T17 extended to trial_quant.zig — all 17 agentic tasks now in quantization sensitivity harness
88+
22. ✅ Throughput benchmark (`examples/bench.zig`) — tokens/sec × RAM measurement across Q4/Q8/BF16 per weight class; igllama v0.3.10 usage.completion_tokens fix integrated
89+
23. ✅ igllama v0.3.10 — populate usage.completion_tokens in non-streaming responses (patched upstream, PR #82)

build.zig

Lines changed: 20 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -95,8 +95,27 @@ pub fn build(b: *std.Build) void {
9595
}),
9696
});
9797
b.installArtifact(trial_quant_exe);
98-
const trial_quant_step = b.step("trial-quant", "Run the igllama quantization sensitivity harness (Q4/Q5/Q6/Q8 on 2B and 9B)");
98+
const trial_quant_step = b.step("trial-quant", "Run the igllama quantization sensitivity harness (T01-T17 x Q4/Q5/Q6/Q8/BF16 across all 4 weight classes)");
9999
const trial_quant_cmd = b.addRunArtifact(trial_quant_exe);
100100
trial_quant_cmd.step.dependOn(b.getInstallStep());
101101
trial_quant_step.dependOn(&trial_quant_cmd.step);
102+
103+
// Throughput benchmark (examples/bench.zig)
104+
const bench_exe = b.addExecutable(.{
105+
.name = "bench",
106+
.root_module = b.createModule(.{
107+
.root_source_file = b.path("examples/bench.zig"),
108+
.target = target,
109+
.optimize = optimize,
110+
.link_libc = true,
111+
.imports = &.{
112+
.{ .name = "powerglide", .module = mod },
113+
},
114+
}),
115+
});
116+
b.installArtifact(bench_exe);
117+
const bench_step = b.step("bench", "Run the igllama throughput benchmark (tokens/sec across Q4/Q8/BF16 x all weight classes)");
118+
const bench_cmd = b.addRunArtifact(bench_exe);
119+
bench_cmd.step.dependOn(b.getInstallStep());
120+
bench_step.dependOn(&bench_cmd.step);
102121
}

build.zig.zon

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@
99
.name = .powerglide,
1010
// This is a [Semantic Version](https://semver.org/).
1111
// In a future version of Zig it will be used for package deduplication.
12-
.version = "0.2.9",
12+
.version = "0.3.0",
1313
// Together with name, this represents a globally unique package
1414
// identifier. This field is generated by the Zig toolchain when the
1515
// package is first created, and then *never changes*. This allows

0 commit comments

Comments
 (0)