Commit 70b08f9
feat: v0.3.0 — 4B quant curve, bench harness, T01–T17 in trial_quant
New harnesses:
- examples/bench.zig (zig build bench): tokens/sec benchmark across Q4/Q8/BF16
per weight class; uses igllama v0.3.10 usage.completion_tokens for accurate counts
- trial_quant.zig extended: T01–T17 (was T01–T13); 16 models (was 12) with full
4B Q4/Q5/Q6/Q8/BF16 curve alongside 2B and 9B
Key findings documented in showcase:
- 4B saturated at Q4 — all quants pass 13/17; 4B-Q4 (2.6 GB) is optimal
- Speed cliff at 4B-Q8: 0.1 tok/s from swap thrashing (≤6 GB free RAM)
- 0.8B-Q8: 3.4 tok/s; 2B-Q4: 2.9 tok/s; 4B-Q4: 1.3 tok/s
igllama fix upstreamed: usage.completion_tokens no longer hardcoded 0 (PR #82, v0.3.10)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>1 parent d971abd commit 70b08f9
File tree
9 files changed
+506
-21
lines changed- docs/showcase
- examples
- src
- website/content
9 files changed
+506
-21
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
2 | 2 | | |
3 | 3 | | |
4 | 4 | | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
5 | 25 | | |
6 | 26 | | |
7 | 27 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
55 | 55 | | |
56 | 56 | | |
57 | 57 | | |
58 | | - | |
| 58 | + | |
| 59 | + | |
59 | 60 | | |
60 | 61 | | |
61 | 62 | | |
62 | | - | |
| 63 | + | |
63 | 64 | | |
64 | 65 | | |
65 | 66 | | |
| |||
82 | 83 | | |
83 | 84 | | |
84 | 85 | | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
95 | 95 | | |
96 | 96 | | |
97 | 97 | | |
98 | | - | |
| 98 | + | |
99 | 99 | | |
100 | 100 | | |
101 | 101 | | |
| 102 | + | |
| 103 | + | |
| 104 | + | |
| 105 | + | |
| 106 | + | |
| 107 | + | |
| 108 | + | |
| 109 | + | |
| 110 | + | |
| 111 | + | |
| 112 | + | |
| 113 | + | |
| 114 | + | |
| 115 | + | |
| 116 | + | |
| 117 | + | |
| 118 | + | |
| 119 | + | |
| 120 | + | |
102 | 121 | | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
9 | 9 | | |
10 | 10 | | |
11 | 11 | | |
12 | | - | |
| 12 | + | |
13 | 13 | | |
14 | 14 | | |
15 | 15 | | |
| |||
0 commit comments