Releases: bkataru/powerglide
v0.3.2 β 9B 17/17 Measured, Security Fixes, igllama v0.3.11
Highlights
9B Achieves 17/17 at All Quantizations
The full 9B T01-T17 trial is complete. Every quantization variant (Q4, Q5, Q6, Q8, BF16) passes all 17 agentic tasks β code generation, JSON round-trip, error recovery, multi-source synthesis, everything. 9B is the gold standard for local agentic tool use.
| 9B Variant | Score | Turns | Time |
|---|---|---|---|
| 9B-Q4 | 17/17 | 38 | 9127s |
| 9B-Q5 | 17/17 | 49 | 14239s |
| 9B-Q6 | 17/17 | 39 | 8642s |
| 9B-Q8 | 17/17 | 39 | 17324s |
| 9B-BF16 | 17/17 | 43 | 15722s |
Security and Correctness Fixes
- OAIResponse use-after-free β
send()now dupes all string fields before returning (previously returned dangling pointers into freed JSON parse tree) - Config save malformed JSON β
save()rewrote usingstd.fmtinstead of broken manual string concatenation - getToolCalls page_allocator leak β replaced leaking ArrayList with proper allocator-based allocation
- Auth header memory leak β
Bearer {key}string now freed after HTTP request
igllama v0.3.11
- Strip residual
</think>tokens when--no-thinkis active (both streaming and non-streaming paths)
Verified
- 195/195 tests pass, 0 leaks
- GH Pages rebuilt and deployed
v0.3.1 β Context Sensitivity Harness + Trial Filter + 4B T01-T17 Measured
What's New
Context Length Sensitivity Harness
New examples/ctx_sensitivity.zig measures 2B-Q6 accuracy across ctx-size 512/1024/2048/4096 Γ T01-T17. Run with zig build ctx.
Trial Quantization Filter
trial-quant now accepts optional model names as CLI arguments:
./zig-out/bin/trial-quant 9B-Q4 9B-Q5 # only these two
./zig-out/bin/trial-quant # all 16 models
4B-Q4 T01-T17 Fully Measured
- Result: 15/17, 63 turns, ~9050s
- T04 (multi-step write) and T16 (Zig compile recovery) fail via turn exhaustion at 1.3 tok/s
- These are characteristic failure modes for 4B at Q4 β not correctness failures
Verified
- 195/195 tests pass, 0 leaks
- 9B T01-T17 re-run in progress (background); showcase will be updated when complete
v0.3.0 β 4B Quant Curve, Throughput Benchmark, Extended Task Suite
What's New
New: Throughput Benchmark (examples/bench.zig)
zig build benchMeasures tokens/second for each Qwen3.5 model across Q4/Q8/BF16 precision levels. Reports tok/s, file size, and RAM (RSS). Requires igllama v0.3.10+ for accurate usage.completion_tokens.
Measured results (CPU-only, 4 threads):
| Model | tok/s | RAM |
|---|---|---|
| 0.8B-Q8 | 3.4 | 0.8 GB |
| 0.8B-BF16 | 2.9 | 1.5 GB |
| 2B-Q4 | 2.9 | 1.3 GB |
| 2B-Q8 | 2.6 | 1.9 GB |
| 2B-BF16 | 1.9 | 3.6 GB |
| 4B-Q4 | 1.3 | 2.7 GB |
| 4B-Q8 | 0.1 | ~4 GB (swap!) |
Key finding: RAM is the hard limit. Models that exceed physical RAM fall off a cliff (4B-Q8: 0.1 tok/s from swap thrashing). 4B-Q4 is the practical ceiling on systems with β€6 GB free RAM.
Extended: 4B Full Quant Curve
Added Qwen3.5-4B-Q4_K_M, Q5_K_M, and Q6_K to trial_quant.zig. All three pass 13/17 β 4B is saturated at Q4. 4B-Q4 (2.6 GB) is the recommended production config: full accuracy, minimum file size.
Extended: T01βT17 in trial_quant.zig
The quantization sensitivity harness now runs all 17 tasks (was T01βT13), adding code generation with zig fmt validation (T14), JSON round-trip (T15), error recovery (T16), and multi-source synthesis (T17) across all 16 quantization variants.
trial_quant.zig now covers: 16 models Γ 17 tasks = 272 test cases per full run.
igllama v0.3.10 Upstream Fix
Patched usage.completion_tokens in igllama's non-streaming /v1/chat/completions handler β was hardcoded 0, now returns real counts. Fix upstreamed as igllama PR #82, released as igllama v0.3.10.
Upgrade Notes
- Requires igllama v0.3.10+ for accurate
benchtoken counts (fallback estimate still works with older builds) - Download 4B quant GGUFs:
igllama pull unsloth/Qwen3.5-4B-GGUF -f Qwen3.5-4B-Q4_K_M.gguf
Stats
- 195/195 tests pass
- 3 harness executables:
trial,trial-quant,bench - 16 models Γ 17 tasks in quantization harness
v0.2.7 β BF16 precision trials, igllama grammar fix, LICENSE
What's new in v0.2.7
Added
- BF16 in
trial_quant.zigβ2B-BF16and9B-BF16added toQUANT_MODELS; harness now covers the full Q4/Q5/Q6/Q8/BF16 precision curve - LICENSE β MIT license file added
Fixed
- igllama v0.3.10 β streaming json_mode use-after-free β streaming handler freed the grammar string while the sampler held a pointer to it; replaced with direct
JSON_GRAMMARcomptime const (matches non-streaming handler) - trial_quant.zig β changed
response_formatfromjson_objecttotext; the grammar sampler in the bundled llama.cpp crashes during generation for 2B+ model vocabularies; system prompt JSON constraint is sufficient for 4B+
Changed
- showcase.smd β documents igllama json_mode crash finding, expanded trial task suite to T01βT17, updated framework version
Build
zig build trial-quant # Q4/Q5/Q6/Q8/BF16 sensitivity trial for 2B + 9B
zig build trial # T01-T17 across all 4 weight classes
zig build test # 195/195 testsv0.2.2 β Session summary, igllama port scan, json_mode, Showcase
What's new in v0.2.2
Features
- Session summary output β
powerglide runnow emits a structured completion block with steps, elapsed time, agent/model, and the<POWERGLIDE_DONE>or<POWERGLIDE_ERROR>terminal signal - igllama port scan β
powerglide doctorscans:8090β8099and reports all running igllama instances simultaneously json_modeonOpenAIClientβ setsresponse_format: {"type":"json_object"}to force constrained JSON output from igllama and other local endpoints
New Showcase page
Live at bkataru.github.io/powerglide/showcase β four case studies documenting powerglide dogfooding with Qwen3.5 0.8B and 4B models via igllama, including the honest tool calling triage and performance table.
Bug fix
Loop step count incrementstest now uses an isolated/tmpsession file; previously picked up real.powerglide/session.jsonfrom dogfooding runs
Stats
- 195/195 tests passing, 0 memory leaks
- Fully local stack: powerglide + igllama + Qwen3.5-4B, no API keys required
v0.2.1 β 195 tests, bug fixes
v0.2.1
Test Coverage Expansion
- 195/195 tests passing (up from 170)
- New test modules: SSE parser, HTTP response, persistence manager
- Root module now covers all submodules via
refAllDecls
Bug Fixes (uncovered by expanded coverage)
stream.zig: unmanaged ArrayList API fixes (Zig 0.15.2 compliance)terminal/pool.zig:sessions.sizeβsessions.count()terminal/session.zig: array literal syntax fix, orphaned test code removed
195/195 tests, 0 leaked.
v0.2.0 β MCP Integration
What's New in v0.2.0
MCP Integration
- MCP Server β
powerglide mcpstarts powerglide as a JSON-RPC 2.0 MCP server over stdin/stdout, exposing all registered tools to any MCP-compatible client - MCP Client β connect to external MCP servers; their tools become first-class powerglide tools prefixed as
mcp_{server}_{tool} - Tool Bridge β transparent
McpTool β Toolconversion for seamless integration - Config support β
mcp_serversarray in~/.config/powerglide/config.json
Fixes
- Stdin API fix for Zig 0.15.2 (
posix.readbyte-by-byte pattern) - Favicon 404 on GitHub Pages resolved
- Homepage title deduplication
138/138 tests passing.