Skip to content

Releases: bkataru/powerglide

v0.3.2 β€” 9B 17/17 Measured, Security Fixes, igllama v0.3.11

06 Mar 10:13

Choose a tag to compare

Highlights

9B Achieves 17/17 at All Quantizations

The full 9B T01-T17 trial is complete. Every quantization variant (Q4, Q5, Q6, Q8, BF16) passes all 17 agentic tasks β€” code generation, JSON round-trip, error recovery, multi-source synthesis, everything. 9B is the gold standard for local agentic tool use.

9B Variant Score Turns Time
9B-Q4 17/17 38 9127s
9B-Q5 17/17 49 14239s
9B-Q6 17/17 39 8642s
9B-Q8 17/17 39 17324s
9B-BF16 17/17 43 15722s

Security and Correctness Fixes

  • OAIResponse use-after-free β€” send() now dupes all string fields before returning (previously returned dangling pointers into freed JSON parse tree)
  • Config save malformed JSON β€” save() rewrote using std.fmt instead of broken manual string concatenation
  • getToolCalls page_allocator leak β€” replaced leaking ArrayList with proper allocator-based allocation
  • Auth header memory leak β€” Bearer {key} string now freed after HTTP request

igllama v0.3.11

  • Strip residual </think> tokens when --no-think is active (both streaming and non-streaming paths)

Verified

  • 195/195 tests pass, 0 leaks
  • GH Pages rebuilt and deployed

v0.3.1 β€” Context Sensitivity Harness + Trial Filter + 4B T01-T17 Measured

05 Mar 12:30

Choose a tag to compare

What's New

Context Length Sensitivity Harness

New examples/ctx_sensitivity.zig measures 2B-Q6 accuracy across ctx-size 512/1024/2048/4096 Γ— T01-T17. Run with zig build ctx.

Trial Quantization Filter

trial-quant now accepts optional model names as CLI arguments:

./zig-out/bin/trial-quant 9B-Q4 9B-Q5     # only these two
./zig-out/bin/trial-quant                  # all 16 models

4B-Q4 T01-T17 Fully Measured

  • Result: 15/17, 63 turns, ~9050s
  • T04 (multi-step write) and T16 (Zig compile recovery) fail via turn exhaustion at 1.3 tok/s
  • These are characteristic failure modes for 4B at Q4 β€” not correctness failures

Verified

  • 195/195 tests pass, 0 leaks
  • 9B T01-T17 re-run in progress (background); showcase will be updated when complete

v0.3.0 β€” 4B Quant Curve, Throughput Benchmark, Extended Task Suite

05 Mar 08:41

Choose a tag to compare

What's New

New: Throughput Benchmark (examples/bench.zig)

zig build bench

Measures tokens/second for each Qwen3.5 model across Q4/Q8/BF16 precision levels. Reports tok/s, file size, and RAM (RSS). Requires igllama v0.3.10+ for accurate usage.completion_tokens.

Measured results (CPU-only, 4 threads):

Model tok/s RAM
0.8B-Q8 3.4 0.8 GB
0.8B-BF16 2.9 1.5 GB
2B-Q4 2.9 1.3 GB
2B-Q8 2.6 1.9 GB
2B-BF16 1.9 3.6 GB
4B-Q4 1.3 2.7 GB
4B-Q8 0.1 ~4 GB (swap!)

Key finding: RAM is the hard limit. Models that exceed physical RAM fall off a cliff (4B-Q8: 0.1 tok/s from swap thrashing). 4B-Q4 is the practical ceiling on systems with ≀6 GB free RAM.

Extended: 4B Full Quant Curve

Added Qwen3.5-4B-Q4_K_M, Q5_K_M, and Q6_K to trial_quant.zig. All three pass 13/17 β€” 4B is saturated at Q4. 4B-Q4 (2.6 GB) is the recommended production config: full accuracy, minimum file size.

Extended: T01–T17 in trial_quant.zig

The quantization sensitivity harness now runs all 17 tasks (was T01–T13), adding code generation with zig fmt validation (T14), JSON round-trip (T15), error recovery (T16), and multi-source synthesis (T17) across all 16 quantization variants.

trial_quant.zig now covers: 16 models Γ— 17 tasks = 272 test cases per full run.

igllama v0.3.10 Upstream Fix

Patched usage.completion_tokens in igllama's non-streaming /v1/chat/completions handler β€” was hardcoded 0, now returns real counts. Fix upstreamed as igllama PR #82, released as igllama v0.3.10.

Upgrade Notes

  • Requires igllama v0.3.10+ for accurate bench token counts (fallback estimate still works with older builds)
  • Download 4B quant GGUFs: igllama pull unsloth/Qwen3.5-4B-GGUF -f Qwen3.5-4B-Q4_K_M.gguf

Stats

  • 195/195 tests pass
  • 3 harness executables: trial, trial-quant, bench
  • 16 models Γ— 17 tasks in quantization harness

v0.2.7 β€” BF16 precision trials, igllama grammar fix, LICENSE

04 Mar 18:23

Choose a tag to compare

What's new in v0.2.7

Added

  • BF16 in trial_quant.zig β€” 2B-BF16 and 9B-BF16 added to QUANT_MODELS; harness now covers the full Q4/Q5/Q6/Q8/BF16 precision curve
  • LICENSE β€” MIT license file added

Fixed

  • igllama v0.3.10 β€” streaming json_mode use-after-free β€” streaming handler freed the grammar string while the sampler held a pointer to it; replaced with direct JSON_GRAMMAR comptime const (matches non-streaming handler)
  • trial_quant.zig β€” changed response_format from json_object to text; the grammar sampler in the bundled llama.cpp crashes during generation for 2B+ model vocabularies; system prompt JSON constraint is sufficient for 4B+

Changed

  • showcase.smd β€” documents igllama json_mode crash finding, expanded trial task suite to T01–T17, updated framework version

Build

zig build trial-quant   # Q4/Q5/Q6/Q8/BF16 sensitivity trial for 2B + 9B
zig build trial         # T01-T17 across all 4 weight classes
zig build test          # 195/195 tests

v0.2.2 β€” Session summary, igllama port scan, json_mode, Showcase

04 Mar 09:45

Choose a tag to compare

What's new in v0.2.2

Features

  • Session summary output β€” powerglide run now emits a structured completion block with steps, elapsed time, agent/model, and the <POWERGLIDE_DONE> or <POWERGLIDE_ERROR> terminal signal
  • igllama port scan β€” powerglide doctor scans :8090–8099 and reports all running igllama instances simultaneously
  • json_mode on OpenAIClient β€” sets response_format: {"type":"json_object"} to force constrained JSON output from igllama and other local endpoints

New Showcase page

Live at bkataru.github.io/powerglide/showcase β€” four case studies documenting powerglide dogfooding with Qwen3.5 0.8B and 4B models via igllama, including the honest tool calling triage and performance table.

Bug fix

  • Loop step count increments test now uses an isolated /tmp session file; previously picked up real .powerglide/session.json from dogfooding runs

Stats

  • 195/195 tests passing, 0 memory leaks
  • Fully local stack: powerglide + igllama + Qwen3.5-4B, no API keys required

v0.2.1 β€” 195 tests, bug fixes

04 Mar 09:10

Choose a tag to compare

v0.2.1

Test Coverage Expansion

  • 195/195 tests passing (up from 170)
  • New test modules: SSE parser, HTTP response, persistence manager
  • Root module now covers all submodules via refAllDecls

Bug Fixes (uncovered by expanded coverage)

  • stream.zig: unmanaged ArrayList API fixes (Zig 0.15.2 compliance)
  • terminal/pool.zig: sessions.size β†’ sessions.count()
  • terminal/session.zig: array literal syntax fix, orphaned test code removed

195/195 tests, 0 leaked.

v0.2.0 β€” MCP Integration

04 Mar 08:36

Choose a tag to compare

What's New in v0.2.0

MCP Integration

  • MCP Server β€” powerglide mcp starts powerglide as a JSON-RPC 2.0 MCP server over stdin/stdout, exposing all registered tools to any MCP-compatible client
  • MCP Client β€” connect to external MCP servers; their tools become first-class powerglide tools prefixed as mcp_{server}_{tool}
  • Tool Bridge β€” transparent McpTool β†’ Tool conversion for seamless integration
  • Config support β€” mcp_servers array in ~/.config/powerglide/config.json

Fixes

  • Stdin API fix for Zig 0.15.2 (posix.read byte-by-byte pattern)
  • Favicon 404 on GitHub Pages resolved
  • Homepage title deduplication

138/138 tests passing.