A tiny execution-plan compiler for AI runtimes.
IRPlanner is not a full TVM or XLA. It compiles a minimal .irplan IR description of LLM hot-path operators into an plan.json execution plan — defining shape inference, memory planning, kernel lowering, and budget propagation.
Execution plan definition authority — whoever defines graph → tensor → shape → memory → kernel → plan controls what the runtime can execute.
input x: f16[1, S, D]
w = param f16[D]
n = rmsnorm(x, w, eps=1e-5)
r = rope(n, base=10000, dim=64)
s = silu(r)
out = output(s)
.irplan source
↓ Parser → AST
↓ pass_shape → shape inference (symbolic dims)
↓ pass_budget → token/memory constraint propagation
↓ pass_memory → buffer reuse planning
↓ lowering → op → kernel mapping
↓ plan emit → plan.json
cargo run -- --input examples/hotpath.irplan --output plan.json{
"version": "0.1.0",
"ops": [
{"id": "n", "op": "rmsnorm", "kernel": "cuda_rmsnorm_v1"},
{"id": "r", "op": "rope", "kernel": "cuda_rope_v1"},
{"id": "s", "op": "silu", "kernel": "cuda_silu_v1"}
],
"memory_plan": {
"peak_bytes": 33554432,
"reuse_buffers": true
}
}| Layer | Choice |
|---|---|
| Language | Rust |
| Parser | Hand-written |
| Serialization | serde + serde_json |
| CLI | clap |
apeinx-ir/
├── crates/apxir-core/src/ # AST, parser, IR, dtype, shape, passes
├── crates/apxir-cli/src/ # CLI entry point
├── examples/ # .irplan source files
└── tests/
MLIR is a reusable, extensible compiler infrastructure. But don't touch it for v0.1 — the complexity will bury you. A hand-written parser and pass pipeline is the right start.
IRPlanner → (plan.json) → LLMSched → (trace.jsonl) → Trace2Train
TBD