tinyorca

tinyorca is a minimal implementation of an Orca-style LLM serving engine.

It focuses on iteration-level scheduling and selective batching for mixed prefill and decode workloads.

Demo: Static Batch vs. Iteration-Level Scheduling

Both demos below use the same setup:

max_batch_size=2
5 concurrent requests
2 requests(req-0, req-2) intentionally much shorter than the others

Baseline Engine

In the baseline, the first admitted batch is effectively pinned until its slowest request completes. Even if one request finishes early, that vacant spot is not turned into useful work right away, so later requests keep waiting.

tinyorca

In tinyorca, scheduling happens at iteration granularity instead of request granularity.

When a short request(e.g. "Hi") finishes, its slot can be reused on the next iteration, so waiting requests can join earlier without waiting for the longest request in the current batch to finish. This helps each step to keep the max batch size, leading to better throughput.

Deep dive

For a deeper walkthrough of the paper and this implementation, see: Understanding Orca through tinyorca

Run

uv venv
uv sync
uv run python -m tinyorca.example

Example

from tinyorca import OrcaConfig, OrcaServe, SamplingConfig

serve = OrcaServe(
    OrcaConfig(
        model="Qwen/Qwen3-0.6B",
        max_batch_size=2,
        sampling=SamplingConfig(max_new_tokens=32),
    )
)

for event in serve.generate(["Hello", "Hi."]):
    print(event.request.request_id, event.token_id)

Benchmark

uv run python -m bench

By default, the benchmark runs two synthetic workloads:

equal_size: 8 requests of (128, 128)
short_long_mix: interleaved short (32, 32) and long (512, 128) requests

To run just one workload:

uv run python -m bench --workload short_long_mix

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
assets		assets
tests		tests
tinyorca		tinyorca
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
bench.py		bench.py
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

tinyorca

Demo: Static Batch vs. Iteration-Level Scheduling

Baseline Engine

tinyorca

Deep dive

Run

Example

Benchmark

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

tinyorca

Demo: Static Batch vs. Iteration-Level Scheduling

Baseline Engine

tinyorca

Deep dive

Run

Example

Benchmark

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages