Skip to content

junuxyz/tinyorca

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

tinyorca

tinyorca logo

tinyorca is a minimal implementation of an Orca-style LLM serving engine.

It focuses on iteration-level scheduling and selective batching for mixed prefill and decode workloads.

Demo: Static Batch vs. Iteration-Level Scheduling

Both demos below use the same setup:

  • max_batch_size=2
  • 5 concurrent requests
  • 2 requests(req-0, req-2) intentionally much shorter than the others

Baseline Engine

baseline engine demo

In the baseline, the first admitted batch is effectively pinned until its slowest request completes. Even if one request finishes early, that vacant spot is not turned into useful work right away, so later requests keep waiting.

tinyorca

tinyorca demo

In tinyorca, scheduling happens at iteration granularity instead of request granularity.

When a short request(e.g. "Hi") finishes, its slot can be reused on the next iteration, so waiting requests can join earlier without waiting for the longest request in the current batch to finish. This helps each step to keep the max batch size, leading to better throughput.

Deep dive

For a deeper walkthrough of the paper and this implementation, see: Understanding Orca through tinyorca

Run

uv venv
uv sync
uv run python -m tinyorca.example

Example

from tinyorca import OrcaConfig, OrcaServe, SamplingConfig

serve = OrcaServe(
    OrcaConfig(
        model="Qwen/Qwen3-0.6B",
        max_batch_size=2,
        sampling=SamplingConfig(max_new_tokens=32),
    )
)

for event in serve.generate(["Hello", "Hi."]):
    print(event.request.request_id, event.token_id)

Benchmark

uv run python -m bench

By default, the benchmark runs two synthetic workloads:

  • equal_size: 8 requests of (128, 128)
  • short_long_mix: interleaved short (32, 32) and long (512, 128) requests

To run just one workload:

uv run python -m bench --workload short_long_mix

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages