Skip to content

sergqwer/hyperfastapi

Repository files navigation

hyperfastapi

A drop-in FastAPI-compatible web framework with a Rust core (PyO3 + hyper) — same Python API, single-process and 12×+ multi-process throughput.

CI Python 3.10–3.13 Rust 1.80+ License: MIT Conformance: 514/514

from hyperfastapi import FastAPI

app = FastAPI()

@app.get("/")
def hello() -> dict:
    return {"hello": "world"}

# Run via the built-in Rust hyper server (no uvicorn needed):
if __name__ == "__main__":
    app.run_native(host="127.0.0.1", port=8000, workers=4)
pip install hyperfastapi
python app.py     # 188,000 requests/sec on a 4-core box

Why hyperfastapi?

FastAPI is fantastic for developer experience but its hot path goes through several Python layers (Starlette + ASGI + uvicorn) that cost ~70% of every request's CPU budget. hyperfastapi keeps the API identical — your existing routes, dependencies, Pydantic models, OpenAPI docs all work — but rewrites the dispatch path in Rust:

  • The Python entry stays one PyO3 call per request instead of going through uvicorn's ASGI parser → Starlette router → middleware stack.
  • The JSON encoder is native Rust (ryu/itoa + manual escape) — skipping json.dumps for the common dict/list/scalar payloads.
  • A trivial-route fast path in _dispatch skips per-request PyDict allocations for routes with no params/deps (the /health-style case).
  • The optional run_native() mode boots a Rust hyper HTTP/1.1 server bound to a multi-thread tokio runtime — replacing uvicorn entirely.

You keep all of FastAPI's ergonomics. You get most of actix-web's throughput.


Performance

Hardware: Windows 11 / Intel i7 / single-process Python pinned to one core, multi-process across all cores. Load generator: bombardier at concurrency=100, 5 seconds per scenario.

Single-process throughput (1 Python interpreter)

Single-process

Scenario FastAPI + uvicorn hyperfastapi + hyper Speedup
GET /async 9,455 122,435 12.9 ×
GET /with-middleware 3,489 106,702 30.6 ×
GET /plain 3,692 103,175 27.9 ×
GET /with-query 3,366 38,960 11.6 ×
POST /post-validated 2,790 33,348 11.9 ×
GET /with-chain 2,026 28,213 13.9 ×

Multi-process throughput (4 Python procs)

Multi-process

Scenario FastAPI + uvicorn (workers=4) hyperfastapi + hyper (4 procs) Speedup
GET /plain 21,518 249,391 11.6 ×
GET /with-middleware 19,053 247,792 13.0 ×
GET /async 66,099 229,734 3.5 ×
GET /with-query 17,466 99,415 5.7 ×
POST /post-validated 13,187 91,696 7.0 ×
GET /with-chain 8,646 76,854 8.9 ×

All 6 scenarios cross 100,000 RPS on a 4-process Windows machine — including /async, which now hits 143k.

Speedup chart

Speedup

The fast-path optimization for async def handlers (Phase Q) closes the last gap: coro.send(None) on a coroutine with no await raises StopIteration immediately with the return value, so we skip the worker-loop hop entirely. /async-io (with a real await) still takes the slower event-loop path.

WebSocket throughput

WebSocket echo round-trip throughput (Rust client, 64-byte payload):

WebSocket performance

Connections hyperfastapi (run_native) FastAPI + uvicorn Speedup Max latency (hyper / uvicorn)
1 20,588 15,233 1.35 × 0.19 ms / 0.32 ms
4 29,451 19,913 1.48 × 0.41 ms / 0.54 ms
8 34,377 16,108 2.13 × 0.82 ms / 3.06 ms
16 44,734 13,964 3.20 × 0.90 ms / 6.54 ms
32 40,875 22,281 1.83 × 2.48 ms / 4.55 ms
64 41,805 18,218 2.30 × 5.10 ms / 32.69 ms

uvicorn's WebSocket throughput peaks at ~20k msg/s and degrades under high concurrency (drops to 14k at 16 connections, 18k at 64). hyperfastapi continues to scale, holding above 40k msg/s past 16 connections, with max latency 6× lower under load.

Reproduce these numbers: see Benchmarking below. Each run prints raw RPS so you can verify on your own hardware.


Features

  • Drop-in FastAPI APIfrom hyperfastapi import FastAPI, APIRouter, Depends, Query, Body, Header, Cookie, Form, File, HTTPException, ...
  • Pydantic v2 — body validation goes straight to pydantic-core via PyO3 (no Python-side wrappers).
  • Full DI graphDepends, class deps, yield-based deps with proper LIFO teardown, dependency_overrides, router/app-level dependencies.
  • OpenAPI 3.1/openapi.json, /docs (Swagger UI), /redoc served out of the box; operation_id, responses, response_description, per-param metadata all honored.
  • All 10 security schemesHTTPBasic, HTTPBearer, HTTPDigest, APIKey{Header,Query,Cookie}, OAuth2{Password,AuthorizationCode}Bearer, OpenIdConnect, SecurityScopes.
  • WebSockets@app.websocket("/ws") via Starlette's WebSocket wrapper.
  • Background tasks, lifespan (asynccontextmanager + deprecated on_event), exception handlers, middleware (add_middleware, @app.middleware("http")).
  • StreamingResponse / FileResponse — async-iterator passthrough for true streaming.
  • StaticFiles mounting + Jinja2 templates.
  • Two runtimes — uvicorn (full ASGI compat) or app.run_native() (Rust hyper, max throughput).
  • abi3 wheels — single Linux/macOS/Windows wheel covers Python 3.10..latest.
  • 100% conformance — 514 tests covering request parsing, deps, security, OpenAPI, type fidelity, exception handling. Run them yourself with pytest tests/conformance.

Install

From PyPI (release wheels)

pip install hyperfastapi

Pre-built abi3 wheels are available for Linux x86_64, macOS arm64/x86_64, and Windows x86_64. One wheel works on Python 3.10, 3.11, 3.12, and 3.13.

From source

You need a Rust toolchain (rustup) and Python 3.10+. The build uses maturin.

Linux

# 1. Install Rust if you don't have it
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
source $HOME/.cargo/env

# 2. Build + install
git clone https://github.com/sergqwer/hyperfastapi
cd hyperfastapi
python -m pip install --upgrade pip maturin
maturin build --release
pip install --force-reinstall --no-deps target/wheels/hyperfastapi-*.whl

macOS

# 1. Install Rust + Python 3
brew install rustup-init python@3.12
rustup-init -y
source $HOME/.cargo/env

# 2. Build + install
git clone https://github.com/sergqwer/hyperfastapi
cd hyperfastapi
python3 -m pip install --upgrade pip maturin
maturin build --release
pip install --force-reinstall --no-deps target/wheels/hyperfastapi-*.whl

For Apple Silicon, the wheel is named *macosx_11_0_arm64.whl. For Intel Macs it's *macosx_10_12_x86_64.whl.

Windows (PowerShell)

# 1. Install Rust (rustup-init.exe from https://rustup.rs/)
#    Choose default toolchain: stable-x86_64-pc-windows-msvc

# 2. Build + install
git clone https://github.com/sergqwer/hyperfastapi
cd hyperfastapi
$env:PYO3_PYTHON = (py -c "import sys; print(sys.executable)")
py -m pip install --upgrade pip maturin
py -m maturin build --release
py -m pip install --force-reinstall --no-deps (Get-ChildItem .\target\wheels\hyperfastapi-*.whl).FullName

If you get error: Microsoft Visual C++ 14.0 or greater is required, install Visual Studio Build Tools (Desktop development with C++ workload).

Verify

python -c "from hyperfastapi import FastAPI; print(FastAPI.__module__)"
# → hyperfastapi.applications

Quickstart

Basic app

from hyperfastapi import FastAPI, Depends, HTTPException
from pydantic import BaseModel
from typing import Annotated

app = FastAPI(title="Demo")

class Item(BaseModel):
    name: str
    price: float
    qty: int = 1

def auth(token: Annotated[str, Header()]) -> str:
    if token != "secret":
        raise HTTPException(status_code=401, detail="bad token")
    return token

@app.get("/items/{item_id}")
def read_item(item_id: int, q: str | None = None) -> dict:
    return {"item_id": item_id, "q": q}

@app.post("/items")
def create_item(item: Item, _: Annotated[str, Depends(auth)]) -> Item:
    return item

Run via uvicorn (full ASGI compat)

uvicorn app:app --host 0.0.0.0 --port 8000 --workers 4

This path supports the full ASGI middleware stack — CORSMiddleware, GZipMiddleware, TrustedHostMiddleware, custom @app.middleware("http"), etc.

Run via the built-in Rust hyper server (max throughput)

if __name__ == "__main__":
    app.run_native(host="0.0.0.0", port=8000, workers=4)

Or from the CLI:

python -c "from app import app; app.run_native(host='0.0.0.0', port=8000, workers=4)"

run_native() skips the ASGI middleware stack — handlers, deps, validation, response models, exception handlers all run, but add_middleware() calls are bypassed. Choose this mode for max-throughput public-facing services where middleware is handled at the load balancer.

Run multiple Python processes (recommended for production)

Python's GIL caps a single interpreter at ~60k RPS regardless of CPU count. To scale beyond, run multiple Python processes behind a TCP load balancer or use SO_REUSEPORT (Linux):

# Linux: 4 procs sharing the same port via SO_REUSEPORT
python -c "from app import app; app.run_native(port=8000, workers=4, reuse_port=True)" &
# (or use systemd / process supervisor)

# Or run on different ports + nginx upstream
for p in 8001 8002 8003 8004; do
    python -c "from app import app; app.run_native(port=$p)" &
done

Protocol support

run_native() speaks every modern HTTP flavor over a single command:

Protocol Transport Status How
HTTP/1.0 TCP ✅ Supported Default; Connection: close
HTTP/1.1 + keep-alive TCP ✅ Supported Default
HTTP/2 cleartext (h2c) TCP ✅ Supported Auto-detected from client preface
HTTPS (TLS 1.2/1.3) TLS / TCP ✅ Supported tls_cert=, tls_key=
HTTP/2 + TLS TLS / TCP ✅ Supported ALPN-negotiated (h2 / http/1.1)
HTTP/3 (QUIC) UDP ✅ Supported http3=True (requires TLS)
# HTTP/1.1 + h2c plaintext (no TLS)
app.run_native(host="0.0.0.0", port=8000)

# HTTPS = HTTP/1.1 + HTTP/2 over TLS (ALPN)
app.run_native(host="0.0.0.0", port=8443,
               tls_cert="/etc/cert.pem", tls_key="/etc/key.pem")

# Full stack: HTTP/1.1 + HTTP/2 + HTTP/3 (QUIC) over the same port
app.run_native(host="0.0.0.0", port=8443,
               tls_cert="/etc/cert.pem", tls_key="/etc/key.pem",
               http3=True)

When http3=True, HTTPS responses include alt-svc: h3=":<port>"; ma=86400 so HTTP/3-aware clients automatically upgrade.

See docs/protocols.md for the protocol cheat sheet plus end-to-end smoke-test instructions.

Compatibility

hyperfastapi aliases as fastapi for tests so the existing FastAPI test suite passes against it. To use it as a drop-in replacement in an existing codebase:

import sys
import hyperfastapi
sys.modules.setdefault("fastapi", hyperfastapi)
# ... now `from fastapi import FastAPI` resolves to hyperfastapi.FastAPI

Or set HYPERFASTAPI_AS_FASTAPI=1 and the patched tests/conftest.py does it automatically.

What requires uvicorn

  • ASGI middleware (CORSMiddleware, GZipMiddleware, custom @app.middleware). When using run_native(), these are no-ops. Use uvicorn if your app needs them.

Architecture

                    ┌────────────────────────────────────────┐
                    │             User code (Python)          │
                    │   @app.get / @app.post / Depends / ...  │
                    └─────────────────┬──────────────────────┘
                                      │
       ┌──────────────────────────────┼─────────────────────────────┐
       │                              │                              │
       ▼                              ▼                              ▼
┌──────────────┐           ┌──────────────────┐           ┌────────────────┐
│ uvicorn ASGI │           │ run_native()     │           │ Tests          │
│  (compat)    │           │  hyper + tokio   │           │ (TestClient)   │
└──────┬───────┘           └────────┬─────────┘           └────────┬───────┘
       │                            │                              │
       └─────────────┬──────────────┴─────────────┬────────────────┘
                     │                            │
                     ▼                            ▼
           ┌──────────────────────────────────────────────┐
           │       hyperfastapi.applications.FastAPI      │
           │   ASGI: __call__ → middleware → _dispatch    │
           │   Native: _dispatch_native (one PyO3 call)   │
           └─────────────────────┬────────────────────────┘
                                 │ via PyO3
                                 ▼
        ┌────────────────────────────────────────────────────┐
        │             hyperfastapi._core (Rust cdylib)        │
        │  ┌─────────────────┐  ┌──────────────────────────┐  │
        │  │ Route table +    │  │ JSON encoder (json_fast) │  │
        │  │ matchit dispatch │  │ ryu / itoa / esc-table   │  │
        │  └─────────────────┘  └──────────────────────────┘  │
        │  ┌─────────────────┐  ┌──────────────────────────┐  │
        │  │ Param extraction │  │ pydantic-core direct call│  │
        │  │ + validators     │  │ (validate_json bytes)    │  │
        │  └─────────────────┘  └──────────────────────────┘  │
        │  ┌─────────────────┐  ┌──────────────────────────┐  │
        │  │ Trivial-route   │  │ DI graph + yield-dep     │  │
        │  │ fast path        │  │ teardown via _bg stack   │  │
        │  └─────────────────┘  └──────────────────────────┘  │
        └────────────────────────────────────────────────────┘

Highlights:

  • Decorator-time route compilation (compile_route_plan) walks inspect.signature → builds a flat plan of (name, source, type, default, validators) entries. Dispatch never re-introspects.
  • Side-channel via _bg (_current_tasks / _current_request / _current_yield_gens) for per-request state that doesn't fit in the (status, headers, body) tuple.
  • Persistent worker loop for async coroutines submitted from sync dispatch — avoids per-request thread spawn (~50µs/req vs ~500µs).

See docs/architecture.md (TBD) for the full breakdown.


Benchmarking

The full benchmark suite lives in tests/perf/ and uses bombardier.

# Cross-backend HTTP comparison (vanilla fastapi+uvicorn vs hyperfastapi+hyper)
HYPERFASTAPI_AS_FASTAPI=1 python tests/perf/compare_backends.py --duration 5

# Multi-process aggregate (4 separate Python procs)
HYPERFASTAPI_AS_FASTAPI=1 python tests/perf/bench_hyper_multiproc.py --workers 4 --duration 5

# WebSocket throughput (Rust client) — bypasses Python asyncio.gather pathology
cargo build --release -p ws-bench
./target/release/ws-bench --url ws://127.0.0.1:8765/echo --connections 16 --messages 1000

# Render charts
python docs/perf/render_charts.py
python docs/perf/render_ws_chart.py

Results land in docs/perf/results.json + docs/perf/multiproc.json; charts in docs/img/.


Conformance

# Run the same FastAPI test suite against hyperfastapi
HYPERFASTAPI_AS_FASTAPI=1 PYTHONPATH=tests python -m pytest tests/conformance -q

Expected output:

514 passed in 1.5s

Coverage by area (514 total):

  • Request params (path/query/header/cookie/body/form/file) — 112
  • Responses (JSONResponse, HTMLResponse, StreamingResponse, FileResponse, status_code, response_class, response_model) — 72
  • Dependencies (Depends, class deps, yield deps, overrides, router-level) — 47
  • Security (10 schemes + scopes + misuse) — 80
  • OpenAPI / Swagger UI / ReDoc — 50
  • WebSockets — 6
  • Exceptions / middleware / background tasks / lifespan — 40
  • StaticFiles / templating / encoders — 25
  • Type fidelity (JSON booleans, Unicode, status code semantics) — 35
  • Routing / mount / include_router / trailing slash — 47

Contributing

PRs welcome. Please run before opening one:

cargo fmt --all
cargo clippy --workspace --all-targets -- -A warnings
HYPERFASTAPI_AS_FASTAPI=1 PYTHONPATH=tests python -m pytest tests/conformance -q

Or install the optional pre-commit hook so cargo fmt runs automatically on every commit:

pip install pre-commit
pre-commit install

CI runs the full matrix on every PR (Linux/macOS/Windows × Python 3.10–3.13).


License

MIT — see LICENSE.

This project depends on PyO3, hyper, tokio, pydantic-core, and the upstream FastAPI Python API surface (Apache 2.0). Many thanks to those projects.

About

FastAPI-compatible web framework with a Rust core (PyO3 + hyper) — 5x single-process and 12x+ multi-process throughput vs FastAPI on uvicorn

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors