Tags · Leeaandrob/neurogrid

v0.3.0

v0.3.0 Distributed Inference - Multi-GPU Pipeline Parallelism

Major Features:
- Coordinator/Worker architecture for distributed inference
- P2P weight distribution via libp2p
- Remote layer execution across multiple GPUs/machines
- Mistral 7B model support with chat templates
- Llama 2 13B benchmarks on distributed setup

Infrastructure:
- Network notifee for automatic worker detection
- --skip-weight-transfer flag for pre-loaded workers
- --bootstrap flag for explicit peer connection
- Generic HuggingFace model download (make download REPO=org/model)

Performance:
- Llama 2 7B: ~5.2 tokens/sec (single RTX 4090)
- Llama 2 13B: ~3.1 tokens/sec (distributed RTX 4090 + GH200)

Testing:
- 1311 lines of distributed inference E2E tests
- 688 lines of model loader tests
- Full router and scheduler test coverage

Jan 23, 2026
2b1fb3b
zip
tar.gz

v0.2.0

v0.2.0 HyperGrid Foundation - SSE streaming, Prometheus metrics, NAT …

…traversal, peer reconnection

Jan 22, 2026
6eedcd8
zip
tar.gz

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.3.0

v0.2.0

Tags: Leeaandrob/neurogrid

v0.3.0

v0.2.0