-
-
Notifications
You must be signed in to change notification settings - Fork 0
Home
QubitEngine is a cloud-native, polyglot quantum simulation and execution framework engineered for latency-critical research, quantum machine learning (QML), and computational finance. It abstracts the von Neumann bottleneck inherent in massive state-vector simulations through a decoupled five-layer architecture, zero-copy IPC, and dynamic hardware dispatch.
QubitEngine operates as a distributed service mesh rather than a monolithic script. Coordination and scheduling are handled by a Go-based orchestration layer, while the execution kernel runs heavily optimized C++20.
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Client / Application Layer β
β ββββββββββββββββββ ββββββββββββββββββββ ββββββββββββββββββ ββββββββββββββ β
β β Rust CLI (TUI) β β Python (PyBind11)β β Web Dashboard β β Domain APIsβ β
β β (cli-rs) β β + Torch Quantum β β (React/WASM) β β (Fin/Phys) β β
β βββββββββ¬βββββββββ ββββββββββ¬ββββββββββ βββββββββ¬βββββββββ ββββββββ¬ββββββ β
ββββββββββββΌββββββββββββββββββββΌββββββββββββββββββββΌββββββββββββββββββΌβββββββββ
β β β β
β gRPC / Protobuf β β gRPC-Web β
βΌ βΌ βΌ βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Go Orchestration Mesh β
β βββββββββββββββββ ββββββββββββββββββ β
β β Job Scheduler ββββΊβ Result Cache β β
β βββββββββ¬ββββββββ ββββββββββββββββββ β
β β β
β βββββββββΌββββββββ ββββββββββββββββββ β
β β Redis Queue ββββΊβ Registry β β
β βββββββββ¬ββββββββ ββββββββββββββββββ β
βββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββββββββββββββββββββββ
β
β POSIX Shared Memory (Zero-Copy IPC)
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β C++20 Quantum Kernel β
β ββββββββββββββββββ ββββββββββββββββββββ βββββββββββββββββββββββββββββββββ β
β β QuantumJIT β β Q. Differentiatorβ β IQuantumBackend (Polymorphic)β β
β β (O3 Optimizer) β β (Adjoint / PSR) β β βββββββ βββββββ βββββββ βββββ β β
β ββββββββββββββββββ ββββββββββββββββββββ β βCUDA β βMetalβ β MPS β βAVXβ β β
β β βββββββ βββββββ βββββββ βββββ β β
ββββββββββββββββββββββββββββββββββββββββββββ΄ββββββββββββββββββββββββββββββββ
To bypass the severe serialization penalties of moving
-
POSIX Shared Memory: The Go sidecars, Python processes, and C++ backend map raw simulation arrays directly from OS paging memory via
shm_descriptor. -
Single-Precision SIMD: Downgrading the state vector to
std::complex<float>halves memory bandwidth saturation, directly accelerating full-vector broadcast operations (e.g.,$H^{\otimes n}$ ) against the von Neumann bottleneck. -
NumPy Anchoring:
pybind11::buffer_infodirectly exposes the C++ memory allocator lifecycle to Python, eliminating deep copies during iterative algorithms like VQE.
The QuantumRegister dynamically dispatches intermediate representations to the most optimal IQuantumBackend implementation based on topology and hardware availability:
- CUDA: Multi-GPU sharded execution utilizing NCCL for cluster-scale state vector distribution.
-
Apple Metal: Asynchronous command queues (
MetalContext) allowing concurrent CPU execution while GPU shaders crunch gate linear algebra. - Matrix Product State (MPS): SVD-truncated tensor network backend capable of simulating >50 qubits for weakly entangled states.
- Stabilizer: Highly optimized Clifford-pure simulation backend for error correction evaluation.
- AVX2/CPU: Thread-safe execution via OpenMP with fused SIMD intrinsics.
Executes intermediate representation (CircuitIR) transformations on a background thread. The O3 optimization tier automatically applies:
- Adjacent inverse cancellation (
$U U^\dagger = I$ ). - Aggressive
$2 \times 2$ and$4 \times 4$ unitary matrix fusions. - Linear mapping reordering to optimize memory access patterns for specific hardware topologies.
Natively supports integration with deep learning frameworks (e.g., PyTorch via torch_quantum.py) through dual-gradient calculation methods:
-
Adjoint Differentiation: Optimal for deep variational circuits. Operates with
$O(1)$ forward passes and$O(L)$ backward passes by unwinding the recorded circuit tape in reverse. - Parameter-Shift Rule (PSR): Provides exact analytical gradients for hardware backends without relying on finite-difference approximations, natively distributed across MPI ranks.
QubitEngine is designed for Kubernetes deployments using standard Horizontal Pod Autoscaling (HPA) governed by custom metrics.
The Go Scheduler exposes a :2112/metrics endpoint mapping Redis queue depth to Prometheus. The K8s HPA predictively scales the bare-metal backend pods (engine-deployment.yaml) based on active simulation congestion, isolating the lightweight application layer from the heavy compute nodes.
For state-vectors exceeding single-node RAM limitations, deployment via mpi-cluster.yaml provides distributed scaling.
# Deploy full stack to Kubernetes
helm install qubit-engine ./deploy/helm/qubit-engine -f values.yaml
# Local development (Docker Compose)
docker-compose up --build
The Python SDK acts as a direct wrapper around the C++ bindings with built-in adapters for Qiskit portability.
pip install qubit_engine
from qubit_engine import QuantumRegister, CudaBackend
from qubit_engine.adapters import QiskitAdapter
from qiskit import QuantumCircuit
# Standard Qiskit definition
qc = QuantumCircuit(2)
qc.h(0)
qc.cx(0, 1)
# Zero-copy dispatch to QubitEngine CUDA backend
backend = CudaBackend()
reg = QuantumRegister(2, backend)
# JIT compilation and execution
adapter = QiskitAdapter()
qubit_engine_circuit = adapter.convert(qc)
reg.execute(qubit_engine_circuit)
print(reg.state_vector())Distributed under the MIT License. See LICENSE for more information.