Home

QubitEngine: Distributed Quantum Execution Framework

QubitEngine is a cloud-native, polyglot quantum simulation and execution framework engineered for latency-critical research, quantum machine learning (QML), and computational finance. It abstracts the von Neumann bottleneck inherent in massive state-vector simulations through a decoupled five-layer architecture, zero-copy IPC, and dynamic hardware dispatch.

📋 System Architecture

QubitEngine operates as a distributed service mesh rather than a monolithic script. Coordination and scheduling are handled by a Go-based orchestration layer, while the execution kernel runs heavily optimized C++20.

Polyglot Execution Mesh

┌─────────────────────────────────────────────────────────────────────────────┐
│                            Client / Application Layer                       │
│  ┌────────────────┐ ┌──────────────────┐ ┌────────────────┐ ┌────────────┐  │
│  │ Rust CLI (TUI) │ │ Python (PyBind11)│ │ Web Dashboard  │ │ Domain APIs│  │
│  │   (cli-rs)     │ │ + Torch Quantum  │ │  (React/WASM)  │ │ (Fin/Phys) │  │
│  └───────┬────────┘ └────────┬─────────┘ └───────┬────────┘ └──────┬─────┘  │
└──────────┼───────────────────┼───────────────────┼─────────────────┼────────┘
           │                   │                   │                 │
           │  gRPC / Protobuf  │                   │ gRPC-Web        │
           ▼                   ▼                   ▼                 ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│                             Go Orchestration Mesh                           │
│                     ┌───────────────┐   ┌────────────────┐                  │
│                     │ Job Scheduler │◄─►│ Result Cache   │                  │
│                     └───────┬───────┘   └────────────────┘                  │
│                             │                                               │
│                     ┌───────▼───────┐   ┌────────────────┐                  │
│                     │ Redis Queue   │◄─►│   Registry     │                  │
│                     └───────┬───────┘   └────────────────┘                  │
└─────────────────────────────┼───────────────────────────────────────────────┘
                              │
                              │  POSIX Shared Memory (Zero-Copy IPC)
                              ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│                           C++20 Quantum Kernel                              │
│  ┌────────────────┐ ┌──────────────────┐ ┌───────────────────────────────┐  │
│  │  QuantumJIT    │ │ Q. Differentiator│ │  IQuantumBackend (Polymorphic)│  │
│  │ (O3 Optimizer) │ │ (Adjoint / PSR)  │ │ ┌─────┐ ┌─────┐ ┌─────┐ ┌───┐ │  │
│  └────────────────┘ └──────────────────┘ │ │CUDA │ │Metal│ │ MPS │ │AVX│ │  │
│                                          │ └─────┘ └─────┘ └─────┘ └───┘ │  │
└──────────────────────────────────────────┴───────────────────────────────┘

⚡ Core Subsystems

1. Memory Wall Mitigation & Zero-Copy IPC

To bypass the severe serialization penalties of moving $2^n$ state-vector amplitudes across process boundaries, QubitEngine implements zero-copy inter-process communication:

POSIX Shared Memory: The Go sidecars, Python processes, and C++ backend map raw simulation arrays directly from OS paging memory via shm_descriptor.
Single-Precision SIMD: Downgrading the state vector to std::complex<float> halves memory bandwidth saturation, directly accelerating full-vector broadcast operations (e.g., $H^{\otimes n}$) against the von Neumann bottleneck.
NumPy Anchoring: pybind11::buffer_info directly exposes the C++ memory allocator lifecycle to Python, eliminating deep copies during iterative algorithms like VQE.

2. Hardware-Agnostic Backend Polymorphism

The QuantumRegister dynamically dispatches intermediate representations to the most optimal IQuantumBackend implementation based on topology and hardware availability:

CUDA: Multi-GPU sharded execution utilizing NCCL for cluster-scale state vector distribution.
Apple Metal: Asynchronous command queues (MetalContext) allowing concurrent CPU execution while GPU shaders crunch gate linear algebra.
Matrix Product State (MPS): SVD-truncated tensor network backend capable of simulating >50 qubits for weakly entangled states.
Stabilizer: Highly optimized Clifford-pure simulation backend for error correction evaluation.
AVX2/CPU: Thread-safe execution via OpenMP with fused SIMD intrinsics.

3. JIT Compiler (`QuantumJIT`)

Executes intermediate representation (CircuitIR) transformations on a background thread. The O3 optimization tier automatically applies:

Adjacent inverse cancellation ($U U^\dagger = I$).
Aggressive $2 \times 2$ and $4 \times 4$ unitary matrix fusions.
Linear mapping reordering to optimize memory access patterns for specific hardware topologies.

4. Differentiable Quantum Computing (`QuantumDifferentiator`)

Natively supports integration with deep learning frameworks (e.g., PyTorch via torch_quantum.py) through dual-gradient calculation methods:

Adjoint Differentiation: Optimal for deep variational circuits. Operates with $O(1)$ forward passes and $O(L)$ backward passes by unwinding the recorded circuit tape in reverse.
Parameter-Shift Rule (PSR): Provides exact analytical gradients for hardware backends without relying on finite-difference approximations, natively distributed across MPI ranks.

🌐 Cloud-Native Deployment

QubitEngine is designed for Kubernetes deployments using standard Horizontal Pod Autoscaling (HPA) governed by custom metrics.

Autoscaling Mesh

The Go Scheduler exposes a :2112/metrics endpoint mapping Redis queue depth to Prometheus. The K8s HPA predictively scales the bare-metal backend pods (engine-deployment.yaml) based on active simulation congestion, isolating the lightweight application layer from the heavy compute nodes.

Multi-Node Cluster Setup (MPI)

For state-vectors exceeding single-node RAM limitations, deployment via mpi-cluster.yaml provides distributed scaling.

# Deploy full stack to Kubernetes
helm install qubit-engine ./deploy/helm/qubit-engine -f values.yaml

# Local development (Docker Compose)
docker-compose up --build

🚀 Quick Start (Python SDK)

The Python SDK acts as a direct wrapper around the C++ bindings with built-in adapters for Qiskit portability.

pip install qubit_engine

from qubit_engine import QuantumRegister, CudaBackend
from qubit_engine.adapters import QiskitAdapter
from qiskit import QuantumCircuit

# Standard Qiskit definition
qc = QuantumCircuit(2)
qc.h(0)
qc.cx(0, 1)

# Zero-copy dispatch to QubitEngine CUDA backend
backend = CudaBackend()
reg = QuantumRegister(2, backend)

# JIT compilation and execution
adapter = QiskitAdapter()
qubit_engine_circuit = adapter.convert(qc)
reg.execute(qubit_engine_circuit)

print(reg.state_vector())

📜 License

Distributed under the MIT License. See LICENSE for more information.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Home

QubitEngine: Distributed Quantum Execution Framework

📋 System Architecture

Polyglot Execution Mesh

⚡ Core Subsystems

1. Memory Wall Mitigation & Zero-Copy IPC

2. Hardware-Agnostic Backend Polymorphism

3. JIT Compiler (`QuantumJIT`)

4. Differentiable Quantum Computing (`QuantumDifferentiator`)

🌐 Cloud-Native Deployment

Autoscaling Mesh

Multi-Node Cluster Setup (MPI)

🚀 Quick Start (Python SDK)

📜 License

Clone this wiki locally

Uh oh!

Home

QubitEngine: Distributed Quantum Execution Framework

📋 System Architecture

Polyglot Execution Mesh

⚡ Core Subsystems

1. Memory Wall Mitigation & Zero-Copy IPC

2. Hardware-Agnostic Backend Polymorphism

3. JIT Compiler (QuantumJIT)

4. Differentiable Quantum Computing (QuantumDifferentiator)

🌐 Cloud-Native Deployment

Autoscaling Mesh

Multi-Node Cluster Setup (MPI)

🚀 Quick Start (Python SDK)

📜 License

Clone this wiki locally

3. JIT Compiler (`QuantumJIT`)

4. Differentiable Quantum Computing (`QuantumDifferentiator`)