Hybrid Mamba-2 + Transformer 2.94B LLM (Nemotron-H style) — Korean 3B model pretrained from scratch on 7× NVIDIA B200 GPUs with SFT + DPO alignment
-
Updated
Mar 26, 2026 - Python
Hybrid Mamba-2 + Transformer 2.94B LLM (Nemotron-H style) — Korean 3B model pretrained from scratch on 7× NVIDIA B200 GPUs with SFT + DPO alignment
Hybrid SSM-Attention language model on Apple Silicon with MLX — interleaving Mamba-2 and Transformer for efficient inference
A 390M-parameter Mamba2 + Differential Attention hybrid language model
VerDee: Vertical Deep Network on Mamba-2 — staged LoRA (shallow/mid/deep), early-exit routing, and domain experts. Pilot on 370M; 16GB GPU friendly. Research/experiment repo.
GLACIER: Mamba with infinite memory. This project integrates the Mamba SSM with ICE-Lite, a virtual memory engine, to solve context rot. By adding persistent, time-aware memory, GLACIER gives Mamba the long-term recall of a Transformer while retaining its $O(N)$ speed. Apache 2.0 licensed, by Dopove.
Build Your Own Mamba — From Math to Metal
A simple, minimalistic, and explainable JAX implementation of Mamba 2 & Mamba 3
Add a description, image, and links to the mamba2 topic page so that developers can more easily learn about it.
To associate your repository with the mamba2 topic, visit your repo's landing page and select "manage topics."