LFM-Coder: High-Performance RLVR for Small Language Models

Fine-tune LLMs to enhance coding capabilities using Reinforcement Learning from Verifiable Rewards (RLVR) with Group Relative Policy Optimization (GRPO). Includes a blazing-fast Python sandbox for safely running model-generated code.

Results

A model trained from this repository using only 1,000 examples from the OpenCoder dataset achieved a 49.1% improvement in coding performance on the MBPP benchmark while maintaining general capabilities:

✨ Try out the trained model, explore the metrics during training, or analyze the training artifacts.

Why LFM-Coder?

Small language models (SLMs) are the key to fast, local coding agents, but they often struggle with complex programming tasks. Liquid AI's LFM2.5-1.2B-Instruct is exceptionally fast and efficient, but not optimized for coding out of the box.

LFM-Coder bridges this gap using RLVR. By training lightweight LoRA adapters (~22M parameters) with Hugging Face TRL, we provide the model with a high-fidelity execution environment to learn from real-time, verifiable feedback. This approach significantly enhances coding performance while maintaining the model's tiny footprint and general capabilities.

Key Innovations and Optimizations

This repository goes beyond basic fine-tuning by implementing a production-grade RLVR environment and training pipeline:

🚀 High-Performance Sandbox

Dual-Engine Architecture: Seamlessly alternates between a blazing-fast Rust-based Python interpreter (Monty) and full-featured Docker/Podman containers.
Massive Concurrency: Threaded execution across all CPU cores for both engines, enabling high-throughput reward computation essential for GRPO.
Smart Dependency Management: Packages are installed dynamically based on code requirements. Local caching ensures subsequent runs load instantaneously and can run without network access.
Enterprise-Grade Isolation: Configurable resource guards (CPU/memory), execution timeouts, and network isolation to ensure secure execution of model-generated code.

⚡ Training and Evaluation Efficiency

Asynchronous Pipelining: Overlaps GPU completion generation with CPU-based code verification to maximize hardware utilization and minimize idle time.
Optimized RLVR Pipeline: Leverages QLoRA (4-bit) and Liger kernels to enable advanced GRPO training on consumer hardware (8GB VRAM).
Fault-Tolerant Workflows: Robust state management with automatic resumption for both training and evaluation cycles.

📊 Data Quality and Integrity

Benchmark Sanitization: Identifies and repairs incorrect test cases in standard benchmarks (HumanEvalPlus/MBPPPlus) to ensure rigorous evaluation.
Automated Validation: Verifies all training examples against provided solutions to guarantee data quality before RLVR begins.
Granular Metrics: Heuristic-driven extraction that calculates per-test-case pass rates and provides detailed logs for model weakness analysis.

Getting Started: Training

1. Requirements

Hardware: Single GPU with 8GB VRAM (e.g., RTX 4060).
Tooling: uv installed.

2. Setup

git clone https://github.com/rparkr/lfm-coder.git && cd lfm-coder
export HF_TOKEN="your-hf-token"

3. Configuration

Update training_config.toml with your model_id and output_dir.

4. Run Training

# Dry run to verify configuration
uv run lfm-coder --dry-run

# Start full training
uv run lfm-coder

Using the Python Sandbox

You can use the high-performance sandbox in your own projects for safe execution of LLM-generated code.

Installation

uv add lfm-coder  # or pip install lfm-coder

Basic Usage

The Sandbox class automatically routes code between Monty (fast) and Docker (full support).

from lfm_coder.sandbox import Sandbox

sandbox = Sandbox()

# Batch execution (parallel)
results = sandbox.run(["1+1", "import math; math.sqrt(16)", "print('Hello')"])
for r in results:
    print(f"Stdout: {r.stdout} | Result: {r.result}")

Advanced: Automatic Fallback

code = """
import httpx  # Requires Docker fallback
r = httpx.get('https://example.com')
print(r.status_code)
"""
result = sandbox.run(code)

Note

The Docker sandbox requires either Podman (recommended) or Docker to be installed and running.

Project Roadmap and Stats

🗺️ Status

Dual Sandboxes: MontySandbox + DockerSandbox with auto-routing.
Data Pipeline: Automated sampling, verification, and repair of benchmarks.
RLVR Training: GRPO integration with TRL and GPU optimizations.
Evaluation: Scoring module with GPU/CPU pipelining.
Ollama support: Fix chat template in fine-tuned GGUF model for multi-turn chat.

📊 Training Performance Metrics

Metric	Monty Sandbox (Rust)	Docker Sandbox (Container)
Execution Count	18,556 (77.3%)	5,444 (22.7%)
Avg. Speed	1.01 ms	2,577 ms
Median Speed	0.4 ms	2,240 ms
Success Rate	69.8%	35.8%
Throughput	~1,000 exec/sec	~0.4 exec/sec

Monty execution is 2,000x - 5,000x faster than the Docker fallback, providing the massive throughput required for efficient RLVR training.

Acknowledgments

pydantic-monty for the lightning-fast Python sandbox.
TRL and trackio for the RL framework and monitoring.
Evalplus for the benchmark datasets.
OpenCoder-LLM for training data.
Liquid AI for the LFM2.5 model and GRPO guidance.

License

Code: MIT license.
Model Weights: LFM license (Commercial restriction for >$10M revenue orgs).

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
.github/workflows		.github/workflows
images		images
openspec		openspec
scripts		scripts
src/lfm_coder		src/lfm_coder
tests		tests
.gitignore		.gitignore
.python-version		.python-version
AGENTS.md		AGENTS.md
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
training_config.toml		training_config.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LFM-Coder: High-Performance RLVR for Small Language Models

Results

Why LFM-Coder?

Key Innovations and Optimizations

🚀 High-Performance Sandbox

⚡ Training and Evaluation Efficiency

📊 Data Quality and Integrity

Getting Started: Training

1. Requirements

2. Setup

3. Configuration

4. Run Training

Using the Python Sandbox

Installation

Basic Usage

Advanced: Automatic Fallback

Project Roadmap and Stats

🗺️ Status

📊 Training Performance Metrics

Acknowledgments

License

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

LFM-Coder: High-Performance RLVR for Small Language Models

Results

Why LFM-Coder?

Key Innovations and Optimizations

🚀 High-Performance Sandbox

⚡ Training and Evaluation Efficiency

📊 Data Quality and Integrity

Getting Started: Training

1. Requirements

2. Setup

3. Configuration

4. Run Training

Using the Python Sandbox

Installation

Basic Usage

Advanced: Automatic Fallback

Project Roadmap and Stats

🗺️ Status

📊 Training Performance Metrics

Acknowledgments

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages