Skip to content

rparkr/lfm-coder

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

32 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

LFM-Coder: High-Performance RLVR for Small Language Models

Python 3.13+ License: MIT uv Ruff

Fine-tune LLMs to enhance coding capabilities using Reinforcement Learning from Verifiable Rewards (RLVR) with Group Relative Policy Optimization (GRPO). Includes a blazing-fast Python sandbox for safely running model-generated code.

Results

A model trained from this repository using only 1,000 examples from the OpenCoder dataset achieved a 49.1% improvement in coding performance on the MBPP benchmark while maintaining general capabilities:

benchmark results showing the change in model performance after fine-tuning

✨ Try out the trained model, explore the metrics during training, or analyze the training artifacts.

Why LFM-Coder?

Small language models (SLMs) are the key to fast, local coding agents, but they often struggle with complex programming tasks. Liquid AI's LFM2.5-1.2B-Instruct is exceptionally fast and efficient, but not optimized for coding out of the box.

LFM-Coder bridges this gap using RLVR. By training lightweight LoRA adapters (~22M parameters) with Hugging Face TRL, we provide the model with a high-fidelity execution environment to learn from real-time, verifiable feedback. This approach significantly enhances coding performance while maintaining the model's tiny footprint and general capabilities.

Key Innovations and Optimizations

This repository goes beyond basic fine-tuning by implementing a production-grade RLVR environment and training pipeline:

πŸš€ High-Performance Sandbox

  • Dual-Engine Architecture: Seamlessly alternates between a blazing-fast Rust-based Python interpreter (Monty) and full-featured Docker/Podman containers.
  • Massive Concurrency: Threaded execution across all CPU cores for both engines, enabling high-throughput reward computation essential for GRPO.
  • Smart Dependency Management: Packages are installed dynamically based on code requirements. Local caching ensures subsequent runs load instantaneously and can run without network access.
  • Enterprise-Grade Isolation: Configurable resource guards (CPU/memory), execution timeouts, and network isolation to ensure secure execution of model-generated code.

⚑ Training and Evaluation Efficiency

  • Asynchronous Pipelining: Overlaps GPU completion generation with CPU-based code verification to maximize hardware utilization and minimize idle time.
  • Optimized RLVR Pipeline: Leverages QLoRA (4-bit) and Liger kernels to enable advanced GRPO training on consumer hardware (8GB VRAM).
  • Fault-Tolerant Workflows: Robust state management with automatic resumption for both training and evaluation cycles.

πŸ“Š Data Quality and Integrity

  • Benchmark Sanitization: Identifies and repairs incorrect test cases in standard benchmarks (HumanEvalPlus/MBPPPlus) to ensure rigorous evaluation.
  • Automated Validation: Verifies all training examples against provided solutions to guarantee data quality before RLVR begins.
  • Granular Metrics: Heuristic-driven extraction that calculates per-test-case pass rates and provides detailed logs for model weakness analysis.

Getting Started: Training

1. Requirements

  • Hardware: Single GPU with 8GB VRAM (e.g., RTX 4060).
  • Tooling: uv installed.

2. Setup

git clone https://github.com/rparkr/lfm-coder.git && cd lfm-coder
export HF_TOKEN="your-hf-token"

3. Configuration

Update training_config.toml with your model_id and output_dir.

4. Run Training

# Dry run to verify configuration
uv run lfm-coder --dry-run

# Start full training
uv run lfm-coder

Using the Python Sandbox

You can use the high-performance sandbox in your own projects for safe execution of LLM-generated code.

Installation

uv add lfm-coder  # or pip install lfm-coder

Basic Usage

The Sandbox class automatically routes code between Monty (fast) and Docker (full support).

from lfm_coder.sandbox import Sandbox

sandbox = Sandbox()

# Batch execution (parallel)
results = sandbox.run(["1+1", "import math; math.sqrt(16)", "print('Hello')"])
for r in results:
    print(f"Stdout: {r.stdout} | Result: {r.result}")

Advanced: Automatic Fallback

code = """
import httpx  # Requires Docker fallback
r = httpx.get('https://example.com')
print(r.status_code)
"""
result = sandbox.run(code)

Note

The Docker sandbox requires either Podman (recommended) or Docker to be installed and running.

Project Roadmap and Stats

πŸ—ΊοΈ Status

  • Dual Sandboxes: MontySandbox + DockerSandbox with auto-routing.
  • Data Pipeline: Automated sampling, verification, and repair of benchmarks.
  • RLVR Training: GRPO integration with TRL and GPU optimizations.
  • Evaluation: Scoring module with GPU/CPU pipelining.
  • Ollama support: Fix chat template in fine-tuned GGUF model for multi-turn chat.

πŸ“Š Training Performance Metrics

Metric Monty Sandbox (Rust) Docker Sandbox (Container)
Execution Count 18,556 (77.3%) 5,444 (22.7%)
Avg. Speed 1.01 ms 2,577 ms
Median Speed 0.4 ms 2,240 ms
Success Rate 69.8% 35.8%
Throughput ~1,000 exec/sec ~0.4 exec/sec

Monty execution is 2,000x - 5,000x faster than the Docker fallback, providing the massive throughput required for efficient RLVR training.

Acknowledgments

License

Code: MIT license.
Model Weights: LFM license (Commercial restriction for >$10M revenue orgs).

About

GRPO with RLVR training Liquid AI's LFM 2.5-instruct model to enhance coding capabilities

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages