Skip to content

mbzuai-oryx/EvoLMM

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

15 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

EvoLMM: Self-Evolving Large Multimodal Models with Continuous Rewards [CVPR 2026 (Findings) πŸ”₯]

Project Page HuggingFace arXiv

Abstract

EvoLMM couples a Proposer and Solver built on the same vision-language backbone and trains them end-to-end with continuous, self-consistency rewards. The Proposer generates image-grounded questions while the Solver answers them; both are optimized via KL-regularized REINFORCE with adaptive baselines and lightweight LoRA adapters. The framework needs only raw images (no labels or external reward models) and delivers ~2–3% absolute gains on multimodal math/diagram reasoning benchmarks over the Qwen2.5-VL baseline.

architecture-figure

Repository layout

  • src/train.py: core training loop, LoRA setup, adaptive KL, checkpoints, and logging.
  • src/train.sh: example hyperparameters for Qwen2.5-VL-7B with LoRA.
  • Evaluation/lmms-eval: evaluation harness (based on lmms-eval) with a ready-made script.
  • inference.py: Inference script using the LoRA checkpoints.

Setup

  1. Install Python dependencies:
    pip install -r requirements.txt
  2. (Optional) Set cache paths/tokens, e.g.:
    export HF_HOME=/workspace/cache
    export HF_TOKEN=<your_hf_token>

Data Preparation

Training only needs images (no annotations). By default the loader scans images/train and all first-level subfolders recursively. Expected layout:

images/
  train/
    split1/          # any subfolder names are accepted
      img_001.jpg
      ...
    split2/
      ...
  • Use --data_dir /path/to/images/train to point to your root.
  • To restrict to certain subfolders, pass --include_subfolders=split1,split2.
  • Corrupted images are skipped; sampling is deterministic given --seed.

Training

Baseline LoRA recipe (from src/train.sh) for Qwen2.5-VL-7B:

python src/train.py \
  --data_dir /path/to/images/train \
  --solver_model Qwen/Qwen2.5-VL-7B-Instruct \
  --proposer_model Qwen/Qwen2.5-VL-7B-Instruct \
  --use_lora_solver --use_lora_proposer \
  --lora_targets q_proj,k_proj,v_proj,o_proj,gate_proj,up_proj,down_proj,mm_projector \
  --lora_r 16 --lora_alpha 32 --lora_dropout 0.05 \
  --num_solver_samples 5 --proposer_update_freq 5 --total_steps 16180 \
  --kl_target 0.020 --kl_adapt_rate 0.10 \
  --solver_soft_gamma 0.7 \
  --wandb_mode online --wandb_project sqlmm_main --wandb_run_name exp1 \
  --clear_cache_every 10

Notes:

  • Set --device, --dtype, and --device_map for your hardware (defaults use CUDA if available).
  • Checkpoints and per-iteration logs land in runs/<run_name>/.
  • Adaptive resume is supported: keep --wandb_run_name fixed and checkpoints under runs/ to auto-restore weights/optimizers/RNG.

Inference

Inference script using the LoRA checkpoints from Huggingface (you can use your own LoRA checkpoints)

python inference.py

Evaluation

The evaluation harness in Evaluation/lmms-eval mirrors the training backbone. Example to evaluate a LoRA checkpoint on ChartQA:

cd Evaluation/lmms-eval
pip install -e .
export HF_HOME=/workspace/cache
export HF_TOKEN=<your_hf_token>

accelerate launch --num_processes=8 --main_process_port=12346 -m lmms_eval \
  --model qwen2_5_vl_our \
  --model_args=pretrained=Qwen/Qwen2.5-VL-7B-Instruct,base_model=Qwen/Qwen2.5-VL-7B-Instruct,lora_path=/path/to/runs/exp1/step_xxxxx/solver,max_pixels=12845056,interleave_visuals=False \
  --tasks chartqa \
  --batch_size 1 \
  --output_path /workspace/lmms-eval/eval_results/exp1 \

Replace lora_path with the checkpoint directory you want to test. Additional tasks (MathVista, MathVision, etc.) are supported via --tasks.

Results (Qwen2.5-VL-7B, zero labels)

Model ChartQA MathVista MathVision MathVerse InfoGraphic-VQAval AI2D ScienceQA MMMUval
Qwen2.5-VL-7B (baseline) 84.00 68.46 23.91 43.78 80.44 82.61 88.30 51.11
Qwen2.5-VL-7B + Discrete reward 84.62 68.88 22.52 42.10 80.52 82.18 87.98 50.84
Qwen2.5-VL-7B + Continuous reward (EvoLMM) 86.70 70.52 24.81 44.88 81.06 83.41 89.50 52.01

Scaling behaviour of our EvoLMM self-evolving framework across model sizes in the Qwen2.5-VL family

Model ChartQA MathVista MathVision MathVerse InfoGraphic-VQA_val AI2D ScienceQA MMMU_val
Qwen2.5-VL-7B (Base) 84.00 68.20 23.91 43.78 80.44 82.61 88.30 51.11
Qwen2.5-VL-7B (EvoLMM) 86.70 70.52 24.81 44.88 81.06 83.41 89.50 52.01
Qwen2.5-VL-72B (Base) 88.20 73.93 36.92 54.09 85.97 87.34 93.36 65.86
Qwen2.5-VL-72B (EvoLMM) 91.04 76.44 38.31 55.45 86.63 88.19 94.63 67.02

For additional ablations (LoRA vs. QLoRA/full fine-tune) and other backbones (InternVL3-8B, Gemma-3-12B, Llama-3.2-11B-Vision), see arxiv.

πŸ“œ Citation

@misc{thawakar2025evolmmselfevolvinglargemultimodal,
      title={EvoLMM: Self-Evolving Large Multimodal Models with Continuous Rewards}, 
      author={Omkar Thawakar and Shravan Venkatraman and Ritesh Thawkar and Abdelrahman Shaker and Hisham Cholakkal and Rao Muhammad Anwer and Salman Khan and Fahad Khan},
      year={2025},
      eprint={2511.16672},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2511.16672}, 
}

About

[CVPR 2026 (Findings) πŸ”₯πŸ”₯] Self Evolving Large Multimodal Models with Continuous Rewards

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages