Skip to content

MMMGBench/MMMG

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MMMG: A Massive, Multidisciplinary, Multi-Tier Generation Benchmark for Text-to-Image Reasoning

Official evaluation toolkit for MMMG: the Massive Multi-discipline Multi-tier Knowledge-Image Generation benchmark.


✨ Overview

teaser

MMMG is a large-scale benchmark designed to assess text-to-image (T2I) models on their ability to generate faithful and visually readable images based on knowledge-intensive prompts, spanning multiple academic disciplines and educational levels.

MMMG-Score is computed as:

MMMG-Score = Knowledge Fidelity (1 - GED) × Visual Readability (SAM2.1)

Where:

  • GED: Graph Edit Distance between predicted and ground-truth concept graphs.
  • SAM2.1: Visual readability score based on SAM2.1 segmentation accuracy.

📬 News

  • 2025.11.29 We have benchmarked Nano Banana Pro, which is currently the leading model.
  • 2025.9.19 Our work has been accepted by NeurIPS 2025!
  • 2025.6.10 The repository has been updated.

♻️ Installation

git clone https://github.com/MMMGBench/MMMG.git
cd MMMG
conda env create -f environment.yaml
conda activate mmmg

📊 Dataset Preparation

Place your generated images under the following structure:

/data/
 ├─ preschool/
 ├─ primaryschool/
 ├─ secondaryschool/
 ├─ highschool/
 ├─ undergraduate/
 └─ PhD/

Each folder contains model-generated images named as <prompt_key>.png.


💡 Run Evaluation

We use the Azure OpenAI service for knowledge integrity evaluation. If you use a different API interface (e.g., from OpenAI website), please modify:

mmmg_eval/step1_knowledge_integrity.py

Insert your API keys into:

mmmg_eval/utils/gpt_api_pool.py

Example: Evaluate GPT-4o Generations

python evaluate.py \
  --img_dir ./data/GPT-4o \
  --output_dir ./output \
  --sam2_ckpt /YOUR/PATH/TO/sam2/checkpoints/sam2.1_hiera_large.pt \
  --t2i_method GPT-4o \
  --api_name o3 \
  --hf_cache ./data/MMMG

Arguments

  • --img_dir: Path to generated images (organized by education tier).
  • --output_dir: Where evaluation logs and scores will be saved.
  • --sam2_ckpt: Path to the pretrained SAM2.1 checkpoint.
  • --t2i_method: Name of the T2I model under evaluation.
  • --api_name: LLM backend (e.g., gpt-4, gpt-4o, o3).
  • --hf_cache: Path to HuggingFace cache for loading ground-truth graphs.

📅 Citation

If you find MMMG helpful in your research, please consider citing our paper:

@inproceedings{luo2025mmmg,
  title={Mmmg: A massive, multidisciplinary, multi-tier generation benchmark for text-to-image reasoning},
  author={Luo, Yuxuan and Yuan, Yuhui and Chen, Junwen and Cai, Haonan and Yue, Ziyi and Yang, Yuwei and Daha, Fatima Zohra and Li, Ji and Lian, Zhouhui},
  booktitle={The Thirty-ninth Annual Conference on Neural Information Processing Systems Datasets and Benchmarks Track},
  year={2025}
}

About

MMMG: A Massive, Multidisciplinary, Multi-Tier Generation Benchmark for Text-to-Image Reasoning [NeurIPS 2025 Poster]

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages