-
SII and Fudan University
- Shanghai
- Homepage: https://codegoat24.github.io/
- https://scholar.google.com.hk/citations?user=FQeuWTYAAAAJ&hl=zh-CN
Stars
[CVPR 2026🔥] Enhancing Spatial Understanding in Image Generation via Reward Modeling
Official repo for "GeoVista: Web-Augmented Agentic Visual Reasoning for Geolocalization"
Offline implementation of UniREditBench: A Unified Reasoning-based Image Editing Benchmark.
MMMG: A Massive, Multidisciplinary, Multi-Tier Generation Benchmark for Text-to-Image Reasoning [NeurIPS 2025 Poster]
Rethinking Semantic-level Building Change Detection: Ensemble Learning and Dynamic Interaction
Official implementation of RLFR: Extending Reinforcement Learning for LLMs with Flow Environment
[CVPR 2026] Fine-Grained GRPO for Precise Preference Alignment in Flow Models
[NeurIPS 2025] Fractional Langevin Dynamics for Combinatorial Optimization via Polynomial-Time Escape
[ICLR 2026] An official implementation of "CapRL: Stimulating Dense Image Caption Capabilities via Reinforcement Learning"
[ICLR 2026] An official implementation of "SIM-CoT: Supervised Implicit Chain-of-Thought"
Project Imaging-X: A Survey of 1000+ Open-Access Medical Imaging Datasets for Foundation Model Development
Lumina-DiMOO - An Open-Sourced Multi-Modal Large Diffusion Language Model
A curated collection of papers, datasets, and resources on Scientific Datasets and Large Language Models (LLMs)
Official implementation of Pref-GRPO: Pairwise Preference Reward-based GRPO for Stable Text-to-Image Reinforcement Learning
UniGenBench++: A Unified Semantic Evaluation Benchmark for Text-to-Image Generation
[ICLR 2026] Official implementation of DiCache: Let Diffusion Model Determine Its Own Cache
Official implementation of "SEAgent: Self-Evolving Computer Use Agent with Autonomous Learning from Experience"
(ICLR 2026)Official repository of 'ScaleCap: Inference-Time Scalable Image Captioning via Dual-Modality Debiasing’
WISE: A World Knowledge-Informed Semantic Evaluation for Text-to-Image Generation
Official implementation of UnifiedReward & UnifiedReward-Think
Where Paths Collide: A Comprehensive Survey of Classic and Learning-Based Multi-Agent Pathfinding
EasyR1: An Efficient, Scalable, Multi-Modality RL Training Framework based on veRL
Video-R1: Reinforcing Video Reasoning in MLLMs [🔥the first paper to explore R1 for video]
Official repository of "CoMP: Continual Multimodal Pre-training for Vision Foundation Models"
From Head to Tail: Efficient Black-box Model Inversion Attack via Long-tailed Learning - CVPR 2025
[ICCV 2025] Light-A-Video: Training-free Video Relighting via Progressive Light Fusion
We introduce ADAM, An emboDied causal Agent in Minecraft, that can autonomously navigate the open world, perceive multimodal contexts, learn causal world knowledge, and tackle complex tasks through…
