Skip to content
View CaraJ7's full-sized avatar

Organizations

@MME-Benchmarks

Block or report CaraJ7

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Implement search image generation similar to Nano-banana-pro / Seedream / FLUX.

Python 76 1 Updated Feb 3, 2026

[CVPR 2026] The official implementation of The paper "Are We Ready for RL in Text-to-3D Generation? A Progressive Investigation"

Python 105 1 Updated Feb 28, 2026

RealGen: Photorealistic Text-to-Image Generation via Detector-Guided Rewards.

Python 318 26 Updated Dec 29, 2025

Offical Repository for Paper: DraCo: Draft as CoT for Text-to-Image Preview and Rare Concept Generation

17 Updated Dec 7, 2025

The first Interleaved framework for textual reasoning within the visual generation process

158 1 Updated Nov 21, 2025

Are Video Models Ready as Zero-shot Reasoners?

Python 84 4 Updated Nov 24, 2025

ULMEvalKit: One-Stop Eval ToolKit for Image Generation

Python 56 2 Updated Dec 17, 2025

A huge collection of SVG logos

SVG 6,681 753 Updated Jul 21, 2025

Qwen-Image text to image lora trainer

Python 716 63 Updated Dec 16, 2025

A curated collection of fun and creative examples generated with Nano Banana & Nano Banana Pro🍌, Gemini-2.5-flash-image based model. We also release Nano-consistent-150K openly to support the commu…

21,125 2,174 Updated Dec 12, 2025

Echo-4o: Harnessing Proprietary Models’ Synthetic Images for Improved Image Generation

Jupyter Notebook 503 28 Updated Dec 9, 2025

A curated gallery and toolkit designed to provide inspiration for scientific illustrations, project sites, and visual storytelling in research.

977 28 Updated Feb 10, 2026

CLIP+MLP Aesthetic Score Predictor

Python 1,265 113 Updated Jul 1, 2024

Open-source unified multimodal model

Python 5,710 504 Updated Oct 27, 2025

[CVPR 2025 (Oral)] Open implementation of "RandAR"

Python 207 6 Updated Jul 14, 2025

Official codebase for "Self Forcing: Bridging Training and Inference in Autoregressive Video Diffusion" (NeurIPS 2025 Spotlight)

Python 3,175 245 Updated Sep 12, 2025

[NeurIPS 2025] MINT-CoT: Enabling Interleaved Visual Tokens in Mathematical Chain-of-Thought Reasoning

Python 101 5 Updated Sep 19, 2025

Resources and paper list for "Thinking with Images for LVLMs". This repository accompanies our survey on how LVLMs can leverage visual information for complex reasoning, planning, and generation.

1,346 41 Updated Feb 3, 2026

Jodi: Unification of Visual Generation and Understanding via Joint Modeling

Python 90 2 Updated Jun 19, 2025

CrossLMM: Decoupling Long Video Sequences from LMMs via Dual Cross-Attention Mechanisms

25 Updated Dec 21, 2025

Selftok: Discrete Visual Tokens of Autoregression, by Diffusion, and for Reasoning

Python 237 8 Updated May 30, 2025

Witness the aha moment of VLM with less than $3.

Python 4,036 287 Updated May 19, 2025
Python 4,578 448 Updated Sep 14, 2025

Awesome Unified Multimodal Models

1,131 36 Updated Feb 6, 2026

GPT-ImgEval: Evaluating GPT-4o’s state-of-the-art image generation capabilities

Python 305 8 Updated May 3, 2025

[ICLR 2025 Spotlight] The official implementation of the paper “LOKI:A Comprehensive Synthetic Data Detection Benchmark using Large Multimodal Models”

Python 175 4 Updated Feb 7, 2026

[ICCV 2025] Scaling Inference-Time Optimization for Text-to-Image Diffusion Models via Reflection Tuning

Python 217 12 Updated Nov 5, 2025

[NeurIPS 2025] T2I-R1: Reinforcing Image Generation with Collaborative Semantic-level and Token-level CoT

Python 430 24 Updated Sep 18, 2025

[ACM Computing Surveys] The collection of awesome papers on alignment of diffusion models.

406 17 Updated Feb 6, 2026

MMSearch-R1 is an end-to-end RL framework that enables LMMs to perform on-demand, multi-turn search with real-world multimodal search tools.

Python 408 21 Updated Aug 26, 2025
Next