Think with 3D: Geometric Imagination Grounded Spatial Reasoning from Limited Views

Tsinghua University SIGS; Meituan

Zhangquan Chen, Manyuan Zhang, Xinlei Yu, Xufang Luo, Mingze Sun, Zihao Pan,

Yan Feng, Peng Pei, Xunliang Cai, Ruqi Huang

Overview

Instruction

Though recent advances in vision–language models (VLMs) have achieved remarkable progress across a wide range of multimodal tasks, understanding 3D spatial relationships from limited views remains a significant challenge. Previous reasoning methods typically rely on pure text (e.g., topological cognitive maps) or on 2D visual cues. However, their limited representational capacity hinders performance in specific tasks that require 3D spatial imagination. To address this limitation, we propose 3DThinker, a framework that can effectively exploits the rich geometric information embedded within images while reasoning, like humans do. Our framework is the first to enable 3D mentaling during reasoning without any 3D prior input, and it does not rely on explicitly labeled 3D data for training Specifically, our training consists of two stages. First, we perform supervised training to align the 3D latent generated by VLM while reasoning with that of a 3D foundation model (e.g., VGGT). Then, we optimize the entire reasoning trajectory solely based on outcome signals, thereby refining the underlying 3D mentaling. Extensive experiments across multiple benchmarks show that 3DThinker consistently outperforms strong baselines and offers a new perspective toward unifying 3D representations into multimodal reasoning.

Changelog

[25/11/18] We release the code for 3DThinker, both stage1 and stage2. See Training for usage.

[25/11/25] We fix some bugs of 3D latent assignment, and release the evaluation result of 3DThinker-Qwen2.5-VL-3B.

[25/12/05] We replace the full data with the example case, for rechecking.

[26/02/10] We release the data and model for training on Mindcube_Train and testing on MindCube-Tiny. See One Case for details.

Env Setup

git clone https://github.com/zhangquanchen/3DThinker.git
cd 3DThinker

3DThinker-stage1

conda create -n 3DThinker-stage1 python=3.10 -y && conda activate 3DThinker-stage1
pip install -r envs/requirements_stage1.txt

3DThinker-stage2

conda create -n 3DThinker-stage2 python=3.10 -y && conda activate 3DThinker-stage2
bash 3dthinker/stage2/setup.sh

If the installed trl version conflicts with our repository, replace it with the local copy by running:

cp -rf 3dthinker/stage2/package/trl /home/tiger/anaconda3/envs/3DThinker-stage2/lib/python3.10/site-packages/

Remark：You can refer to envs/requirements_stage2.txt to configure the environment.

SFT

Follow LLaMA-Factory for environment setup, which is for SFT training and weight merging.

conda create -n SFT python=3.10 -y && conda activate SFT
cd SFT/env
pip install -e ".[torch,metrics]" --no-build-isolation

Remark：You can refer to envs/requirements_sft.txt to configure the environment.

Data Generation

You should first download data from here, which incude MindCube_train_raw_qa_qwen_sft.json and images. The idx.jsonl under data folder is the data with idx. Then, following the two steps below, you will get data_output3d_begin_10k_resized.jsonl for training.

VGGT feature extraction: Download the VGGT-1B weight and place under models.

python preprocessing/feature/extract_vggt_feature.py

After doing this, you will get data/feature_vggt(vggt feature) and data/resized_images(images resized for training).

CoT data generation and filtering:

## produce chain-of-thought data
python preprocessing/produce_cot.py
## remove non-compliant data, e.g., w/o </output>
python preprocessing/clean.py
## filter useless data
python preprocessing/remove.py
## match VGGT indices
python preprocessing/jsonl_add_idx.py

After doing these, you will get data_output3d_begin_10k_resized.jsonl under data folder.

Remark：example.jsonl is an example dataset for training.

Training

Once you finish data preprocesing, you could conduct the training of 3DThinker now!

Supervised Training Prepare your base model under models (e.g., Qwen2.5-VL-3B).

conda activate 3DThinker-stage1
cd 3dthinker/stage1 && sh train.sh

Reinforced Training

conda activate 3DThinker-stage2
cd 3dthinker/stage2 && bash run_scripts/train.sh
## merge the weight
conda activate SFT && llamafactory-cli export merge.yaml

Evaluation

You can follow the official code of MindCube for evaluation. Specifically, you need to organize the benchmark according to the requirements, and then run the following script.

cd eval
sh eval_mindcube.sh
sh get_result.sh

For other base models, you can use eval_xxx.py for evaluation.

Other benchmarks including Ego3D-Bench, VSI-Bench, SPBench, CV-Bench, SPAR-Bench, ViewSpatial-Bench, MMSI-Bench.

OneCase

A case for training on Mindcube_Train and testing on MindCube-Tiny are listed below:

Training data, which is for MindCube training.
Model, which is trained after stage1 on Qwen2.5-3B-VL. Note that Tab. 2 is trained on a different training data.

Acknowledgements

The repo also benifits form Mirage, trl, transformers, VLM-R1, MindCube, VGGT.

Thanks for their wonderful works.

Bibtex

If you find 3DThinker helpful for your work, please cite

@article{chen2025think,
  title={Think with 3D: Geometric Imagination Grounded Spatial Reasoning from Limited Views},
  author={Chen, Zhangquan and Zhang, Manyuan and Yu, Xinlei and Luo, Xufang and Sun, Mingze and Pan, Zihao and Feng, Yan and Pei, Peng and Cai, Xunliang and Huang, Ruqi},
  journal={arXiv preprint arXiv:2510.18632},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
3dthinker		3dthinker
SFT		SFT
assets		assets
data		data
envs		envs
eval		eval
preprocessing		preprocessing
tests		tests
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Think with 3D: Geometric Imagination Grounded Spatial Reasoning from Limited Views

Overview

Instruction

Changelog

Env Setup

3DThinker-stage1

3DThinker-stage2

SFT

Data Generation

Training

Evaluation

OneCase

Acknowledgements

Bibtex

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 1

Languages

Folders and files

Latest commit

History

Repository files navigation

Think with 3D: Geometric Imagination Grounded Spatial Reasoning from Limited Views

Overview

Instruction

Changelog

Env Setup

3DThinker-stage1

3DThinker-stage2

SFT

Data Generation

Training

Evaluation

OneCase

Acknowledgements

Bibtex

About

Resources

License

Code of conduct

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 1

Languages

Packages