InfiniteVGGT: Visual Geometry Grounded Transformer for Endless Streams

Shuai Yuan,¹ Yantai Yang,^{1, 2} Xiaotian Yang,¹ Xupeng Zhang,¹
Zhonghao Zhao,¹ Lingming Zhang, Zhipeng Zhang^{1 ✉}

¹AutoLab, School of Artificial Intelligence, Shanghai Jiao Tong University
²Anyverse Dynamics

^✉ Corresponding Author

Achieving higher reconstruction quality and more accurate camera pose estimation using thousands of frames input.

📰 News

[Jan 6 , 2026] Paper release.
[Jan 6 , 2026] Code release.
[Jan 19 , 2026] Long3D dataset release.

🔍 Recommendation

Welcome to check out our previous collaborative work FastVGGT.

📖 Overview

We propose InfiniteVGGT, a causal visual geometry transformer that utilizes a training-free rolling memory mechanism to enable stable, infinite-horizon streaming, and introduce the Long3D benchmark to rigorously evaluate long-term continuous 3D geometry performance. Our main contributions are summarized as follows:

An unbounded memory architecture InfiniteVGGT for continuous 3D geometry understanding, built on a novel, dynamic, and interpretable explicit memory system.
State-of-the-art performance on long-sequence benchmarks and a unique capability for robust, infinite-horizon reconstruction without memory overflow.
The Long3D benchmark, a new dataset for the rigorous evaluation of long-term performance, addressing a critical gap in the field.

🌍 Installation

Clone InfiniteVGGT

git clone https://github.com/AutoLab-SAI-SJTU/InfiniteVGGT.git
cd InfiniteVGGT

Create conda environment

conda create -n infinitevggt python=3.11 cmake=3.14.0
conda activate infinitevggt

Install requirements

pip install -r requirements.txt
conda install 'llvm-openmp<16'

Download the StreamVGGT pretrained checkpoint and place it to ./ckpt directory.

▶️ Run Inference

# Run on your own data
python run_inference.py --input_dir path/to/your/images_dir

# Run long sequence and store the result to directory for each frame
python run_inference.py \
    --input_dir path/to/your/images_dir \
    --frame_cache_dir path/to/your/results_perframe_dir \
    --no_cache_results

🚀 Run Demo

We provide demo code based on the NRGBD dataset. You can run it using the following command:

python demo_viser.py  \
    --seq_path path/to/nrgbd/image_sequence \
    --frame_interval 10 \
    --gt_path path/to/nrgbd/gt_camera (Optional)

🧊 Long3D Dataset

The Long3D Dataset is a benchmark designed for long-sequence 3D scene reconstruction. It provides 10Hz image streams paired with dense ground truth point clouds.

📊 Data Description

File Name	Description
`image.7z`	Continuous image stream data captured at a frequency of 10 Hz.
`dense_cloud_map.pcd`	Global ground truth point clouds, acquired via a 3D spatial scanner.

📥 Download Instructions

Option1: Hugging Face CLI:

The most efficient way to download the dataset is using the huggingface-hub CLI. Ensure you have the library installed (pip install -U huggingface_hub).

# export HF_ENDPOINT=https://hf-mirror.com
hf download --repo-type dataset \
    --resume-download AutoLab-SJTU/Long3D \
    --local-dir ./Long3D

Option2: Manual Access:

Alternatively, you can browse and download files directly from the Long3D dataset.

📋 Checklist

[ √ ] Release the Dataset.

🙏 Acknowledgement

We would like to acknowledge the following open-source projects that served as a foundation for our implementation:

DUSt3R CUT3R VGGT Point3R StreamVGGT FastVGGT TTT3R

Many thanks to these authors!

📜 Citation

If you incorporate our work into your research, please cite:

@misc{yuan2026infinitevggt,
        title={InfiniteVGGT: Visual Geometry Grounded Transformer for Endless Streams}, 
        author={Shuai Yuan and Yantai Yang and Xiaotian Yang and Xupeng Zhang and Zhonghao Zhao and Lingming Zhang and Zhipeng Zhang},
        journal={arXiv preprint arXiv:2601.02281},
        year={2026}
}

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
assets		assets
cloud_opt		cloud_opt
config		config
datasets_preprocess		datasets_preprocess
lib		lib
src		src
.gitignore		.gitignore
LICENSE.txt		LICENSE.txt
README.md		README.md
demo_viser.py		demo_viser.py
requirements.txt		requirements.txt
run_inference.py		run_inference.py
viser_utils.py		viser_utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

InfiniteVGGT: Visual Geometry Grounded Transformer for Endless Streams

📰 News

🔍 Recommendation

📖 Overview

🌍 Installation

▶️ Run Inference

🚀 Run Demo

🧊 Long3D Dataset

📊 Data Description

📥 Download Instructions

Option1: Hugging Face CLI:

Option2: Manual Access:

📋 Checklist

🙏 Acknowledgement

📜 Citation

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors 2

Languages

Folders and files

Latest commit

History

Repository files navigation

InfiniteVGGT: Visual Geometry Grounded Transformer for Endless Streams

📰 News

🔍 Recommendation

📖 Overview

🌍 Installation

▶️ Run Inference

🚀 Run Demo

🧊 Long3D Dataset

📊 Data Description

📥 Download Instructions

Option1: Hugging Face CLI:

Option2: Manual Access:

📋 Checklist

🙏 Acknowledgement

📜 Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors 2

Languages

Packages