Skip to content

AutoLab-SAI-SJTU/InfiniteVGGT

Repository files navigation

Logo InfiniteVGGT: Visual Geometry Grounded Transformer for Endless Streams

Autolab Logo    Shuai Yuan,1   Yantai Yang,1, 2   Xiaotian Yang,1   Xupeng Zhang,1  
Zhonghao Zhao,1   Lingming Zhang,   Zhipeng Zhang1 ✉  

1AutoLab, School of Artificial Intelligence, Shanghai Jiao Tong University  
2Anyverse Dynamics

Corresponding Author

Paper PDF Hugging Face

Achieving higher reconstruction quality and more accurate camera pose estimation using thousands of frames input.

📰 News

  • [Jan 6 , 2026] Paper release.
  • [Jan 6 , 2026] Code release.
  • [Jan 19 , 2026] Long3D dataset release.

🔍 Recommendation

  • Welcome to check out our previous collaborative work FastVGGT.

📖 Overview

We propose InfiniteVGGT, a causal visual geometry transformer that utilizes a training-free rolling memory mechanism to enable stable, infinite-horizon streaming, and introduce the Long3D benchmark to rigorously evaluate long-term continuous 3D geometry performance. Our main contributions are summarized as follows:

  1. An unbounded memory architecture InfiniteVGGT for continuous 3D geometry understanding, built on a novel, dynamic, and interpretable explicit memory system.
  2. State-of-the-art performance on long-sequence benchmarks and a unique capability for robust, infinite-horizon reconstruction without memory overflow.
  3. The Long3D benchmark, a new dataset for the rigorous evaluation of long-term performance, addressing a critical gap in the field.

🌍 Installation

  1. Clone InfiniteVGGT
git clone https://github.com/AutoLab-SAI-SJTU/InfiniteVGGT.git
cd InfiniteVGGT
  1. Create conda environment
conda create -n infinitevggt python=3.11 cmake=3.14.0
conda activate infinitevggt 
  1. Install requirements
pip install -r requirements.txt
conda install 'llvm-openmp<16'
  1. Download the StreamVGGT pretrained checkpoint and place it to ./ckpt directory.

▶️ Run Inference

# Run on your own data
python run_inference.py --input_dir path/to/your/images_dir

# Run long sequence and store the result to directory for each frame
python run_inference.py \
    --input_dir path/to/your/images_dir \
    --frame_cache_dir path/to/your/results_perframe_dir \
    --no_cache_results

🚀 Run Demo

We provide demo code based on the NRGBD dataset. You can run it using the following command:

python demo_viser.py  \
    --seq_path path/to/nrgbd/image_sequence \
    --frame_interval 10 \
    --gt_path path/to/nrgbd/gt_camera (Optional)

🧊 Long3D Dataset

The Long3D Dataset is a benchmark designed for long-sequence 3D scene reconstruction. It provides 10Hz image streams paired with dense ground truth point clouds.

📊 Data Description

File Name Description
image.7z Continuous image stream data captured at a frequency of 10 Hz.
dense_cloud_map.pcd Global ground truth point clouds, acquired via a 3D spatial scanner.

📥 Download Instructions

Option1: Hugging Face CLI:

The most efficient way to download the dataset is using the huggingface-hub CLI. Ensure you have the library installed (pip install -U huggingface_hub).

# export HF_ENDPOINT=https://hf-mirror.com
hf download --repo-type dataset \
    --resume-download AutoLab-SJTU/Long3D \
    --local-dir ./Long3D

Option2: Manual Access:

Alternatively, you can browse and download files directly from the Long3D dataset.

📋 Checklist

  • [ √ ] Release the Dataset.

🙏 Acknowledgement

We would like to acknowledge the following open-source projects that served as a foundation for our implementation:

DUSt3R CUT3R VGGT Point3R StreamVGGT FastVGGT TTT3R

Many thanks to these authors!

📜 Citation

If you incorporate our work into your research, please cite:

@misc{yuan2026infinitevggt,
        title={InfiniteVGGT: Visual Geometry Grounded Transformer for Endless Streams}, 
        author={Shuai Yuan and Yantai Yang and Xiaotian Yang and Xupeng Zhang and Zhonghao Zhao and Lingming Zhang and Zhipeng Zhang},
        journal={arXiv preprint arXiv:2601.02281},
        year={2026}
}

About

The official implementation of InfiniteVGGT

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Languages