-
Zhejiang University
- Hangzhou, China
Stars
ClawPhD is an agent for research that can turn academic papers into publication-ready diagrams, posters, videos, and more.
OpenMMLab 3D Human Parametric Model Toolbox and Benchmark
The repository provides code for running inference with the SAM 3D Body Model (3DB), links for downloading the trained model checkpoints and datasets, and example notebooks that show how to use the…
The repository provides code for running inference and finetuning with the Meta Segment Anything Model 3 (SAM 3), links for downloading the trained model checkpoints, and example notebooks that sho…
Native Multimodal Models are World Learners
[CVPR 2025 Highlight] Video Depth Anything: Consistent Depth Estimation for Super-Long Videos
Thinking with Videos from Open-Source Priors. We reproduce chain-of-frames visual reasoning by fine-tuning open-source video models. Give it a star 🌟 if you find it useful.
A procedural Blender pipeline for photorealistic training image generation
MapAnything: Universal Feed-Forward Metric 3D Reconstruction
Official inference repo for FLUX.1 models
Enjoy the magic of Diffusion models!
[CVPR 2025] StreamingT2V: Consistent, Dynamic, and Extendable Long Video Generation from Text
CVPR 2024: Language Guided Generation of 3D Embodied AI Environments.
Extensible memoizing collections and decorators
GeoCalib: Learning Single-image Calibration with Geometric Optimization (ECCV 2024)
Reference PyTorch implementation and models for DINOv3
[ICCV 2025] Diffuman4D: 4D Consistent Human View Synthesis from Sparse-View Videos with Spatio-Temporal Diffusion Models
[CVPR 2025] EnvGS: Modeling View-Dependent Appearance with Environment Gaussian
[ICCV 2025] SpatialTrackerV2: 3D Point Tracking Made Easy
Code for "MatchAnything: Universal Cross-Modality Image Matching with Large-Scale Pre-Training", Arxiv 2025.
4DHumans: Reconstructing and Tracking Humans with Transformers
[CVPR 2025 Best Paper Award] VGGT: Visual Geometry Grounded Transformer
DeepStream SDK Python bindings and sample applications
PyTorch3D is FAIR's library of reusable components for deep learning with 3D data
Official repository for our work on micro-budget training of large-scale diffusion models.
[SIGGRAPH Asia 2023 (Technical Communications)] EasyVolcap: Accelerating Neural Volumetric Video Research
Official PyTorch Implementation of "Scalable Diffusion Models with Transformers"
High-resolution models for human tasks.
SkyReels-A2: Compose anything in video diffusion transformers
