T2ICount: Enhancing Cross-modal Understanding for Zero-Shot Counting (CVPR2025)

Paper (ArXiv)

Official Implementation for CVPR 2025 paper T2ICount: Enhancing Cross-modal Understanding for Zero-Shot Counting.

Preparation

Environment: Create a virtural environment use Anaconda, and install all dependencies.

conda env create -f environment.yaml

Data: We conduct experiments over three datasets, you can download and use whichever you would like to test. The three dataset could be downloaded at: FSC-147 | CARPK. Notice that you have to download the annoations of FSC-147 separately from their repo.

Extract and put the downloaded data in the data/ dir. The complete file structure should look like this. You don't have to download all the dataset for evaluation, but you must have FSC-147 if you want to train the model.

data
├─CARPK/
│  ├─Annotations/
│  ├─Images/
│  ├─ImageSets/
│
├─FSC/    
│  ├─gt_density_map_adaptive_384_VarV2/
│  ├─images_384_VarV2/
│  ├─FSC_147/
│  │  ├─ImageClasses_FSC147.txt
│  │  ├─Train_Test_Val_FSC_147.json
│  │  ├─ annotation_FSC147_384.json

Stable Diffusion: Our model is developed by fine-tuning Stable Diffusion v1.5, whose original weights can be downloaded from here. Please put the downloaded weight file in the configs/ dir.

FSC-147-S-v2

During the review process, the reviewers raised concerns regarding the dataset. In response, we conducted a thorough reassessment and introduced a revised version, which we named FSC-147-S-v2. This updated version includes an additional set of images, bringing the total to 230. As a result, the statistics of v2 differ from those originally reported in the paper. In this new subset, the objects originally annotated in these images from FSC-147 had an average count of 44.98, while the newly annotated objects have an average count of 3.96. The results from the baseline methods and our method are provided here. For the updated dataset (v2), please refer to FSC-147-S.json. As for the original subset used in the paper, you can download it here. We sincerely apologize for any confusion caused.

Medthod	MAE	RMSE
CLIP-Count	45.59	98.96
CountX	28.67	89.18
VLCounter	33.10	69.34
PseCo	30.53	43.92
DAVE	46.36	97.11
T2ICount (Ours)	5.99	10.55

We hope that this small subset can serve as an evaluation set to verify whether a model is truly performing zero-shot object counting.

Train

Once you have prepared the data and the pretrained weights of SD1.5, you can train the model using the following command.

We have tested the reproducibility of this code and obtained consistent results, the training log is provided along with the reproduced model.

CUDA_VISIBLE_DEVICES=0 python train.py --content exp --crop-size 384 --concat-size 224 --data-dir data/FSC --batch-size 16 --lr 5e-5 --weight-decay 5e-5

Evaluation and the pretrained model

We provide a pre-trained ckpt of our full model, which is the exact model used to demonstrate the performance results presented in the paper.

FSC val MAE	FSC val RMSE	FSC test MAE	FSC test RMSE	CARPK MAE	CARPK RMSE
13.78	58.78	11.76	97.86	8.61	13.47

FSC S-v2 MAE	FSC S-v2 MSE
5.99	10.55

CUDA_VISIBLE_DEVICES=0 python test.py --model-path ./best_model_paper.pth --data fsc147(or carpk) --batch-size 16 --dataset_type FSC --ckpt path/to/model.ckpt

Gallery

Citation

Consider cite us if you find our paper is useful in your research :).

@inproceedings{qian2025t2icount,
               title={T2ICount: Enhancing Cross-modal Understanding for Zero-Shot Counting}, 
               author={Qian, Yifei and Guo, Zhongliang and Deng, Bowen and Lei, Chun Tong and Zhao, Shuai and Lau, Chun Pong and Hong, Xiaopeng and Pound, Michael P},
               year={2025},
               booktitle={Proceedings of the IEEE/CVF conference on computer vision and pattern recognition}
}

Name		Name	Last commit message	Last commit date
Latest commit History 139 Commits
asset		asset
configs		configs
datasets		datasets
ldm		ldm
logs		logs
models		models
utils		utils
FSC-147-S.json		FSC-147-S.json
README.md		README.md
environment.yaml		environment.yaml
simple_subset_test.py		simple_subset_test.py
test.py		test.py
train.py		train.py
visualize.py		visualize.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

T2ICount: Enhancing Cross-modal Understanding for Zero-Shot Counting (CVPR2025)

Paper (ArXiv)

Preparation

FSC-147-S-v2

Train

Evaluation and the pretrained model

Gallery

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

T2ICount: Enhancing Cross-modal Understanding for Zero-Shot Counting (CVPR2025)

Paper (ArXiv)

Preparation

FSC-147-S-v2

Train

Evaluation and the pretrained model

Gallery

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages