Can Machines Understand Composition? A Dataset and Benchmark for Photographic Image Composition Embedding and Understanding
π CVPR 2025 Highlight
π CVPR 2025 Paper
π Supplementary Appendix
PICD is a large-scale dataset for photographic image composition analysis, currently containing 49,123 high-quality images annotated with 24 composition categories.
This dataset is intended to support the evaluation and advancement of composition learning in AI models. It is applicable to a wide range of tasks, including aesthetic quality assessment, composition-aware image cropping, and more. We encourage researchers and practitioners to explore creative uses of PICD.
The composition label system is structured along two axes:
- Element Types: Points, Lines, and Shapes (inspired by Kandinskyβs principles)
- Arrangement Patterns: Rule of Thirds, Centered, Diagonal, Vertical, Horizontal, Triangle, C-curve, O-curve, S-curve, Radial, Dense, Scatter, etc.
π For detailed category definitions and label design, please refer to the Appendix.
Figure 1. The PICD label system is structured along two axes: element types and arrangement patterns. Column 1 (green) shows arrangement types; Columns 2β4 show compositional element types. Categories are numbered 1β24 with abbreviations in blue. Red boxes indicate merged categories; blue strikethroughs mark excluded ones due to low frequency. Column 5 highlights dominant compositional factors.
Figure 2. Sample images from the 24 composition categories in PICD. Category abbreviations appear in blue parentheses.
PICD is actively maintained and will continue to be expanded. The current release includes:
- β 49,123 images
- β Verified composition category annotations
- β³ Negative samples (images not conforming to any predefined category) β coming soon
- β³ Composition quality scores β coming soon
- β³ Textual composition descriptions β coming soon
PICD consists of both image files and annotations.
1) Direct Access:
Image download is divided into two parts based on licensing:
Part 1: Images with redistribution permission
- This includes 44,577 images from open platforms and redistributable open-source datasets (e.g., Unsplash, Pexels).
- These images can be downloaded directly via the following link:
π Baidu Netdisk download link (with code 1517) π Google Drive download link
Part 2: Images requiring user-side access
- This includes 4546 images from public datasets that do not permit redistribution (e.g., AVA).
- For these, we provide a mapping file that links each PICD-assigned image ID to the original dataset image ID or URL. You may download the original images from their respective sources using this mapping:
π Image ID Mapping File
2) Alternative Access:
If you prefer to request both parts directly via email (especially for convenience or if you encounter access issues), please send a message to picd2025@outlook.com with your affiliation and intended use. We will respond with the download links after reviewing your request.
Accessing or using the dataset in any way implies agreement to the
π PICD Dataset Terms of Use (PDF)
- βοΈ Image Annotation File
π Download Annotations
This CSV file contains the following fields:img_id: The PICD image IDcategory_id: Index of the composition category (1β24)category_abbre: Abbreviated category label (as shown in Figure 2)category_full_name: Full name of the composition category
The mapping among category_id, category_abbre, and category_full_name follows the structure shown in Figure 1.
PICD is released under the
Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License (CC BY-NC-SA 4.0).
All users must also agree to the dataset-specific terms of use:
π PICD Dataset Terms of Use (PDF)
If you use PICD in your research, please cite:
@inproceedings{zhao2025can,
title={Can Machines Understand Composition? Dataset and Benchmark for Photographic Image Composition Embedding and Understanding},
author={Zhao, Zhaoran and Lu, Peng and Zhang, Anran and Li, Peipei and Li, Xia and Liu, Xuannan and Hu, Yang and Chen, Shiyi and Wang, Liwei and Guo, Wenhao},
booktitle={Proceedings of the Computer Vision and Pattern Recognition Conference},
pages={14411--14421},
year={2025}
}
