DEAL-300K: Diffusion-based Editing Area Localization with a 300K-Scale Dataset and Frequency-Prompted Baseline
This repository serves as the official repository for DEAL-300K. It contains the DEAL-300K dataset along with other tools mentioned in the paper. The code and related pre-trained models will be released after the paper is accepted.
- Release DEAL-300K dataset.
- Release SAM-CD pre-trained weights.
- Release Qwen-VL pre-trained weights.
- Release MFPT pre-trained weights.
- Release Full Code.
The advent of diffusion-based image editing techniques has revolutionized image manipulation, providing intuitive, semantic-level editing capabilities. These advancements significantly lower the barrier for non-experts to produce high-quality edits but also raise concerns regarding potential misuse. Traditional datasets, primarily focused on binary classification of diffusion-generated images or localization of manual manipulations, do not address the challenges posed by diffusion-based edits, which blend seamlessly with the original content. In response, we introduce the Diffusion-Based Image Editing Area Localization Dataset (DEAL-300K), a novel dataset comprising over 300,000 annotated images specifically designed for diffusion-based image manipulation localization (DIML). Our dataset generation leverages multimodal large language models (MLLMs) for instruction-driven editing, combined with an active learning annotation process, ensuring both diversity and quality at an unprecedented scale. Additionally, we present a novel benchmarking approach that combines Visual Foundation Models (VFMs) with Multi-Frequency Prompt Tuning (MFPT), capturing the intricate details of diffusion-edited regions. Our thorough evaluation highlights the effectiveness of our method, achieving an impressive pixel-level F1 score of
The dataset has been uploaded to OneDrive.
Training Set Images can be downloaded from train.zip
Val Set Images can be downloaded from val.zip
Testing Set Images can be downloaded from test.zip
Labels can be downloaded from label.zip
2026/1/21 Now, we also provide the Hugging Face download link.
Our dataset is based on InstructionPix2Pix, with all instructions generated by the fine-tuned Qwen-VL. You can view all the instructions in instructions. The original images come from the MS COCO dataset. The word cloud of the editing instructions is shown in the image below.
| Dataset | Year | Source Images | Edited Images | Image Size | Scenario | Generative Model |
|---|---|---|---|---|---|---|
| CoCoGlide | 2023 | 512 | 512 | General | Glide | |
| AutoSplice | 2023 | 2,273 | 3,621 | General | DALL-E2 | |
| MagicBrush | 2023 | 5,313 | 10,388 | General | DALL-E2 | |
| Repaint-P2/CelebA-HQ | 2024 | 10,800 | 41,472 | Face | Repaint | |
| DEAL-300K | 2024 Apr | 119,371 | 221,097 | General | InstructPix2Pix |
Some random examples from the training set
Our work is built upon the foundational work of MS COCO, InstructPix2Pix, Qwen-VL, ISAT and SAM-CD.


















