GitHub - ymhzyj/DEAL-300K: DEAL-300K: Diffusion-based Editing Area Localization with a 300K-Scale Dataset and Frequency-Prompted Baseline

DEAL-300K: Diffusion-based Editing Area Localization with a 300K-Scale Dataset and Frequency-Prompted Baseline

Description

This repository serves as the official repository for DEAL-300K. It contains the DEAL-300K dataset along with other tools mentioned in the paper. The code and related pre-trained models will be released after the paper is accepted.

TODO List

Release DEAL-300K dataset.
Release SAM-CD pre-trained weights.
Release Qwen-VL pre-trained weights.
Release MFPT pre-trained weights.
Release Full Code.

Overview

The advent of diffusion-based image editing techniques has revolutionized image manipulation, providing intuitive, semantic-level editing capabilities. These advancements significantly lower the barrier for non-experts to produce high-quality edits but also raise concerns regarding potential misuse. Traditional datasets, primarily focused on binary classification of diffusion-generated images or localization of manual manipulations, do not address the challenges posed by diffusion-based edits, which blend seamlessly with the original content. In response, we introduce the Diffusion-Based Image Editing Area Localization Dataset (DEAL-300K), a novel dataset comprising over 300,000 annotated images specifically designed for diffusion-based image manipulation localization (DIML). Our dataset generation leverages multimodal large language models (MLLMs) for instruction-driven editing, combined with an active learning annotation process, ensuring both diversity and quality at an unprecedented scale. Additionally, we present a novel benchmarking approach that combines Visual Foundation Models (VFMs) with Multi-Frequency Prompt Tuning (MFPT), capturing the intricate details of diffusion-edited regions. Our thorough evaluation highlights the effectiveness of our method, achieving an impressive pixel-level F1 score of $82.56%$ on our specialized test set and $80.97%$ on the external CoCoGlide set, demonstrating its strong performance across different datasets.

DEAL-300K dataset

Download

The dataset has been uploaded to OneDrive.

Training Set Images can be downloaded from train.zip

Val Set Images can be downloaded from val.zip

Testing Set Images can be downloaded from test.zip

Labels can be downloaded from label.zip

2026/1/21 Now, we also provide the Hugging Face download link.

Instructions

Our dataset is based on InstructionPix2Pix, with all instructions generated by the fine-tuned Qwen-VL. You can view all the instructions in instructions. The original images come from the MS COCO dataset. The word cloud of the editing instructions is shown in the image below.

Quantitative comparison of DEAL-300K to existing publicly available DIML dataset.

Dataset	Year	Source Images	Edited Images	Image Size	Scenario	Generative Model
CoCoGlide	2023	512	512	$256 \times 256$	General	Glide
AutoSplice	2023	2,273	3,621	$256 \times 256 - 4232 \times 4232$	General	DALL-E2
MagicBrush	2023	5,313	10,388	$1024 \times 1024$	General	DALL-E2
Repaint-P2/CelebA-HQ	2024	10,800	41,472	$256 \times 256$	Face	Repaint
DEAL-300K	2024 Apr	119,371	221,097	$128 \times 512 - 512 \times 576$	General	InstructPix2Pix