Skip to content

ymhzyj/DEAL-300K

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DEAL-300K: Diffusion-based Editing Area Localization with a 300K-Scale Dataset and Frequency-Prompted Baseline

Description

This repository serves as the official repository for DEAL-300K. It contains the DEAL-300K dataset along with other tools mentioned in the paper. The code and related pre-trained models will be released after the paper is accepted.

TODO List

  • Release DEAL-300K dataset.
  • Release SAM-CD pre-trained weights.
  • Release Qwen-VL pre-trained weights.
  • Release MFPT pre-trained weights.
  • Release Full Code.

Overview

The advent of diffusion-based image editing techniques has revolutionized image manipulation, providing intuitive, semantic-level editing capabilities. These advancements significantly lower the barrier for non-experts to produce high-quality edits but also raise concerns regarding potential misuse. Traditional datasets, primarily focused on binary classification of diffusion-generated images or localization of manual manipulations, do not address the challenges posed by diffusion-based edits, which blend seamlessly with the original content. In response, we introduce the Diffusion-Based Image Editing Area Localization Dataset (DEAL-300K), a novel dataset comprising over 300,000 annotated images specifically designed for diffusion-based image manipulation localization (DIML). Our dataset generation leverages multimodal large language models (MLLMs) for instruction-driven editing, combined with an active learning annotation process, ensuring both diversity and quality at an unprecedented scale. Additionally, we present a novel benchmarking approach that combines Visual Foundation Models (VFMs) with Multi-Frequency Prompt Tuning (MFPT), capturing the intricate details of diffusion-edited regions. Our thorough evaluation highlights the effectiveness of our method, achieving an impressive pixel-level F1 score of $82.56%$ on our specialized test set and $80.97%$ on the external CoCoGlide set, demonstrating its strong performance across different datasets.

DEAL-300K dataset

Download

The dataset has been uploaded to OneDrive.

Training Set Images can be downloaded from train.zip

Val Set Images can be downloaded from val.zip

Testing Set Images can be downloaded from test.zip

Labels can be downloaded from label.zip

2026/1/21 Now, we also provide the Hugging Face download link.

Instructions

Our dataset is based on InstructionPix2Pix, with all instructions generated by the fine-tuned Qwen-VL. You can view all the instructions in instructions. The original images come from the MS COCO dataset. The word cloud of the editing instructions is shown in the image below.

Word Cloud

Quantitative comparison of DEAL-300K to existing publicly available DIML dataset.

Dataset Year Source Images Edited Images Image Size Scenario Generative Model
CoCoGlide 2023 512 512 $256 \times 256$ General Glide
AutoSplice 2023 2,273 3,621 $256 \times 256 - 4232 \times 4232$ General DALL-E2
MagicBrush 2023 5,313 10,388 $1024 \times 1024$ General DALL-E2
Repaint-P2/CelebA-HQ 2024 10,800 41,472 $256 \times 256$ Face Repaint
DEAL-300K 2024 Apr 119,371 221,097 $128 \times 512 - 512 \times 576$ General InstructPix2Pix

Visualization

Some random examples from the training set

Image ori 1 Image ori 2 Image ori 2
Image ori 1 Image ori 2 Image ori 2
Image ori 1 Image ori 2 Image ori 2
Image ori 1 Image ori 2 Image ori 2
Image ori 1 Image ori 2 Image ori 2
Image ori 1 Image ori 2 Image ori 2

Acknowledgments

Our work is built upon the foundational work of MS COCO, InstructPix2Pix, Qwen-VL, ISAT and SAM-CD.

About

DEAL-300K: Diffusion-based Editing Area Localization with a 300K-Scale Dataset and Frequency-Prompted Baseline

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors