Add D-FINE Model into Transformers by VladOS95-cyber · Pull Request #36261 · huggingface/transformers

VladOS95-cyber · 2025-02-18T15:08:03Z

What does this PR do?

D-FINE, a powerful real-time object detector that achieves outstanding localization precision by redefining the bounding box regression task in DETR models. D-FINE comprises two key components: Fine-grained Distribution Refinement (FDR) and Global Optimal Localization Self-Distillation (GO-LSD).

As backbone, it is used HGNet-V2 (High Performance GPU Net) image classification model.

This PR add D-FINE and HGNet-V2 into the Transformers library. There is a new thing in transformers called modular, which adds new models by creating a modeling_modelname.py file. Since D-FINE updates several RT-DETR arch parts while keeping the rest of the model unchanged, it serves as an ideal use case for this modular approach.

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case. Request to add D-FINE #35283
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

@qubvel @Rocketknight1 @ArthurZucker @NielsRogge

VladOS95-cyber · 2025-02-18T15:10:00Z

Hi @qubvel! I decided to create a new PR as previous one (#35400) was broken, sorry for the mess. In this PR all the latest changes.
To sum up, I cleaned up modulars according to your comments. On this step we have two unresolved problems: 1. Wrong namespace import for modeling_d_fine_resnet.py where conversion script provides from .configuration_d_fine_resnet_resnet. 2. DFine config conversion using modular does not work properly because it provides a call super().init(is_encoder_decoder=is_encoder_decoder, **kwargs) too early in the very beginning of init and before all self.args assignments, which is wrong as well.

Cyrilvallez · 2025-02-18T21:06:07Z

Hey very sorry, I had very low bandwith last week as I was presenting at a conference! Are you still experiencing issue with how modular works, or is it resolved?

VladOS95-cyber · 2025-02-18T21:09:02Z

Hey very sorry, I had very low bandwith last week as I was presenting at a conference! Are you still experiencing issue with how modular works, or is it resolved?

Hey @Cyrilvallez! Thank you for your answer, yes, there are still some issues: 1. Wrong namespace import for modeling_d_fine_resnet.py where conversion script provides from .configuration_d_fine_resnet_resnet (too much resnet). 2. DFine config conversion using modular does not work properly because it provides a call super().init(is_encoder_decoder=is_encoder_decoder, **kwargs) too early in the very beginning of init and before all self.args assignments, which is wrong as well. It must be placed after all self.args assignments.

Cyrilvallez · 2025-02-19T19:47:48Z

Ok @VladOS95-cyber, I looked into it, thanks for the feedback 🤗

is indeed an issue, as you have 2 modular files in the same folder, this was unprecedented and we were not parsing correctly: [modular] allow multiple modular files in the same model folder #36287 will solve it
This is actually by design, as usually (but not always, true) super().__init__() calls are placed at the beginning. In this instance it should not be an issue, is it? Meaning that I don't think the order has any importance in this case

Cyrilvallez · 2025-02-19T19:54:43Z

Oh, or is 2. an issue because of the attribute_map? I may have seen this issue when reviewing RtDetrV2. But in this case, I would argue that using the attribute_map itself is usually very confusing, and it would definitely be best not to if possible (e.g. by renaming the arg directly)

VladOS95-cyber · 2025-02-19T20:54:12Z

Ok @VladOS95-cyber, I looked into it, thanks for the feedback 🤗

is indeed an issue, as you have 2 modular files in the same folder, this was unprecedented and we were not parsing correctly: [modular] allow multiple modular files in the same model folder #36287 will solve it

This is actually by design, as usually (but not always, true) super().__init__() calls are placed at the beginning. In this instance it should not be an issue, is it? Meaning that I don't think the order has any importance in this case

Hi @Cyrilvallez! Well, I suppose it is an issue in case of config and attribute map usage, yes. Because if we call super().init(kwargs**) in the beginning and do args.self assignment later, we override some attributes values that were passed before. In RTDetrV2 this problem was resolved by implementing config from the scratch, but in our case, we reuse RTDetrConfig, so this problem came up unfortunately.

VladOS95-cyber · 2025-02-19T20:54:41Z

@Cyrilvallez So, what do you suggest in this case?

qubvel

Thanks, @VladOS95-cyber for addressing the comments, refactoring, and adding type hints! I left a few minor comments, but it's in good shape. As the next step, let's also ensure:

Modular issues are resolved (cc @Cyrilvallez)
torch.compile (fullgraph) and torch.export are supported (see comments)
All checkpoints can be converted (please provide links for converted checkpoints on the Hub) and logits/boxes match the original implementation
Model can be fine-tuned (here is an example for RT-DETRv2, it would work with the change of checkpoint path only)
CI is happy: consistency, docs, style issues are resolved

Thanks for the great work!

VladOS95-cyber · 2025-02-25T07:47:33Z

Hi @qubvel! There is a short summary of what we've achieved so far by this moment:

Modular issues are resolved (cc @Cyrilvallez) - still in progress. Moreover, I would like to discuss what we are going to do with config and attribute map problem.
torch.compile (fullgraph) works and compiles the model, but I cannot execute this model later on inference as it is failed with exception I provided in comments. But I checked the behavior of other models in this case, it is the same. So, I assume either I do something wrong or it is expected.
torch.export is supported. I added it into tests and it was executed successfully.
All main checkpoints were converted: https://huggingface.co/vladislavbro/dfine_x_coco, https://huggingface.co/vladislavbro/dfine_l_coco, https://huggingface.co/vladislavbro/dfine_m_coco, https://huggingface.co/vladislavbro/dfine_s_coco. Model cards I'll add later. All logits/boxes are matched with original implementation
Model is able to be fine tuned, I checked on the script you provided and changed checkpoint on d-fine one, and it worked.
CI is not happy yet because it depends on issues from modular conversion. Once it will be resolved, CI will be completed.

qubvel

Hi @VladOS95-cyber, thanks for iterating and addressing comments! I left a few more comments, but in general looks great!

I am having difficulty running fine-tuning for the "m" checkpoint, while the "l" checkpoint is fine. Can you please double-check the modeling code, losses and initialization strategy? Here is the error I got:

[/usr/local/lib/python3.11/dist-packages/transformers/loss/loss_for_object_detection.py](https://localhost:8080/#) in generalized_box_iou(boxes1, boxes2)
    416     # so do an early check
    417     if not (boxes1[:, 2:] >= boxes1[:, :2]).all():
--> 418         raise ValueError(f"boxes1 must be in [x0, y0, x1, y1] (corner) format, but got {boxes1}")
    419     if not (boxes2[:, 2:] >= boxes2[:, :2]).all():
    420         raise ValueError(f"boxes2 must be in [x0, y0, x1, y1] (corner) format, but got {boxes2}")

ValueError: boxes1 must be in [x0, y0, x1, y1] (corner) format, but got tensor([[4.0848e-02, 6.0155e-02, 9.7476e-01, 9.4098e-01],
        [4.3698e-02, 1.0271e-01, 9.6435e-01, 8.6763e-01],
        [4.0578e-06, 6.4026e-06, 1.4135e-05, 2.2186e-05],
        ...,
        [9.9940e-01, 3.7292e-02, 1.0063e+00, 4.3024e-02],
        [9.8414e-01, 4.3545e-02, 1.0219e+00, 4.8508e-02],
        [7.8416e-01, 6.7056e-01, 8.3837e-01, 7.0462e-01]], device='cuda:0')

P.S. torch.compile(model, fullgraph=True) works fine with my env, thanks!

VladOS95-cyber · 2025-02-27T15:29:51Z

@qubvel do you know if anything was changed for transformers recently? I cannot download model for fine-tuning anymore.
Just one thing, I am using my remote branch for transformer package like: !uv run pip install --upgrade -q -U git+https://github.com/VladOS95-cyber/transformers.git@add-dfine-model . But it worked yesterday

model = AutoModelForObjectDetection.from_pretrained(
    checkpoint,
    id2label=id2label,
    label2id=label2id,
    ignore_mismatched_sizes=True)

It is failed with: 

RuntimeError: Error(s) in loading state_dict for DFineForObjectDetection:
	size mismatch for model.denoising_class_embed.weight: copying a param with shape torch.Size([81, 256]) from checkpoint, the shape in current model is torch.Size([6, 256]).
	size mismatch for model.enc_score_head.weight: copying a param with shape torch.Size([80, 256]) from checkpoint, the shape in current model is torch.Size([5, 256]).
	size mismatch for model.enc_score_head.bias: copying a param with shape torch.Size([80]) from checkpoint, the shape in current model is torch.Size([5]).
	size mismatch for model.decoder.class_embed.0.weight: copying a param with shape torch.Size([80, 256]) from checkpoint, the shape in current model is torch.Size([5, 256]).
	size mismatch for model.decoder.class_embed.0.bias: copying a param with shape torch.Size([80]) from checkpoint, the shape in current model is torch.Size([5]).
	size mismatch for model.decoder.class_embed.1.weight: copying a param with shape torch.Size([80, 256]) from checkpoint, the shape in current model is torch.Size([5, 256]).
	size mismatch for model.decoder.class_embed.1.bias: copying a param with shape torch.Size([80]) from checkpoint, the shape in current model is torch.Size([5]).
	size mismatch for model.decoder.class_embed.2.weight: copying a param with shape torch.Size([80, 256]) from checkpoint, the shape in current model is torch.Size([5, 256]).
	size mismatch for model.decoder.class_embed.2.bias: copying a param with shape torch.Size([80]) from checkpoint, the shape in current model is torch.Size([5]).
	size mismatch for model.decoder.class_embed.3.weight: copying a param with shape torch.Size([80, 256]) from checkpoint, the shape in current model is torch.Size([5, 256]).
	size mismatch for model.decoder.class_embed.3.bias: copying a param with shape torch.Size([80]) from checkpoint, the shape in current model is torch.Size([5]).

qubvel · 2025-02-27T18:21:52Z

@VladOS95-cyber hmm, yes, there ware some PRs regarding from_pretrained refactoring.. it might be broken, let me check

qubvel · 2025-02-28T15:30:02Z

PR with fix merged 👍 thanks for reporting

VladOS95-cyber · 2025-03-05T12:41:09Z

Hi @qubvel! Latest state: I added tests for resnet. I don't know how many times I checked modeling code but I could not find any problems that might be a root cause of this issue with fine-tuning. I made small fix but it did not help and it seems weird as loss calculation is always failed on different steps and sometimes is not failed at all. Returned scores and boxes are the same as in original implementation. There are some differenses in states, yes, but there is logical explanation for that and it should not be a problem. Unfortunately, I am a little bit stacked right now...

qubvel · 2025-03-05T18:04:03Z

Hey @VladOS95-cyber, thanks for checking. I suppose it's not a blocker, and maybe we can figure it out later. At least inference works for all models, and fine-tuning for bigger models also works fine. I will take a deep dive next week and try to resolve the remaining issues to get this merged.

VladOS95-cyber · 2025-03-05T18:24:02Z

@qubvel ok, thank you. Basically, all other things are fine. Just there are still couple unresolved problems of modular. It can be easily resolved, I think, we just need to make a decision about that.

VladOS95-cyber · 2025-03-10T12:11:11Z

Hey @qubvel! I am just curious, I found an issue #36516. Could it be the case that an existing issue with fine-tuning for D-FINE is related to this one?

qubvel · 2025-03-11T12:08:17Z

Well, it might be an issue, however, other models fine-tuning is OK with this dataset. This model might be a bit more sensitive to noisy labels, let me try to run with a clean one

qubvel · 2025-04-08T11:04:52Z

run-slow: d_fine, hgnet_v2

github-actions · 2025-04-08T11:06:11Z

This comment contains run-slow, running the specified jobs: This comment contains run-slow, running the specified jobs:

models: ['models/d_fine', 'models/hgnet_v2']
quantizations: [] ...

qubvel · 2025-04-08T11:56:23Z

cc @ArthurZucker for final review

The model is based on RT-DETR. The loss implementation is different, so we added some additional attributes to the RT-DETR output to propagate it through the model and avoid overriding all corresponding forward methods. It's a bit against our philosophy, but I believe it is worth it in order to make the code more standard and modular

qubvel · 2025-04-08T12:06:02Z

run-slow: d_fine, hgnet_v2

ArthurZucker

Happy to merge if you like it @qubvel ! 🤗
LGTM great review and great work!

HuggingFaceDocBuilderDev · 2025-04-29T10:42:24Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

qubvel · 2025-04-29T11:19:17Z

Merging! @VladOS95-cyber thanks a lot for contributing and iterating on the PR 🤗 Really huge work done!

VladOS95-cyber · 2025-04-29T11:29:48Z

Merging! @VladOS95-cyber thanks a lot for contributing and iterating on the PR 🤗 Really huge work done!

Thank you very much for your support and review!

* copy the last changes from broken PR * small format * some fixes and refactoring after review * format * add config attr for loss * some fixes and refactoring * fix copies * fix style * add test for d-fine resnet * fix decoder layer prop * fix dummies * format init * remove extra print * refactor modeling, move resnet into separate folder * fix resnet config * change resnet on hgnet_v2, add clamp into decoder * fix init * fix config doc * fix init * fix dummies * fix config docs * fix hgnet_v2 config typo * format modular * add image classification for hgnet, some refactoring * format tests * fix dummies * fix init * fix style * fix init for hgnet v2 * fix index.md, add init rnage for hgnet * fix conversion * add missing attr to encoder * add loss for d-fine, add additional output for rt-detr decoder * tests and docs fixes * fix rt_detr v2 conversion * some fixes for loos and decoder output * some fixes for loss * small fix for converted modeling * add n model config, some todo comments for modular * convert script adjustments and fixes, small refact * remove extra output for rt_detr * make some outputs optionsl, fix conversion * some posr merge fixes * small fix * last field fix * fix not split for hgnet_v2 * disable parallelism test for hgnet_v2 image classification * skip multi gpu for d-fine * adjust after merge init * remove extra comment * fix repo name references * small fixes for tests * Fix checkpoint path * Fix consistency * Fixing docs --------- Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

Matvezy · 2025-06-20T21:17:55Z

Hello @VladOS95-cyber @qubvel thanks a lot for adding D-FINE to transformers! I'm interested in fine-tuning and benchmarking your implementation on a custom dataset. I found a fine-tuning notebook example here, however I noticed that it doesn't fully follow the original D-FINE fine-tuning config. I specifically mean not using the same augmentations, optimizer parameters, and not using use EMA like in the original D-FINE configs here.
I wanted to ask if you previously tried fine-tuning with the original D-FINE recipe? I suspect that not having it in place may result in a lower benchmark score, so I wanted to ensure I try fine-tuning with the original recipe as well. Right now I'm just adding those changes myself, but I wanted to see if you may already have anything you tired that I can use?

qubvel · 2025-06-23T12:23:11Z

Hi @Matvezy! We did not adapt the full fine-tuning recipe for D-FINE. The notebook you mentioned is just a starting point that can be significantly improved with many techniques. We would appreciate any contribution, e.g. if you have some results on your fine-tuning or upgrade the notebook 🤗

VladOS95-cyber force-pushed the add-dfine-model branch from fa6144a to 4bf1909 Compare February 18, 2025 15:24

Cyrilvallez mentioned this pull request Feb 19, 2025

[modular] allow multiple modular files in the same model folder #36287

Open

qubvel added New model Vision labels Feb 20, 2025

qubvel self-requested a review February 21, 2025 11:54

qubvel reviewed Feb 21, 2025

View reviewed changes

VladOS95-cyber force-pushed the add-dfine-model branch 2 times, most recently from a78006a to 90a3536 Compare February 22, 2025 12:15

qubvel self-requested a review February 25, 2025 09:06

qubvel reviewed Feb 26, 2025

View reviewed changes

VladOS95-cyber force-pushed the add-dfine-model branch from 45f9f76 to 95c15f0 Compare February 27, 2025 15:13

qubvel mentioned this pull request Feb 27, 2025

Fix loading models with mismatched sizes #36463

Merged

VladOS95-cyber force-pushed the add-dfine-model branch 2 times, most recently from 5b2be76 to 2817dac Compare March 4, 2025 19:05

qubvel requested a review from ArthurZucker April 8, 2025 11:05

VladOS95-cyber added 2 commits April 8, 2025 13:43

disable parallelism test for hgnet_v2 image classification

730a45f

skip multi gpu for d-fine

dff2e53

huggingface deleted a comment from github-actions Bot Apr 8, 2025

ArthurZucker approved these changes Apr 11, 2025

View reviewed changes

Comment thread src/transformers/models/d_fine/modular_d_fine.py Outdated

VladOS95-cyber and others added 7 commits April 14, 2025 16:06

Merge remote-tracking branch 'upstream/main' into add-dfine-model

98921c2

adjust after merge init

3575df5

remove extra comment

3dc9bf8

Merge remote-tracking branch 'upstream/main' into add-dfine-model

b857d7f

fix repo name references

95dc472

small fixes for tests

706a851

Merge branch 'main' into add-dfine-model

f981b95

qubvel self-requested a review April 29, 2025 08:57

qubvel added 3 commits April 29, 2025 09:01

Fix checkpoint path

d8267e5

Fix consistency

ebaec43

Fixing docs

cf2396c

qubvel merged commit 4abeb50 into huggingface:main Apr 29, 2025
20 checks passed

This was referenced Apr 30, 2025

Add ONNX export support for D-FINE huggingface/optimum#2249

Merged

Add support for D-FINE huggingface/transformers.js#1303

Merged

NielsRogge mentioned this pull request May 11, 2025

Request to add D-FINE #35283

Closed

2 tasks

Conversation

VladOS95-cyber commented Feb 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Before submitting

Who can review?

Uh oh!

VladOS95-cyber commented Feb 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Cyrilvallez commented Feb 18, 2025

Uh oh!

VladOS95-cyber commented Feb 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Cyrilvallez commented Feb 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Cyrilvallez commented Feb 19, 2025

Uh oh!

VladOS95-cyber commented Feb 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

VladOS95-cyber commented Feb 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

qubvel left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

VladOS95-cyber commented Feb 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

qubvel left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

VladOS95-cyber commented Feb 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

qubvel commented Feb 27, 2025

Uh oh!

qubvel commented Feb 28, 2025

Uh oh!

VladOS95-cyber commented Mar 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

qubvel commented Mar 5, 2025

Uh oh!

VladOS95-cyber commented Mar 5, 2025

Uh oh!

VladOS95-cyber commented Mar 10, 2025

Uh oh!

qubvel commented Mar 11, 2025

Uh oh!

qubvel commented Apr 8, 2025

Uh oh!

github-actions Bot commented Apr 8, 2025

Uh oh!

qubvel commented Apr 8, 2025

Uh oh!

qubvel commented Apr 8, 2025

Uh oh!

VladOS95-cyber commented Feb 18, 2025 •

edited

Loading

VladOS95-cyber commented Feb 18, 2025 •

edited

Loading

VladOS95-cyber commented Feb 18, 2025 •

edited

Loading

Cyrilvallez commented Feb 19, 2025 •

edited

Loading

VladOS95-cyber commented Feb 19, 2025 •

edited

Loading

VladOS95-cyber commented Feb 19, 2025 •

edited

Loading

qubvel left a comment •

edited

Loading

VladOS95-cyber commented Feb 25, 2025 •

edited

Loading

qubvel left a comment •

edited

Loading

VladOS95-cyber commented Feb 27, 2025 •

edited

Loading

VladOS95-cyber commented Mar 5, 2025 •

edited

Loading