Skip to content

Add D-FINE Model into Transformers#36261

Merged
qubvel merged 58 commits into
huggingface:mainfrom
VladOS95-cyber:add-dfine-model
Apr 29, 2025
Merged

Add D-FINE Model into Transformers#36261
qubvel merged 58 commits into
huggingface:mainfrom
VladOS95-cyber:add-dfine-model

Conversation

@VladOS95-cyber
Copy link
Copy Markdown
Contributor

@VladOS95-cyber VladOS95-cyber commented Feb 18, 2025

What does this PR do?

D-FINE, a powerful real-time object detector that achieves outstanding localization precision by redefining the bounding box regression task in DETR models. D-FINE comprises two key components: Fine-grained Distribution Refinement (FDR) and Global Optimal Localization Self-Distillation (GO-LSD).

As backbone, it is used HGNet-V2 (High Performance GPU Net) image classification model.

This PR add D-FINE and HGNet-V2 into the Transformers library. There is a new thing in transformers called modular, which adds new models by creating a modeling_modelname.py file. Since D-FINE updates several RT-DETR arch parts while keeping the rest of the model unchanged, it serves as an ideal use case for this modular approach.

Before submitting

Who can review?

@qubvel @Rocketknight1 @ArthurZucker @NielsRogge

@VladOS95-cyber
Copy link
Copy Markdown
Contributor Author

VladOS95-cyber commented Feb 18, 2025

Hi @qubvel! I decided to create a new PR as previous one (#35400) was broken, sorry for the mess. In this PR all the latest changes.
To sum up, I cleaned up modulars according to your comments. On this step we have two unresolved problems: 1. Wrong namespace import for modeling_d_fine_resnet.py where conversion script provides from .configuration_d_fine_resnet_resnet. 2. DFine config conversion using modular does not work properly because it provides a call super().init(is_encoder_decoder=is_encoder_decoder, **kwargs) too early in the very beginning of init and before all self.args assignments, which is wrong as well.

@Cyrilvallez
Copy link
Copy Markdown
Member

Hey very sorry, I had very low bandwith last week as I was presenting at a conference! Are you still experiencing issue with how modular works, or is it resolved?

@VladOS95-cyber
Copy link
Copy Markdown
Contributor Author

VladOS95-cyber commented Feb 18, 2025

Hey very sorry, I had very low bandwith last week as I was presenting at a conference! Are you still experiencing issue with how modular works, or is it resolved?

Hey @Cyrilvallez! Thank you for your answer, yes, there are still some issues: 1. Wrong namespace import for modeling_d_fine_resnet.py where conversion script provides from .configuration_d_fine_resnet_resnet (too much resnet). 2. DFine config conversion using modular does not work properly because it provides a call super().init(is_encoder_decoder=is_encoder_decoder, **kwargs) too early in the very beginning of init and before all self.args assignments, which is wrong as well. It must be placed after all self.args assignments.

@Cyrilvallez
Copy link
Copy Markdown
Member

Cyrilvallez commented Feb 19, 2025

Ok @VladOS95-cyber, I looked into it, thanks for the feedback 🤗

  1. is indeed an issue, as you have 2 modular files in the same folder, this was unprecedented and we were not parsing correctly: [modular] allow multiple modular files in the same model folder #36287 will solve it
  2. This is actually by design, as usually (but not always, true) super().__init__() calls are placed at the beginning. In this instance it should not be an issue, is it? Meaning that I don't think the order has any importance in this case

@Cyrilvallez
Copy link
Copy Markdown
Member

Oh, or is 2. an issue because of the attribute_map? I may have seen this issue when reviewing RtDetrV2. But in this case, I would argue that using the attribute_map itself is usually very confusing, and it would definitely be best not to if possible (e.g. by renaming the arg directly)

@VladOS95-cyber
Copy link
Copy Markdown
Contributor Author

VladOS95-cyber commented Feb 19, 2025

Ok @VladOS95-cyber, I looked into it, thanks for the feedback 🤗

  1. is indeed an issue, as you have 2 modular files in the same folder, this was unprecedented and we were not parsing correctly: [modular] allow multiple modular files in the same model folder #36287 will solve it
  2. This is actually by design, as usually (but not always, true) super().__init__() calls are placed at the beginning. In this instance it should not be an issue, is it? Meaning that I don't think the order has any importance in this case

Hi @Cyrilvallez! Well, I suppose it is an issue in case of config and attribute map usage, yes. Because if we call super().init(kwargs**) in the beginning and do args.self assignment later, we override some attributes values that were passed before. In RTDetrV2 this problem was resolved by implementing config from the scratch, but in our case, we reuse RTDetrConfig, so this problem came up unfortunately.

@VladOS95-cyber
Copy link
Copy Markdown
Contributor Author

VladOS95-cyber commented Feb 19, 2025

@Cyrilvallez So, what do you suggest in this case?

Copy link
Copy Markdown
Contributor

@qubvel qubvel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @VladOS95-cyber for addressing the comments, refactoring, and adding type hints! I left a few minor comments, but it's in good shape. As the next step, let's also ensure:

  • Modular issues are resolved (cc @Cyrilvallez)
  • torch.compile (fullgraph) and torch.export are supported (see comments)
  • All checkpoints can be converted (please provide links for converted checkpoints on the Hub) and logits/boxes match the original implementation
  • Model can be fine-tuned (here is an example for RT-DETRv2, it would work with the change of checkpoint path only)
  • CI is happy: consistency, docs, style issues are resolved

Thanks for the great work!

Comment thread docs/source/en/model_doc/d_fine.md Outdated
Comment thread docs/source/en/model_doc/d_fine.md Outdated
Comment thread src/transformers/models/d_fine/__init__.py Outdated
Comment thread src/transformers/models/d_fine/configuration_d_fine.py Outdated
Comment thread src/transformers/models/d_fine/configuration_d_fine_resnet.py Outdated
Comment thread src/transformers/models/d_fine/modular_d_fine.py Outdated
Comment thread src/transformers/models/d_fine/modular_d_fine_resnet.py Outdated
Comment thread src/transformers/models/d_fine/modular_d_fine_resnet.py Outdated
Comment thread src/transformers/models/d_fine/modular_d_fine_resnet.py Outdated
Comment thread tests/models/d_fine/test_modeling_d_fine.py
@VladOS95-cyber VladOS95-cyber force-pushed the add-dfine-model branch 2 times, most recently from a78006a to 90a3536 Compare February 22, 2025 12:15
@VladOS95-cyber
Copy link
Copy Markdown
Contributor Author

VladOS95-cyber commented Feb 25, 2025

Hi @qubvel! There is a short summary of what we've achieved so far by this moment:

  1. Modular issues are resolved (cc @Cyrilvallez) - still in progress. Moreover, I would like to discuss what we are going to do with config and attribute map problem.
  2. torch.compile (fullgraph) works and compiles the model, but I cannot execute this model later on inference as it is failed with exception I provided in comments. But I checked the behavior of other models in this case, it is the same. So, I assume either I do something wrong or it is expected.
  3. torch.export is supported. I added it into tests and it was executed successfully.
  4. All main checkpoints were converted: https://huggingface.co/vladislavbro/dfine_x_coco, https://huggingface.co/vladislavbro/dfine_l_coco, https://huggingface.co/vladislavbro/dfine_m_coco, https://huggingface.co/vladislavbro/dfine_s_coco. Model cards I'll add later. All logits/boxes are matched with original implementation
  5. Model is able to be fine tuned, I checked on the script you provided and changed checkpoint on d-fine one, and it worked.
  6. CI is not happy yet because it depends on issues from modular conversion. Once it will be resolved, CI will be completed.

@qubvel qubvel self-requested a review February 25, 2025 09:06
Copy link
Copy Markdown
Contributor

@qubvel qubvel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @VladOS95-cyber, thanks for iterating and addressing comments! I left a few more comments, but in general looks great!

I am having difficulty running fine-tuning for the "m" checkpoint, while the "l" checkpoint is fine. Can you please double-check the modeling code, losses and initialization strategy? Here is the error I got:

[/usr/local/lib/python3.11/dist-packages/transformers/loss/loss_for_object_detection.py](https://localhost:8080/#) in generalized_box_iou(boxes1, boxes2)
    416     # so do an early check
    417     if not (boxes1[:, 2:] >= boxes1[:, :2]).all():
--> 418         raise ValueError(f"boxes1 must be in [x0, y0, x1, y1] (corner) format, but got {boxes1}")
    419     if not (boxes2[:, 2:] >= boxes2[:, :2]).all():
    420         raise ValueError(f"boxes2 must be in [x0, y0, x1, y1] (corner) format, but got {boxes2}")

ValueError: boxes1 must be in [x0, y0, x1, y1] (corner) format, but got tensor([[4.0848e-02, 6.0155e-02, 9.7476e-01, 9.4098e-01],
        [4.3698e-02, 1.0271e-01, 9.6435e-01, 8.6763e-01],
        [4.0578e-06, 6.4026e-06, 1.4135e-05, 2.2186e-05],
        ...,
        [9.9940e-01, 3.7292e-02, 1.0063e+00, 4.3024e-02],
        [9.8414e-01, 4.3545e-02, 1.0219e+00, 4.8508e-02],
        [7.8416e-01, 6.7056e-01, 8.3837e-01, 7.0462e-01]], device='cuda:0')

P.S. torch.compile(model, fullgraph=True) works fine with my env, thanks!

Comment thread src/transformers/models/d_fine/modular_d_fine_resnet.py Outdated
Comment thread src/transformers/models/d_fine/modeling_d_fine_resnet.py Outdated
Comment thread src/transformers/models/d_fine/modeling_d_fine_resnet.py Outdated
Comment thread src/transformers/models/d_fine/modeling_d_fine_resnet.py Outdated
Comment thread src/transformers/models/d_fine/modeling_d_fine_resnet.py Outdated
Comment thread src/transformers/models/d_fine/modular_d_fine.py Outdated
Comment thread src/transformers/models/d_fine/modeling_d_fine.py Outdated
@VladOS95-cyber
Copy link
Copy Markdown
Contributor Author

VladOS95-cyber commented Feb 27, 2025

@qubvel do you know if anything was changed for transformers recently? I cannot download model for fine-tuning anymore.
Just one thing, I am using my remote branch for transformer package like: !uv run pip install --upgrade -q -U git+https://github.com/VladOS95-cyber/transformers.git@add-dfine-model . But it worked yesterday

model = AutoModelForObjectDetection.from_pretrained(
    checkpoint,
    id2label=id2label,
    label2id=label2id,
    ignore_mismatched_sizes=True)

It is failed with: 

RuntimeError: Error(s) in loading state_dict for DFineForObjectDetection:
	size mismatch for model.denoising_class_embed.weight: copying a param with shape torch.Size([81, 256]) from checkpoint, the shape in current model is torch.Size([6, 256]).
	size mismatch for model.enc_score_head.weight: copying a param with shape torch.Size([80, 256]) from checkpoint, the shape in current model is torch.Size([5, 256]).
	size mismatch for model.enc_score_head.bias: copying a param with shape torch.Size([80]) from checkpoint, the shape in current model is torch.Size([5]).
	size mismatch for model.decoder.class_embed.0.weight: copying a param with shape torch.Size([80, 256]) from checkpoint, the shape in current model is torch.Size([5, 256]).
	size mismatch for model.decoder.class_embed.0.bias: copying a param with shape torch.Size([80]) from checkpoint, the shape in current model is torch.Size([5]).
	size mismatch for model.decoder.class_embed.1.weight: copying a param with shape torch.Size([80, 256]) from checkpoint, the shape in current model is torch.Size([5, 256]).
	size mismatch for model.decoder.class_embed.1.bias: copying a param with shape torch.Size([80]) from checkpoint, the shape in current model is torch.Size([5]).
	size mismatch for model.decoder.class_embed.2.weight: copying a param with shape torch.Size([80, 256]) from checkpoint, the shape in current model is torch.Size([5, 256]).
	size mismatch for model.decoder.class_embed.2.bias: copying a param with shape torch.Size([80]) from checkpoint, the shape in current model is torch.Size([5]).
	size mismatch for model.decoder.class_embed.3.weight: copying a param with shape torch.Size([80, 256]) from checkpoint, the shape in current model is torch.Size([5, 256]).
	size mismatch for model.decoder.class_embed.3.bias: copying a param with shape torch.Size([80]) from checkpoint, the shape in current model is torch.Size([5]).

@qubvel
Copy link
Copy Markdown
Contributor

qubvel commented Feb 27, 2025

@VladOS95-cyber hmm, yes, there ware some PRs regarding from_pretrained refactoring.. it might be broken, let me check

@qubvel
Copy link
Copy Markdown
Contributor

qubvel commented Feb 28, 2025

PR with fix merged 👍 thanks for reporting

@VladOS95-cyber VladOS95-cyber force-pushed the add-dfine-model branch 2 times, most recently from 5b2be76 to 2817dac Compare March 4, 2025 19:05
@VladOS95-cyber
Copy link
Copy Markdown
Contributor Author

VladOS95-cyber commented Mar 5, 2025

Hi @qubvel! Latest state: I added tests for resnet. I don't know how many times I checked modeling code but I could not find any problems that might be a root cause of this issue with fine-tuning. I made small fix but it did not help and it seems weird as loss calculation is always failed on different steps and sometimes is not failed at all. Returned scores and boxes are the same as in original implementation. There are some differenses in states, yes, but there is logical explanation for that and it should not be a problem. Unfortunately, I am a little bit stacked right now...

@qubvel
Copy link
Copy Markdown
Contributor

qubvel commented Mar 5, 2025

Hey @VladOS95-cyber, thanks for checking. I suppose it's not a blocker, and maybe we can figure it out later. At least inference works for all models, and fine-tuning for bigger models also works fine. I will take a deep dive next week and try to resolve the remaining issues to get this merged.

@VladOS95-cyber
Copy link
Copy Markdown
Contributor Author

@qubvel ok, thank you. Basically, all other things are fine. Just there are still couple unresolved problems of modular. It can be easily resolved, I think, we just need to make a decision about that.

@VladOS95-cyber
Copy link
Copy Markdown
Contributor Author

Hey @qubvel! I am just curious, I found an issue #36516. Could it be the case that an existing issue with fine-tuning for D-FINE is related to this one?

@qubvel
Copy link
Copy Markdown
Contributor

qubvel commented Mar 11, 2025

Well, it might be an issue, however, other models fine-tuning is OK with this dataset. This model might be a bit more sensitive to noisy labels, let me try to run with a clean one

@qubvel
Copy link
Copy Markdown
Contributor

qubvel commented Apr 8, 2025

run-slow: d_fine, hgnet_v2

@qubvel qubvel requested a review from ArthurZucker April 8, 2025 11:05
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 8, 2025

This comment contains run-slow, running the specified jobs: This comment contains run-slow, running the specified jobs:

models: ['models/d_fine', 'models/hgnet_v2']
quantizations: [] ...

@qubvel
Copy link
Copy Markdown
Contributor

qubvel commented Apr 8, 2025

cc @ArthurZucker for final review

The model is based on RT-DETR. The loss implementation is different, so we added some additional attributes to the RT-DETR output to propagate it through the model and avoid overriding all corresponding forward methods. It's a bit against our philosophy, but I believe it is worth it in order to make the code more standard and modular

@huggingface huggingface deleted a comment from github-actions Bot Apr 8, 2025
@qubvel
Copy link
Copy Markdown
Contributor

qubvel commented Apr 8, 2025

run-slow: d_fine, hgnet_v2

Copy link
Copy Markdown
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy to merge if you like it @qubvel ! 🤗
LGTM great review and great work!

Comment thread src/transformers/models/d_fine/modular_d_fine.py Outdated
@qubvel qubvel self-requested a review April 29, 2025 08:57
@HuggingFaceDocBuilderDev
Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@qubvel qubvel merged commit 4abeb50 into huggingface:main Apr 29, 2025
20 checks passed
@qubvel
Copy link
Copy Markdown
Contributor

qubvel commented Apr 29, 2025

Merging! @VladOS95-cyber thanks a lot for contributing and iterating on the PR 🤗 Really huge work done!

@VladOS95-cyber
Copy link
Copy Markdown
Contributor Author

Merging! @VladOS95-cyber thanks a lot for contributing and iterating on the PR 🤗 Really huge work done!

Thank you very much for your support and review!

@NielsRogge NielsRogge mentioned this pull request May 11, 2025
2 tasks
zucchini-nlp pushed a commit to zucchini-nlp/transformers that referenced this pull request May 14, 2025
* copy the last changes from broken PR

* small format

* some fixes and refactoring after review

* format

* add config attr for loss

* some fixes and refactoring

* fix copies

* fix style

* add test for d-fine resnet

* fix decoder layer prop

* fix dummies

* format init

* remove extra print

* refactor modeling, move resnet into separate folder

* fix resnet config

* change resnet on hgnet_v2, add clamp into decoder

* fix init

* fix config doc

* fix init

* fix dummies

* fix config docs

* fix hgnet_v2 config typo

* format modular

* add image classification for hgnet, some refactoring

* format tests

* fix dummies

* fix init

* fix style

* fix init for hgnet v2

* fix index.md, add init rnage for hgnet

* fix conversion

* add missing attr to encoder

* add loss for d-fine, add additional output for rt-detr decoder

* tests and docs fixes

* fix rt_detr v2 conversion

* some fixes for loos and decoder output

* some fixes for loss

* small fix for converted modeling

* add n model config, some todo comments for modular

* convert script adjustments and fixes, small refact

* remove extra output for rt_detr

* make some outputs optionsl, fix conversion

* some posr merge fixes

* small fix

* last field fix

* fix not split for hgnet_v2

* disable parallelism test for hgnet_v2 image classification

* skip multi gpu for d-fine

* adjust after merge init

* remove extra comment

* fix repo name references

* small fixes for tests

* Fix checkpoint path

* Fix consistency

* Fixing docs

---------

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
@Matvezy
Copy link
Copy Markdown

Matvezy commented Jun 20, 2025

Hello @VladOS95-cyber @qubvel thanks a lot for adding D-FINE to transformers! I'm interested in fine-tuning and benchmarking your implementation on a custom dataset. I found a fine-tuning notebook example here, however I noticed that it doesn't fully follow the original D-FINE fine-tuning config. I specifically mean not using the same augmentations, optimizer parameters, and not using use EMA like in the original D-FINE configs here.
I wanted to ask if you previously tried fine-tuning with the original D-FINE recipe? I suspect that not having it in place may result in a lower benchmark score, so I wanted to ensure I try fine-tuning with the original recipe as well. Right now I'm just adding those changes myself, but I wanted to see if you may already have anything you tired that I can use?

@qubvel
Copy link
Copy Markdown
Contributor

qubvel commented Jun 23, 2025

Hi @Matvezy! We did not adapt the full fine-tuning recipe for D-FINE. The notebook you mentioned is just a starting point that can be significantly improved with many techniques. We would appreciate any contribution, e.g. if you have some results on your fine-tuning or upgrade the notebook 🤗

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants