Add D-FINE Model into Transformers#36261
Conversation
|
Hi @qubvel! I decided to create a new PR as previous one (#35400) was broken, sorry for the mess. In this PR all the latest changes. |
fa6144a to
4bf1909
Compare
|
Hey very sorry, I had very low bandwith last week as I was presenting at a conference! Are you still experiencing issue with how modular works, or is it resolved? |
Hey @Cyrilvallez! Thank you for your answer, yes, there are still some issues: 1. Wrong namespace import for modeling_d_fine_resnet.py where conversion script provides from .configuration_d_fine_resnet_resnet (too much resnet). 2. DFine config conversion using modular does not work properly because it provides a call super().init(is_encoder_decoder=is_encoder_decoder, **kwargs) too early in the very beginning of init and before all self.args assignments, which is wrong as well. It must be placed after all self.args assignments. |
|
Ok @VladOS95-cyber, I looked into it, thanks for the feedback 🤗
|
|
Oh, or is 2. an issue because of the |
Hi @Cyrilvallez! Well, I suppose it is an issue in case of config and attribute map usage, yes. Because if we call super().init(kwargs**) in the beginning and do args.self assignment later, we override some attributes values that were passed before. In RTDetrV2 this problem was resolved by implementing config from the scratch, but in our case, we reuse RTDetrConfig, so this problem came up unfortunately. |
|
@Cyrilvallez So, what do you suggest in this case? |
There was a problem hiding this comment.
Thanks, @VladOS95-cyber for addressing the comments, refactoring, and adding type hints! I left a few minor comments, but it's in good shape. As the next step, let's also ensure:
- Modular issues are resolved (cc @Cyrilvallez)
torch.compile(fullgraph) andtorch.exportare supported (see comments)- All checkpoints can be converted (please provide links for converted checkpoints on the Hub) and logits/boxes match the original implementation
- Model can be fine-tuned (here is an example for RT-DETRv2, it would work with the change of checkpoint path only)
- CI is happy: consistency, docs, style issues are resolved
Thanks for the great work!
a78006a to
90a3536
Compare
|
Hi @qubvel! There is a short summary of what we've achieved so far by this moment:
|
There was a problem hiding this comment.
Hi @VladOS95-cyber, thanks for iterating and addressing comments! I left a few more comments, but in general looks great!
I am having difficulty running fine-tuning for the "m" checkpoint, while the "l" checkpoint is fine. Can you please double-check the modeling code, losses and initialization strategy? Here is the error I got:
[/usr/local/lib/python3.11/dist-packages/transformers/loss/loss_for_object_detection.py](https://localhost:8080/#) in generalized_box_iou(boxes1, boxes2)
416 # so do an early check
417 if not (boxes1[:, 2:] >= boxes1[:, :2]).all():
--> 418 raise ValueError(f"boxes1 must be in [x0, y0, x1, y1] (corner) format, but got {boxes1}")
419 if not (boxes2[:, 2:] >= boxes2[:, :2]).all():
420 raise ValueError(f"boxes2 must be in [x0, y0, x1, y1] (corner) format, but got {boxes2}")
ValueError: boxes1 must be in [x0, y0, x1, y1] (corner) format, but got tensor([[4.0848e-02, 6.0155e-02, 9.7476e-01, 9.4098e-01],
[4.3698e-02, 1.0271e-01, 9.6435e-01, 8.6763e-01],
[4.0578e-06, 6.4026e-06, 1.4135e-05, 2.2186e-05],
...,
[9.9940e-01, 3.7292e-02, 1.0063e+00, 4.3024e-02],
[9.8414e-01, 4.3545e-02, 1.0219e+00, 4.8508e-02],
[7.8416e-01, 6.7056e-01, 8.3837e-01, 7.0462e-01]], device='cuda:0')
P.S. torch.compile(model, fullgraph=True) works fine with my env, thanks!
45f9f76 to
95c15f0
Compare
|
@qubvel do you know if anything was changed for transformers recently? I cannot download model for fine-tuning anymore. |
|
@VladOS95-cyber hmm, yes, there ware some PRs regarding |
|
PR with fix merged 👍 thanks for reporting |
5b2be76 to
2817dac
Compare
|
Hi @qubvel! Latest state: I added tests for resnet. I don't know how many times I checked modeling code but I could not find any problems that might be a root cause of this issue with fine-tuning. I made small fix but it did not help and it seems weird as loss calculation is always failed on different steps and sometimes is not failed at all. Returned scores and boxes are the same as in original implementation. There are some differenses in states, yes, but there is logical explanation for that and it should not be a problem. Unfortunately, I am a little bit stacked right now... |
|
Hey @VladOS95-cyber, thanks for checking. I suppose it's not a blocker, and maybe we can figure it out later. At least inference works for all models, and fine-tuning for bigger models also works fine. I will take a deep dive next week and try to resolve the remaining issues to get this merged. |
|
@qubvel ok, thank you. Basically, all other things are fine. Just there are still couple unresolved problems of modular. It can be easily resolved, I think, we just need to make a decision about that. |
|
Well, it might be an issue, however, other models fine-tuning is OK with this dataset. This model might be a bit more sensitive to noisy labels, let me try to run with a clean one |
|
run-slow: d_fine, hgnet_v2 |
|
This comment contains run-slow, running the specified jobs: This comment contains run-slow, running the specified jobs: models: ['models/d_fine', 'models/hgnet_v2'] |
|
cc @ArthurZucker for final review The model is based on RT-DETR. The loss implementation is different, so we added some additional attributes to the RT-DETR output to propagate it through the model and avoid overriding all corresponding forward methods. It's a bit against our philosophy, but I believe it is worth it in order to make the code more standard and modular |
|
run-slow: d_fine, hgnet_v2 |
ArthurZucker
left a comment
There was a problem hiding this comment.
Happy to merge if you like it @qubvel ! 🤗
LGTM great review and great work!
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
|
Merging! @VladOS95-cyber thanks a lot for contributing and iterating on the PR 🤗 Really huge work done! |
Thank you very much for your support and review! |
* copy the last changes from broken PR * small format * some fixes and refactoring after review * format * add config attr for loss * some fixes and refactoring * fix copies * fix style * add test for d-fine resnet * fix decoder layer prop * fix dummies * format init * remove extra print * refactor modeling, move resnet into separate folder * fix resnet config * change resnet on hgnet_v2, add clamp into decoder * fix init * fix config doc * fix init * fix dummies * fix config docs * fix hgnet_v2 config typo * format modular * add image classification for hgnet, some refactoring * format tests * fix dummies * fix init * fix style * fix init for hgnet v2 * fix index.md, add init rnage for hgnet * fix conversion * add missing attr to encoder * add loss for d-fine, add additional output for rt-detr decoder * tests and docs fixes * fix rt_detr v2 conversion * some fixes for loos and decoder output * some fixes for loss * small fix for converted modeling * add n model config, some todo comments for modular * convert script adjustments and fixes, small refact * remove extra output for rt_detr * make some outputs optionsl, fix conversion * some posr merge fixes * small fix * last field fix * fix not split for hgnet_v2 * disable parallelism test for hgnet_v2 image classification * skip multi gpu for d-fine * adjust after merge init * remove extra comment * fix repo name references * small fixes for tests * Fix checkpoint path * Fix consistency * Fixing docs --------- Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
|
Hello @VladOS95-cyber @qubvel thanks a lot for adding D-FINE to transformers! I'm interested in fine-tuning and benchmarking your implementation on a custom dataset. I found a fine-tuning notebook example here, however I noticed that it doesn't fully follow the original D-FINE fine-tuning config. I specifically mean not using the same augmentations, optimizer parameters, and not using use EMA like in the original D-FINE configs here. |
|
Hi @Matvezy! We did not adapt the full fine-tuning recipe for D-FINE. The notebook you mentioned is just a starting point that can be significantly improved with many techniques. We would appreciate any contribution, e.g. if you have some results on your fine-tuning or upgrade the notebook 🤗 |
What does this PR do?
D-FINE, a powerful real-time object detector that achieves outstanding localization precision by redefining the bounding box regression task in DETR models. D-FINE comprises two key components: Fine-grained Distribution Refinement (FDR) and Global Optimal Localization Self-Distillation (GO-LSD).
As backbone, it is used HGNet-V2 (High Performance GPU Net) image classification model.
This PR add D-FINE and HGNet-V2 into the Transformers library. There is a new thing in transformers called modular, which adds new models by creating a modeling_modelname.py file. Since D-FINE updates several RT-DETR arch parts while keeping the rest of the model unchanged, it serves as an ideal use case for this modular approach.
Before submitting
Pull Request section?
to it if that's the case. Request to add D-FINE #35283
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
@qubvel @Rocketknight1 @ArthurZucker @NielsRogge