Resolve errors when trying to use distilbert in an encoder/decoder model by FrancisBehnen · Pull Request #1 · KMFODA/transformers

FrancisBehnen · 2022-04-05T21:16:39Z

Hey, I'm trying to use your code to use distilbert as encoder/decoder, but ran into some errors. These changes fix the errors, however it seems like it's now not properly identifying the masks. I was hoping you could offer some insights, as it seems like a task you're trying to enable with your implementation.

I have this simple bert2bert pipeline to do summarization: https://colab.research.google.com/drive/1uFttq_3cNzOlOYSQLEGc9Vcd_t_15l3g?usp=sharing, that I was hoping to transform into a 'distilbert2distilbert' pipeline: https://colab.research.google.com/drive/1UpvyFv_X86xL28mxK8Nt9TN3oymeRYEc#scrollTo=9a88f91a&uniqifier=1.

However, as you can see under the training cell in the second colab, this throws the following error: TypeError: forward() got an unexpected keyword argument 'use_cache'. The first three edits in this PR about the use_cache solve this error, and the other edit of this PR solves the next exception that you'll run into. Unfortunately now the validation loss is NaN and nothing particular useful is generated...

Here is a Colab where I switched out your repo for mine: https://colab.research.google.com/drive/1fxf8k7nJZ6jwEb56HD6NMw81z3uZFvfd?usp=sharing

Do you have any idea what might be going on? I hope I'm not doing anything stupid and this is actually a useful insight for you!

@ydshieh

* chore: initial commit Copied the torch implementation of regnets and porting the code to tf step by step. Also introduced an output layer which was needed for regnets. * chore: porting the rest of the modules to tensorflow did not change the documentation yet, yet to try the playground on the model * Fix initilizations (#1) * fix: code structure in few cases. * fix: code structure to align tf models. * fix: layer naming, bn layer still remains. * chore: change default epsilon and momentum in bn. * chore: styling nits. * fix: cross-loading bn params. * fix: regnet tf model, integration passing. * add: tests for TF regnet. * fix: code quality related issues. * chore: added rest of the files. * minor additions.. * fix: repo consistency. * fix: regnet tf tests. * chore: reorganize dummy_tf_objects for regnet. * chore: remove checkpoint var. * chore: remov unnecessary files. * chore: run make style. * Update docs/source/en/model_doc/regnet.mdx Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * chore: PR feedback I. * fix: pt test. thanks to @ydshieh. * New adaptive pooler (huggingface#3) * feat: new adaptive pooler Co-authored-by: @Rocketknight1 * chore: remove image_size argument. Co-authored-by: matt <rocketknight1@gmail.com> Co-authored-by: matt <rocketknight1@gmail.com> * Empty-Commit * chore: remove image_size comment. * chore: remove playground_tf.py * chore: minor changes related to spacing. * chore: make style. * Update src/transformers/models/regnet/modeling_tf_regnet.py Co-authored-by: amyeroberts <aeroberts4444@gmail.com> * Update src/transformers/models/regnet/modeling_tf_regnet.py Co-authored-by: amyeroberts <aeroberts4444@gmail.com> * chore: refactored __init__. * chore: copied from -> taken from./g * adaptive pool -> global avg pool, channel check. * chore: move channel check to stem. * pr comments - minor refactor and add regnets to doc tests. * Update src/transformers/models/regnet/modeling_tf_regnet.py Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> * minor fix in the xlayer. * Empty-Commit * chore: removed from_pt=True. Co-authored-by: Sayak Paul <spsayakpaul@gmail.com> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: matt <rocketknight1@gmail.com> Co-authored-by: amyeroberts <aeroberts4444@gmail.com> Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

Resolve errors when trying to use distilbert in an encoder/decoder model

045e96f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Resolve errors when trying to use distilbert in an encoder/decoder model#1

Resolve errors when trying to use distilbert in an encoder/decoder model#1
FrancisBehnen wants to merge 1 commit into
KMFODA:add-DistilBertForCausalLMfrom
FrancisBehnen:add-DistilBertForCausalLM

FrancisBehnen commented Apr 5, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

FrancisBehnen commented Apr 5, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant