Skip to content

Resolve errors when trying to use distilbert in an encoder/decoder model#1

Open
FrancisBehnen wants to merge 1 commit into
KMFODA:add-DistilBertForCausalLMfrom
FrancisBehnen:add-DistilBertForCausalLM
Open

Resolve errors when trying to use distilbert in an encoder/decoder model#1
FrancisBehnen wants to merge 1 commit into
KMFODA:add-DistilBertForCausalLMfrom
FrancisBehnen:add-DistilBertForCausalLM

Conversation

@FrancisBehnen

Copy link
Copy Markdown

Hey, I'm trying to use your code to use distilbert as encoder/decoder, but ran into some errors. These changes fix the errors, however it seems like it's now not properly identifying the masks. I was hoping you could offer some insights, as it seems like a task you're trying to enable with your implementation.

I have this simple bert2bert pipeline to do summarization: https://colab.research.google.com/drive/1uFttq_3cNzOlOYSQLEGc9Vcd_t_15l3g?usp=sharing, that I was hoping to transform into a 'distilbert2distilbert' pipeline: https://colab.research.google.com/drive/1UpvyFv_X86xL28mxK8Nt9TN3oymeRYEc#scrollTo=9a88f91a&uniqifier=1.

However, as you can see under the training cell in the second colab, this throws the following error: TypeError: forward() got an unexpected keyword argument 'use_cache'. The first three edits in this PR about the use_cache solve this error, and the other edit of this PR solves the next exception that you'll run into. Unfortunately now the validation loss is NaN and nothing particular useful is generated...
WhatsApp Image 2022-04-05 at 8 04 29 PM
WhatsApp Image 2022-04-05 at 9 20 24 PM

Here is a Colab where I switched out your repo for mine: https://colab.research.google.com/drive/1fxf8k7nJZ6jwEb56HD6NMw81z3uZFvfd?usp=sharing

Do you have any idea what might be going on? I hope I'm not doing anything stupid and this is actually a useful insight for you!

KMFODA pushed a commit that referenced this pull request Jul 4, 2022
* chore: initial commit

Copied the torch implementation of regnets and porting the code to tf step by step. Also introduced an output layer which was needed for regnets.

* chore: porting the rest of the modules to tensorflow

did not change the documentation yet, yet to try the playground on the model

* Fix initilizations (#1)

* fix: code structure in few cases.

* fix: code structure to align tf models.

* fix: layer naming, bn layer still remains.

* chore: change default epsilon and momentum in bn.

* chore: styling nits.

* fix: cross-loading bn params.

* fix: regnet tf model, integration passing.

* add: tests for TF regnet.

* fix: code quality related issues.

* chore: added rest of the files.

* minor additions..

* fix: repo consistency.

* fix: regnet tf tests.

* chore: reorganize dummy_tf_objects for regnet.

* chore: remove checkpoint var.

* chore: remov unnecessary files.

* chore: run make style.

* Update docs/source/en/model_doc/regnet.mdx

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* chore: PR feedback I.

* fix: pt test. thanks to @ydshieh.

* New adaptive pooler (huggingface#3)

* feat: new adaptive pooler

Co-authored-by: @Rocketknight1

* chore: remove image_size argument.

Co-authored-by: matt <rocketknight1@gmail.com>

Co-authored-by: matt <rocketknight1@gmail.com>

* Empty-Commit

* chore: remove image_size comment.

* chore: remove playground_tf.py

* chore: minor changes related to spacing.

* chore: make style.

* Update src/transformers/models/regnet/modeling_tf_regnet.py

Co-authored-by: amyeroberts <aeroberts4444@gmail.com>

* Update src/transformers/models/regnet/modeling_tf_regnet.py

Co-authored-by: amyeroberts <aeroberts4444@gmail.com>

* chore: refactored __init__.

* chore: copied from -> taken from./g

* adaptive pool -> global avg pool, channel check.

* chore: move channel check to stem.

* pr comments - minor refactor and add regnets to doc tests.

* Update src/transformers/models/regnet/modeling_tf_regnet.py

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* minor fix in the xlayer.

* Empty-Commit

* chore: removed from_pt=True.

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: matt <rocketknight1@gmail.com>
Co-authored-by: amyeroberts <aeroberts4444@gmail.com>
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant