Skip to content

[ASR] Add official ASR CTC example to examples/pytorch/speech-recognition#13620

Merged
patrickvonplaten merged 33 commits into
huggingface:masterfrom
patrickvonplaten:add_asr_example
Sep 24, 2021
Merged

[ASR] Add official ASR CTC example to examples/pytorch/speech-recognition#13620
patrickvonplaten merged 33 commits into
huggingface:masterfrom
patrickvonplaten:add_asr_example

Conversation

@patrickvonplaten

@patrickvonplaten patrickvonplaten commented Sep 17, 2021

Copy link
Copy Markdown
Contributor

This PR adds a generic speech recognition for CTC example. It has been tested for single GPU and distributed training on Common Voice and is being tested on Librispeech currently.

Once datasets has https://github.com/huggingface/datasets/pull/2324/files merged and made a new release I will slightly adapt the script to leverage the new audio feature.

A couple of example runs with this script:

This example folder should have two additional scripts: 1 for Seq2Seq ASR + 1 for CTC + LM decoding which are left for future work

Comment thread examples/pytorch/speech-recognition/README.md Outdated
Comment thread src/transformers/models/hubert/configuration_hubert.py
Comment thread src/transformers/models/wav2vec2/configuration_wav2vec2.py
Comment thread src/transformers/models/hubert/configuration_hubert.py
Comment thread src/transformers/models/wav2vec2/configuration_wav2vec2.py
# 3. Next, we create the vocabulary of the model by extracting all unique characters from
# the training and evaluation datasets
# We need to make sure that only first rank saves vocabulary
if training_args.world_size == 1 or dist.get_rank() == 0:

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this caused me an headache for 3 days -> in distributed training each process was creating a different ordering of characters in the vocabulary which essentially meant that each process has different label ids.

By using sorted(...) and making sure that only the first process creates & saves the vocabulary, the problem is solved.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice find!

Comment thread examples/pytorch/speech-recognition/run_speech_recognition_ctc.py Outdated
@patrickvonplaten patrickvonplaten changed the title [WIP][ASR] Add official ASR CTC example to examples/pytorch/speech-recognition [ASR] Add official ASR CTC example to examples/pytorch/speech-recognition Sep 22, 2021
Comment thread examples/pytorch/speech-recognition/README.md Outdated

@sgugger sgugger left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for adding this example and great job figuring out the problem in a distributed setup!

Comment thread examples/pytorch/speech-recognition/README.md Outdated
Comment thread examples/pytorch/speech-recognition/run_speech_recognition_ctc.py Outdated

@patil-suraj patil-suraj left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks really good! Thanks for adding this example

Comment thread examples/pytorch/speech-recognition/requirements.txt Outdated
Comment thread examples/pytorch/speech-recognition/run_speech_recognition_ctc.py Outdated
Comment thread examples/pytorch/speech-recognition/run_speech_recognition_ctc.py Outdated
Comment thread examples/pytorch/speech-recognition/run_speech_recognition_ctc.py
Comment thread examples/pytorch/speech-recognition/run_speech_recognition_ctc.py Outdated
Comment thread examples/pytorch/speech-recognition/run_speech_recognition_ctc.py Outdated
Comment thread examples/pytorch/speech-recognition/run_speech_recognition_ctc.py Outdated
Comment thread examples/pytorch/speech-recognition/run_speech_recognition_ctc.py Outdated
Comment thread examples/pytorch/speech-recognition/run_speech_recognition_ctc.py Outdated
Comment thread examples/pytorch/speech-recognition/run_speech_recognition_ctc.py Outdated
patrickvonplaten and others added 3 commits September 23, 2021 11:19
Co-authored-by: Suraj Patil <surajp815@gmail.com>

@anton-l anton-l left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thank you very much for figuring out the DDP problem!

The torchaudio loader seems to be the best fit for the example 🙂
Although I think Windows users will be out of luck when they try to load mp3's (soundfile is used as a backend there, and it specifically excludes mp3: http://www.mega-nerd.com/libsndfile/#Features)

P.S. So sorry for the typo spam 😅

Comment thread examples/pytorch/speech-recognition/run_speech_recognition_ctc.py Outdated
Comment thread examples/pytorch/speech-recognition/README.md
Comment thread examples/pytorch/speech-recognition/run_speech_recognition_ctc.py Outdated
Comment thread src/transformers/models/hubert/configuration_hubert.py Outdated
Comment thread src/transformers/models/hubert/configuration_hubert.py Outdated
Comment thread examples/pytorch/speech-recognition/run_speech_recognition_ctc.py Outdated
Comment thread examples/pytorch/speech-recognition/run_speech_recognition_ctc.py Outdated
Comment thread examples/pytorch/speech-recognition/run_speech_recognition_ctc.py Outdated
Comment thread examples/pytorch/speech-recognition/run_speech_recognition_ctc.py Outdated
Comment thread examples/pytorch/speech-recognition/run_speech_recognition_ctc.py Outdated
patrickvonplaten and others added 4 commits September 23, 2021 16:51
Co-authored-by: Anton Lozhkov <aglozhkov@gmail.com>
Co-authored-by: Anton Lozhkov <aglozhkov@gmail.com>
Comment thread examples/pytorch/speech-recognition/run_speech_recognition_ctc.py Outdated
@patrickvonplaten patrickvonplaten merged commit 4a320f6 into huggingface:master Sep 24, 2021
@patrickvonplaten patrickvonplaten deleted the add_asr_example branch September 24, 2021 05:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants