[ASR] Add official ASR CTC example to examples/pytorch/speech-recognition#13620
Conversation
…into add_asr_example
…into add_asr_example
| # 3. Next, we create the vocabulary of the model by extracting all unique characters from | ||
| # the training and evaluation datasets | ||
| # We need to make sure that only first rank saves vocabulary | ||
| if training_args.world_size == 1 or dist.get_rank() == 0: |
There was a problem hiding this comment.
this caused me an headache for 3 days -> in distributed training each process was creating a different ordering of characters in the vocabulary which essentially meant that each process has different label ids.
By using sorted(...) and making sure that only the first process creates & saves the vocabulary, the problem is solved.
examples/pytorch/speech-recognitionexamples/pytorch/speech-recognition
sgugger
left a comment
There was a problem hiding this comment.
Thanks a lot for adding this example and great job figuring out the problem in a distributed setup!
patil-suraj
left a comment
There was a problem hiding this comment.
Looks really good! Thanks for adding this example
Co-authored-by: Suraj Patil <surajp815@gmail.com>
anton-l
left a comment
There was a problem hiding this comment.
Looks good, thank you very much for figuring out the DDP problem!
The torchaudio loader seems to be the best fit for the example 🙂
Although I think Windows users will be out of luck when they try to load mp3's (soundfile is used as a backend there, and it specifically excludes mp3: http://www.mega-nerd.com/libsndfile/#Features)
P.S. So sorry for the typo spam 😅
Co-authored-by: Anton Lozhkov <aglozhkov@gmail.com>
Co-authored-by: Anton Lozhkov <aglozhkov@gmail.com>
…/transformers into add_asr_example
This PR adds a generic speech recognition for CTC example. It has been tested for single GPU and distributed training on Common Voice and is being tested on Librispeech currently.
Once
datasetshas https://github.com/huggingface/datasets/pull/2324/files merged and made a new release I will slightly adapt the script to leverage the new audio feature.A couple of example runs with this script:
This example folder should have two additional scripts: 1 for Seq2Seq ASR + 1 for CTC + LM decoding which are left for future work