[ASR] Add official ASR CTC example to `examples/pytorch/speech-recognition` by patrickvonplaten · Pull Request #13620 · huggingface/transformers

patrickvonplaten · 2021-09-17T09:33:05Z

This PR adds a generic speech recognition for CTC example. It has been tested for single GPU and distributed training on Common Voice and is being tested on Librispeech currently.

Once datasets has https://github.com/huggingface/datasets/pull/2324/files merged and made a new release I will slightly adapt the script to leverage the new audio feature.

A couple of example runs with this script:

This example folder should have two additional scripts: 1 for Seq2Seq ASR + 1 for CTC + LM decoding which are left for future work

…into add_asr_example

patrickvonplaten · 2021-09-22T17:20:06Z

+    # 3. Next, we create the vocabulary of the model by extracting all unique characters from
+    # the training and evaluation datasets
+    # We need to make sure that only first rank saves vocabulary
+    if training_args.world_size == 1 or dist.get_rank() == 0:


this caused me an headache for 3 days -> in distributed training each process was creating a different ordering of characters in the vocabulary which essentially meant that each process has different label ids.

By using sorted(...) and making sure that only the first process creates & saves the vocabulary, the problem is solved.

…/transformers into add_asr_example

sgugger

Thanks a lot for adding this example and great job figuring out the problem in a distributed setup!

patil-suraj

Looks really good! Thanks for adding this example

Co-authored-by: Suraj Patil <surajp815@gmail.com>

anton-l

Looks good, thank you very much for figuring out the DDP problem!

The torchaudio loader seems to be the best fit for the example 🙂
Although I think Windows users will be out of luck when they try to load mp3's (soundfile is used as a backend there, and it specifically excludes mp3: http://www.mega-nerd.com/libsndfile/#Features)

P.S. So sorry for the typo spam 😅

Co-authored-by: Anton Lozhkov <aglozhkov@gmail.com>

…/transformers into add_asr_example

patrickvonplaten added 4 commits September 16, 2021 17:57

up

9d169c4

rename

873ae06

add asr example

ef9b969

add auto feature extractor

adc9a66

This was referenced Sep 17, 2021

[Trainer] Add nan/inf logging filter #13619

Merged

AutoTokenizer - add from_model_name method #13623

Closed

patrickvonplaten added 5 commits September 18, 2021 00:18

some more fixes

769aa5b

correct layerdrop

c7e4845

correct for multi-gpu dist

5306cdd

Merge branch 'master' of https://github.com/huggingface/transformers …

f85401b

…into add_asr_example

clean up

97936d3

LysandreJik mentioned this pull request Sep 20, 2021

some error when I finetune wav2vec2 by rum_common_voice.py #13651

Closed

patrickvonplaten added 9 commits September 21, 2021 22:54

refactor

712fbc7

Merge branch 'master' of https://github.com/huggingface/transformers …

1fa221c

…into add_asr_example

refactor

2c40b50

more fixes

a8b51f3

more fixes

3c217c2

Merge branch 'master' of https://github.com/huggingface/transformers …

24d4e27

…into add_asr_example

clean-up

0e39d2f

finish

30f2611

up

0c93c7a