|
| 1 | +## Quick links |
| 2 | +- [Words by count in BNC with pronunciations](local_intermediate/correlated_ipa_no_spaces) |
| 3 | +- [Phonemes by frequency](local_target/q1_frequencies) |
| 4 | +- [Phonemes by frequency post-/w/](local_target/q2_post_w_frequencies) |
| 5 | + |
| 6 | +## Summary |
| 7 | +An estimate of the relative frequencies of English phonemes. |
| 8 | +Also, an estimate of the relative frequencies of English phonemes |
| 9 | +that follow /w/. |
| 10 | + |
| 11 | +## Methodology |
| 12 | +Reproducing the work of Doug Blumeyer[1], I correlated the CMU |
| 13 | +Pronouncing Dictionary ("CMUdict")[2] and Adam Kilgarriff's |
| 14 | +unlemmatized frequency list[3] for the British National Corpus to |
| 15 | +find phoneme frequencies generally. I extended this technique to |
| 16 | +estimate post-/w/ phoneme frequencies as well. |
| 17 | + |
| 18 | +## Limitations |
| 19 | +As Blumeyer notes, the source datasets have some limitations. |
| 20 | +CMUdict conflates "schwa with the near-open central vowel" and |
| 21 | +has "several noticeable errors." Kilgarriff's frequency list has |
| 22 | +some formatting issues that make it hard to work with words with |
| 23 | +accents and apostrophes, (at this time, I've completely ignored |
| 24 | +this issue) including common contractions. |
| 25 | + |
| 26 | +Blumeyer did manual error checking on several hundred of the |
| 27 | +most common words. I have not done this. |
| 28 | + |
| 29 | +The CMUdict has multiple pronunciations for some words. For |
| 30 | +these words, I used only the first pronunciation given. It's not |
| 31 | +clear to me if in these cases the multiple pronunciations are |
| 32 | +ordered in some way or just ordered arbitrarily. |
| 33 | + |
| 34 | +## Other notes |
| 35 | +While the Kilgarriff list is for the British National Corpus, a |
| 36 | +quick inspection suggests that it uses American pronunciations |
| 37 | +over British ones. |
| 38 | + |
| 39 | +## References |
| 40 | +- Doug Blumeyer, ["Relative Frequencies of English Phonemes"][blumeyer] |
| 41 | +- [CMU Pronouncing Dictionary][cmudict] (Local copy at version 0.7b. Retrieved May 28, 2018.) |
| 42 | +- Adam Kilgarriff, [word frequencies for the BNC][kilgarriff] (Local copy retrieved May 28, 2018.) |
0 commit comments