The CMU Pronouncing Dictionary

“The Carnegie Mellon University Pronouncing Dictionary is an open-source machine-readable pronunciation dictionary for North American English that contains over 134,000 words and their pronunciations.”

One could use this data for speech recognition and speech synthesis, as the page suggests. One could also, presumably, use it to automatically create a rhyming dictionary, which is not a use that the page discusses.

Someone asked about how to create a rhyming dictionary on Stack Overflow some years back; the answers on that page pretty much explain how to do it.

Someone else used the CMU dictionary to create a list of Shakespearean sonnet lines, and match rhyming lines with each other. It’s not a very good poetry generator, but I’m linking to it because it mentions the useful point that the CMU dictionary “doesn't provide a way to disambiguate different pronunciations according to the word’s part of speech or other context.” It also mentions that a word doesn’t rhyme with itself.

Anyway, I’m not going to embark on this project right now, but if I ever want to put together a digital rhyming dictionary, this seems like the easiest way to do it.

Join the Conversation