Natural Language Processing

I have become increasingly interested in tools to perform natural language processing. So much of how we approach and see problems is tied up in the words we find to describe them. I am currently exploring ways to use language to help define and understand problems as well as get out of creative blocks.

Below are a few useful tools:

Princeton’s Wordnet is a massive linguistic database containing not just definitions, but how words are related to each other.

The Python Natural Language Toolkit, which provides tools for parsing and understanding natural language semantics.

The Carnegie Mellon Pronouncing Dictionary, which breaks words down into their phonemes, providing information regarding pronouncing, rhyming, and syllable counts.

Having found no simple resource to provide syllable counts for common words, I want to share a quick solution I wrote in Python, which uses the CMU dictionary. Simply download the dictionary to the same directory as this script, naming it cmu_pron.txt.

f = open("cmu_pron.txt")
f = open("cmu_pron.txt") lines = f.readlines() f.close() words = {} for line in lines: pieces = line.split() if pieces[0] == ";;;": continue words[pieces[0]] = pieces[1:] def num_syllables(word): global words key = word.upper() plist = words[key] return len(filter(lambda c: c in ("0","1","2"),"".join(plist))) print "alphabet: %s" % num_syllables("alphabet")

The result for alphabet is indeed 3. There are probably more efficient ways to do this, but it gets the job done.