Over the last few weeks I’ve been exploring language and words and how to deal with them algorithmically. Lately I’ve been thinking about ways to visualize various aspects of language, and one of the first that came to mind was the idea of representing the sound of words with color.
I am using the Carnegie Mellon Pronouncing Dictionary to encode each word. The CMUPD provides a list of 39 phonemes, the unique sounds that comprise spoken English. These are as follows:
Initially I gave each phoneme a different hue (from 1-360 degrees) by spreading each phoneme out on the color wheel equidistantly. Using this mapping, the phrase “Who knew sniffing glue could give you the flu?” translates to the following image: . You can clearly see the phoneme UW in magenta appearing at the end of each word. Unfortunately, there is a lot of visual noise introduced by displaying all the other phonemes.
Realizing not all phonemes are going to be equally represented in the corpus, I decided to find the distribution of each. The sound AH (the “uhh” in the word “hut”) is the most commonly occurring, accounting for nearly 10% of phonemes (this may be different if the most common 10% of words are analyzed, haven’t looked). This also happens to answer the question I asked my linguistics studies cousin at Christmas dinner some years back after doing imitations of other languages: “So, what does a non-English speaker hear when an American speaks?” It sounds like the answer may be “uhh… duh.. buh… fuh.”
The next step is to take the frequency distribution into account… Before diving any farther into palette selection, I created a simple test application which you may play with (though be warned, it may be broken and may be an older version).
In the following version the previous phrase was repeated with a palette restricted to vowels only and colors assigned according to probability. Phonemes with a higher rate of occurrence received hues that were more distinct from others.
This change appears to have improved the legibility somewhat. More experimentation will be necessary. I suspect this linear format will be a failed experiment. It seems there is little improvement in recognition of rhyme, as the viewer must use working memory to hold what amounts to merely a new representation of sound rather than use pre-attentive factors to quickly match sounds together.