<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Luke Loeffler &#187; linguistics</title>
	<atom:link href="http://lukeloeffler.com/tag/linguistics/feed/" rel="self" type="application/rss+xml" />
	<link>http://lukeloeffler.com</link>
	<description></description>
	<lastBuildDate>Thu, 29 Apr 2010 20:07:44 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.4</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>The Color of Rhyme</title>
		<link>http://lukeloeffler.com/2009/the-color-of-rhyme/</link>
		<comments>http://lukeloeffler.com/2009/the-color-of-rhyme/#comments</comments>
		<pubDate>Sat, 08 Aug 2009 02:42:57 +0000</pubDate>
		<dc:creator>luke</dc:creator>
				<category><![CDATA[blog]]></category>
		<category><![CDATA[language]]></category>
		<category><![CDATA[linguistics]]></category>
		<category><![CDATA[visualization]]></category>

		<guid isPermaLink="false">http://lukeloeffler.com/?p=344</guid>
		<description><![CDATA[Over the last few weeks I&#8217;ve been exploring language and words and how to deal with them algorithmically. Lately I&#8217;ve been thinking about ways to visualize various aspects of language, and one of the first that came to mind was the idea of representing the sound of words with color.
I am using the Carnegie Mellon [...]]]></description>
			<content:encoded><![CDATA[<p>Over the last few weeks I&#8217;ve been exploring language and words and how to deal with them algorithmically. Lately I&#8217;ve been thinking about ways to visualize various aspects of language, and one of the first that came to mind was the idea of representing the sound of words with color.</p>
<p>I am using the Carnegie Mellon Pronouncing Dictionary to encode each word. The CMUPD provides a list of 39 phonemes, the unique sounds that comprise spoken English. These are as follows:</p>
<div style="height: 150px; width: 300px; overflow-x: hidden; overflow-y: scroll;">
<pre>Phoneme Example Translation
AA	odd     AA D
AE	at	AE T
AH	hut	HH AH T
AO	ought	AO T
AW	cow	K AW
AY	hide	HH AY D
B 	be	B IY
CH	cheese	CH IY Z
D 	dee	D IY
DH	thee	DH IY
EH	Ed	EH D
ER	hurt	HH ER T
EY	ate	EY T
F 	fee	F IY
G 	green	G R IY N
HH	he	HH IY
IH	it	IH T
IY	eat	IY T
JH	gee	JH IY
K 	key	K IY
L 	lee	L IY
M 	me	M IY
N 	knee	N IY
NG	ping	P IH NG
OW	oat	OW T
OY	toy	T OY
P 	pee	P IY
R 	read	R IY D
S 	sea	S IY
SH	she	SH IY
T 	tea	T IY
TH	theta	TH EY T AH
UH	hood	HH UH D
UW	two	T UW
V 	vee	V IY
W 	we	W IY
Y 	yield	Y IY L D
Z 	zee	Z IY
ZH	seizure	S IY ZH ER</pre>
</div>
<p>Initially I gave each phoneme a different hue (from 1-360 degrees) by spreading each phoneme out on the color wheel equidistantly. Using this mapping, the phrase &#8220;Who knew sniffing glue could give you the flu?&#8221; translates to the following image: <a href="http://lukeloeffler.com/wordpress/wp-content/uploads/2009/08/phrase1.png"><img class="alignnone size-full wp-image-345" title="phrase1" src="http://lukeloeffler.com/wordpress/wp-content/uploads/2009/08/phrase1.png" alt="phrase1" width="215" height="23" /></a>.  You can clearly see the phoneme UW in magenta appearing at the end of each word.  Unfortunately, there is a lot of visual noise introduced by displaying all the other phonemes.</p>
<p>Realizing not all phonemes are going to be equally represented in the corpus, I decided to find the distribution of each. The sound AH (the &#8220;uhh&#8221; in the word &#8220;hut&#8221;) is the  most commonly occurring, accounting for nearly 10% of phonemes (this may be different if the most common 10% of words are analyzed, haven&#8217;t looked). This also happens to answer the question I asked my linguistics studies cousin at Christmas dinner some years back after doing imitations of other languages: &#8220;So, what does a non-English speaker hear when an American speaks?&#8221; It sounds like the answer may be &#8220;uhh&#8230; duh.. buh&#8230; fuh.&#8221;</p>
<div style="height: 150px; width: 300px; overflow-x: hidden; overflow-y: scroll;">
<pre>Phon.   Count   Prob.
AH	70564	0.0938934151927
N	53577	0.0712902826622
L	44148	0.0587439274124
S	43349	0.0576807671786
T	41698	0.0554839241923
R	40794	0.0542810495348
K	38174	0.0507948420096
IH	33779	0.0449467954168
IY	30957	0.0411918039527
D	28491	0.0379105109157
M	26330	0.0350350550142
ER	25871	0.0344243033905
EH	24564	0.0326851914686
Z	23955	0.0318748478111
AA	22175	0.0295063556757
AE	19151	0.0254825802726
B	18943	0.0252058126523
P	17305	0.0230262676423
OW	17147	0.0228160306999
G	12248	0.0162973548733
F	12147	0.0161629629038
EY	11851	0.0157691012903
AO	10059	0.0133846417922
AY	9838	0.0130905761956
V	9349	0.0124399061651
NG	8692	0.0115656930567
UW	8579	0.0114153337245
HH	8439	0.0112290478262
W	7737	0.0102949571077
SH	7730	0.0102856428128
JH	5461	0.00726648064689
Y	4392	0.00584405475209
CH	4378	0.00582542616226
AW	2932	0.00390135895563
TH	2597	0.00345560341329
UH	2021	0.00268917000318
OY	1124	0.00149560964056
DH	504	0.000670629233846
ZH	482	0.000641355735543</pre>
</div>
<p>The next step is to take the frequency distribution into account&#8230; Before diving any farther into palette selection, I created a simple <a href="http://lukeloeffler.com/labs/WordDna.html">test application</a> which you may play with (though be warned, it may be broken and may be an older version).</p>
<p>In this screen shot, syllable stress was taken into account as I experiment with the pre-attentive characteristic of height to aid visualization.<br />
<a href="http://lukeloeffler.com/wordpress/wp-content/uploads/2009/08/Safari.png"><img src="http://lukeloeffler.com/wordpress/wp-content/uploads/2009/08/Safari.png" alt="test with stress" title="test with stress" width="423" height="149" class="alignnone size-full wp-image-350" /></a></p>
<p>In the following version the previous phrase was repeated with a palette restricted to vowels only and colors assigned according to probability. Phonemes with a higher rate of occurrence received hues that were more distinct from others.<br />
<a href="http://lukeloeffler.com/wordpress/wp-content/uploads/2009/08/Safari1.png"><img src="http://lukeloeffler.com/wordpress/wp-content/uploads/2009/08/Safari1.png" alt="palette restriction" title="palette restriction" width="413" height="40" class="alignnone size-full wp-image-353" /></a></p>
<p>This change appears to have improved the legibility somewhat. More experimentation will be necessary.  I suspect this linear format will be a failed experiment. It seems there is little improvement in recognition of rhyme, as the viewer must use working memory to hold what amounts to merely a new representation of sound rather than use pre-attentive factors to quickly match sounds together.</p>
]]></content:encoded>
			<wfw:commentRss>http://lukeloeffler.com/2009/the-color-of-rhyme/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
