<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Measuring vocabulary richness with python</title>
	<atom:link href="http://swizec.com/blog/measuring-vocabulary-richness-with-python/swizec/2528/feed" rel="self" type="application/rss+xml" />
	<link>http://swizec.com/blog/measuring-vocabulary-richness-with-python/swizec/2528</link>
	<description>Drinker of tea</description>
	<lastBuildDate>Wed, 22 May 2013 13:03:00 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.4.2</generator>
	<item>
		<title>By: list(g)</title>
		<link>http://swizec.com/blog/measuring-vocabulary-richness-with-python/swizec/2528/comment-page-1#comment-3465</link>
		<dc:creator>list(g)</dc:creator>
		<pubDate>Mon, 10 Oct 2011 22:16:00 +0000</pubDate>
		<guid isPermaLink="false">http://swizec.com/blog/?p=2528#comment-3465</guid>
		<description>Thanks for this, but the M2 = sum(...) line is giving me a TypeError (&#039;list&#039; object is not callable) in Python 2.6.

list(g) seems to be causing the error, but I&#039;m not sure why. Each g is an itertools._grouper object, so I&#039;ll take a look at itertools, but if anyone has a suggestion please let me know.</description>
		<content:encoded><![CDATA[<p>Thanks for this, but the M2 = sum(&#8230;) line is giving me a TypeError (&#8216;list&#8217; object is not callable) in Python 2.6.</p>
<p>list(g) seems to be causing the error, but I&#8217;m not sure why. Each g is an itertools._grouper object, so I&#8217;ll take a look at itertools, but if anyone has a suggestion please let me know.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: A geek with a hat &#187; I want to analyze your blog</title>
		<link>http://swizec.com/blog/measuring-vocabulary-richness-with-python/swizec/2528/comment-page-1#comment-3416</link>
		<dc:creator>A geek with a hat &#187; I want to analyze your blog</dc:creator>
		<pubDate>Sat, 01 Oct 2011 00:00:48 +0000</pubDate>
		<guid isPermaLink="false">http://swizec.com/blog/?p=2528#comment-3416</guid>
		<description>[...] Yule&#8217;s I &#8211; this is a measure of vocabulary richness, where I&#8217;m assuming a broader active vocabulary is a positive thing [...]</description>
		<content:encoded><![CDATA[<p>[...] Yule&#8217;s I &#8211; this is a measure of vocabulary richness, where I&#8217;m assuming a broader active vocabulary is a positive thing [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Links 30/9/2011: FOSS Catchup, Many More Linux Tablets &#124; Techrights</title>
		<link>http://swizec.com/blog/measuring-vocabulary-richness-with-python/swizec/2528/comment-page-1#comment-3415</link>
		<dc:creator>Links 30/9/2011: FOSS Catchup, Many More Linux Tablets &#124; Techrights</dc:creator>
		<pubDate>Fri, 30 Sep 2011 08:33:39 +0000</pubDate>
		<guid isPermaLink="false">http://swizec.com/blog/?p=2528#comment-3415</guid>
		<description>[...] Measuring vocabulary richness with python [...]</description>
		<content:encoded><![CDATA[<p>[...] Measuring vocabulary richness with python [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Swizec</title>
		<link>http://swizec.com/blog/measuring-vocabulary-richness-with-python/swizec/2528/comment-page-1#comment-3414</link>
		<dc:creator>Swizec</dc:creator>
		<pubDate>Thu, 29 Sep 2011 23:12:00 +0000</pubDate>
		<guid isPermaLink="false">http://swizec.com/blog/?p=2528#comment-3414</guid>
		<description>Oh right, I always forget about .get :)

But I&#039;ll take collections when I&#039;m in 2.7. It&#039;s just prettier</description>
		<content:encoded><![CDATA[<p>Oh right, I always forget about .get <img src="http://swizec.com/blog/wp-includes/images/smilies/icon_smile.gif?1d5d3d" alt=':)' class='wp-smiley' /> </p>
<p>But I&#8217;ll take collections when I&#8217;m in 2.7. It&#8217;s just prettier</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Anonymous</title>
		<link>http://swizec.com/blog/measuring-vocabulary-richness-with-python/swizec/2528/comment-page-1#comment-3413</link>
		<dc:creator>Anonymous</dc:creator>
		<pubDate>Thu, 29 Sep 2011 16:04:00 +0000</pubDate>
		<guid isPermaLink="false">http://swizec.com/blog/?p=2528#comment-3413</guid>
		<description>Most of the code is pretty basic Haskell. The only advanced things I used are the Arrow class, and the uncurry function. uncurry takes a multi-argument function, something of the form (a -&gt; b -&gt; c -&gt; d) and converts it into a function that takes a tuple of arguments, so something of the form ((a, b, c) -&gt; d). The Arrow class is a bit harder to explain. First in my opinion, class was a poor choice of name in Haskell. Scala did a much better job at this and calls them traits which helps avoid confusing all the OO programmers and also I think is slightly more descriptive. A trait is something part way between an interface and an abstract class. Unlike in OO, traits may be added to types at any time simply by providing an implementation of them. The Arrow class (trait) simply represent a transformation of some kind, that is, you&#039;ve got an A type, and you want to convert it to a B type, then that conversion can be represented as an Arrow. The (&gt;&gt;&gt;) operator combines two arrows, and the (&amp;&amp;&amp;) operator combines and parallelizes (think of it a bit like a beam splitter). In this code I&#039;m taking advantage of the fact that there&#039;s an instance of Arrow for (-&gt;) which is the Haskell function type. In other words, any function in Haskell, is also automatically an instance of Arrow.

If you have a function from an b to a c called foo:

foo :: b -&gt; c

It&#039;s also an Arrow of the type:

Arrow a =&gt; a b c

That bit at the beginning is just a restriction for the benefit of the compiler telling it that a is some instance of Arrow. Technically the type of foo could also be written as:

foo :: (-&gt;) b c

which is just the non-infix version of the previous signature. Notice that since (-&gt;) is an instance of arrow it also matches the arrow constraint provided previously.</description>
		<content:encoded><![CDATA[<p>Most of the code is pretty basic Haskell. The only advanced things I used are the Arrow class, and the uncurry function. uncurry takes a multi-argument function, something of the form (a -&gt; b -&gt; c -&gt; d) and converts it into a function that takes a tuple of arguments, so something of the form ((a, b, c) -&gt; d). The Arrow class is a bit harder to explain. First in my opinion, class was a poor choice of name in Haskell. Scala did a much better job at this and calls them traits which helps avoid confusing all the OO programmers and also I think is slightly more descriptive. A trait is something part way between an interface and an abstract class. Unlike in OO, traits may be added to types at any time simply by providing an implementation of them. The Arrow class (trait) simply represent a transformation of some kind, that is, you&#8217;ve got an A type, and you want to convert it to a B type, then that conversion can be represented as an Arrow. The (&gt;&gt;&gt;) operator combines two arrows, and the (&amp;&amp;&amp;) operator combines and parallelizes (think of it a bit like a beam splitter). In this code I&#8217;m taking advantage of the fact that there&#8217;s an instance of Arrow for (-&gt;) which is the Haskell function type. In other words, any function in Haskell, is also automatically an instance of Arrow.</p>
<p>If you have a function from an b to a c called foo:</p>
<p>foo :: b -&gt; c</p>
<p>It&#8217;s also an Arrow of the type:</p>
<p>Arrow a =&gt; a b c</p>
<p>That bit at the beginning is just a restriction for the benefit of the compiler telling it that a is some instance of Arrow. Technically the type of foo could also be written as:</p>
<p>foo :: (-&gt;) b c</p>
<p>which is just the non-infix version of the previous signature. Notice that since (-&gt;) is an instance of arrow it also matches the arrow constraint provided previously.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Foo</title>
		<link>http://swizec.com/blog/measuring-vocabulary-richness-with-python/swizec/2528/comment-page-1#comment-3412</link>
		<dc:creator>Foo</dc:creator>
		<pubDate>Thu, 29 Sep 2011 16:00:00 +0000</pubDate>
		<guid isPermaLink="false">http://swizec.com/blog/?p=2528#comment-3412</guid>
		<description>err so much for my line breaks :(</description>
		<content:encoded><![CDATA[<p>err so much for my line breaks <img src="http://swizec.com/blog/wp-includes/images/smilies/icon_sad.gif?1d5d3d" alt=':(' class='wp-smiley' /> </p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Foo</title>
		<link>http://swizec.com/blog/measuring-vocabulary-richness-with-python/swizec/2528/comment-page-1#comment-3411</link>
		<dc:creator>Foo</dc:creator>
		<pubDate>Thu, 29 Sep 2011 15:59:00 +0000</pubDate>
		<guid isPermaLink="false">http://swizec.com/blog/?p=2528#comment-3411</guid>
		<description>instead of the try ... except surrounding the count increment, you can also do: d[w] = d.get(w, 0) + 1(not as pretty as Alice Atlas&#039;s collection counter of course :)</description>
		<content:encoded><![CDATA[<p>instead of the try &#8230; except surrounding the count increment, you can also do: d[w] = d.get(w, 0) + 1(not as pretty as Alice Atlas&#8217;s collection counter of course <img src="http://swizec.com/blog/wp-includes/images/smilies/icon_smile.gif?1d5d3d" alt=':)' class='wp-smiley' /> </p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Swizec</title>
		<link>http://swizec.com/blog/measuring-vocabulary-richness-with-python/swizec/2528/comment-page-1#comment-3410</link>
		<dc:creator>Swizec</dc:creator>
		<pubDate>Thu, 29 Sep 2011 10:30:00 +0000</pubDate>
		<guid isPermaLink="false">http://swizec.com/blog/?p=2528#comment-3410</guid>
		<description>That looks really cool. I should give this Haskell thing a try ... at least enough of a try so I could read that code :)</description>
		<content:encoded><![CDATA[<p>That looks really cool. I should give this Haskell thing a try &#8230; at least enough of a try so I could read that code <img src="http://swizec.com/blog/wp-includes/images/smilies/icon_smile.gif?1d5d3d" alt=':)' class='wp-smiley' /> </p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Swizec</title>
		<link>http://swizec.com/blog/measuring-vocabulary-richness-with-python/swizec/2528/comment-page-1#comment-3409</link>
		<dc:creator>Swizec</dc:creator>
		<pubDate>Thu, 29 Sep 2011 10:27:00 +0000</pubDate>
		<guid isPermaLink="false">http://swizec.com/blog/?p=2528#comment-3409</guid>
		<description>Ooh nice, that is really cool. Did not know about collections :)

Thanks!</description>
		<content:encoded><![CDATA[<p>Ooh nice, that is really cool. Did not know about collections <img src="http://swizec.com/blog/wp-includes/images/smilies/icon_smile.gif?1d5d3d" alt=':)' class='wp-smiley' /> </p>
<p>Thanks!</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Anonymous</title>
		<link>http://swizec.com/blog/measuring-vocabulary-richness-with-python/swizec/2528/comment-page-1#comment-3408</link>
		<dc:creator>Anonymous</dc:creator>
		<pubDate>Wed, 28 Sep 2011 23:49:00 +0000</pubDate>
		<guid isPermaLink="false">http://swizec.com/blog/?p=2528#comment-3408</guid>
		<description>I wrote something equivalent in Haskell, mostly as an exercise for my own benefit. I made rather heavy use of arrows and map, and I replaced d with two different lists although one of them is derived from another. It&#039;s probably not the best Haskell code around, and there&#039;s probably all kinds of better ways to do this, but it looks like it&#039;s working more or less to me.
https://ideone.com/lAnzc

I copied in a chunk of your post as some test data and it produces a number that at first glance seems fairly reasonable.
</description>
		<content:encoded><![CDATA[<p>I wrote something equivalent in Haskell, mostly as an exercise for my own benefit. I made rather heavy use of arrows and map, and I replaced d with two different lists although one of them is derived from another. It&#8217;s probably not the best Haskell code around, and there&#8217;s probably all kinds of better ways to do this, but it looks like it&#8217;s working more or less to me.<br />
<a href="https://ideone.com/lAnzc" rel="nofollow">https://ideone.com/lAnzc</a></p>
<p>I copied in a chunk of your post as some test data and it produces a number that at first glance seems fairly reasonable.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Alice Atlas</title>
		<link>http://swizec.com/blog/measuring-vocabulary-richness-with-python/swizec/2528/comment-page-1#comment-3407</link>
		<dc:creator>Alice Atlas</dc:creator>
		<pubDate>Wed, 28 Sep 2011 21:30:00 +0000</pubDate>
		<guid isPermaLink="false">http://swizec.com/blog/?p=2528#comment-3407</guid>
		<description>You can replace that dictionary and for loop with a collections.Counter object if you&#039;re on Python 2.7 or later.

&lt;code&gt;stemmer = PorterStemmer()
d = collections.Counter(stemmer.stem(w).lower() for w in words(entry))&lt;/code&gt;


(also adding &lt;code&gt;import collections&lt;/code&gt; to the top, of course.)</description>
		<content:encoded><![CDATA[<p>You can replace that dictionary and for loop with a collections.Counter object if you&#8217;re on Python 2.7 or later.</p>
<p><code>stemmer = PorterStemmer()<br />
d = collections.Counter(stemmer.stem(w).lower() for w in words(entry))</code></p>
<p>(also adding <code>import collections</code> to the top, of course.)</p>
]]></content:encoded>
	</item>
</channel>
</rss>

<!-- Served from: www.swizec.com @ 2013-05-26 08:38:51 by W3 Total Cache -->