Skip to content
Swizec Teller - a geek with a

Science Wednesday: Towards a computational model of poetry generation

English: Three quarter length portrait of Osca...

Towards A Computational Model of Poetry Generation is a paper by Manurung, Ritchie and Thompson (whomever they are) published in May 2000 and so far seems to be the best starting point for my graduation thesis.

There are three main parts to this story:

  1. why!?
  2. what makes it hard
  3. how it used to be done
  4. how it should be done

Obviously my idea of how it should be done matches somewhat with the authors of this paper, otherwise I wouldn't hold it in such a high regard :P


Poetry is a unique artifact of human natural language pro- duction, with the distinctive feature of having a strong unity between its content and its form. The creation of poetry is a task that requires intelligence, expert mastery over world and linguistic knowledge, and creativity. Al- though some researchwork has been devoted towards cre- ative language such as story generation, poetry writing has not been afforded the same attention. It is the aim of this research to fill that gap, and to shed some light on what often seems to be the most enigmatic andmysterious forms of artistic expression.

In short, this is a really cool problem to solve because it's something that hasn't really been done before. And it just looks interesting when you give computers - strictly logical reasonable machines - even the most modest of abilities to create art.

When rationalization is required ... well somebody needs to create all those pop songs. Imagine if we could get a computer to write them, then sell them to all the pop music labels to feed to the Spears and Aguileras of this world. Marvelous!

What makes it hard

The problem with poetry is two-fold. On the one hand one finds themselves at a serious disadvantage in comparison to creating a traditional NLG(natural language generation) tool because poetry is less rigid. There is no message you are trying to convey and so the goal you are trying to achieve is a bit muddy.

English: Portrait of Allen Ginsberg and Bob Dy...

Furthermore, poetry possesses a certain unity between message and form. As Levin stated in 1962 "In poetry the form of the discourse and its meaning are fused into a higher unity".

This creates a problem because traditional NLG systems are decomposed into simpler units of content determination, text planning and surface realisation. In poetry form and message are too intertwined to allow this approach.

The only advantage we might have from creating poetry instead of informative text, is that readers expect to do a lot of interpretative work, but realistically this again only serves to make it more difficult to define a goal for our algorithm.

Essentially: it's hard to decide what we _want _and traditional well researched approaches break down. Great.

How it used to be done

Slovenska lipa srebrnik reverz

As with a lot of NLPresearch, there were several attempts at a strictly academic approach to poetry generation back in the 80's. More recently, there have been several-ish web based attempts, but using similar techniques.

The problem with old approaches is that they were basically party tricks - humans generated extremely detailed grammars and poetic structures, so the computer ended up just semi-randomly filling in the words. While this produced good looking poetry and in case of RACTER even lead to publication.

Leaving aside considerations of whether the poetry at this point is even computer generated, all of these approaches completely ignored semantic meaning and most poetics as well - rhythm, rhyme and figurative language.

How it should be done

Just as I'm planning to in my thesis, authors of this paper have decided to focus mostly on poetic form and ensure their results follow a strict verse, rhyming structure and other features usually attributed to classic poetry. Mostly because it is easy to pass off almost anything as modern experimental poetry and the more structure we can muster, the easier it becomes to actually verify results.

English: Jim Morrison in 1970.

The approach they suggest is a stochastic hillclimbing algorithm - particularly an evolutionary approach where the algorithm follows this kind of loop:

  1. Generate
  2. Evaluate

Generation is done by mutating the best candidates from the previous cycle through three simple constructs: Adding, Changing and Deleting.

So if you had a verse "John walked" it could become "John walked to the store", or "John lumbered". Or for instance "John likes Jill and Mary" it becomes "John likes Jill".

Because of an integrated architecture (combining all of semantics, form, etc.) these changes can and should happen at any level so special care must be taken that changing the semantics of a verse, doesn't negatively affect its rhythm. Although from the paper I don't grok why exactly this is a problem, since the evaluation step should take care of negative mutations.

After we have a new population, these are then evaluated for correctness and the candidates with highest scores from the fitness functions get to go into the next round of mutations.

The authors go into little detail about assessing the different poetic structures, mostly because they have not yet cracked how to implement all of them. Detecting rhythm seems to be where they've made the most progress and a solution is suggested where we give the algorithm a target phonetic form, like this limerick:


Where w means a weak stress syllable and s means a strong one. In parentheses the rhyming structure is described.

It is suggested that with the exception figurative language all the criteria can be dumbed down to numerical arguments that can be easily quantified through different means that are only loosely described.

They also go into a bit of formal detail explaining the sort of grammars that are used, but you should go read the actual paper if you're interested in such heavily theoretical things :)

Here is an example of what their algorithm can produce:

Target semantics

{john(1),mary(2), dog(3), bottle(4), love(5,6,7), slow(8), smile(9,10)}

Target Form



the bottle was loved by Luke
a bottle was loved by a dog



Enhanced by Zemanta

Did you enjoy this article?

Published on March 7th, 2012 in art, Ljubljana, Poetry, Uncategorized

Learned something new?
Want to become a high value JavaScript expert?

Here's how it works 👇

Leave your email and I'll send you an Interactive Modern JavaScript Cheatsheet 📖right away. After that you'll get thoughtfully written emails every week about React, JavaScript, and your career. Lessons learned over my 20 years in the industry working with companies ranging from tiny startups to Fortune5 behemoths.

Start with an interactive cheatsheet 📖

Then get thoughtful letters 💌 on mindsets, tactics, and technical skills for your career.

"Man, love your simple writing! Yours is the only email I open from marketers and only blog that I give a fuck to read & scroll till the end. And wow always take away lessons with me. Inspiring! And very relatable. 👌"

~ Ashish Kumar

Join over 10,000 engineers just like you already improving their JS careers with my letters, workshops, courses, and talks. ✌️

Have a burning question that you think I can answer? I don't have all of the answers, but I have some! Hit me up on twitter or book a 30min ama for in-depth help.

Ready to Stop copy pasting D3 examples and create data visualizations of your own?  Learn how to build scalable dataviz components your whole team can understand with React for Data Visualization

Curious about Serverless and the modern backend? Check out Serverless Handbook, modern backend for the frontend engineer.

Ready to learn how it all fits together and build a modern webapp from scratch? Learn how to launch a webapp and make your first 💰 on the side with ServerlessReact.Dev

Want to brush up on your modern JavaScript syntax? Check out my interactive cheatsheet:

By the way, just in case no one has told you it yet today: I love and appreciate you for who you are ❤️