Another step in my Automagic Poetry Generation project.
when i think we have tickets we't my abortion i drank it't my best friend melissa mahoney oh we tried to the building
~The Markov Chain, based on Amanda Palmer's Oasis
I'm making an evolutionary algorithm to generate poetry, but it needs a good base to start from. Random data is preferred, but being completely random means you waste the first couple of epochs just getting to something solid to work from - that's a waste of time.
To make everything a bit smoother, I've written a markov chain generator in Haskell that generates the initial population. The principle is very simple:
- Read seed poem from file
- Tokenize
- Train markov chain
- A generator spits out an infinitely long list based on the markov chain
- Take X tokens
- Detokenize
- Done
Simple.
A quick brush up for people who don't breathe "hardcore" Comp Sci stuff: A markov chain is essentially a probabilistic state automata, usually represented with a probability matrix. You take the current state and translate it into the next state by observing the proper probabilities.
The output is a random string that looks a lot like some drunk tweets you might see on a friday night.
Le Code
Our main function is a simple convolution of a bunch of things:
start_population::(RandomGen g) => g -> String -> IO String
start_population gen start = do
return . (foldr detokenize "") . (take Config.seed_length) . (produce gen start). chain . tokenize =<< readFile Config.seed_data
Reading it from right to left you can see that it first reads some data, tokenizes it, makes the chain, produces some output, cuts it to the proper length and then shoves it back into a normal string.
Simple.
Because I wanted to handle punctuation and new lines, which are important in poems, I had to write my own tokenization and detokenization functions. Otherwise the built in words function would be sufficient.
tokenize::String -> [String]
tokenize s = Prelude.filter (\x -> x /= " " && x /= "") $
Split.split (whenElt
(\x -> isSeparator x || isPunctuation x || x == '\n')) $
Prelude.map toLower s
detokenize::String -> String -> String
detokenize a b
| punctuation a || punctuation b = a++b
| otherwise = a++" "++b
where punctuation = (\x -> length x > 0 && isPunctuation (x!!0))
You can see that tokenize splits on pretty much everything and detokenize takes special care not to put spaces around punctuation.
Another important step is building the chain itself.
chain::[String] -> Map String [String]
chain [now, last] =
insert now [last] $ singleton last []
chain (token:xs) =
insertWith (\new old -> new++old) token [xs!!0] $ chain xs
Simply put - this function builds a HashMap from a token to many tokens. The idea here is to make a note of every token that comes after some other token. To make things simpler, if a pair of tokens happens twice, it will be recorded twice.
This magically gives us the ability to properly weigh the random function that chooses what to generate next.
next_token::(RandomGen g) => g -> Map String [String] -> String -> (g, String)
next_token gen map s =
let choices = findWithDefault [] s map
(i, gen') = randomR (0, length choices - 1) gen
in (gen', choices!!i)
produce::(RandomGen g) => g -> String -> Map String [String] -> [String]
produce gen s map =
let (gen', next) = next_token gen map s
in s:(produce gen' next map)
I have a nasty suspicion the next_token and produce functions could be merged, but I found this easier to reason about.
Next_token is the meat of our algorithm - it does nothing but take a token, find a list of its possible successors in the HashMap and return a random member of that list. To avoid any issues it will return an empty string if nothing is found.
The produce function takes care of driving next_token and makes sure it gets a fresh random generator every time.
A problem with my technique is that once you give a random generator to the markov chain, you're not getting it back. While it does ensure the result will always be fresh, you might be using a stale generator in other parts of your code if you're not careful.
Maybe I should finally look into that random monad I've been hearing about.
Either way, here's the full code, which is 38 sloc because I like including the function headers - makes code easier to think about, but I've tried and it does work without any type hints. Haskell is smarter than I am. Plus I added the whole part that only exposes start_population to the outside world, which isn't otherwise necessary.
module Initiators.MarkovChain (
start_population
) where
import System.Random
import Data.HashMap
import Data.List.Split as Split
import Data.Char
import Config
-- read corpus data
-- build markov chain
-- spit out data
start_population::(RandomGen g) => g -> String -> IO String
start_population gen start = do
return . (foldr detokenize "") . (take Config.seed_length) . (produce gen start). chain . tokenize =<< readFile Config.seed_data tokenize::String -> [String]
tokenize s = Prelude.filter (\x -> x /= " " && x /= "") $
Split.split (whenElt
(\x -> isSeparator x || isPunctuation x || x == '\n')) $
Prelude.map toLower s
detokenize::String -> String -> String
detokenize a b
| punctuation a || punctuation b = a++b
| otherwise = a++" "++b
where punctuation = (\x -> length x > 0 && isPunctuation (x!!0))
chain::[String] -> Map String [String]
chain [now, last] =
insert now [last] $ singleton last []
chain (token:xs) =
insertWith (\new old -> new++old) token [xs!!0] $ chain xs
next_token::(RandomGen g) => g -> Map String [String] -> String -> (g, String)
next_token gen map s =
let choices = findWithDefault [] s map
(i, gen') = randomR (0, length choices - 1) gen
in (gen', choices!!i)
produce::(RandomGen g) => g -> String -> Map String [String] -> [String]
produce gen s map =
let (gen', next) = next_token gen map s
in s:(produce gen' next map)
Continue reading about Markov chain poem trainer+generator in 29 sloc of Haskell
Semantically similar articles hand-picked by GPT-4
- Haskell and randomness
- Evolving a poem with an hour of python hacking
- An elegant way to randomly change every list member in Haskell
- Science Wednesday: Towards a computational model of poetry generation
- Livecoding #38 - A faux AI that writes JavaScript
Learned something new?
Read more Software Engineering Lessons from Production
I write articles with real insight into the career and skills of a modern software engineer. "Raw and honest from the heart!" as one reader described them. Fueled by lessons learned over 20 years of building production code for side-projects, small businesses, and hyper growth startups. Both successful and not.
Subscribe below 👇
Software Engineering Lessons from Production
Join Swizec's Newsletter and get insightful emails 💌 on mindsets, tactics, and technical skills for your career. Real lessons from building production software. No bullshit.
"Man, love your simple writing! Yours is the only newsletter I open and only blog that I give a fuck to read & scroll till the end. And wow always take away lessons with me. Inspiring! And very relatable. 👌"
Have a burning question that you think I can answer? Hit me up on twitter and I'll do my best.
Who am I and who do I help? I'm Swizec Teller and I turn coders into engineers with "Raw and honest from the heart!" writing. No bullshit. Real insights into the career and skills of a modern software engineer.
Want to become a true senior engineer? Take ownership, have autonomy, and be a force multiplier on your team. The Senior Engineer Mindset ebook can help 👉 swizec.com/senior-mindset. These are the shifts in mindset that unlocked my career.
Curious about Serverless and the modern backend? Check out Serverless Handbook, for frontend engineers 👉 ServerlessHandbook.dev
Want to Stop copy pasting D3 examples and create data visualizations of your own? Learn how to build scalable dataviz React components your whole team can understand with React for Data Visualization
Want to get my best emails on JavaScript, React, Serverless, Fullstack Web, or Indie Hacking? Check out swizec.com/collections
Did someone amazing share this letter with you? Wonderful! You can sign up for my weekly letters for software engineers on their path to greatness, here: swizec.com/blog
Want to brush up on your modern JavaScript syntax? Check out my interactive cheatsheet: es6cheatsheet.com
By the way, just in case no one has told you it yet today: I love and appreciate you for who you are ❤️