Image via Wikipedia
On Monday I got a little bored at class. This is something that is likely to happen when you're stuck at classes for nearly the whole day.
During a break a classmate of mine showed me something cool. It was a Java applet that used genetic algorithms (we think it was just basic evolution) to create a model of a car that can travel over rugged terrain ... Actually thanks to Zemanta I found the link -> Go here for a very cool genetic algorithm demo
This gave me a cool idea to retry making an algorithm I was making a year ago in Lisp.
Because wi-fi wasn't working I wrote it in python this time, simply because I don't need to reference documentation to write python.
So I made something cool. (I think) In an hour of solid hacking I created a simple evolutionary algorithm that given enough time can create pretty much any piece of text by performing random mutations and breeding strings together. My classmates can verify that it only took an hour.
Sure, nothing very impressive on a grand scale of things. I mean, what's cool about taking a bunch of random characters and applying random transformations to them to create a known piece of text right?
Well I don't know, but my nerd side finds it incredibly satisfying.
Anyway I tried it with Oscar Wilde's Amor Intellectualis.
You can see the whole code on github.
To make this post about a little bit more than just geek bravado, I'm going to try explaining why I believe the algorithm works at all and what makes it pretty successful. Then I'm going to explain a better approach that maybe I will implement some day.
Ok so the basics of evolution algorithms are pretty simple. You take a population of random stuff. With every generation you mutate the members, perform breeding amongst the best members and remove the worst members from the population.
Yep, pretty much like natural selection+evolution in nature.
Obviously this means that choosing a proper breeding and fitness functions is crucial here. The mutation function isn't very interesting ... and in fact as I discovered it's almost better if you don't have it at all. The real reason it's usually there is because it helps the algorithm get out of local minimums.
def mutate(a):return "".join([random.choice(ALPHABET) if abs(random.random()-0.5) < MUTATE_CHANCE else k for k in a])
Basically mutate just goes through the string and randomly changes characters, if some event with a low chance of happening occurs.
def distance(a, b):return sum([1 if a[i]!=b[i] else 0 for i in range(len(a))])
Distance between strings is my fitness function. It is roughly based on my understanding of Leiningen distance. It simply counts the number of characters that differ between two strings. Very basic stuff.
def compete(population):population.sort(key=lambda a: distance(a, TARGET))return population
My compete function is pretty basic as well. Because of python's awesomeness and because my fitness function returns a simple integer, I can simply sort the population by distance. Lovely.
def breed(a, b):if random.random()-0.5 < BREED_CHANCE:s = random.randint(0, len(a)-10)e = s+random.randint(1, BREED_MAX_CHUNK)c = a[0:s]+b[s:e]+a[e:]return celse:return mutate(b)
The breed function is where it gets interesting and I firmly believe the way I implemented this, is the reason the algorithm performs as well as it does ... it's also probably the reason why it doesn't perform better :)
As you can see what I'm doing here is I'm taking a string and replacing a random chunk of it with the same random chunk from a different string. The key thing here is that I'm not shuffling things around. What this does is that it preserves good candidates in a roughly good state.
For example, if you're at only a single character difference from the target text, this type of breeding assures that nothing but that single character will be changing.
Also of interest might be how I choose whom to breed. Because the breeding function basically clones a part of a worse subject into a part of a better subject (due to the way I'm calling it) what I do is I choose the best 20 members and breed the best 60 members with them. Roughly this ratio seems to produce the best results.
Oh and I had problems with inbreeding. It's funny, but if the population was too small and not enough members are breeding what happens is that every member in the population becomes identical, while none of them match the target. Guess all there's something to all those hillbilly jokes :D
I haven't really put this algorithm through its paces, but I was pleasantly surprised when it only took 10 generations to evolve a simple sentence and only about 150 generations to create a stanza of Hamlet.
The Oscar Wilde poem mentioned earlier is taking a bit longer. After 1000 generations it seems to be mostly prancing around the 5-8 differences range and showing a general downward trend. The lowest peak I saw was at 4. It will probably finish eventually, but perhaps some tweaking of the settings is needed.
At the time of writing the algorithm has managed to turn this:
llajCbeq'abvnM.pvvTSEjl -Sc ue-v,ub,lWoqf'nmhpojfWNNufyv-EywDCO,m:ayu.fqOMTbew.hCyiOkp:vlgSffDFun.nyiM'np bwlkbkj'EyST.kN.li -ogEeiqctEi.aSCwcjNSwmpnEamhoeODSc.,wltnOvMN:oprSbi,CobNh-iyw..hbpF-NNk'fusCtgtT.upmetwAjenlWOsAEgAncoilDSvAO.DmMoiyFttAawTiCjaNuasoCtynkhO-CoDdWWW,gEiOwuk.yFS vNmdqtvs'fgiAS,cWtwOotm:r,dt,FNqen:f-y,Aa, pMNrDMEOFFgqjMMwmtldkD Sb.eq'D:ncc-vdvkpTme,:,'jDabdnNS'p'bFTgWhudWprFpFywAh.bdCWOwcNk-fbou,,gWosvfTyFhrkACd CrhfbrDNoMw'cEfiEcqpqSpTdpyFT,u',TEl hmhskd,'jrAskgOaetibdEEfqaAyd,vhh.sdp'Wicr: uMfa'Wl,f,WEdumCfgEDgqfhhFfDu:mnbvM'iAWjSw :hrb-FltrDkt'atC
Oft have be trod the vales of CastalyAnd heard sweet notes of sylvan music blownFrom antique reeds to common folk unknown:And often launchd ou bark upon that seaWhich the nEne Muses hold in empery,And ploughed free furrows through the wave and foam,Nor spread reluctant sail for more safe homeTill we had freighted well our argosy.Of which despoiled treasures these remain,Sor:ello's passion, and the honied lineOf young Endymion, lordly TamburlaineDriving his pampered jades, and more than these,The seven-fold vision of the Florentine,And grave-browed Milton's solemn harmonies.
Since this simple hack got me interested in this field, I think I've come up with an even better way of creating texts. One that could perhaps even be capable of creating works of art not known in advance and thus become actually a useful implementation of an evolution algorithm.
The general idea is such: instead of mixing around random characters and comparing them to a text. What if we would first use this same algorithm to generate words roughly fitting the English syllabic structure.
Then we could start putting those words together using a similar principle as I've used here. Just make sure when breeding you are working on the level of words. Then your fitness function basically checks for proper poetic structure.
Not sure that would be at all useful, but it might just be something pretty cool and nerdy to make some day.
Comments regarding the suggested algorithm especially wanted!
I write articles with real insight into the career and skills of a modern software engineer. "Raw and honest from the heart!" as one reader described them. Fueled by lessons learned over 20 years of building production code for side-projects, small businesses, and hyper growth startups. Both successful and not.
Subscribe below 👇
Join Swizec's Newsletter and get insightful emails 💌 on mindsets, tactics, and technical skills for your career. Real lessons from building production software. No bullshit.
"Man, love your simple writing! Yours is the only newsletter I open and only blog that I give a fuck to read & scroll till the end. And wow always take away lessons with me. Inspiring! And very relatable. 👌"
Senior Mindset Book
Get promoted, earn a bigger salary, work for top companiesLearn more
20% off this week
Have a burning question that you think I can answer? Hit me up on twitter and I'll do my best.
Who am I and who do I help? I'm Swizec Teller and I turn coders into engineers with "Raw and honest from the heart!" writing. No bullshit. Real insights into the career and skills of a modern software engineer.
Want to become a true senior engineer? Take ownership, have autonomy, and be a force multiplier on your team. The Senior Engineer Mindset ebook can help 👉 swizec.com/senior-mindset. These are the shifts in mindset that unlocked my career.
Curious about Serverless and the modern backend? Check out Serverless Handbook, for frontend engineers 👉 ServerlessHandbook.dev
Want to Stop copy pasting D3 examples and create data visualizations of your own? Learn how to build scalable dataviz React components your whole team can understand with React for Data Visualization
Did someone amazing share this letter with you? Wonderful! You can sign up for my weekly letters for software engineers on their path to greatness, here: swizec.com/blog
By the way, just in case no one has told you it yet today: I love and appreciate you for who you are ❤️