[18:21:38] <HairyFotr> _chatty-botko_ we desire to hear moar of you're infinite wisdom [18:21:38] <_chatty-botko_> know it broken ? _chatty-botko_ doesn't though gets
IRC might seem quaint and outdated, but for keeping in touch with a group of friends it beats Twitter, Facebook and Google+ hands down.
There's a bunch of others on those things. With IRC there's just us ... and a robot overlord.
Mostly __botko__ makes sure to mention when we repost links and keeps irc logs for us. But as soon as @smotko added random phrases to make keeping us in place more interesting. We wanted more.
We wanted our bot to speak.
It's been ages since I coded just for fun, so last night I made __botko__ talk.
What to say?
[18:03:13] <_chatty-botko_> _botko_ probably feels left out [18:12:45] <_chatty-botko_> auto complete :D whoa! mind = blown
So how do you make a robot speak?
Natural language generation is a huge field, hell I'm doing my thesis in that general vicinity. For this project I just wanted to have a little fun, not invent a monster -> markov chain text generation!
The way a markov chain text generator works is basically:
- split a text into words
- create n-grams (n-length groups of words, I used n=1)
- create unique hashes of all n-grams
- map each n-gram to a list of next n-grams
You end up with this sort of data structure.
{-8818677644356330256: {'next': {-6361492750444014453: 1},
'text': ['dobu'],
'weight': 1.9},
-8629397782117610386: {'next': {}, 'text': ['podatkov'], 'weight': 0.9},
7424044602067048: {'next': {14336086128129331: 0.9, 1480645349370722979: 1},
'text': [':D'],
'weight': 2.9},
# and so on for quite a while
While this works great for static text, an irc chatroom is a rolling time series that never really ends. Not only will you soon run out of memory, how do you deal with staying relevant?
I had to wait until 4:03AM for the solution to hit me!
It doesn't have to be complicated at all - just decay the weights of all next items whenever you poke an element. And do the same whenever a new text is added to the Corpus.
Especially important is to choose the starting point well. The more relevant it is to current discussion, the more chance you have of saying something relevant. So instead of choosing at random, we make a weighted choice here as well.
When everything is packaged together, usage becomes pretty simple:
corpus.add(text) # repeat this a couple of times
# to generate, you just
corpus.rewind()
text = " ".join(take(corpus, 5)) # creates a 5 word text
All the code is on github.
Making the data structure iterable like that was especially interesting. Now I can change how the generator works without affecting external code.
def __iter__(self):
return self
def __setitem__(self, ngram, value):
key = self.__hash(ngram)
self.data[key] = value
# adapted from
# http://eli.thegreenplace.net/2010/01/22/weighted-random-generation-in-python/
def __weighted_choice(self, items):
rnd = random.random() * sum([w for w, k in items])
for i, item in enumerate(items):
rnd -= item[0]
if rnd < 0:
return item[1]
def __next(self, ngram):
item = self.__getitem__(ngram)
if len(item['next']) == 0:
raise StopIteration
return self.data[self.__weighted_choice(zip(item['next'].values(),
item['next'].keys()))
]['text']
def next(self):
if not hasattr(self, 'current_ngram'):
self.rewind()
self.current_ngram = self.__next(self.current_ngram)
return self.current_ngram
Pretty cool right?
When to speak?
Okay, that takes care of generating the text. But when should __botko__ speak anyway?
You certainly don't want the bot to spam everyone. But you don't want him to be completely quiet most of the time. Speaking every X amount of events is simply too predictable and boring ...
Ideally you'd want him to speak more when the chatroom is busier and less when it's a bit quiet. When it's been quiet for a long time and it suddenly becomes very busy, that's also a good time to speak.
Sort of like a greeting.
I ended up using a combination of chat velocity and the rate of chat acceleration change to determine the probability of speaking, which is then still left up to randomness.
Velocity is measured as the ratio between the timespan of the last 10 messages and the timespan of the last 60. This basically measures how densely the messages are coming into the chatroom right now.
velocity_rate = (now - 10_messages_ago_time)/(now - 60_messages_ago_time)
The rate of acceleration change idea took some time to materialize in my head. The idea is that you want to make it likelier for the bot to speak when there is a sudden flurry of activity and fall back to the velocity_rate formula when conditions are mostly stable.
v_i = average_speed_of_last_[0+i : i*5]_messages
a_i = delta_of_two_speeds
accel_rate = a_1/a_2
It looks simple in pseudocode, but I promise it took a fair amount of head banging to come up with that!
This part of the code is also on github.
Lovely!
That's pretty much it. Our IRC room now has a bot that entertains everyone with its nonsense and I couldn't make myself go to bed until almost five in the morning.
All that's left to do now is tweaking the parameters a bit, perhaps iron out a bug or two.
I'm almost tempted to connect this guy to #startups ...
[18:27:27] wisdom ACTION uses this quote in blogpost because it's starting [18:27:43] I almost have to use that ... [18:27:44] kul, to da si se spet zacel ful pogovarjat, prej
Continue reading about Making our irc bot talk
Semantically similar articles hand-picked by GPT-4
- How I turned 15 years of writing into a chatbot
- Livecoding #22: A door-answering Slackbot
- Livecoding #28: Productizing the door-answering Slack bot, Part 1
- Evolving a poem with an hour of python hacking
- The Internet is a Beautiful Place in the World's Most Exclusive Chatroom
Learned something new?
Read more Software Engineering Lessons from Production
I write articles with real insight into the career and skills of a modern software engineer. "Raw and honest from the heart!" as one reader described them. Fueled by lessons learned over 20 years of building production code for side-projects, small businesses, and hyper growth startups. Both successful and not.
Subscribe below 👇
Software Engineering Lessons from Production
Join Swizec's Newsletter and get insightful emails 💌 on mindsets, tactics, and technical skills for your career. Real lessons from building production software. No bullshit.
"Man, love your simple writing! Yours is the only newsletter I open and only blog that I give a fuck to read & scroll till the end. And wow always take away lessons with me. Inspiring! And very relatable. 👌"
Have a burning question that you think I can answer? Hit me up on twitter and I'll do my best.
Who am I and who do I help? I'm Swizec Teller and I turn coders into engineers with "Raw and honest from the heart!" writing. No bullshit. Real insights into the career and skills of a modern software engineer.
Want to become a true senior engineer? Take ownership, have autonomy, and be a force multiplier on your team. The Senior Engineer Mindset ebook can help 👉 swizec.com/senior-mindset. These are the shifts in mindset that unlocked my career.
Curious about Serverless and the modern backend? Check out Serverless Handbook, for frontend engineers 👉 ServerlessHandbook.dev
Want to Stop copy pasting D3 examples and create data visualizations of your own? Learn how to build scalable dataviz React components your whole team can understand with React for Data Visualization
Want to get my best emails on JavaScript, React, Serverless, Fullstack Web, or Indie Hacking? Check out swizec.com/collections
Did someone amazing share this letter with you? Wonderful! You can sign up for my weekly letters for software engineers on their path to greatness, here: swizec.com/blog
Want to brush up on your modern JavaScript syntax? Check out my interactive cheatsheet: es6cheatsheet.com
By the way, just in case no one has told you it yet today: I love and appreciate you for who you are ❤️