Swizec Teller - a geek with a hatswizec.com

    Making our irc bot talk

    [18:21:38] <HairyFotr> _chatty-botko_ we desire to hear moar of you're infinite wisdom [18:21:38] <_chatty-botko_> know it broken ? _chatty-botko_ doesn't though gets

    IRC might seem quaint and outdated, but for keeping in touch with a group of friends it beats Twitter, Facebook and Google+ hands down.


    There's a bunch of others on those things. With IRC there's just us ... and a robot overlord.

    Mostly __botko__ makes sure to mention when we repost links and keeps irc logs for us. But as soon as @smotko added random phrases to make keeping us in place more interesting. We wanted more.

    We wanted our bot to speak.

    It's been ages since I coded just for fun, so last night I made __botko__ talk.

    What to say?

    [18:03:13] <_chatty-botko_> _botko_ probably feels left out [18:12:45] <_chatty-botko_> auto complete :D whoa! mind = blown

    So how do you make a robot speak?

    Natural language generation is a huge field, hell I'm doing my thesis in that general vicinity. For this project I just wanted to have a little fun, not invent a monster -> markov chain text generation!

    The way a markov chain text generator works is basically:

    1. split a text into words
    2. create n-grams (n-length groups of words, I used n=1)
    3. create unique hashes of all n-grams
    4. map each n-gram to a list of next n-grams

    You end up with this sort of data structure.

    {-8818677644356330256: {'next': {-6361492750444014453: 1},
    'text': ['dobu'],
    'weight': 1.9},
    -8629397782117610386: {'next': {}, 'text': ['podatkov'], 'weight': 0.9},
    7424044602067048: {'next': {14336086128129331: 0.9, 1480645349370722979: 1},
    'text': [':D'],
    'weight': 2.9},
    # and so on for quite a while

    While this works great for static text, an irc chatroom is a rolling time series that never really ends. Not only will you soon run out of memory, how do you deal with staying relevant?

    I had to wait until 4:03AM for the solution to hit me!

    It doesn't have to be complicated at all - just decay the weights of all next items whenever you poke an element. And do the same whenever a new text is added to the Corpus.

    Especially important is to choose the starting point well. The more relevant it is to current discussion, the more chance you have of saying something relevant. So instead of choosing at random, we make a weighted choice here as well.

    When everything is packaged together, usage becomes pretty simple:

    corpus.add(text) # repeat this a couple of times
    # to generate, you just
    text = " ".join(take(corpus, 5)) # creates a 5 word text

    All the code is on github.

    Making the data structure iterable like that was especially interesting. Now I can change how the generator works without affecting external code.

    def __iter__(self):
    return self
    def __setitem__(self, ngram, value):
    key = self.__hash(ngram)
    self.data[key] = value
    # adapted from
    # http://eli.thegreenplace.net/2010/01/22/weighted-random-generation-in-python/
    def __weighted_choice(self, items):
    rnd = random.random() * sum([w for w, k in items])
    for i, item in enumerate(items):
    rnd -= item[0]
    if rnd < 0:
    return item[1]
    def __next(self, ngram):
    item = self.__getitem__(ngram)
    if len(item['next']) == 0:
    raise StopIteration
    return self.data[self.__weighted_choice(zip(item['next'].values(),
    def next(self):
    if not hasattr(self, 'current_ngram'):
    self.current_ngram = self.__next(self.current_ngram)
    return self.current_ngram

    Pretty cool right?

    When to speak?

    Okay, that takes care of generating the text. But when should __botko__ speak anyway?

    How Should You Act on IRC (Internet Relay Chat...
    How Should You Act on IRC (Internet Relay Chat...

    You certainly don't want the bot to spam everyone. But you don't want him to be completely quiet most of the time. Speaking every X amount of events is simply too predictable and boring ...

    Ideally you'd want him to speak more when the chatroom is busier and less when it's a bit quiet. When it's been quiet for a long time and it suddenly becomes very busy, that's also a good time to speak.

    Sort of like a greeting.

    I ended up using a combination of chat velocity and the rate of chat acceleration change to determine the probability of speaking, which is then still left up to randomness.

    Velocity is measured as the ratio between the timespan of the last 10 messages and the timespan of the last 60. This basically measures how densely the messages are coming into the chatroom right now.

    velocity_rate = (now - 10_messages_ago_time)/(now - 60_messages_ago_time)

    The rate of acceleration change idea took some time to materialize in my head. The idea is that you want to make it likelier for the bot to speak when there is a sudden flurry of activity and fall back to the velocity_rate formula when conditions are mostly stable.

    v_i = average_speed_of_last_[0+i : i*5]_messages
    a_i = delta_of_two_speeds
    accel_rate = a_1/a_2

    It looks simple in pseudocode, but I promise it took a fair amount of head banging to come up with that!

    This part of the code is also on github.


    That's pretty much it. Our IRC room now has a bot that entertains everyone with its nonsense and I couldn't make myself go to bed until almost five in the morning.

    All that's left to do now is tweaking the parameters a bit, perhaps iron out a bug or two.

    I'm almost tempted to connect this guy to #startups ...

    [18:27:27] wisdom ACTION uses this quote in blogpost because it's starting [18:27:43] I almost have to use that ... [18:27:44] kul, to da si se spet zacel ful pogovarjat, prej

    Enhanced by Zemanta

    Did you enjoy this article?

    Published on April 25th, 2012 in Artificial intelligence, Facebook, Internet Relay Chat, Markov chain, Time series, Twitter, Uncategorized

    Learned something new?
    Want to become an expert?

    Here's how it works 👇

    Leave your email and I'll send you thoughtfully written emails every week about React, JavaScript, and your career. Lessons learned over 20 years in the industry working with companies ranging from tiny startups to Fortune5 behemoths.

    Join Swizec's Newsletter

    And get thoughtful letters 💌 on mindsets, tactics, and technical skills for your career. Real lessons from building production software. No bullshit.

    "Man, love your simple writing! Yours is the only newsletter I open and only blog that I give a fuck to read & scroll till the end. And wow always take away lessons with me. Inspiring! And very relatable. 👌"

    ~ Ashish Kumar

    Join over 14,000 engineers just like you already improving their careers with my letters, workshops, courses, and talks. ✌️

    Have a burning question that you think I can answer? I don't have all of the answers, but I have some! Hit me up on twitter or book a 30min ama for in-depth help.

    Ready to Stop copy pasting D3 examples and create data visualizations of your own?  Learn how to build scalable dataviz components your whole team can understand with React for Data Visualization

    Curious about Serverless and the modern backend? Check out Serverless Handbook, modern backend for the frontend engineer.

    Ready to learn how it all fits together and build a modern webapp from scratch? Learn how to launch a webapp and make your first 💰 on the side with ServerlessReact.Dev

    Want to brush up on your modern JavaScript syntax? Check out my interactive cheatsheet: es6cheatsheet.com

    By the way, just in case no one has told you it yet today: I love and appreciate you for who you are ❤️

    Created bySwizecwith ❤️