Swizec Teller - a geek with a hatswizec.com

Senior Mindset Book

Get promoted, earn a bigger salary, work for top companies

Senior Engineer Mindset cover
Learn more

    Making our irc bot talk

    [18:21:38] <HairyFotr> _chatty-botko_ we desire to hear moar of you're infinite wisdom [18:21:38] <_chatty-botko_> know it broken ? _chatty-botko_ doesn't though gets

    IRC might seem quaint and outdated, but for keeping in touch with a group of friends it beats Twitter, Facebook and Google+ hands down.


    There's a bunch of others on those things. With IRC there's just us ... and a robot overlord.

    Mostly __botko__ makes sure to mention when we repost links and keeps irc logs for us. But as soon as @smotko added random phrases to make keeping us in place more interesting. We wanted more.

    We wanted our bot to speak.

    It's been ages since I coded just for fun, so last night I made __botko__ talk.

    What to say?

    [18:03:13] <_chatty-botko_> _botko_ probably feels left out [18:12:45] <_chatty-botko_> auto complete :D whoa! mind = blown

    So how do you make a robot speak?

    Natural language generation is a huge field, hell I'm doing my thesis in that general vicinity. For this project I just wanted to have a little fun, not invent a monster -> markov chain text generation!

    The way a markov chain text generator works is basically:

    1. split a text into words
    2. create n-grams (n-length groups of words, I used n=1)
    3. create unique hashes of all n-grams
    4. map each n-gram to a list of next n-grams

    You end up with this sort of data structure.

    {-8818677644356330256: {'next': {-6361492750444014453: 1},
                            'text': ['dobu'],
                            'weight': 1.9},
     -8629397782117610386: {'next': {}, 'text': ['podatkov'], 'weight': 0.9},
     7424044602067048: {'next': {14336086128129331: 0.9, 1480645349370722979: 1},
                        'text': [':D'],
                        'weight': 2.9},
    # and so on for quite a while

    While this works great for static text, an irc chatroom is a rolling time series that never really ends. Not only will you soon run out of memory, how do you deal with staying relevant?

    I had to wait until 4:03AM for the solution to hit me!

    It doesn't have to be complicated at all - just decay the weights of all next items whenever you poke an element. And do the same whenever a new text is added to the Corpus.

    Especially important is to choose the starting point well. The more relevant it is to current discussion, the more chance you have of saying something relevant. So instead of choosing at random, we make a weighted choice here as well.

    When everything is packaged together, usage becomes pretty simple:

    corpus.add(text) # repeat this a couple of times
    # to generate, you just
    text = " ".join(take(corpus, 5)) # creates a 5 word text

    All the code is on github.

    Making the data structure iterable like that was especially interesting. Now I can change how the generator works without affecting external code.

        def __iter__(self):
            return self
        def __setitem__(self, ngram, value):
            key = self.__hash(ngram)
            self.data[key] = value
        # adapted from
        # http://eli.thegreenplace.net/2010/01/22/weighted-random-generation-in-python/
        def __weighted_choice(self, items):
            rnd = random.random() * sum([w for w, k in items])
            for i, item in enumerate(items):
                rnd -= item[0]
                if rnd < 0:
                    return item[1]
        def __next(self, ngram):
            item = self.__getitem__(ngram)
            if len(item['next']) == 0:
                raise StopIteration
            return self.data[self.__weighted_choice(zip(item['next'].values(),
        def next(self):
            if not hasattr(self, 'current_ngram'):
            self.current_ngram = self.__next(self.current_ngram)
            return self.current_ngram

    Pretty cool right?

    When to speak?

    Okay, that takes care of generating the text. But when should __botko__ speak anyway?

    How Should You Act on IRC (Internet Relay Chat...
    How Should You Act on IRC (Internet Relay Chat...

    You certainly don't want the bot to spam everyone. But you don't want him to be completely quiet most of the time. Speaking every X amount of events is simply too predictable and boring ...

    Ideally you'd want him to speak more when the chatroom is busier and less when it's a bit quiet. When it's been quiet for a long time and it suddenly becomes very busy, that's also a good time to speak.

    Sort of like a greeting.

    I ended up using a combination of chat velocity and the rate of chat acceleration change to determine the probability of speaking, which is then still left up to randomness.

    Velocity is measured as the ratio between the timespan of the last 10 messages and the timespan of the last 60. This basically measures how densely the messages are coming into the chatroom right now.

    velocity_rate = (now - 10_messages_ago_time)/(now - 60_messages_ago_time)

    The rate of acceleration change idea took some time to materialize in my head. The idea is that you want to make it likelier for the bot to speak when there is a sudden flurry of activity and fall back to the velocity_rate formula when conditions are mostly stable.

    v_i = average_speed_of_last_[0+i : i*5]_messages
    a_i = delta_of_two_speeds
    accel_rate = a_1/a_2

    It looks simple in pseudocode, but I promise it took a fair amount of head banging to come up with that!

    This part of the code is also on github.


    That's pretty much it. Our IRC room now has a bot that entertains everyone with its nonsense and I couldn't make myself go to bed until almost five in the morning.

    All that's left to do now is tweaking the parameters a bit, perhaps iron out a bug or two.

    I'm almost tempted to connect this guy to #startups ...

    [18:27:27] wisdom ACTION uses this quote in blogpost because it's starting [18:27:43] I almost have to use that ... [18:27:44] kul, to da si se spet zacel ful pogovarjat, prej

    Published on April 25th, 2012 in Artificial intelligence, Facebook, Internet Relay Chat, Markov chain, Time series, Twitter, Uncategorized

    Did you enjoy this article?

    Continue reading about Making our irc bot talk

    Semantically similar articles hand-picked by GPT-4

    Senior Mindset Book

    Get promoted, earn a bigger salary, work for top companies

    Learn more

    Have a burning question that you think I can answer? Hit me up on twitter and I'll do my best.

    Who am I and who do I help? I'm Swizec Teller and I turn coders into engineers with "Raw and honest from the heart!" writing. No bullshit. Real insights into the career and skills of a modern software engineer.

    Want to become a true senior engineer? Take ownership, have autonomy, and be a force multiplier on your team. The Senior Engineer Mindset ebook can help 👉 swizec.com/senior-mindset. These are the shifts in mindset that unlocked my career.

    Curious about Serverless and the modern backend? Check out Serverless Handbook, for frontend engineers 👉 ServerlessHandbook.dev

    Want to Stop copy pasting D3 examples and create data visualizations of your own? Learn how to build scalable dataviz React components your whole team can understand with React for Data Visualization

    Want to get my best emails on JavaScript, React, Serverless, Fullstack Web, or Indie Hacking? Check out swizec.com/collections

    Did someone amazing share this letter with you? Wonderful! You can sign up for my weekly letters for software engineers on their path to greatness, here: swizec.com/blog

    Want to brush up on your modern JavaScript syntax? Check out my interactive cheatsheet: es6cheatsheet.com

    By the way, just in case no one has told you it yet today: I love and appreciate you for who you are ❤️

    Created by Swizec with ❤️