[18:21:38] <HairyFotr> _chatty-botko_ we desire to hear moar of you’re infinite wisdom
[18:21:38] <_chatty-botko_> know it broken ? _chatty-botko_ doesn’t though gets

IRC might seem quaint and outdated, but for keeping in touch with a group of friends it beats Twitter, Facebook and Google+ hands down.

Robot

Robot

There’s a bunch of others on those things. With IRC there’s just us … and a robot overlord.

Mostly _botko_ makes sure to mention when we repost links and keeps irc logs for us. But as soon as @smotko added random phrases to make keeping us in place more interesting. We wanted more.

We wanted our bot to speak.

It’s been ages since I coded just for fun, so last night I made _botko_ talk.

What to say?

[18:03:13] <_chatty-botko_> _botko_ probably feels left out
[18:12:45] <_chatty-botko_> auto complete 😀 whoa! mind = blown

So how do you make a robot speak?

Natural language generation is a huge field, hell I’m doing my thesis in that general vicinity. For this project I just wanted to have a little fun, not invent a monster -> markov chain text generation!

The way a markov chain text generator works is basically:

  1. split a text into words
  2. create n-grams (n-length groups of words, I used n=1)
  3. create unique hashes of all n-grams
  4. map each n-gram to a list of next n-grams

You end up with this sort of data structure.

{-8818677644356330256: {'next': {-6361492750444014453: 1},
                        'text': ['dobu'],
                        'weight': 1.9},
 -8629397782117610386: {'next': {}, 'text': ['podatkov'], 'weight': 0.9},
 7424044602067048: {'next': {14336086128129331: 0.9, 1480645349370722979: 1},
                    'text': [':D'],
                    'weight': 2.9},
# and so on for quite a while

While this works great for static text, an irc chatroom is a rolling time series that never really ends. Not only will you soon run out of memory, how do you deal with staying relevant?

I had to wait until 4:03AM for the solution to hit me!

It doesn’t have to be complicated at all – just decay the weights of all next items whenever you poke an element. And do the same whenever a new text is added to the Corpus.

Especially important is to choose the starting point well. The more relevant it is to current discussion, the more chance you have of saying something relevant. So instead of choosing at random, we make a weighted choice here as well.

When everything is packaged together, usage becomes pretty simple:

corpus.add(text) # repeat this a couple of times
 
# to generate, you just
corpus.rewind()
text = " ".join(take(corpus, 5)) # creates a 5 word text

All the code is on github.

Making the data structure iterable like that was especially interesting. Now I can change how the generator works without affecting external code.

    def __iter__(self):
        return self
 
    def __setitem__(self, ngram, value):
        key = self.__hash(ngram)
        self.data[key] = value
 
    # adapted from
    # http://eli.thegreenplace.net/2010/01/22/weighted-random-generation-in-python/
    def __weighted_choice(self, items):
        rnd = random.random() * sum([w for w, k in items])
        for i, item in enumerate(items):
            rnd -= item[0]
            if rnd &lt; 0:
                return item[1]
 
    def __next(self, ngram):
        item = self.__getitem__(ngram)
 
        if len(item['next']) == 0:
            raise StopIteration
 
        return self.data[self.__weighted_choice(zip(item['next'].values(),
                                                    item['next'].keys()))
                         ]['text']
 
    def next(self):
        if not hasattr(self, 'current_ngram'):
            self.rewind()
 
        self.current_ngram = self.__next(self.current_ngram)
        return self.current_ngram

Pretty cool right?

When to speak?

Okay, that takes care of generating the text. But when should _botko_ speak anyway?

How Should You Act on IRC (Internet Relay Chat...

How Should You Act on IRC (Internet Relay Chatroom)? (Photo credit: Chris Pirillo)

You certainly don’t want the bot to spam everyone. But you don’t want him to be completely quiet most of the time. Speaking every X amount of events is simply too predictable and boring …

Ideally you’d want him to speak more when the chatroom is busier and less when it’s a bit quiet. When it’s been quiet for a long time and it suddenly becomes very busy, that’s also a good time to speak.

Sort of like a greeting.

I ended up using a combination of chat velocity and the rate of chat acceleration change to determine the probability of speaking, which is then still left up to randomness.

Velocity is measured as the ratio between the timespan of the last 10 messages and the timespan of the last 60. This basically measures how densely the messages are coming into the chatroom right now.

velocity_rate = (now - 10_messages_ago_time)/(now - 60_messages_ago_time)

The rate of acceleration change idea took some time to materialize in my head. The idea is that you want to make it likelier for the bot to speak when there is a sudden flurry of activity and fall back to the velocity_rate formula when conditions are mostly stable.

v_i = average_speed_of_last_[0+i : i*5]_messages
a_i = delta_of_two_speeds
accel_rate = a_1/a_2

It looks simple in pseudocode, but I promise it took a fair amount of head banging to come up with that!

This part of the code is also on github.

Lovely!

That’s pretty much it. Our IRC room now has a bot that entertains everyone with its nonsense and I couldn’t make myself go to bed until almost five in the morning.

All that’s left to do now is tweaking the parameters a bit, perhaps iron out a bug or two.

I’m almost tempted to connect this guy to #startups …

[18:27:27] wisdom ACTION uses this quote in blogpost because it’s starting
[18:27:43] I almost have to use that …
[18:27:44] kul, to da si se spet zacel ful pogovarjat, prej

Enhanced by Zemanta

Learned something new? Want to improve your skills?

Join over 10,000 engineers just like you already improving their skills!

Here's how it works 👇

Leave your email and I'll send you an Interactive Modern JavaScript Cheatsheet 📖right away. After that you'll get thoughtfully written emails every week about React, JavaScript, and your career. Lessons learned over my 20 years in the industry working with companies ranging from tiny startups to Fortune5 behemoths.

PS: You should also follow me on twitter 👉 here.
It's where I go to shoot the shit about programming.