Swizec Teller - a geek with a hatswizec.com

    More messing with time: Deduping messages between iOS and JavaScript

    I've written about time before: Days are not 60โˆ™60โˆ™24 seconds, Time is funny in Ruby, and Distributed clocks are hard to synchronize.

    Here's another for the Things You Never Thought Could Go Wrong pile: Deduping messages in a chat application.

    For months now, we've had this elusive bug at Yup that nobody could figure out. When a student and a tutor are talking, if the tutor refreshes his or her webapp, student messages are doubled. Not because there's a display issue, oh no, because they're all saved twice.

    The student only sees the bug if they go back to read their session history later.

    Tutor messages are never doubled.


    Can you figure it out? Let me give you some background.

    The background

    Chat sessions happen between a student and a tutor and our server. The server is there to establish a direct connection between student and tutor via a Pusher channel and to store every message in our database. Sometimes it sends system messages.

    Student talks through an iPhone or Android app, tutor talks through a webapp. Both try to save every message, sent or received, to the server.

    This 3-way approach ensures the whole session transcript gets saved even if one of the clients loses connection to our server. As long as the two humans can talk to each other and at least one of them can talk to the server, all is well.

    Sometimes, maybe, the humans can talk and neither of them can talk to the server. If that happens, our service is down. When we get it back, the chat session is hopefully still going and clients can save history.

    Here's what a typical message looks like:

    chat_id: <number>,
    sent_at: <timestamp>,
    sent_from: <sender>,
    sent_to: <recipient>,
    content_type: <img text="">,
    text: <message>

    chat_id tells you which session a message belongs to, sent_at when it was sent, sent_from who sent it, sent_to who's receiving it (because the system can send messages to either human), content_type whether you should render the text or assume it's an image URL, and text gives you the content.

    On the backend, we rely on a database index to dedupe messages. Our index uses the combination of chat, timestamp, sender, and text to ensure uniqueness.

    add_index "messages", ["chat_id", "text", "sent_at", "sent_from"], name: "messages_must_be_different", unique: true, using: :btree

    You can think of this as: In a chatroom, a person can send the same message at the same time only once.

    The problem

    Soโ€ฆ what's wrong?

    If you guessed "iOS and JavaScript round timestamps differently which means the same message saved from a different client has a different sent_at,โ€ you were right! Congratulations, you're a wizard.

    It wasโ€ฆ not the first thing I thought of. ๐Ÿ˜…

    Here's a dump from a debugging session on my localhost ๐Ÿ‘‡ Student messages are doubled; that's easy to see. The part that's hard to see is that each student message is saved with two different sent_at timestamps.

    2.2.0 :001 > puts Session.last.messages.pluck(:sent_from, :sent_at, :text).to_yaml
    - - student
    - 2017-07-04 23:00:02.786000000 Z
    - Bdbr
    - - student
    - 2017-07-04 23:00:02.786581000 Z
    - Bdbr
    - - tutor
    - 2017-07-04 23:00:14.032999000 Z
    - fawefaw
    - - tutor
    - 2017-07-04 23:00:14.857000000 Z
    - afeaw
    - - student
    - 2017-07-04 23:00:17.053721000 Z
    - Hsj
    - - student
    - 2017-07-04 23:00:17.052999000 Z
    - Hsj
    - - system alert
    - 2017-07-04 23:00:19.403000000 Z
    - Student ended session

    Who sent a message, when, what it was. Pay attention to the timestamps. 2017-07-04 23:00:02.786000000 Z vs. 2017-07-04 23:00:02.786581000 Z, for example.

    That's 0.000581 seconds apart, half a thousandth of a second. Different enough that our database index relying on timestamps decides these are two different messages.


    I'm not sure how this bug got introduced, but it boils down to this line somewhere in our webapp code.

    sent_at: moment(message.sent_at),

    Don't believe me? Watch this.

    > moment("2017-07-04 23:00:02.786581000 Z").format('YYYY-MM-DDTHH:mm:ss.SSSSSSS')


    In fact, this is not even a momentjs problem. JavaScript is limited to milliseconds and iOS is not. Why, I don't know.

    The solution

    Change the index!

    No, not take out sent_at. Heavens no, we still need that. Instead, we can make sure all timestamps have the same precision. We proooooobably don't need to ensure the same message isn't sent twice per nanosecond. Twice per hundredth second should suffice.

    We introduce a key field and index based on that. Added bonus: improve space performance by hashing the text. Makes the index smaller and maybe faster ๐Ÿค“

    def generate_key_if_needed
    if key.blank?
    # round to 1/100s precision
    timestamp = sent_at.to_f.round(2)
    hash = Digest::MurmurHash3_x86_32.hexdigest(text)
    self.key = "#{chat_id}:#{sent_from}:#{timestamp}:#{hash}"

    MurmurHash is one of the best algorithms out there for non-cryptographic hashing. That is, one-way hashing that cares about collisions more than guessability.


    And that's how you fix what looks like a frontend bug by writing writing code on the backend and re-engineering your biggest database table.

    Score for the fullstack generalists! ๐Ÿ’ช๐Ÿผ

    Did you enjoy this article?

    Published on July 5th, 2017 in Front End, Ruby, Technical

    Learned something new?
    Want to become an expert?

    Here's how it works ๐Ÿ‘‡

    Leave your email and I'll send you thoughtfully written emails every week about React, JavaScript, and your career. Lessons learned over 20 years in the industry working with companies ranging from tiny startups to Fortune5 behemoths.

    Join Swizec's Newsletter

    And get thoughtful letters ๐Ÿ’Œ on mindsets, tactics, and technical skills for your career. Real lessons from building production software. No bullshit.

    "Man, love your simple writing! Yours is the only newsletter I open and only blog that I give a fuck to read & scroll till the end. And wow always take away lessons with me. Inspiring! And very relatable. ๐Ÿ‘Œ"

    ~ Ashish Kumar

    Join over 14,000 engineers just like you already improving their careers with my letters, workshops, courses, and talks. โœŒ๏ธ

    Have a burning question that you think I can answer?ย I don't have all of the answers, but I have some! Hit me up on twitter or book a 30min ama for in-depth help.

    Ready to Stop copy pasting D3 examples and create data visualizations of your own? ย Learn how to build scalable dataviz components your whole team can understand with React for Data Visualization

    Curious about Serverless and the modern backend? Check out Serverless Handbook, modern backend for the frontend engineer.

    Ready to learn how it all fits together and build a modern webapp from scratch? Learn how to launch a webapp and make your first ๐Ÿ’ฐ on the side with ServerlessReact.Dev

    Want to brush up on your modern JavaScript syntax?ย Check out my interactive cheatsheet: es6cheatsheet.com

    By the way, just in case no one has told you it yet today: I love and appreciate you for who you areย โค๏ธ

    Created by Swizec with โค๏ธ