Swizec Teller - a geek with a hatswizec.com

    Lesson learned, test your migrations on the big dataset

    Swizec Teller published ServerlessHandbook.dev avatarSwizec Teller published ServerlessHandbook.dev@Swizec
    Data migrations:

    Oh I can do this in an hour. Two tops.

    Two days later: (╯°□°)╯︵ ┻━┻

    On Thursday we pushed to production. Three months worth of code went from no users, to thousands of users. Or hundreds. Depends how you count.

    My point is, our code went through the ultimate integration test - the real world.

    The whole thing was much less intense than expected. We pushed in the middle of the night, ran our migrations, waited 36 minutes for them to run, ate some pizza while waiting for the fires to start, and went home when they didn't. An uneventful deploy is the best kind of deploy.

    Well okay, I say no fires started, but that's not quite true. The whole engineering team has spent the last couple of days fixing known bugs that slipped through, squashing unknown bugs that users on strange computers in strange lands discovered by clicking something they shouldn't, manually tweaking the production database to get people out of unforeseen corner cases, and implementing some features we forgot we needed.

    Such is life without QA. You forget things.

    As David Heinemeier Hansson once said: it's not worth fixing, if a user doesn't complain about it.


    I kid, we knew hiccups would happen. Hair-on-fire mode lasting just three days after a major deploy is good. Very good.

    Swizec Teller published ServerlessHandbook.dev avatarSwizec Teller published ServerlessHandbook.dev@Swizec
    That moment when workers run while you're importing data and duplicate ids and your db doesn't get primary keys and rails gets confused.

    But the week before was the most frustrating week of my life. I was this close to tearing my hair out, punching my computer in the face, and taking up goat farming. This close. does the this close finger gesture

    I was doing a test run of the database migrations we accumulated in those three months. Shouldn't take more than two hours, right? Wrong. It took a week.

    Some of our database migrations are also data migrations. Mistake no. 1.

    None of our new migrations that we've had for three months have ever been run against the full database. Mistake no. 2.

    Our whole database is under 500,000 lines. The entire data dump falls well under 200 megabytes. Our database is tiny. What could possibly go wrong?

    Oh so very much could.

    Most migrations ran in a fraction of a second. Many took a second or two. A few beasts took ten minutes.

    Ten whole minutes for a single migration on a tiny database. The mind recoils in horror, unable to understand what could be taking so long.

    When we wrote a migration, we tested it on our local machines. 2.3 gigahertz computers with six or eight CPU cores, 16 gigabytes of memory, a solid state drive, and a database that could fit on a floppy drive.

    "It was fast when I tried it". Of course it was. Your computer is a monster running on bare metal. Heroku dynos are paltry "computers" running on many layers of virtualization. They're meant to zerg rush a problem, not handle a big thing on their own.

    You run into problems when your migrations look like this:

    def change
    add_column :promotional_tokens, :type, :string
    Token::PromotionalToken.all.find_each do |token|
    if token.promotion_type.eql? 'referral'
    # remove if user is not a student
    user = User.find(token.user_id)
    if user.is? :student #refactor it to student profile
    token.update!(type: 'Token::ReferralToken' )
    ref_token = Token::ReferralToken.find(token.id)
    user.profile.update!(referral_token: ref_token)
    else #remove it as it is now redundant

    This migration adds a type column to promotional_tokens, then goes through existing tokens - of which each user has one - and updates them. Students get a new token, everybody else has theirs removed. Because that's what business wants.

    There are 90,000 users in the system.

    Let's play spot the problem.

    1. Using .all that loads the whole table into memory
    2. We run a DB query for every token to fetch the user
    3. Then we run a DB query to update the token
    4. Then we run a DB query to get the token again, but with the correct object type
    5. Then we run a DB query to move the token reference to user.profile
    6. If the user isn't a student, we run a destroy query instead

    There's only 200 users who aren't students. We can ignore those.

    For every user we have, this migration hits the database four times. 360,000 database queries. And just to make it more fun, each is made using ActiveRecord instead of raw SQL.

    It took 10 minutes to run through the whole dataset on my local machine. It flat out crashed Heroku's one-off dyno. Out of memory.

    And if it didn't crash, who knows. Our database runs on a separate server. Maybe in the same datacenter, maybe not. Heroku knows, we don't.

    That makes migrations even slower.

    Swizec Teller published ServerlessHandbook.dev avatarSwizec Teller published ServerlessHandbook.dev@Swizec
    That moment when your laptop is so much stronger than Heroku dynos that you decide to migrate data locally.

    I'm not kidding. I legit considered snapshotting the production database, running migrations locally, then swapping production with the updated snapshot. That could screw things up, especially permissions, but at least I could run it.

    We salvaged this migration after many struggles. Like this:

    def change
    add_column :promotional_tokens, :type, :string
    User.with_role(:tutor).includes(:promotional_tokens).find_each do |tutor|
    Token::PromotionalToken.update_all(type: 'Token::ReferralToken')
    User.with_role(:student).includes(:referral_token, :profile).find_each do |student|
    tok = student.referral_token
    student.profile.update_attribute(:referral_token, tok)

    Doesn't look like much, but an order of magnitude less queries hit the database. The magic of scoping and .includes.

    200 queries run to delete non-student tokens. Nothing you can do about that. A single query updates the type of all tokens still in the system. And a single query per token, moves the token reference to student.profile.

    Some 90,003 queries in total. Takes a minute or two.

    In school they teach you that O(4n+1) is the same as O(n+3). It's just linear complexity, who cares. Reality cares. 4 times 90,000 is a lot more than 90,000. Especially when you're talking to a database across TCP/IP.

    All in all it took myself and the genius who sits across from me an entire week to get our migrations running in under 20 minutes on a MacBook Pro. We removed a lot of .alls, reduced many factors of linear complexity, and in one epic instance even turned linear complexity into constant complexity. That was cool.

    We solved the rest by using --size=px. That ran our migrations with Heroku's beefiest dyno. 2.5 gigs of memory, dedicated machine, and 12x of computing. Whatever that means.

    It took 36 minutes to run our migrations on production.

    It took 36 seconds to test them in our dev environment.

    Lesson learned.

    Did you enjoy this article?

    Published on September 2nd, 2015 in Active record pattern, CPU time, Database, Heroku, Rails, Technical

    Learned something new?
    Want to become an expert?

    Here's how it works 👇

    Leave your email and I'll send you thoughtfully written emails every week about React, JavaScript, and your career. Lessons learned over 20 years in the industry working with companies ranging from tiny startups to Fortune5 behemoths.

    Join Swizec's Newsletter

    And get thoughtful letters 💌 on mindsets, tactics, and technical skills for your career. Real lessons from building production software. No bullshit.

    "Man, love your simple writing! Yours is the only newsletter I open and only blog that I give a fuck to read & scroll till the end. And wow always take away lessons with me. Inspiring! And very relatable. 👌"

    ~ Ashish Kumar

    Join over 14,000 engineers just like you already improving their careers with my letters, workshops, courses, and talks. ✌️

    Have a burning question that you think I can answer? I don't have all of the answers, but I have some! Hit me up on twitter or book a 30min ama for in-depth help.

    Ready to Stop copy pasting D3 examples and create data visualizations of your own?  Learn how to build scalable dataviz components your whole team can understand with React for Data Visualization

    Curious about Serverless and the modern backend? Check out Serverless Handbook, modern backend for the frontend engineer.

    Ready to learn how it all fits together and build a modern webapp from scratch? Learn how to launch a webapp and make your first 💰 on the side with ServerlessReact.Dev

    Want to brush up on your modern JavaScript syntax? Check out my interactive cheatsheet: es6cheatsheet.com

    By the way, just in case no one has told you it yet today: I love and appreciate you for who you are ❤️

    Created bySwizecwith ❤️