Swizec Teller - a geek with a hatswizec.com

Senior Mindset Book

Get promoted, earn a bigger salary, work for top companies

Senior Engineer Mindset cover
Learn more

    Make mistakes easy to fix

    You can't prevent bugs. You'll burn out. Instead, you can focus on making them quick to fix.

    Catch and fix bugs quickly

    At work we had a particularly startup moment one day when we shipped a small update, patted ourselves on the back, and immediately got slammed by a flood of messages. People can't do the most important critical work in the company everything is broken fire fire fire!!!

    It was a deep and narrow crevasse.

    Deep and narrow crevasse
    Deep and narrow crevasse

    We had changed the color of some buttons and took the opportunity to delete an unused piece of code mapping data states to colors. Buttons looked great so we shipped.

    And that broke a key page in an important technician workflow during peak use. Page wouldn't even load. The code we deleted was reused on this unrelated page to avoid duplication and in our testing we didn't think to check.

    A classic case of superficial similarity leading to incorrect DRY (do not repeat yourself) mixed with a dash of Hyrum's Law – anything that can be a dependency, will be.

    We didn't think to rollback the deploy, but we had a fix out in 10 minutes. Crisis averted.

    Habits that help you fix fast

    A few habits helped us respond so quickly. The big one was that our change was small and that made it easy to know what broke. Small search space.

    • we found out fast because users had a direct line to engineering and could tell us something's wrong
    • engineers were available because we shipped during regular work hours
    • the change was small because we ship at least daily instead of letting work in progress pile up
    • the fix was small because the change was small
    • observability showed us what to fix; we had logging in place that showed live errors from production with stack traces and those logs were indexed and easy to search
    • deploys are quick because they're automated and engineers just have to ~~press a button~~ run a script
    • deploys are safe because they're deterministic, regularly exercised, and there's almost no manual steps to remember
    • our code is always deployable because we avoid merging in-progress work to main, if it's merged, it's ready to go

    All this means we can go from code to shipped with little overhead. Automated testing runs on every pull request, a peer can review the code while testing runs, and you can get your change to prod a few minutes later.

    There's no long discussion or committee or some burned out dude who needs to approve every little thing. We trust you to do your best and make sure the code works. And if it doesn't, we know you'll fix it.

    Shift even more left

    Even better than quick fixes in production is catching bugs before they ship. The sooner you find mistakes, the easier they are to fix. This is a core lesson Titus Winters writes about in Software Engineering at Google.

    The type of bug where a function goes missing and breaks far-off code is preventable without killing your velocity. With better tooling and a few engineering habits.

    Static types and linters would've caught the bug in the engineers' text editor. Delete the function and squiggly lines appear wherever it is used. Can't even run the code.

    Commit checks work great when you can't see the squiggly lines because they appear in a different file. Try to commit the code, run type validation, get an error.

    Both of those are hard to do in python, which we use. Python is a super dynamic language with few static guarantees. You have to run the code to know exactly what it does.

    Lower architectural complexity would've helped us side-step the whole issue. When you closely collocate code that works together and maintain clean APIs with the rest of your system, mistakes like this become less likely. You're never changing unrelated code by accident.

    Better test coverage of critical user flows would've helped. Just a few broad tests would've told us something's broken before we even merge the pull request. Ideally tests catch unintended changes in your logic, but they'll do as a cumbersome type checker in a pinch.

    Automated alerting on critical user flows could've told us when the code broke before our users even realized. We've since set this up and nothing brings me better joy than reaching out to users with a "Hey we saw this error. What happened?"

    Engineers should proactively hunt for production issues before they turn into fires. This builds trust and keeps your systems running smoothly so there's less confusion when a fire does occur.

    All this maps to DORA metrics: Deploy frequency, lead time for changes, failure rate, time to fix.

    Cheers,
    ~Swizec

    Published on October 8th, 2024 in Software Engineering, Scaling Fast Book, Mindset, Teamwork, Observability

    Did you enjoy this article?

    Continue reading about Make mistakes easy to fix

    Semantically similar articles hand-picked by GPT-4

    Senior Mindset Book

    Get promoted, earn a bigger salary, work for top companies

    Learn more

    Have a burning question that you think I can answer? Hit me up on twitter and I'll do my best.

    Who am I and who do I help? I'm Swizec Teller and I turn coders into engineers with "Raw and honest from the heart!" writing. No bullshit. Real insights into the career and skills of a modern software engineer.

    Want to become a true senior engineer? Take ownership, have autonomy, and be a force multiplier on your team. The Senior Engineer Mindset ebook can help 👉 swizec.com/senior-mindset. These are the shifts in mindset that unlocked my career.

    Curious about Serverless and the modern backend? Check out Serverless Handbook, for frontend engineers 👉 ServerlessHandbook.dev

    Want to Stop copy pasting D3 examples and create data visualizations of your own? Learn how to build scalable dataviz React components your whole team can understand with React for Data Visualization

    Want to get my best emails on JavaScript, React, Serverless, Fullstack Web, or Indie Hacking? Check out swizec.com/collections

    Did someone amazing share this letter with you? Wonderful! You can sign up for my weekly letters for software engineers on their path to greatness, here: swizec.com/blog

    Want to brush up on your modern JavaScript syntax? Check out my interactive cheatsheet: es6cheatsheet.com

    By the way, just in case no one has told you it yet today: I love and appreciate you for who you are ❤️

    Created by Swizec with ❤️