Swizec Teller - a geek with a hatswizec.com

Senior Mindset Book

Get promoted, earn a bigger salary, work for top companies

Senior Engineer Mindset cover
Learn more

    Why you need observability more than tests

    Here's a short and sweet story about a Friday deploy. I love Friday deploys.

    Here's how it went:

    1. We deployed an update
    2. 2min later we saw SQL error messages in our "something's wrong" slack channel
    3. It was a distributed transaction constraint violation
    4. We couldn't rollback because software only moves forward
    5. 5min later we shipped a reverted PR
    6. The errors stopped
    7. An hour later we had the full fix ready to go

    We didn't ship that one though because a Friday 4:53pm deploy feels too aggressive even to me. Especially when the systems are working and it's a problem that can wait.

    Why tests didn't catch this

    Distributed systems problem. Code worked locally and in tests. You do operation A then B and everything is fine.

    But in production sometimes B happens before A and the database goes "lol mate hold on what is this object you're referencing??"

    You could write a test for this, but you might end up with one of those flaky tests that everybody hates. You know the kind – fails every 98th time, nobody knows why, and you all just ignore it. "Oh that test? Yeah that one sucks. Hit rerun and it'll be fine".

    In production that 98th time happens to a user every day 😉

    And even if you did write the test you'll never know if it works because your code behaves more deterministically in a test environment or because you accurately captured all the nuance of a live production environment.

    How observability did catch it, fast

    It's easy. We send all error logs to a central location where they are observed by robots. When errors talk about SQL, we send them to slack as a warning. If there are lots, we trigger a proper alert that wakes people up.

    We're using OTEL integrated into our python logger. Anyone can hook into this infrastructure with a current_app.logger.debug/info/warn/error. Default error handling is already instrumented so you don't need to think about it.

    Same ability exists on the client side in JavaScript.

    Key to making this useful is:

    • default instrumentation for defaults
    • low friction to add new logs, traces, or spans
    • easy search through all this data (we use Sumologic)
    • anyone can make a self-serve alert to observe their code

    Crucially, you don't need to deploy code to make a new alert or dashboard. As long as the events are there, you can start observing anything that you think is causing problems.

    And then you can fix 'em :)

    Cheers,
    ~Swizec

    Published on November 16th, 2024 in Software Engineering, Scaling Fast Book, Observability

    Did you enjoy this article?

    Continue reading about Why you need observability more than tests

    Semantically similar articles hand-picked by GPT-4

    Senior Mindset Book

    Get promoted, earn a bigger salary, work for top companies

    Learn more

    Have a burning question that you think I can answer? Hit me up on twitter and I'll do my best.

    Who am I and who do I help? I'm Swizec Teller and I turn coders into engineers with "Raw and honest from the heart!" writing. No bullshit. Real insights into the career and skills of a modern software engineer.

    Want to become a true senior engineer? Take ownership, have autonomy, and be a force multiplier on your team. The Senior Engineer Mindset ebook can help 👉 swizec.com/senior-mindset. These are the shifts in mindset that unlocked my career.

    Curious about Serverless and the modern backend? Check out Serverless Handbook, for frontend engineers 👉 ServerlessHandbook.dev

    Want to Stop copy pasting D3 examples and create data visualizations of your own? Learn how to build scalable dataviz React components your whole team can understand with React for Data Visualization

    Want to get my best emails on JavaScript, React, Serverless, Fullstack Web, or Indie Hacking? Check out swizec.com/collections

    Did someone amazing share this letter with you? Wonderful! You can sign up for my weekly letters for software engineers on their path to greatness, here: swizec.com/blog

    Want to brush up on your modern JavaScript syntax? Check out my interactive cheatsheet: es6cheatsheet.com

    By the way, just in case no one has told you it yet today: I love and appreciate you for who you are ❤️

    Created by Swizec with ❤️