Swizec Teller - a geek with a hatswizec.com

    A short lesson in debugging complex systems

    Hello friend πŸ‘‹

    Today I wanted to share a quick lesson in debugging complex systems. It's from my upcoming course on building modern React apps with GraphQL, Serverless, and such. Currently a live video series :)

    On Tuesday we hit a snag that derailed the whole session. Unfortunate yes, but I think it produced a great teachable moment. How do you debug complex systems and figure out what's wrong?

    enjoy ❀️

    Click through for source
    Click through for source

    Near the beginning a wild error appeared: 403 Forbidden.

    GraphQL failed. Nothing went through.

    It was a great lesson in systematically debugging complex systems. And in the end we did get our allWidget(userId: "some-long-id") to work.

    As Nico later said

    Even though we didn't make a lot of measurable progress this evening, I still feel like I learned a lot about the troubleshooting side of things πŸ™‚ Notably, creating the basic apollo server was πŸ‘Œ. You explained how to narrow down where the issue could be via simplification, and that thought process might not exist for audience members that don't have an Ops or DevOps background.

    Thanks Nico ❀️

    Here's how it all went down πŸ‘‡

    Add userId to the allWidget query

    We wanted to display per user widget lists on the homepage.


    That shows email and video widgets which is a fine list, but check this out: They're from two different users.


    Best way to fix that is changing our allWidget query so it accepts a userId and returns a list of widgets just for that user. It's important this filtering happens on the backend so data the user shouldn't see never reaches their browser.

    Click through for source
    Click through for source

    The query now accepts a userId parameter.

    Passing a few extra arguments into the scanItems query in our resolver makes the magic work.

    Click through for source
    Click through for source

    DynamoDB scan queries can be cumbersome. Getting a list of everything is easy – no arguments – a specific list takes 3 arguments. That's a lot 🀨

    • FilterExpression says what we're filtering by. Equality with userId in our case
    • ExpressionAttributeNames maps filter attributes to table columns
    • ExpressionAttributeValues maps filter value placeholders to real values

    Try the query and 403

    We decided to test our new query in Apollo Playground first. This would let us focus on the query itself and not on the details of Gatsby or React being just right.


    And that's when all hell broke loose.

    Server unreachable. Console full of Forbidden: 403 errors. Gatsby local server crashed out on us. Everything died.

    oh_the_calamity giphy

    Narrow down the problem

    Something was clearly very wrong. But we didn't change anything that could cause this did we? πŸ€”

    Zeroth step check the logs, but there were no useful errors there. This is a sign you're about to go on a wild ride.

    First step make sure we didn't change something.

    Git checkout the old version of our server. Deploy the lambdas. Try our playground again. If this works, we know some code change on our end broke the server.

    Still didn't work.

    Okay second step try to remember what was wrong last time something like this happened.

    We had a problem a few sessions ago where our lambda wasn't configured to accept POST requests. Looking at serverless.yml confirmed that POST requests are in fact configured.

    Third step is Serverless messing up somehow and misconfiguring AWS APIGateway?

    Looked into the AWS dashboard. Spelunked through the UI and found a configuration for our APIGateway. Everything looked right. GET requests connected to our lambda. So are POST requests.


    Okay what else can we try?

    Fourth step try deploying to a new stage. Maybe we clicked something in the UI that broke things in a way Serverless can't correct.

    So we ran sls deploy -s dev2 to create a whole new AWS setup for our service. Everything fresh and from scratch. Make sure the environment isn't the problem.

    Nope. Still broken.


    Okay fifth step reduce our server to the absolute smallest possible setup. Make sure nothing about our code could be causing this.

    Click through for source
    Click through for source

    If that works, we know the problem is somewhere in how we configured GraphQL.

    Sixth step getting desperate, but what if we create a server that just returns a hello world?

    That was a dead end too. Now what ...

    Seventh step start googling. Has anyone else had this problem before?

    We found an old github issue on Apollo's repository. Something about Apollo playground using the wrong server URL ...

    The facepalm πŸ€¦β€β™€οΈ

    That was it. Apollo was using the wrong server URL.

    Change the target URL to /dev/graphql and everything works. Everything.

    It wasn't a change that we made breaking things at all. And to make matters worse: Gatsby crashed because of a completely unrelated problem but it confirmed our fears that the server is broken πŸ€¦β€β™€οΈ

    facepalm giphy

    Make allWidget work without userId

    Gatsby crashed not because our server was broken but because it queries allWidget to create pages for thumbsup/thumbsdown.

    We have to make the query work with and without userId.

    Click through for source
    Click through for source

    Make the userId optional, pass arguments into scanItems based on whether it's available.

    And just like that everything works again. That was fun.

    To recap

    We used what I've heard referred to as the Sherlock method πŸ‘‰ make a hypothesis, find clues, follow every clue to its end. Repeat until solved.

    You can tackle any debugging problem this way.

    Cheers, ~Swizec

    Did you enjoy this article?

    Published on July 26th, 2019 in Technical

    Learned something new?
    Want to become an expert?

    Here's how it works πŸ‘‡

    Leave your email and I'll send you thoughtfully written emails every week about React, JavaScript, and your career. Lessons learned over 20 years in the industry working with companies ranging from tiny startups to Fortune5 behemoths.

    Join Swizec's Newsletter

    And get thoughtful letters πŸ’Œ on mindsets, tactics, and technical skills for your career. Real lessons from building production software. No bullshit.

    "Man, love your simple writing! Yours is the only newsletter I open and only blog that I give a fuck to read & scroll till the end. And wow always take away lessons with me. Inspiring! And very relatable. πŸ‘Œ"

    ~ Ashish Kumar

    Join over 14,000 engineers just like you already improving their careers with my letters, workshops, courses, and talks. ✌️

    Have a burning question that you think I can answer?Β I don't have all of the answers, but I have some! Hit me up on twitter or book a 30min ama for in-depth help.

    Ready to Stop copy pasting D3 examples and create data visualizations of your own? Β Learn how to build scalable dataviz components your whole team can understand with React for Data Visualization

    Curious about Serverless and the modern backend? Check out Serverless Handbook, modern backend for the frontend engineer.

    Ready to learn how it all fits together and build a modern webapp from scratch? Learn how to launch a webapp and make your first πŸ’° on the side with ServerlessReact.Dev

    Want to brush up on your modern JavaScript syntax?Β Check out my interactive cheatsheet: es6cheatsheet.com

    By the way, just in case no one has told you it yet today: I love and appreciate you for who you are ❀️

    Created bySwizecwith ❀️