There comes a time in every startup's growth when you think "Wow we do be stuffing a lot of side-effects in that endpoint". Usually placing an order.
You want an email to go out, things saved to the database, a few 3rd party systems to get notified, kick off any processing, the list keeps growing. At Tia we got up to ~20 promises called after you make an appointment, it was our most brittle endpoint. At Plasmidsaurus we're juuuust starting to go "Where do we put this? Oh yeah, right after you place an order" 😅
Stuffing makes you brittle
Stuffing a bunch of side-effects into an endpoint makes your code slow and brittle.
Each new side-effect has, let's say, a 0.5% chance of erroring out. Hopefully less. But lots can go wrong when calling a 3rd party or internal API. Most of it transient in nature.
Plus it's slow. Say a great API has a 30ms response time. We're seeing more like 150ms for our own things. Flask and Python can get pretty slow for IO-bound server workloads. But they're great for compute.
That means stuffing 10 side-effects into an endpoint gives you a 5% error rate with a 300ms lower bound response time. You can improve the time by parallelizing those calls, but the error rate stays. Assuming you need all effects always to run.
Events to the rescue
You don't want 5% of your Place Order calls to fail do you [name|]? That hits you right in the revenue. How many users retry after a failure? Depends what you're selling.
The typical solution is to move to an events-based system.
- An order was placed,
- an event flies into the aether,
- systems react and do their thing
Your first implementation of this can be simple:
async function placeOrder() {
// stuff
await Promise.allSettled([
schedule(effect1),
schedule(effect2),
schedule(effect3),
...
])
}
Loop through your side-effects and schedule a background task on the queue for each thing that needs to happen. Can be 1 queue or many. Each effect would have its own consumer.
Queue processing
Putting work on the queue has 1 major benefit [name|]: Your tasks are stored.
The queue acts as persistent storage. Your queue machinery ensures tasks don't get lost. Don't build your own – use Celery, Kafka, or any of the popular queueing systems.
Stored tasks let you retry on error. API down? Responding with issues? That's okay. Let your task wait on the queue and try again later.
Eventually your task will succeed and all will be well.
Have a system in place to detect poison pills – tasks that never succeed because there's a bug. Add alerting to notify engineers if a task failed more than X number of times. Having it on the queue makes this easy to investigate.
Whole system went down for a few hours?
That's okay. Your queues will restart processing when everything's back. Careful of the stampede! All your queues starting back all at once is a common cause of a 2nd outage :)
Thin tasks, smart task functions
Put as little code as possible in your scheduling. Keep your task packets small. Make your task function do all the work.
Anemic scheduler
An ideal scheduler looks like this:
function scheduleTask(task) {
logger.log("Scheduling task", task)
queue.push(task)
}
No logic, just add to the queue. A log that you attempted to do this will save you lots of stress later.
Small task
An ideal task looks like an event:
const task = {
event: "order_placed",
order_id: 123,
}
Put as little info as possible in your task. Mainly a pointer to some database object with all the details. This saves memory, makes your code easier to debug, and your queues easier to rebuild in case of catastrophic failure.
Smart task function
An ideal task function looks like this:
// called by queue system
function doTheTask(task) {
logger.log("Checking task", task)
const order = db.query(`select * from orders where id=${task.order_id}`)
if (order.task_not_done_yet) {
logger.log("Doing task", task)
// do the work
// THROW on error
logger.log("Finished task", task)
} else {
logger.log("Skipping task", task)
}
}
Pull data from the database, check that the work needs doing. Your tasks may execute more than once. Do the work and make sure you throw on error, that's how the queue system knows to retry.
Silly logs will save you stress in the future. Always log when attempting, doing, finishing, or skipping a task.
Backup scheduling
Keep track of tasks that completed successfully.
I should be able to run a query like select * from orders where not task_not_done_yet
. Obviously the real query would be more complicated.
This helps you 3-fold:
- Guaranteeing exactly-once delivery is impossible. Most queue systems go for at-least-once delivery. This means you have to make sure tasks are idempotent (running 2x is okay)
- You can rebuild the queue in case it gets lost
- You can re-schedule tasks that failed to schedule
Things happen. Maybe there was a bug scheduling a task. Or you kicked the wrong server at the wrong time and all queues got wiped. Or you had a 3 day outage, fixed a bug, and need to re-drive all those queues.
With thin tasks and keeping track this is easy:
function rebuildQueue() {
const orders = db.query(
`select id from orders where task_not_done_yet`
)
for (const order in orders) {
await scheduleTask(order)
}
}
Run that every hour or so. You can make this a task on the queue! This is called the fan-out pattern – a task that schedules other tasks reliably.
Now you'll never miss a side-effect and they won't slow down your endpoints 😊
Cheers, ~Swizec
[sparkjoy|quick-tips-for-distributed-event-based-systems]
Continue reading about Quick tips for distributed event-based systems
Semantically similar articles hand-picked by GPT-4
- Logging 1,721,410 events per day with Postgres, Rails, Heroku, and a bit of JavaScript
- 90% of performance is data access patterns
- Why you need a task queue
- My biggest React App performance boost was a backend change
- Immutability isn't free
Learned something new?
Read more Software Engineering Lessons from Production
I write articles with real insight into the career and skills of a modern software engineer. "Raw and honest from the heart!" as one reader described them. Fueled by lessons learned over 20 years of building production code for side-projects, small businesses, and hyper growth startups. Both successful and not.
Subscribe below 👇
Software Engineering Lessons from Production
Join Swizec's Newsletter and get insightful emails 💌 on mindsets, tactics, and technical skills for your career. Real lessons from building production software. No bullshit.
"Man, love your simple writing! Yours is the only newsletter I open and only blog that I give a fuck to read & scroll till the end. And wow always take away lessons with me. Inspiring! And very relatable. 👌"
Have a burning question that you think I can answer? Hit me up on twitter and I'll do my best.
Who am I and who do I help? I'm Swizec Teller and I turn coders into engineers with "Raw and honest from the heart!" writing. No bullshit. Real insights into the career and skills of a modern software engineer.
Want to become a true senior engineer? Take ownership, have autonomy, and be a force multiplier on your team. The Senior Engineer Mindset ebook can help 👉 swizec.com/senior-mindset. These are the shifts in mindset that unlocked my career.
Curious about Serverless and the modern backend? Check out Serverless Handbook, for frontend engineers 👉 ServerlessHandbook.dev
Want to Stop copy pasting D3 examples and create data visualizations of your own? Learn how to build scalable dataviz React components your whole team can understand with React for Data Visualization
Want to get my best emails on JavaScript, React, Serverless, Fullstack Web, or Indie Hacking? Check out swizec.com/collections
Did someone amazing share this letter with you? Wonderful! You can sign up for my weekly letters for software engineers on their path to greatness, here: swizec.com/blog
Want to brush up on your modern JavaScript syntax? Check out my interactive cheatsheet: es6cheatsheet.com
By the way, just in case no one has told you it yet today: I love and appreciate you for who you are ❤️