👋 here's an excerpt from the Manning book I'm writing that I think you'll like.
In his famous Things You Should Never Do essay, Joel Spolsky says never to attempt a rewrite. It cannot succeed and you're throwing away lessons learned. Lessons you'll need to re-learn again.
But research and experience show that you can't fix the wrong abstraction – you have to rewrite the code to fit new reality.
I think we're talking about different definitions of "rewrite". Joel is talking about the types of rewrites engineers first think of when they hear "rewrite":
- Stop the world and rewrite
- Build a new system next to the old
These sound clean and tidy. The old and the new system stay neatly separated while you work. But that almost never works.
The culprit is opportunity cost. These sorts of rewrites tend to runaway in scope.
When Christopher's team started work on rewriting the systems of the world's largest furniture manufacturer, they understood the challenge: Big system, lots of code, decades old.
Just knowing the full scope of what you're building can be difficult in a system like that. The team wanted to mitigate risk by delivering incremental improvements and adopting new code piece by piece.
But the business side was not ready. They couldn't work in a half new half old way. The whole migration would have to come down to pressing the big red button when everything's ready.
They never pressed the button.
After 2 years, the business decided that switching was too risky and scrapped the project. Now Christopher's team had to go back and retrofit 2 years worth of improvements to the old system. Because it lay there neglected while they were busy building the new world.
This is a painful story, but not uncommon. A lot can go wrong when you stop the world.
I was once asked to fix a company's billing system.
5 years of legacy business models started to stack up and bad code was slowing us down. Business had a new experiment to try and we ... couldn't. There was no way to make it work.
Nobody understood how the system works – too many cooks over too many years. We knew roughly where the code was, which tables held the data, and that was it.
My charge was to:
- make the code support our new business model,
- kill old business models,
- support some not too old models,
- allow everyone to migrate to the new model during a grace period where old models keep working
You know, the "make this impossible but also possible" ask that businesses love to make. I estimated 2 weeks of work and got to cleaning.
Our 2 week estimate was based on assumptions we hoped were true. A proper estimate would mean digging into the code so much you may as well do the work.
Naturally, the system was even worse than we feared. Spaghetti code barely begins to describe it. Control flow bounced from function to function, module to module, with zero rhyme or reason. A masterpiece produced by years of "Oh I'll just add this quick special case right here".
Projects like this are plagued by known and unknown unknowns. There is no way to know what you'll find without doing an extensive roadmapping project ahead of time. But that may take longer than the business is willing to spend.
"Can I spend a week estimating how long fixing this code will take?" is a tough ask. It takes an experienced engineering leader to say yes.
Making a guess and adjusting as you learn more is usually the best you can do.
6 weeks into my 2 week estimate the billing system was a mess. The original code limped along, the window of opportunity to try our new business model was all but gone, and my long-lived branch was full of bugs and half-written code.
When stakeholders asked what's going on it was like that scene from Malcolm in the Middle – Lois finds Hal in the garage under the car covered in grease and says "Hal can you fix the lightbulb?". Hal rolls out and shouts, annoyed, "What does it look like I'm doing!?"
We were stuck past the point of no return. Couldn't make changes to the old code because they'll be gone soon, couldn't use the new code because it wasn't ready.
Eventually we negotiated some complexity, supported fewer old business models, and completed the rewrite. Then spent 2 months fixing bugs and re-adding old lessons.
Nobody likes to feel stuck during a hard-to-estimate rewrite. You try a different approach next time – split the team. One team works on the fresh rewrite, the other maintains existing code.
- bugs keep getting fixed
- product keeps adding features and running experiments
- the rewrite team can focus
But the exact problem you're avoiding is now the biggest risk for your rewrite: The target is running away!
The only way you'll ever catch up is, if you code even faster. A little bit of slope beats a lot of y-intercept, yes, but how much slope does the maintenance team have?
If the old code has 104 weeks worth of features, and the maintenance team works on fixes and additions for 1 day per week ... you'll need 130 weeks to catch up. That's almost 3 years 🥲
To catch up faster, you'll have to do more than a week's worth of work per week. You can save a few weeks by avoiding old mistakes. And you totally won't make any new mistakes, right?
The same old challenges remain:
- known and unknown unknowns
- nobody understands the old code
- use-cases you forgot existed
Like when we set out to rewrite that login page in React and discovered that there's two login flows – email and sms. The estimate ballooned every time we looked at the code. 😅
The biggest challenge with chasing a moving target is that the old code continues to kick the can. As other teams find new use-cases and uncover bugs, you have to build those twice – once in the old code, once in the new code.
Meanwhile your new code is not kicking the can because it sits there unused until it's ready. Imagine all the bugs and missed use-cases you'll find when it ships ...
Continue reading about You can't stop the business, or why rewrites fail
Semantically similar articles hand-picked by GPT-4
- Don't neglect your upgrades
- What I learned from Software Engineering at Google
- Own the outcome, not the work
- 25 lessons from 25 years of coding
- How resumé-driven development shapes our industry
I write articles with real insight into the career and skills of a modern software engineer. "Raw and honest from the heart!" as one reader described them. Fueled by lessons learned over 20 years of building production code for side-projects, small businesses, and hyper growth startups. Both successful and not.
Subscribe below 👇
Join Swizec's Newsletter and get insightful emails 💌 on mindsets, tactics, and technical skills for your career. Real lessons from building production software. No bullshit.
"Man, love your simple writing! Yours is the only newsletter I open and only blog that I give a fuck to read & scroll till the end. And wow always take away lessons with me. Inspiring! And very relatable. 👌"
Senior Mindset Book
Get promoted, earn a bigger salary, work for top companiesLearn more
Have a burning question that you think I can answer? Hit me up on twitter and I'll do my best.
Who am I and who do I help? I'm Swizec Teller and I turn coders into engineers with "Raw and honest from the heart!" writing. No bullshit. Real insights into the career and skills of a modern software engineer.
Want to become a true senior engineer? Take ownership, have autonomy, and be a force multiplier on your team. The Senior Engineer Mindset ebook can help 👉 swizec.com/senior-mindset. These are the shifts in mindset that unlocked my career.
Curious about Serverless and the modern backend? Check out Serverless Handbook, for frontend engineers 👉 ServerlessHandbook.dev
Want to Stop copy pasting D3 examples and create data visualizations of your own? Learn how to build scalable dataviz React components your whole team can understand with React for Data Visualization
Did someone amazing share this letter with you? Wonderful! You can sign up for my weekly letters for software engineers on their path to greatness, here: swizec.com/blog
By the way, just in case no one has told you it yet today: I love and appreciate you for who you are ❤️