You can't stop the business, or why rewrites fail

👋 here's an excerpt from the Manning book I'm writing that I think you'll like.

In his famous Things You Should Never Do essay, Joel Spolsky says never to attempt a rewrite. It cannot succeed and you're throwing away lessons learned. Lessons you'll need to re-learn again.

But research and experience show that you can't fix the wrong abstraction – you have to rewrite the code to fit new reality.

I think we're talking about different definitions of "rewrite". Joel is talking about the types of rewrites engineers first think of when they hear "rewrite":

Stop the world and rewrite
Build a new system next to the old

These sound clean and tidy. The old and the new system stay neatly separated while you work. But that almost never works.

The culprit is opportunity cost. These sorts of rewrites tend to runaway in scope.

You can't stop the world and rewrite

Stop the world rewrite – old software stops evolving, gets slowly worse, rewrite starts from scratch

When Christopher's team started work on rewriting the systems of the world's largest furniture manufacturer, they understood the challenge: Big system, lots of code, decades old.

Just knowing the full scope of what you're building can be difficult in a system like that. The team wanted to mitigate risk by delivering incremental improvements and adopting new code piece by piece.

But the business side was not ready. They couldn't work in a half new half old way. The whole migration would have to come down to pressing the big red button when everything's ready.

They never pressed the button.

After 2 years, the business decided that switching was too risky and scrapped the project. Now Christopher's team had to go back and retrofit 2 years worth of improvements to the old system. Because it lay there neglected while they were busy building the new world.

This is a painful story, but not uncommon. A lot can go wrong when you stop the world.

Stopping creates risk

I was once asked to fix a company's billing system.

5 years of legacy business models started to stack up and bad code was slowing us down. Business had a new experiment to try and we ... couldn't. There was no way to make it work.

Nobody understood how the system works – too many cooks over too many years. We knew roughly where the code was, which tables held the data, and that was it.

My charge was to:

make the code support our new business model,
kill old business models,
support some not too old models,
allow everyone to migrate to the new model during a grace period where old models keep working

You know, the "make this impossible but also possible" ask that businesses love to make. I estimated 2 weeks of work and got to cleaning.

You can't estimate accurately

Our 2 week estimate was based on assumptions we hoped were true. A proper estimate would mean digging into the code so much you may as well do the work.

Naturally, the system was even worse than we feared. Spaghetti code barely begins to describe it. Control flow bounced from function to function, module to module, with zero rhyme or reason. A masterpiece produced by years of "Oh I'll just add this quick special case right here".

The control flow I was facing – lots of small "reusable" functions with poor domain modeling

Projects like this are plagued by known and unknown unknowns. There is no way to know what you'll find without doing an extensive roadmapping project ahead of time. But that may take longer than the business is willing to spend.

"Can I spend a week estimating how long fixing this code will take?" is a tough ask. It takes an experienced engineering leader to say yes.

Making a guess and adjusting as you learn more is usually the best you can do.

And now you're stuck

6 weeks into my 2 week estimate the billing system was a mess. The original code limped along, the window of opportunity to try our new business model was all but gone, and my long-lived branch was full of bugs and half-written code.

When stakeholders asked what's going on it was like that scene from Malcolm in the Middle – Lois finds Hal in the garage under the car covered in grease and says "Hal can you fix the lightbulb?". Hal rolls out and shouts, annoyed, "What does it look like I'm doing!?"

We were stuck past the point of no return. Couldn't make changes to the old code because they'll be gone soon, couldn't use the new code because it wasn't ready.

Eventually we negotiated some complexity, supported fewer old business models, and completed the rewrite. Then spent 2 months fixing bugs and re-adding old lessons.

You can't build a new system next to the old

New next to old rewrite – old software keeps evolving, rewrite tries to catch up

Nobody likes to feel stuck during a hard-to-estimate rewrite. You try a different approach next time – split the team. One team works on the fresh rewrite, the other maintains existing code.

Sounds great:

bugs keep getting fixed
product keeps adding features and running experiments
the rewrite team can focus

But the exact problem you're avoiding is now the biggest risk for your rewrite: The target is running away!

The only way you'll ever catch up is, if you code even faster. A little bit of slope beats a lot of y-intercept, yes, but how much slope does the maintenance team have?

If the old code has 104 weeks worth of features, and the maintenance team works on fixes and additions for 1 day per week ... you'll need 130 weeks to catch up. That's almost 3 years 🥲

To catch up faster, you'll have to do more than a week's worth of work per week. You can save a few weeks by avoiding old mistakes. And you totally won't make any new mistakes, right?

The same old challenges remain:

known and unknown unknowns
nobody understands the old code
use-cases you forgot existed

Like when we set out to rewrite that login page in React and discovered that there's two login flows – email and sms. The estimate ballooned every time we looked at the code. 😅

The biggest challenge with chasing a moving target is that the old code continues to kick the can. As other teams find new use-cases and uncover bugs, you have to build those twice – once in the old code, once in the new code.

Meanwhile your new code is not kicking the can because it sits there unused until it's ready. Imagine all the bugs and missed use-cases you'll find when it ships ...

Cheers,
~Swizec

Published on July 11th, 2023 in Refactoring, Rewrites, Software Engineering

Did you enjoy this article?

👎👍

Continue reading about You can't stop the business, or why rewrites fail

Semantically similar articles hand-picked by GPT-4

Senior Mindset Book

Get promoted, earn a bigger salary, work for top companies

Learn more

Have a burning question that you think I can answer? Hit me up on twitter and I'll do my best.

Who am I and who do I help? I'm Swizec Teller and I turn coders into engineers with "Raw and honest from the heart!" writing. No bullshit. Real insights into the career and skills of a modern software engineer.

Want to become a true senior engineer? Take ownership, have autonomy, and be a force multiplier on your team. The Senior Engineer Mindset ebook can help 👉 swizec.com/senior-mindset. These are the shifts in mindset that unlocked my career.

Curious about Serverless and the modern backend? Check out Serverless Handbook, for frontend engineers 👉 ServerlessHandbook.dev

Want to Stop copy pasting D3 examples and create data visualizations of your own? Learn how to build scalable dataviz React components your whole team can understand with React for Data Visualization

Want to get my best emails on JavaScript, React, Serverless, Fullstack Web, or Indie Hacking? Check out swizec.com/collections

Did someone amazing share this letter with you? Wonderful! You can sign up for my weekly letters for software engineers on their path to greatness, here: swizec.com/blog

Want to brush up on your modern JavaScript syntax? Check out my interactive cheatsheet: es6cheatsheet.com

By the way, just in case no one has told you it yet today: I love and appreciate you for who you are ❤️

Senior Mindset Book

Start with a free chapter

You can't stop the business, or why rewrites fail

You can't stop the world and rewrite

Stopping creates risk

You can't estimate accurately

And now you're stuck

You can't build a new system next to the old

Did you enjoy this article?

Continue reading about You can't stop the business, or why rewrites fail

Learned something new?
Read more Software Engineering Lessons from Production

Software Engineering Lessons from Production

Senior Mindset Book