Finding modules in a big ball of mud

Pulling modules out of a big ball of mud is like grabbing a slice of cheesy pizza. It's kinda separate but also not really.

The trick is to realize this is normal. If you're here, it means the software is working! Fixing this stuff is the job.

Good architecture emerges

Big balls of mud are popular because they work. It's the best way to get code working before you know the right architecture to use. You've got to get your hands dirty and the wrong architecture too early will cause more problems than it solves.

My team built a microservice once where we said "Okay this is important, we've got to get it right the first time".

We designed a layered architecture, started building, and 6 weeks later it was one of the worst codebases I'd ever worked on. The layer separations were in all the wrong places and even the tiniest bug fix involved dozens of files.

That microservice was a pain in our necks for 2 years before we gave up and threw it away. Everyone was afraid to touch that code. The wrong architecture is worse than no architecture.

The dependency graph

We've talked about how architecture is like a path in the woods and good abstractions follow desire paths laid by the team. But how do you find those paths in the quagmire?

You look at your code as a dependency graph or matrix.

Each box in the graph is code that does something. Each line is a connection between that code and other code.

At the level of a function, boxes are loops, conditionals and other groupings of code that works together. Lines are shared variables, conditionals, and so on.

At the level of a file, boxes are functions or classes, and lines are how they call each other. Inside a class, each method is a box and every shared property or method call is a line.

At the level of a module, boxes are a mix of files, objects, and functions. Depends how you organize your code. Lines are imports, function calls, and any shared variables.

At the level of a codebase, you're looking at imports and exports. At the level of a system, it's about who's calling what API. Just because you're using microservices, doesn't mean you didn't build a big ball of mud.

Tooling to generate these visualizations exists, but I haven't had much luck in making it useful. Better to use your imagination and a piece of paper or a whiteboard. Invite others so you can talk about intent more than current implementation. More on that in a bit.

The dependency matrix

Similar to a dependency graph is a dependency matrix.

Your boxes become rows and columns and your lines turn into dots at their intersections. This lets you see the structure of your code and see how messy it feels. The more uniformly distributed your dots, the bigger the quagmire.

Dependency matrix of Mozilla before and after a refactor

It's like seeing the entropy of your code structure. The more uniform the dots, the less information you gain from the organization of your code. This makes it harder to navigate, understand, and update.

A 2006 study of Mozilla, the open-source browser that became Firefox, found that purposefully refactoring code leads to cleaner dependency diagrams and more modularized code. Which sounds obvious but it's nice to see that with effort we can fight entropy.

The ball of mud doesn't need to win!

I have not found any tooling to make these diagrams outside of academia. Although you could use the raw data output from dependency diagram tools to visualize code in this way.

Look for dependency neighborhoods

The goal of visualizing dependencies is to find neighborhoods. Areas of the visualization with lots of internal connections and few external connections.

Those are your natural modules.

Identifying a module in a dependency graph

You may not know the name of your module yet, but every tightly connected neighborhood is a module waiting to be found. I like to give them names that match the domain.

It's possible, even likely, that you'll find a module cuts through an existing box. That's where you DRY'd up some code that deals with separate concerns whose implementation happened to look similar.

A common example are components and functions with boolean arguments that enable different branches of behavior. You're almost always better off separating those into multiple units.

Even if the implementation looks similar today or they do almost the same thing. Just because it's "editing an invoice", doesn't mean the end user, the billing department, and your finance team have the same needs.

Let the domain guide you

Finding those boxes that should turn into two boxes is where whiteboards and paper shine. These can be hard (even impossible) to infer from code without understanding the business domain.

Good architecture is invisible. Bad architecture is everybody's problem. The difference, in my experience, comes down to how closely your architecture follows the domain of your business.

The business will always have its own ideas of how concepts connect, what's possible, and where ideas diverge. You can either fight this and make everyone's lives difficult, or you can build your software to match the domain. This is called domain driven design – DDD.

Good architecture is invisible.
Bad architecture is everybody’s problem.
— Swizec Teller (@Swizec) November 20, 2024

To get this right, you need to talk to stakeholders, users, and other engineers. Listen carefully to unstated assumptions and ask questions about how they use the system to get work done.

How to detangle the domain

My favorite tool for detangling business domains is event storming. It's a workshop technique designed to build common understanding between engineers and stakeholders.

For small domains you can use it as a guide to ask the right questions. Your collaborators don't even need to realize it's happening. For large domains I've found scheduled time with everyone whiteboarding together works best.

You start with domain events. These describe what happened – user signed up, purchase made, etc. Every stakeholder will identify different events. This is important: you're using event storming to uncover things you didn't know about.

You then match those events with commands. These are actions that cause an event – sign up, purchase, etc. Again you're looking to uncover unknown or unsupported ways people want to use your system. It's common to find commands that currently take multiple steps as people work around the limitations of your code.

Then you add the actors. Who is making these actions? This helps you uncover different types of users your code will need to support. Actors may be internal or external systems that react to or trigger events. Sending a welcome email, for example.

Once you have the events, commands, actors, and systems identified, you can draw lines between them to identify business processes.

With all that laid out, it becomes pretty obvious where to draw your module boxes. You'll see parts of the system that are tightly connected, belong to the same actors, etc. A common approach is to build a core business logic module with separate interfaces for different actors.

If you get this right, nobody will ever notice. The code is going to feel obvious and simple and users' evolving needs will be easy to adopt :)

Cheers,
~Swizec

Published on November 21st, 2024 in Scaling Fast Book, Software Engineering, Architectural Complexity

Did you enjoy this article?

👎👍

Continue reading about Finding modules in a big ball of mud

Semantically similar articles hand-picked by GPT-4

Senior Mindset Book

Get promoted, earn a bigger salary, work for top companies

Learn more

Have a burning question that you think I can answer? Hit me up on twitter and I'll do my best.

Who am I and who do I help? I'm Swizec Teller and I turn coders into engineers with "Raw and honest from the heart!" writing. No bullshit. Real insights into the career and skills of a modern software engineer.

Want to become a true senior engineer? Take ownership, have autonomy, and be a force multiplier on your team. The Senior Engineer Mindset ebook can help 👉 swizec.com/senior-mindset. These are the shifts in mindset that unlocked my career.

Curious about Serverless and the modern backend? Check out Serverless Handbook, for frontend engineers 👉 ServerlessHandbook.dev

Want to Stop copy pasting D3 examples and create data visualizations of your own? Learn how to build scalable dataviz React components your whole team can understand with React for Data Visualization

Want to get my best emails on JavaScript, React, Serverless, Fullstack Web, or Indie Hacking? Check out swizec.com/collections

Did someone amazing share this letter with you? Wonderful! You can sign up for my weekly letters for software engineers on their path to greatness, here: swizec.com/blog

Want to brush up on your modern JavaScript syntax? Check out my interactive cheatsheet: es6cheatsheet.com

By the way, just in case no one has told you it yet today: I love and appreciate you for who you are ❤️

Senior Mindset Book

Start with a free chapter

Finding modules in a big ball of mud

Good architecture emerges

The dependency graph

The dependency matrix

Look for dependency neighborhoods

Let the domain guide you

How to detangle the domain

Did you enjoy this article?

Continue reading about Finding modules in a big ball of mud

Learned something new?
Read more Software Engineering Lessons from Production

Software Engineering Lessons from Production

Senior Mindset Book