Lately I've been thinking a lot about architectural complexity and how better tooling might help.
What if we had self-organizing codebases?
— Swizec Teller (@Swizec) August 12, 2023
You write functions/components, your editor finds the best file structure based on dependencies.
I think we have the tech, someone just needs to cobble it together
Quick refresher
- architectural complexity talks about how interconnected your files are
- it lowers productivity 50%
- it increases bugs 3x
- it kinda sorta leads to 10x increase in employee turnover
The easiest way to recognize you're dealing with high architectural complexity is when you've got a bunch of ../../
imports. That means your codebase isn't structured to group related files.
Otherwise known as a Big Ball of Mud. Everything glommed together with no sense of structure.
Modularizing your code is hard
Teasing apart a big ball of mud is hard.
Splitting your code is the easy part. The hard part is figuring out what you even want to split. What belongs together? What domains are hiding in here? Who are all the actors involved? Who are all the teams that care?
We deal with this a lot at work.
Leadership is reverse conway-ing the hell out of this problem, but at ground level we still gotta figure out where the boundaries lie. Not some fuzzy notion of "Oh yeah there's 3 domains in there". No, what exactly are the domains? Which file belongs where?
You can't implement something you don't understand.
Find natural modules in your code
This is where tooling comes in. Maybe. I think we have the individual pieces, someone "just" needs to put them together in a working package.
Here's the idea.
You can use tools like Madge to visualize file dependencies in your code. This works on imports/exports which is good enough for now. You'd need to visualize function calls to get the full picture.
Take our design system library as an example.
Visualize your dependencies
Any community of files with tight internal connections and loose external coupling is a module candidate. That's your code saying these files belong together. Naturally.
No need to think deep about your domain modeling. Look at what naturally works together.
This becomes difficult to see in a bigger codebase. You get a mess of squiggles and squares with few obvious patterns.
Get AI to analyze the graph
This sort of human intuition is difficult to toolify. People aren't gonna look at ~~pretty~~ confusing pictures of their codebase as part of a regular workflow.
So I tried seeing what ChatGPT can do. As an experiment. To see if it's even possible.
Use node-dependency-tree, the dependency graph builder Madge uses, to get a JSON dump of dependencies in a codebase, paste into ChatGPT, ask to analyze.
You can see the full conversation, here. Sharing highlights below.
The input data looks like a blob. Human readable with great effort.
{"/Users/Swizec/Documents/tia-ui/src/index.tsx":{"/Users/Swizec/Documents/tia-ui/src/components/Button.tsx":{"/Users/Swizec/Documents/tia-ui/src/index.tsx":{}},"/Users/Swizec/Documents/tia-ui/src/components/Heading.tsx":{"/Users/Swizec/Documents/tia-ui/src/index.tsx":{}},"/Users/Swizec/Documents/tia-ui/src/components/Logo.tsx":
...
But the ChatGPT prompt is simple. I wanted to see what this LLM can figure out on its own.
he following is a dependency graph of files in a codebase in JSON format. Analyze it like a graph and identify communities of tightly connected files with loose connections to other files.
I even typo'd the first word. Wonderful 😂
ChatGPT correctly interpreted the JSON structure:
The dependencies between the different files seem to be well-defined and clearly denote how different components rely on each other. However, this data is represented in a nested dictionary format, and for effective analysis, we will first need to represent it as a graph. ~ ChatGPT
Then provided a summary of what's going on:
There is a core file /Users/Swizec/Documents/tia-ui/src/index.tsx which has connections to a wide range of components, acting as the root node. Most of the components are directly dependent on the index file, while a few have more intricate dependencies.
ChatGPT's analysis of our unstructured-ish library
The analysis ChatGPT came up with purely based on file imports feels spot on. This is how I'd organize the codebase, if I had time to go back and move things around.
Community 1: General Components These files directly depend on the index file and seem to be basic components for the user interface Community 2: Form Components These files are related to form handling and interact with each other as well as the index file Community 3: Modal Components These files seem to handle modals and utility functions: Community 4: Blocks Components This community represents some decorative or structural components: Community 5: Miscellaneous There are a few files that don't seem to be directly connected to others:
Not sure about that util
in Community 3. A more granular dependency graph that understands function calls would help declutter grab-bag files like this.
Keep in mind ChatGPT saw none of the code and zero meta information. Filenames and import structure only.
And it correctly suggested that purpose-built tools exist for this analysis.
To perform more sophisticated analysis, graph theory tools such as network clustering algorithms could be applied to this structure. A visualization tool would also be helpful in depicting these relationships visually.
Next steps
I need to play around some more. Analyze results, compare to traditional graph analysis algorithms. Maybe this was a fluke? Maybe it's no better than a stable algo? Who knows!
If that works, the next step would be to get it working on a larger codebase. I struggled with context size.
Then wrap it up in a VSCode extension or something. Make this 1-click useful as a tool. See if it can re-org a codebase maybe 🤔
What do you think? Yay or nay?
Cheers, ~Swizec
Continue reading about Finding modules in a Big Ball of Mud with ChatGPT
Semantically similar articles hand-picked by GPT-4
- Finding modules in a big ball of mud
- Followup answers to Forget complicated code, focus on the system
- Forget complicated code, focus on the system
- Two types of complexity and their impact
- You can't fix the wrong abstraction
Learned something new?
Read more Software Engineering Lessons from Production
I write articles with real insight into the career and skills of a modern software engineer. "Raw and honest from the heart!" as one reader described them. Fueled by lessons learned over 20 years of building production code for side-projects, small businesses, and hyper growth startups. Both successful and not.
Subscribe below 👇
Software Engineering Lessons from Production
Join Swizec's Newsletter and get insightful emails 💌 on mindsets, tactics, and technical skills for your career. Real lessons from building production software. No bullshit.
"Man, love your simple writing! Yours is the only newsletter I open and only blog that I give a fuck to read & scroll till the end. And wow always take away lessons with me. Inspiring! And very relatable. 👌"
Have a burning question that you think I can answer? Hit me up on twitter and I'll do my best.
Who am I and who do I help? I'm Swizec Teller and I turn coders into engineers with "Raw and honest from the heart!" writing. No bullshit. Real insights into the career and skills of a modern software engineer.
Want to become a true senior engineer? Take ownership, have autonomy, and be a force multiplier on your team. The Senior Engineer Mindset ebook can help 👉 swizec.com/senior-mindset. These are the shifts in mindset that unlocked my career.
Curious about Serverless and the modern backend? Check out Serverless Handbook, for frontend engineers 👉 ServerlessHandbook.dev
Want to Stop copy pasting D3 examples and create data visualizations of your own? Learn how to build scalable dataviz React components your whole team can understand with React for Data Visualization
Want to get my best emails on JavaScript, React, Serverless, Fullstack Web, or Indie Hacking? Check out swizec.com/collections
Did someone amazing share this letter with you? Wonderful! You can sign up for my weekly letters for software engineers on their path to greatness, here: swizec.com/blog
Want to brush up on your modern JavaScript syntax? Check out my interactive cheatsheet: es6cheatsheet.com
By the way, just in case no one has told you it yet today: I love and appreciate you for who you are ❤️