Swizec Teller - a geek with a hatswizec.com

Senior Mindset Book

Get promoted, earn a bigger salary, work for top companies

Senior Engineer Mindset cover
Learn more

    Week 16: On the criteria to be used in decomposing systems into modules

    [This post is part of an ongoing challenge to understand 52 papers in 52 weeks. You can read previous entries, here, or subscribe to be notified of new posts by email]

    In December 1972, David L. Parnas published On The Criteria To Be Used In Decomposing Systems Into Modules and set the stage for the concept of information hiding in systems design.

    Information hiding is one the main principles used in modern programming so this paper is one big d'oh. But remember, this was written 42 years ago.

    Then again, I run into code that forces me to intimately understand its internals on a daily basis. Pay attention!

    What is modularization

    Modularization gives us three main benefits:

    1. shorter dev time - teams can work in parallel
    2. flexibility - you can change parts of system without affecting others
    3. comprehensibility - you can understand one piece at a time

    Mind you, we aren't talking about subprograms or objects or whatever. Modularization is something you have to do before any of that, when you're just deciding how to split a problem into responsibilities.

    For example, let's take a simple KWIC index. You get an ordered set of lines, which contains an ordered set of words, which are ordered sets of characters. Lines can be circularly shifted by taking the first word and placing it at the end.

    The KWIC system outputs a listing of all circular shifts in alphabetical order.

    These days, that's not very difficult. But in 1972 this was a problem that would take a good programmer one or two weeks to solve according to Parnas. Interesting.

    He proposes two different modularizations:

    1. Input module - reads data lines, stores them in the core for further processing
    2. Circular Shift module - prepares an index of circular shifts
    3. Alphabetizing module - uses the previous results to alphabetize the circular shift index
    4. Output module - uses the alphabetized index and stored lines to create a nice output
    5. Master Control module - makes sure other modules are called in the right order

    This modularization makes sense. All modules are small, have well defined interfaces, and according to Parnas this is the first design most programmers will come up with.

    1. Line Storage module - provides a bunch of functions to work with strings, essentially
    2. Input module - reads data, uses Line Storage to store it
    3. Circular Shifter module - has a function that builds the index, then gives similar interface to module 1, but for shifted lines
    4. Alphabetizer module - a function that alphabetizes and another that retrieves according to the index
    5. Output module - prints stuff
    6. Master Control module - as above, drives the whole process

    This modularization sounds a lot closer to what we call objects these days and generally smells like modern programming. Instead of modules doing stuff and saving data that other modules operate on, they are a collection of functions that act as an interface. Shiny.

    The criteria

    Both of those modularizations work. The system will do what it's told in both cases. Neither is much more complex than the other, and neither has hidden performance issues. Hell, they could both use the same algorithms!

    However, the second is much easier to work with. Let's see.

    Changeability. We might want to change a bunch of things at a later date. Everything from the input format to how lines are stored in memory. With the first modularization everybody needs to know how lines are stored, whereas the second hides that information from everything but Line Storage.

    This is the case with almost any change you can think of. From changing how alphabetization works to changing whether the circular index is calculated on the fly or stored. First modularization necessitates poking everything every time. The second does not.

    The first modularization doesn't help much with Independent Development either. Everybody needs to agree on formats, and storing things, and whatnot. A lot of work. The second modularization is just a bunch of abstract interfaces, which is fairly quick to agree on.

    Subjectively, the second modularization also has greater comprehensibility because you don't have to understand how everything else works just to read the output module. With the first, you always have to understand the whole system.

    You'll quickly notice that the first system was designed out of a flowchart. Think about data flowing through the system from input to output, each step gets a big box - turn those boxes into modules.

    The second was designed according to information hiding. How can we decompose this problem into modules so as much of the details are hidden as possible?

    This creates a system that is easier to work with and can make a huge difference in how much technical debt you accumulate over the years. Parnas only warns that the second modularization could pose a performance issue if you aren't careful about implementation. Everything keeps calling everything.

    Hierarchical structure

    There is also a hierarchy to the second modularization.

    Line Storage doesn't use any of the other modules, so it's level 1. Input and Circular Shifter do need Line Storage, so they're level 2. Alphabetizer and Output need the circular shifts so they're level 3.

    But line storage and circular shifter are somewhat compatible. We could alphabetize/output just the raw lines via some sort of parametrization. Our system can run at two levels of hierarchy! Cool.

    What's really cool here, though, is that this hierarchy allows us to reuse parts of the system. Line Storage can be used for anything that needs to store strings, for instance

    At any point we can prune the higher levels in the hierarchy and build something new!

    More importantly, those higher levels are greatly simplified by reusing the lower parts. It's pretty nifty.


    As you can see, Parnas had some really good ideas here. So good in fact we still consider software that is well designed to follow these principles.

    We have databases that handle our data, and servers take care of serving, and there's a piece that talks to views, and models talk to the database, and there's a piece that handles interactivity with the user. All the frameworks we use these days are designed with information hiding in mind.

    But we often forget to do that ourselves.

    We build tightly coupled systems just because it's the first thing that comes to mind. Or we get sloppy and modules become more and more coupled.

    Code like that sucks to work with so keep information hiding in mind next time you can no longer decide where a new function is supposed to go.

    Enhanced by Zemanta
    Published on February 19th, 2014 in 52papers52weeks, Information hiding, Languages, Learning, Modules, Personal, Programming, Papers

    Did you enjoy this article?

    Continue reading about Week 16: On the criteria to be used in decomposing systems into modules

    Semantically similar articles hand-picked by GPT-4

    Senior Mindset Book

    Get promoted, earn a bigger salary, work for top companies

    Learn more

    Have a burning question that you think I can answer? Hit me up on twitter and I'll do my best.

    Who am I and who do I help? I'm Swizec Teller and I turn coders into engineers with "Raw and honest from the heart!" writing. No bullshit. Real insights into the career and skills of a modern software engineer.

    Want to become a true senior engineer? Take ownership, have autonomy, and be a force multiplier on your team. The Senior Engineer Mindset ebook can help 👉 swizec.com/senior-mindset. These are the shifts in mindset that unlocked my career.

    Curious about Serverless and the modern backend? Check out Serverless Handbook, for frontend engineers 👉 ServerlessHandbook.dev

    Want to Stop copy pasting D3 examples and create data visualizations of your own? Learn how to build scalable dataviz React components your whole team can understand with React for Data Visualization

    Want to get my best emails on JavaScript, React, Serverless, Fullstack Web, or Indie Hacking? Check out swizec.com/collections

    Did someone amazing share this letter with you? Wonderful! You can sign up for my weekly letters for software engineers on their path to greatness, here: swizec.com/blog

    Want to brush up on your modern JavaScript syntax? Check out my interactive cheatsheet: es6cheatsheet.com

    By the way, just in case no one has told you it yet today: I love and appreciate you for who you are ❤️

    Created by Swizec with ❤️