Swizec Teller - a geek with a hatswizec.com

Senior Mindset Book

Get promoted, earn a bigger salary, work for top companies

Senior Engineer Mindset cover
Learn more

    Why people making compilers are superheroes

    So you want to create a programming language? Awesome!

    English: Male superhero placeholder with copyr...
    English: Male superhero placeholder with copyr...

    Should you do it? Definitely not. Better yet, go ahead, but don't take it lightly.

    When I created my first programming (scripting? is there even a difference?) language I was about 17 - a lovely templating language that through a series of regexes was transformed into PHP code. Had everything from variables to functions and loops. Wonderful.

    My next foray into language creation was about two years ago. Older and wiser, I knew I wanted to create "a lisp without parentheses". Cool huh?

    Failed as soon as I realized I don't know how to parse "if this then if this then that else that"

    Remember, no parentheses.

    Building a real compiler

    This semester I jumped at the chance to take a compilers class - we built a compiler for a stripped down version of Pascal. Practically from scratch.

    Turns out that _"if this then if this then that else that" _cannot be parsed with a linear grammar - you need an "elif" construct or parentheses. Using a recursive grammar would be too slow.

    Writing a compiler is fun! And by fun I mean it makes you feel like driving a metal rod through your brain. It's fun in that rewarding Holy crap, did I just survive that!? I survived _that_? Damn.

    Parse tree of Python code with inset tokenization

    The complexity is immense. The difficulty of discovering there's a problem at all ... even immenser.

    A compiler works in several stages:

    1. Lexical analysis - parses out comments and whitespace, unifies the language used (a list of lexemes, you use JFlex or something)
    2. Syntactical analysis - checks the syntax is correct and builds the Abstract Syntax Tree (using a linear grammar with a tool like java_cup)
    3. Semantic analysis - takes care of the semantics of the language (only call functions, supply correct parameters etc. - type checking)
    4. Frames - essentially memory management. Give functions some breathing space, pointers to their memory and so on.
    5. Intermediate code generation - this stage turns the AST into a tree of assembler-like instructions
    6. Code linearization - next step is to change that tree into a linear set of instructions, make sure registers are used well and so on. At this point you can run an interpreter.
    7. There are a few more stages before reaching machine code; luckily we stopped here.

    The really fun part is that, given a random issue, any of those stages can be the problem. Even though separately they all look like they're working perfectly.

    The debugging

    The debugging ... oh god the debugging. This relatively simple compiler is beyond a doubt the toughest little bastard I have ever had the pleasure of fixing.

    For starters, you don't even know if there is or isn't a bug. Your only chance at debugging (and finding the bugs in the first place) is to write code in the target language and hope they break something.

    1. Compile the compiler, see Java devours it and all is well
    2. Run the compiler, there are no runtime errors
    3. Write some code in the target language
    4. Compile+run with your compiler/interpreter

    One of two things will happen. The code will run smoothly and output the correct result.

    FPGARetrocomputing
    FPGARetrocomputing

    Or there will be a syntax error. Or a semantic error. Or the result will be simply wrong.

    You now have to carefully look through the example code and decide that it is in fact correct, written properly and should work. Remember, you cannot test it anywhere else, because you are creating the compiler. In a class setting, your mates can help with their compilers (which are also be buggy), if you're creating a new language - you're on your own.

    Once you've decided the target code is correct it's time to look through your compiler.

    In the case of syntax/semantic errors the task is simple - look at the output of the appropriate stage and decide that after several months of everything working, hey your grammar is actually wrong. Or hey, your type checker is actually doing that one thing wrong. Or maybe your name checker is being silly ... whatever.

    Easily fixed.

    The really nasty buggers are those logical errors - the code didn't come up with the right result. There is no real symptom to look at. Your only hope of success is carefully inspecting the intermediate code and seeing if anything looks wrong.

    Even once you've found the problem, there's still the issue of what's actually causing it.

    For instance: I was chasing a bug for days. Arrays were overwriting their neighbours in a record ... turns out my sample code wasn't properly reserving memory and shouldn't be working anyway. That was fun.

    Superheroes

    And keep in mind that finding the bugs in the first place is *really hard. ***The professor gave my very buggy compiler a 100%. Simply because every program he ran worked.

    That's why it can take decades to discover a bug in a compiler used by millions of people. And how many buggy compilers are out there when people just assume their code is the problem and change it?

    Seriously, the people out there who make compilers and languages used by millions of people are superheroes. I can't imagine doing that and keep even a semblance of my fragile sanity.

    Superheroes
    Superheroes
    Published on June 11th, 2012 in Abstract syntax tree, Bytecode, Code generation (compiler), Compiler, Machine code, Parsing, Programming language, Type system, Uncategorized

    Did you enjoy this article?

    Continue reading about Why people making compilers are superheroes

    Semantically similar articles hand-picked by GPT-4

    Senior Mindset Book

    Get promoted, earn a bigger salary, work for top companies

    Learn more

    Have a burning question that you think I can answer? Hit me up on twitter and I'll do my best.

    Who am I and who do I help? I'm Swizec Teller and I turn coders into engineers with "Raw and honest from the heart!" writing. No bullshit. Real insights into the career and skills of a modern software engineer.

    Want to become a true senior engineer? Take ownership, have autonomy, and be a force multiplier on your team. The Senior Engineer Mindset ebook can help 👉 swizec.com/senior-mindset. These are the shifts in mindset that unlocked my career.

    Curious about Serverless and the modern backend? Check out Serverless Handbook, for frontend engineers 👉 ServerlessHandbook.dev

    Want to Stop copy pasting D3 examples and create data visualizations of your own? Learn how to build scalable dataviz React components your whole team can understand with React for Data Visualization

    Want to get my best emails on JavaScript, React, Serverless, Fullstack Web, or Indie Hacking? Check out swizec.com/collections

    Did someone amazing share this letter with you? Wonderful! You can sign up for my weekly letters for software engineers on their path to greatness, here: swizec.com/blog

    Want to brush up on your modern JavaScript syntax? Check out my interactive cheatsheet: es6cheatsheet.com

    By the way, just in case no one has told you it yet today: I love and appreciate you for who you are ❤️

    Created by Swizec with ❤️