So you want to create a programming language? Awesome!
Should you do it? Definitely not. Better yet, go ahead, but don't take it lightly.
When I created my first programming (scripting? is there even a difference?) language I was about 17 - a lovely templating language that through a series of regexes was transformed into PHP code. Had everything from variables to functions and loops. Wonderful.
My next foray into language creation was about two years ago. Older and wiser, I knew I wanted to create "a lisp without parentheses". Cool huh?
Failed as soon as I realized I don't know how to parse "if this then if this then that else that"
Remember, no parentheses.
Building a real compiler
This semester I jumped at the chance to take a compilers class - we built a compiler for a stripped down version of Pascal. Practically from scratch.
Turns out that _"if this then if this then that else that" _cannot be parsed with a linear grammar - you need an "elif" construct or parentheses. Using a recursive grammar would be too slow.
Writing a compiler is fun! And by fun I mean it makes you feel like driving a metal rod through your brain. It's fun in that rewarding Holy crap, did I just survive that!? I survived _that_? Damn.
The complexity is immense. The difficulty of discovering there's a problem at all ... even immenser.
A compiler works in several stages:
- Lexical analysis - parses out comments and whitespace, unifies the language used (a list of lexemes, you use JFlex or something)
- Syntactical analysis - checks the syntax is correct and builds the Abstract Syntax Tree (using a linear grammar with a tool like java_cup)
- Semantic analysis - takes care of the semantics of the language (only call functions, supply correct parameters etc. - type checking)
- Frames - essentially memory management. Give functions some breathing space, pointers to their memory and so on.
- Intermediate code generation - this stage turns the AST into a tree of assembler-like instructions
- Code linearization - next step is to change that tree into a linear set of instructions, make sure registers are used well and so on. At this point you can run an interpreter.
- There are a few more stages before reaching machine code; luckily we stopped here.
The really fun part is that, given a random issue, any of those stages can be the problem. Even though separately they all look like they're working perfectly.
The debugging
The debugging ... oh god the debugging. This relatively simple compiler is beyond a doubt the toughest little bastard I have ever had the pleasure of fixing.
For starters, you don't even know if there is or isn't a bug. Your only chance at debugging (and finding the bugs in the first place) is to write code in the target language and hope they break something.
- Compile the compiler, see Java devours it and all is well
- Run the compiler, there are no runtime errors
- Write some code in the target language
- Compile+run with your compiler/interpreter
One of two things will happen. The code will run smoothly and output the correct result.
Or there will be a syntax error. Or a semantic error. Or the result will be simply wrong.
You now have to carefully look through the example code and decide that it is in fact correct, written properly and should work. Remember, you cannot test it anywhere else, because you are creating the compiler. In a class setting, your mates can help with their compilers (which are also be buggy), if you're creating a new language - you're on your own.
Once you've decided the target code is correct it's time to look through your compiler.
In the case of syntax/semantic errors the task is simple - look at the output of the appropriate stage and decide that after several months of everything working, hey your grammar is actually wrong. Or hey, your type checker is actually doing that one thing wrong. Or maybe your name checker is being silly ... whatever.
Easily fixed.
The really nasty buggers are those logical errors - the code didn't come up with the right result. There is no real symptom to look at. Your only hope of success is carefully inspecting the intermediate code and seeing if anything looks wrong.
Even once you've found the problem, there's still the issue of what's actually causing it.
For instance: I was chasing a bug for days. Arrays were overwriting their neighbours in a record ... turns out my sample code wasn't properly reserving memory and shouldn't be working anyway. That was fun.
Superheroes
And keep in mind that finding the bugs in the first place is *really hard. ***The professor gave my very buggy compiler a 100%. Simply because every program he ran worked.
That's why it can take decades to discover a bug in a compiler used by millions of people. And how many buggy compilers are out there when people just assume their code is the problem and change it?
Seriously, the people out there who make compilers and languages used by millions of people are superheroes. I can't imagine doing that and keep even a semblance of my fragile sanity.
Continue reading about Why people making compilers are superheroes
Semantically similar articles hand-picked by GPT-4
- When you code, write down everything
- Don't Worry; It's a Compiler Bug
- Dynamic languages have jumped the shark
- My language is better than yours
- I wish this existed
Learned something new?
Read more Software Engineering Lessons from Production
I write articles with real insight into the career and skills of a modern software engineer. "Raw and honest from the heart!" as one reader described them. Fueled by lessons learned over 20 years of building production code for side-projects, small businesses, and hyper growth startups. Both successful and not.
Subscribe below 👇
Software Engineering Lessons from Production
Join Swizec's Newsletter and get insightful emails 💌 on mindsets, tactics, and technical skills for your career. Real lessons from building production software. No bullshit.
"Man, love your simple writing! Yours is the only newsletter I open and only blog that I give a fuck to read & scroll till the end. And wow always take away lessons with me. Inspiring! And very relatable. 👌"
Have a burning question that you think I can answer? Hit me up on twitter and I'll do my best.
Who am I and who do I help? I'm Swizec Teller and I turn coders into engineers with "Raw and honest from the heart!" writing. No bullshit. Real insights into the career and skills of a modern software engineer.
Want to become a true senior engineer? Take ownership, have autonomy, and be a force multiplier on your team. The Senior Engineer Mindset ebook can help 👉 swizec.com/senior-mindset. These are the shifts in mindset that unlocked my career.
Curious about Serverless and the modern backend? Check out Serverless Handbook, for frontend engineers 👉 ServerlessHandbook.dev
Want to Stop copy pasting D3 examples and create data visualizations of your own? Learn how to build scalable dataviz React components your whole team can understand with React for Data Visualization
Want to get my best emails on JavaScript, React, Serverless, Fullstack Web, or Indie Hacking? Check out swizec.com/collections
Did someone amazing share this letter with you? Wonderful! You can sign up for my weekly letters for software engineers on their path to greatness, here: swizec.com/blog
Want to brush up on your modern JavaScript syntax? Check out my interactive cheatsheet: es6cheatsheet.com
By the way, just in case no one has told you it yet today: I love and appreciate you for who you are ❤️