Never underestimate the power of good marketing for bad products. That's my favorite line from What Goes Around Comes around ... And Around ..., a paper going into detail about why SQL and relational databases continue to be the best tool for transactional data even now.
The 2024 paper follows up from Stonebraker's influential 2005 paper that explored why SQL and relational databases won out over all the data management innovation that happened in the 90's. When all was said and done, businesses went back to the good stuff.
So I’m reading this SQL is Forever paper and the most surprising thing so far is how many of the famous NoSQL technologies have been actively removed from BigTech codebases by now because they don’t work
— Swizec Teller (@Swizec) July 16, 2024
What happened to 20 years of NoSQL?
Stonebraker's 2005 paper talks about 35 years of database research and how the current Document XML vs Relational debate is eerily similar to debates in the 1960's. You don't remember XML databases because they didn't survive.
Something similar happened to all the NoSQL web scale technologies of the early 2000's and 2010's. All the famous tech you've heard of has either:
- been removed from BigTech codebases (like MapReduce)
- gained transactional support (like Mongo)
- gained a flavor of SQL (like DynamoDB, Spanner, Mongo)
- primarily used for analytical workloads (like ClickHouse, DuckDB)
- been relegated to caching (like Redis)
As a fan of NoSQL technologies since the beginning (it's gotta be web scale bro!), I can't say I'm surprised. The more you use this stuff, the more you realize it has a few narrow use-cases and you should go with Postgres or MySQL, even Oracle (yuck) for core production data.
SQL offers flexibility
Just learn SQL. It’s not hard
— Swizec Teller (@Swizec) February 28, 2024
Every ORM I ever tried turned out to be a half-baked cumbersome implementation of half of SQL. There’s a reason we’ve been using SQL for 50+ years https://t.co/ywK3VoSGXK
Many engineers shy away from SQL because it's another thing to learn and there's an impedance mismatch between SQL and the rest of your code. Translating between flat data and objects in your language of choice sucks.
It's repetitive, cumbersome, and feels like a schlep. Json bureaucracy I like to call it.
So we build ORMs and various frameworks to sweep this dirty work under the rug. The more complex your domain becomes, the more you'll see engineers using the ORM's version of rawSql
and writing plain SQL queries. Because it's easier.
Same thing happened with NoSQL – it started as "Hey you never need to write SQL again! Your database natively understands your programming language 🤩".
And that is exactly the problem.
NoSQL databases are fantastic at reading data in exactly the ways engineers predicted in advance. Need something new? Coding time. Want the new path to also be fast? Tough! Optimizations are baked into your data structures.
Stonebraker says that "[NoSQL] systems are on a collision path with relational databases". In large part because the ability to ad-hoc query your production data and let the database engine optimize your execution, is super useful.
Transactions are good, actually
Similar to the SQL issue where these systems traded flexibility for familiarity, many web scale technologies had to trade ACID compliance for speed.
Atomicity, Consistency, Isolation, Durability.
Databases ensure those using transactions – a construct that ensures your in-flight queries save atomically (all or nothing), stay internally consistent (no stale data), isolated from other transactions, and have durability guarantees (once saved, it's saved).
Early web scale tech had to abandon these guarantees in favor of eventual consistency and other tricks. This has proven to be difficult to deal with and a common source of hard-to-fix bugs.
More and more of these NoSQL systems are adding transactional support.
Data is forever
While you and I were running around having fun playing with new tech, and the BigTech startups pushing this stuff had valid internal reasons to try to make it work, the majority of the world never got a chance to try.
Data is incredibly sticky. Once you have a working database you will never change it. Migrations to a new database are a myth. Ain't nobody got time for that.
Except Google and friends. They have thousands of engineers and billions of dollars to spend on projects like this. Even they eventually got off NoSQL because it wasn't working.
Cassandra, HBase, Spanner, etc now all have some flavor of SQL and transaction support. But even they had to tack that support onto the existing pile of NoSQL because data is sticky. Gotta keep the underlying tech whether you like it or not.
OLAP vs OLTP
OLAP – online analytical processing – is one area where database innovations of the past 20 years shine.
The columnar family of databases has found wide adoption in data analysis, data science, and data engineering. These databases tend to flatten your production OLTP – online transactional processing – data into heavily denormalized tables based on columns.
Columnar data orientation then lets you run fast analytical queries. Vectorizing/parallelizing your computation is easier when you zoom through a column of data that's all the same type. And you don't need to deal with joins when everything is already smooshed together.
But columnar databases have not proven out for production workloads. Too slow at handling rows.
Relational databases adopt the best features
Stonebraker notes a wonderful trend: The new tech innovates and the established relational databases adopt.
Every modern relational database supports JSON columns so you can use them as a document store. Plenty of use-cases where saving rich data in a single column comes handy.
SQL now lets you write deep JSON-based queries. Databases even support JSON indexes and other performance optimizations. Plus you get full ACID support. 👌
Similarly vector databases became all the rage with the rise of AI Engineering and guess what: relational databases started to support vectors. Again with all the benefits of relational data modeling, ACID compliance, SQL querying, etc.
Or as Stonebraker said: "It's a feature, not a database"
SQL and Relational's biggest problem
SQL and Relational databases suffer from the First Five Minutes Problem. You have to think about what you're doing, set up a database, create some tables, and then get to work.
Engineers hate thinking ahead. NoSQL lulls you into the false promise that you don't need to.
By the time you have lots of users and your data is falling apart, it's too late. You're stuck with NoSQL because data is sticky.
But if you can, always bet on SQL.
Cheers,
~Swizec
Continue reading about Why SQL is Forever
Semantically similar articles hand-picked by GPT-4
- Why SQL is Forever followup
- Notes on A Relational Model of Data for Large Shared Data Banks
- Logging 1,721,410 events per day with Postgres, Rails, Heroku, and a bit of JavaScript
- Why software only moves forward
- That time serverless melted my credit card
Learned something new?
Read more Software Engineering Lessons from Production
I write articles with real insight into the career and skills of a modern software engineer. "Raw and honest from the heart!" as one reader described them. Fueled by lessons learned over 20 years of building production code for side-projects, small businesses, and hyper growth startups. Both successful and not.
Subscribe below 👇
Software Engineering Lessons from Production
Join Swizec's Newsletter and get insightful emails 💌 on mindsets, tactics, and technical skills for your career. Real lessons from building production software. No bullshit.
"Man, love your simple writing! Yours is the only newsletter I open and only blog that I give a fuck to read & scroll till the end. And wow always take away lessons with me. Inspiring! And very relatable. 👌"
Have a burning question that you think I can answer? Hit me up on twitter and I'll do my best.
Who am I and who do I help? I'm Swizec Teller and I turn coders into engineers with "Raw and honest from the heart!" writing. No bullshit. Real insights into the career and skills of a modern software engineer.
Want to become a true senior engineer? Take ownership, have autonomy, and be a force multiplier on your team. The Senior Engineer Mindset ebook can help 👉 swizec.com/senior-mindset. These are the shifts in mindset that unlocked my career.
Curious about Serverless and the modern backend? Check out Serverless Handbook, for frontend engineers 👉 ServerlessHandbook.dev
Want to Stop copy pasting D3 examples and create data visualizations of your own? Learn how to build scalable dataviz React components your whole team can understand with React for Data Visualization
Want to get my best emails on JavaScript, React, Serverless, Fullstack Web, or Indie Hacking? Check out swizec.com/collections
Did someone amazing share this letter with you? Wonderful! You can sign up for my weekly letters for software engineers on their path to greatness, here: swizec.com/blog
Want to brush up on your modern JavaScript syntax? Check out my interactive cheatsheet: es6cheatsheet.com
By the way, just in case no one has told you it yet today: I love and appreciate you for who you are ❤️