A smart man once said "Lies, damn lies, and statistics" and oh boy do I have a story for you today.
You know how I mentioned yesterday that total counts aren't a great stat to learn from? It's true, they suffer from a bunch of issues.
Compare them and fall prey to underlying statistics.
You see that 1219 black people got shot by police since 2015, compare that to the 2320 white people and hey that looks pretty good. Those poor white people!
Ah but only 12% of the population is black. You're 3x more likely to get shot by police if you're black. 😬
Cathy O'Neil has a great chapter on this in her book Weapons of Math Destruction
She argues that self-reinforcing statistics are the problem. Police departments don't have enough resources, so they use statistical modeling to deploy ... troops? Is it troops?
You take a city like New York and overlay a map of crime statistics. Where there's more crime, you deploy more officers.
Then those officers observe. They report the crime they see. And that reduces the crime rate, right?
More officers, more crimes spotted, more officers deployed, more crimes ...
Imagine if a police officer wrote you up for every minor infraction. Just because they have nothing better to do at the moment. Crossed the street on a red? Crime. Littered? Crime. Took a swig of your alcohol beverage? Crime. Loitered a bit? Crime.
Unfortunately in USA this over-policing comes down to racial boundaries. Blacks are more likely to have a police encounter. Just because there's more police in their neighborhoods. Just because the crime stats send them there.
Self-reinforcing statistic. Lying with facts.
Imagine if they did that with Wall Street bankers 😉
Self-reinforcing statistics are great, but difficult to explain. It took me years to grok.
Here's a simpler way 👉 dataviz.
Take total counts again. Terrible measure for comparison but put them on a map and it's fantastic. Use the power of human processing and intuition to understand what's going on.
Most maps just end up showing population density. Where there are more people there is more of everything.
Okay those have a bias for US Air Force bases too.
Axis manipulation is another great way to lie with facts and dataviz. Want to make a change look super big? Just don't start at zero.
Look at this impressive email subscriber growth! WOW this software that costs $149/mo is really crushing it for me 😍
Oh wait, it's just 379 net new subscribers in 3 months. 3% growth. Start the Y axis at 0 and you won't even notice the growth.
That wouldn't be very useful to visualize so let's give ConvertKit a pass. Maybe my list just needs to grow more 😛
Here's another great example.
From Georgia bragging about their success against the coronavirus. How they'd pretty much defeated it. Go Georgia!
Notice something fishy?
Here's that same graph with X ordered by time
When you look at a graph that implies time, you think it's ordered by time. You see trends. Humans are great at trends.
And data visualization manipulators can use that tendency to lie with facts.
What we didn't do anything wrong? The graph is clearly labeled. You're the dumb doodoo who assumed X was ordered by time like we're honest normal people.
There's a fantastic story from WW2.
Allied bombers kept coming home riddled with bullet holes from German air defense. They'd go over there, drop some bombs, turn into a sieve, and make it home.
Yeah, it was real bad.
So they did some statistics. Where do airplanes get the most holes?
Wonderful! Slap some armor on those spots and away we go. Problem solved.
Those are the airplanes that came home. The ones that didn't come home had holes in the white areas. The holes we never got to measure because the planes died.
That's where you put the armor.
This is a type of survivorship bias. You're using facts to tell a story, but not all the facts.
Another way to lie with facts ✌️
The Texas sharpshooter fallacy is another version of this.
Give Joe Sixpack a gun and ask him to shoot at the side of a barn. 200 shots for good measure.
Then you inspect the barn and find the 10 bullet holes with the tightest clustering.
Congratulations, Joe Sixpack is a sharpshooter! In just 10 shots he got clustering so tight an Olympic shooter would be jealous.
And finally, my favorite one – the lack of pirates caused global warming. True fact.
There's even a graph about it. Perfect correlation and everything. Factual data.
You can see more of those over at Spurious Correlations.
Seen any good examples of lying with facts? Hit reply
Here's how it works 👇
And get thoughtful letters 💌 on mindsets, tactics, and technical skills for your career. Real lessons from building production software. No bullshit.
"Man, love your simple writing! Yours is the only newsletter I open and only blog that I give a fuck to read & scroll till the end. And wow always take away lessons with me. Inspiring! And very relatable. 👌"
Ready to Stop copy pasting D3 examples and create data visualizations of your own? Learn how to build scalable dataviz components your whole team can understand with React for Data Visualization
Curious about Serverless and the modern backend? Check out Serverless Handbook, modern backend for the frontend engineer.
Ready to learn how it all fits together and build a modern webapp from scratch? Learn how to launch a webapp and make your first 💰 on the side with ServerlessReact.Dev
By the way, just in case no one has told you it yet today: I love and appreciate you for who you are ❤️