A smart man once said “Lies, damn lies, and statistics” and oh boy do I have a story for you today.
You know how I mentioned yesterday that total counts aren’t a great stat to learn from? It’s true, they suffer from a bunch of issues.
Compare them and fall prey to underlying statistics.
You see that 1219 black people got shot by police since 2015, compare that to the 2320 white people and hey that looks pretty good. Those poor white people!
Ah but only 12% of the population is black. You’re 3x more likely to get shot by police if you’re black. 😬
Cathy O’Neil has a great chapter on this in her book Weapons of Math Destruction
She argues that self-reinforcing statistics are the problem. Police departments don’t have enough resources, so they use statistical modeling to deploy … troops? Is it troops?
You take a city like New York and overlay a map of crime statistics. Where there’s more crime, you deploy more officers.
Then those officers observe. They report the crime they see. And that reduces the crime rate, right?
More officers, more crimes spotted, more officers deployed, more crimes …
Imagine if a police officer wrote you up for every minor infraction. Just because they have nothing better to do at the moment. Crossed the street on a red? Crime. Littered? Crime. Took a swig of your alcohol beverage? Crime. Loitered a bit? Crime.
Unfortunately in USA this over-policing comes down to racial boundaries. Blacks are more likely to have a police encounter. Just because there’s more police in their neighborhoods. Just because the crime stats send them there.
Self-reinforcing statistic. Lying with facts.
Imagine if they did that with Wall Street bankers 😉
Dataviz helps you lie with facts
Self-reinforcing statistics are great, but difficult to explain. It took me years to grok.
Here’s a simpler way 👉 dataviz.
Take total counts again. Terrible measure for comparison but put them on a map and it’s fantastic. Use the power of human processing and intuition to understand what’s going on.
Most maps just end up showing population density. Where there are more people there is more of everything.
Okay those have a bias for US Air Force bases too.
There’s other ways to lie with dataviz facts
Axis manipulation is another great way to lie with facts and dataviz. Want to make a change look super big? Just don’t start at zero.
Look at this impressive email subscriber growth! WOW this software that costs $149/mo is really crushing it for me 😍
Oh wait, it’s just 379 net new subscribers in 3 months. 3% growth. Start the Y axis at 0 and you won’t even notice the growth.
That wouldn’t be very useful to visualize so let’s give ConvertKit a pass. Maybe my list just needs to grow more 😛
Here’s another great example.
From Georgia bragging about their success against the coronavirus. How they’d pretty much defeated it. Go Georgia!
Notice something fishy?
Here’s that same graph with X ordered by time
When you look at a graph that implies time, you think it’s ordered by time. You see trends. Humans are great at trends.
And data visualization manipulators can use that tendency to lie with facts.
What we didn’t do anything wrong? The graph is clearly labeled. You’re the dumb doodoo who assumed X was ordered by time like we’re honest normal people.
Lying with factual bias
There’s a fantastic story from WW2.
Allied bombers kept coming home riddled with bullet holes from German air defense. They’d go over there, drop some bombs, turn into a sieve, and make it home.
Yeah, it was real bad.
So they did some statistics. Where do airplanes get the most holes?
Wonderful! Slap some armor on those spots and away we go. Problem solved.
Those are the airplanes that came home. The ones that didn’t come home had holes in the white areas. The holes we never got to measure because the planes died.
That‘s where you put the armor.
This is a type of survivorship bias. You’re using facts to tell a story, but not all the facts.
Another way to lie with facts ✌️
The Texas sharpshooter fallacy is another version of this.
Give Joe Sixpack a gun and ask him to shoot at the side of a barn. 200 shots for good measure.
Then you inspect the barn and find the 10 bullet holes with the tightest clustering.
Congratulations, Joe Sixpack is a sharpshooter! In just 10 shots he got clustering so tight an Olympic shooter would be jealous.
Factual correlation lies
And finally, my favorite one – the lack of pirates caused global warming. True fact.
There’s even a graph about it. Perfect correlation and everything. Factual data.
You can see more of those over at Spurious Correlations.
Seen any good examples of lying with facts? Hit reply
Learned something new? Want to improve your skills?
Join over 10,000 engineers just like you already improving their skills!
Here's how it works 👇
PS: You should also follow me on twitter 👉 here.
It's where I go to shoot the shit about programming.