Tech interviews have a magic answer: The hashmap.
"Great solution, how would you optimize it?"
Hashmap!
You get the job and start to work. Your vast interview prep (and half of college) prepared you to answer every question with a hashmap. Sometimes a graph.
And in life that instinct is wrong.
I was thinking about this while writing the performance chapter of Serverless Handbook. What belief about performance shoots engineers in the foot? ๐ค
โHashmapโ is always the answer in an interview. But in real life itโs not. pic.twitter.com/zCK9lDKeS5
โ Swizec Teller (@Swizec) January 18, 2021
A note on big-O
We're going to use Big-O to describe algorithm complexity. A popularly abused notation in engineering circles.
Formally Big-O talks about function growth at a particular type of limit. Other useful notations include Big Theta and Big Omega about average and lower-bound performance respectively.
Proper analysis can be confusing. We'll use the abused version:
O(1)
means a single operation lookup (hashmap, certain set operations, reading a variable)O(logn)
means you divide remaining steps with each iteration (divide and conquer algorithms)O(n)
means you iterate the full input (iteration algorithms)O(n^2)
means you iterate the full input for every entry (exponent counts nested loops)
We ignore constant factors โ O(2n) == O(n)
โ and that's where interview best practices break down.
An easy phone screen question
At Tia we ask senior fullstack candidates this question in the phone screen:
Given a newspaper article and a sentence, write a function that answers "Can you create the sentence out of words in the article?". Use any language. Words delimited by spaces.
Long for a phone screen, but we pay well ๐
You look at this problem and hashmap is the obvious answer. Count words in the article, count words in the sentence, compare.
def countWords(list):
counts = {}
for word in list:
if counts[word]:
counts[word] += 1
else:
counts[word] = 1
return counts
def canBuildSentence(article, sentence):
haystack = countWords(article.split(' '))
needle = countWords(sentence.split(' '))
for word in needle.keys():
haystack[word] -= needle[word]
if haystack[word] < 0:
return False
return True
A roughly O(n)
algorithm, fantastic.
You iterate the article to build a hashmap โ O(n)
. Each word lookup is O(1)
because hashmap. Iterate the sentence, another O(n)
. Then you iterate the keys from your sentence and do O(1)
lookups.
Can you improve performance?
This algorithm is hard on memory. We like to ignore memory constraints in 2021 because memory is cheap.
But you could get ridiculously large inputs.
The longest sentence in English literature contains 13,955 words. The New Yorker's Plague Year article is around 30,000 words long. Took the entire magazine. Wikipedia's longest article has 634kB of text โย about 33,000 words.
<tweet redacted>
Curveballs are fun!
Constants matter
You're not gonna run out of memory. All that text barely makes a dent next to your execution environment. But do you need to build two hashmaps?
Serverless environments pay per millisecond. Can you shave off milliseconds?
๐ค
To check the longest sentence against Joe Biden's endorsements, takes 33000 + 13955 + 13955 = 60,910
operations. Say each one takes 1/100th of a millisecond.
Your algorithm runs in 609 milliseconds. That's slow, mate. User-noticeable latency.
You can remove a hashmap:
def canBuildSentence(article, sentence):
haystack = countWords(article.split(' '))
for word in sentence.split(' '):
haystack[word] -= 1
if haystack[word] < 0:
return False
return True
Now you're down to 33000 + 13955 = 46,955
operations. 23% better performance, 23% lower cost, better conversions, more sales.
Same O(n)
complexity. ๐คฏ
When hashmaps aren't the answer
What if you know inputs are small?
A sentence is never longer than 5 unique words and an article never shorter than 5,000. Do you change your approach?
You don't have to. The same algorithm works โย 5005 operations.
Or you can leverage English word frequency. The top 1000 words cover 85.5% of uses, 10,000 gets you to 97.2%.
There's an 85.5% chance you'll find those 5 words in the first 1000 words of your article. Hmmmm ๐ค
A search algorithm might work better:
def wordInHaystack(word, haystack):
for needle in haystack:
if needle == word:
return True
return False
def canBuildSentence(article, sentence):
haystack = article.split(' ')
for word in sentence.split(' '):
if !wordInHaystack(word, haystack):
return False
return True
Iterate the article looking for a word. 85.5% of the time, you look at fewer than 1000 words. 15.5% of the time, you look at 5000.
Your algorithm has an expected operation count of E(5\*(1000\*0.855 + 5000\*0.155)) = 8150
. Which is worse than 5005.
But it's important to check. It works for 2-word sentences and shorter ๐
Or when you get lucky.
Cheers,
~Swizec
PS: you can sort both inputs, consume arrays, and compare that way. It's O(nlogn)
complexity but zero overhead and suitable for streaming/distributed solutions.
PPS: a great discussion on StackOverflow comparing Quicksort and Mergesort โย both typical O(nlogn) algorithms โย and how they behave differently in real world applications. Fun to skim โ๏ธ
Continue reading about Tech interview best practice that fails in life
Semantically similar articles hand-picked by GPT-4
- Google sent me a "what to know in on-site interviews" email. Here it is.
- Some life advice I learned from computer science
- Why others' code is hard to navigate
- That time serverless melted my credit card
- Logging 1,721,410 events per day with Postgres, Rails, Heroku, and a bit of JavaScript
Learned something new?
Read more Software Engineering Lessons from Production
I write articles with real insight into the career and skills of a modern software engineer. "Raw and honest from the heart!" as one reader described them. Fueled by lessons learned over 20 years of building production code for side-projects, small businesses, and hyper growth startups. Both successful and not.
Subscribe below ๐
Software Engineering Lessons from Production
Join Swizec's Newsletter and get insightful emails ๐ on mindsets, tactics, and technical skills for your career. Real lessons from building production software. No bullshit.
"Man, love your simple writing! Yours is the only newsletter I open and only blog that I give a fuck to read & scroll till the end. And wow always take away lessons with me. Inspiring! And very relatable. ๐"
Have a burning question that you think I can answer? Hit me up on twitter and I'll do my best.
Who am I and who do I help? I'm Swizec Teller and I turn coders into engineers with "Raw and honest from the heart!" writing. No bullshit. Real insights into the career and skills of a modern software engineer.
Want to become a true senior engineer? Take ownership, have autonomy, and be a force multiplier on your team. The Senior Engineer Mindset ebook can help ๐ swizec.com/senior-mindset. These are the shifts in mindset that unlocked my career.
Curious about Serverless and the modern backend? Check out Serverless Handbook, for frontend engineers ๐ ServerlessHandbook.dev
Want to Stop copy pasting D3 examples and create data visualizations of your own? Learn how to build scalable dataviz React components your whole team can understand with React for Data Visualization
Want to get my best emails on JavaScript, React, Serverless, Fullstack Web, or Indie Hacking? Check out swizec.com/collections
Did someone amazing share this letter with you? Wonderful! You can sign up for my weekly letters for software engineers on their path to greatness, here: swizec.com/blog
Want to brush up on your modern JavaScript syntax? Check out my interactive cheatsheet: es6cheatsheet.com
By the way, just in case no one has told you it yet today: I love and appreciate you for who you are โค๏ธ