Tech interview best practice that fails in life

Tech interviews have a magic answer: The hashmap.

"Great solution, how would you optimize it?"

Hashmap!

You get the job and start to work. Your vast interview prep (and half of college) prepared you to answer every question with a hashmap. Sometimes a graph.

And in life that instinct is wrong.

I was thinking about this while writing the performance chapter of Serverless Handbook. What belief about performance shoots engineers in the foot? 🤔

“Hashmap” is always the answer in an interview. But in real life it’s not. pic.twitter.com/zCK9lDKeS5
— Swizec Teller (@Swizec) January 18, 2021

A note on big-O

We're going to use Big-O to describe algorithm complexity. A popularly abused notation in engineering circles.

Formally Big-O talks about function growth at a particular type of limit. Other useful notations include Big Theta and Big Omega about average and lower-bound performance respectively.

Proper analysis can be confusing. We'll use the abused version:

O(1) means a single operation lookup (hashmap, certain set operations, reading a variable)
O(logn) means you divide remaining steps with each iteration (divide and conquer algorithms)
O(n) means you iterate the full input (iteration algorithms)
O(n^2) means you iterate the full input for every entry (exponent counts nested loops)

We ignore constant factors – O(2n) == O(n) – and that's where interview best practices break down.

You iterate the article to build a hashmap – O(n). Each word lookup is O(1) because hashmap. Iterate the sentence, another O(n). Then you iterate the keys from your sentence and do O(1) lookups.

Can you improve performance?

This algorithm is hard on memory. We like to ignore memory constraints in 2021 because memory is cheap.

But you could get ridiculously large inputs.

The longest sentence in English literature contains 13,955 words. The New Yorker's Plague Year article is around 30,000 words long. Took the entire magazine. Wikipedia's longest article has 634kB of text – about 33,000 words.

Curveballs are fun!

Constants matter

You're not gonna run out of memory. All that text barely makes a dent next to your execution environment. But do you need to build two hashmaps?

Serverless environments pay per millisecond. Can you shave off milliseconds?

🤔

To check the longest sentence against Joe Biden's endorsements, takes 33000 + 13955 + 13955 = 60,910 operations. Say each one takes 1/100th of a millisecond.

Your algorithm runs in 609 milliseconds. That's slow, mate. User-noticeable latency.

You can remove a hashmap:

def canBuildSentence(article, sentence):
	haystack = countWords(article.split(' '))

	for word in sentence.split(' '):
		haystack[word] -= 1
		if haystack[word] < 0:
			return False

	return True

Now you're down to 33000 + 13955 = 46,955 operations. 23% better performance, 23% lower cost, better conversions, more sales.

Same O(n) complexity. 🤯

When hashmaps aren't the answer

What if you know inputs are small?

A sentence is never longer than 5 unique words and an article never shorter than 5,000. Do you change your approach?

You don't have to. The same algorithm works – 5005 operations.

Or you can leverage English word frequency. The top 1000 words cover 85.5% of uses, 10,000 gets you to 97.2%.

There's an 85.5% chance you'll find those 5 words in the first 1000 words of your article. Hmmmm 🤔

A search algorithm might work better:

def wordInHaystack(word, haystack):
	for needle in haystack:
		if needle == word:
			return True
	return False

def canBuildSentence(article, sentence):
	haystack = article.split(' ')

	for word in sentence.split(' '):
		if !wordInHaystack(word, haystack):
			return False

	return True

Iterate the article looking for a word. 85.5% of the time, you look at fewer than 1000 words. 15.5% of the time, you look at 5000.

Your algorithm has an expected operation count of E(5\*(1000\*0.855 + 5000\*0.155)) = 8150. Which is worse than 5005.

But it's important to check. It works for 2-word sentences and shorter 😇

Or when you get lucky.

Cheers,
~Swizec

PS: you can sort both inputs, consume arrays, and compare that way. It's O(nlogn) complexity but zero overhead and suitable for streaming/distributed solutions.

PPS: a great discussion on StackOverflow comparing Quicksort and Mergesort – both typical O(nlogn) algorithms – and how they behave differently in real world applications. Fun to skim ✌️

Published on January 22nd, 2021 in Technical, Computer Science

Did you enjoy this article?

👎👍

Continue reading about Tech interview best practice that fails in life

Semantically similar articles hand-picked by GPT-4

Senior Mindset Book

Get promoted, earn a bigger salary, work for top companies

Learn more

Have a burning question that you think I can answer? Hit me up on twitter and I'll do my best.

Who am I and who do I help? I'm Swizec Teller and I turn coders into engineers with "Raw and honest from the heart!" writing. No bullshit. Real insights into the career and skills of a modern software engineer.

Want to become a true senior engineer? Take ownership, have autonomy, and be a force multiplier on your team. The Senior Engineer Mindset ebook can help 👉 swizec.com/senior-mindset. These are the shifts in mindset that unlocked my career.

Curious about Serverless and the modern backend? Check out Serverless Handbook, for frontend engineers 👉 ServerlessHandbook.dev

Want to Stop copy pasting D3 examples and create data visualizations of your own? Learn how to build scalable dataviz React components your whole team can understand with React for Data Visualization

Want to get my best emails on JavaScript, React, Serverless, Fullstack Web, or Indie Hacking? Check out swizec.com/collections

Did someone amazing share this letter with you? Wonderful! You can sign up for my weekly letters for software engineers on their path to greatness, here: swizec.com/blog

Want to brush up on your modern JavaScript syntax? Check out my interactive cheatsheet: es6cheatsheet.com

By the way, just in case no one has told you it yet today: I love and appreciate you for who you are ❤️

Senior Mindset Book

Start with a free chapter

Tech interview best practice that fails in life

A note on big-O

An easy phone screen question

Can you improve performance?

Constants matter

When hashmaps aren't the answer

Did you enjoy this article?

Continue reading about Tech interview best practice that fails in life

Learned something new?
Read more Software Engineering Lessons from Production

Software Engineering Lessons from Production

Senior Mindset Book