Swizec Teller - a geek with a hatswizec.com

Senior Mindset Book

Get promoted, earn a bigger salary, work for top companies

Senior Engineer Mindset cover
Learn more

    Tech interview best practice that fails in life

    Tech interviews have a magic answer: The hashmap.

    "Great solution, how would you optimize it?"

    Hashmap!

    You get the job and start to work. Your vast interview prep (and half of college) prepared you to answer every question with a hashmap. Sometimes a graph.

    And in life that instinct is wrong.

    I was thinking about this while writing the performance chapter of Serverless Handbook. What belief about performance shoots engineers in the foot? ๐Ÿค”

    A note on big-O

    We're going to use Big-O to describe algorithm complexity. A popularly abused notation in engineering circles.

    Formally Big-O talks about function growth at a particular type of limit. Other useful notations include Big Theta and Big Omega about average and lower-bound performance respectively.

    Proper analysis can be confusing. We'll use the abused version:

    • O(1) means a single operation lookup (hashmap, certain set operations, reading a variable)
    • O(logn) means you divide remaining steps with each iteration (divide and conquer algorithms)
    • O(n) means you iterate the full input (iteration algorithms)
    • O(n^2) means you iterate the full input for every entry (exponent counts nested loops)

    We ignore constant factors โ€“ O(2n) == O(n) โ€“ and that's where interview best practices break down.

    An easy phone screen question

    At Tia we ask senior fullstack candidates this question in the phone screen:

    Given a newspaper article and a sentence, write a function that answers "Can you create the sentence out of words in the article?". Use any language. Words delimited by spaces.

    Long for a phone screen, but we pay well ๐Ÿ˜›

    You look at this problem and hashmap is the obvious answer. Count words in the article, count words in the sentence, compare.

    def countWords(list):
    	counts = {}
    	for word in list:
    		if counts[word]:
    			counts[word] += 1
    		else:
    			counts[word] = 1
    	return counts
    
    def canBuildSentence(article, sentence):
    	haystack = countWords(article.split(' '))
    	needle = countWords(sentence.split(' '))
    
    	for word in needle.keys():
    		haystack[word] -= needle[word]
    		if haystack[word] < 0:
    			return False
    
    	return True
    

    A roughly O(n) algorithm, fantastic.

    You iterate the article to build a hashmap โ€“ O(n). Each word lookup is O(1) because hashmap. Iterate the sentence, another O(n). Then you iterate the keys from your sentence and do O(1) lookups.

    Can you improve performance?

    This algorithm is hard on memory. We like to ignore memory constraints in 2021 because memory is cheap.

    But you could get ridiculously large inputs.

    The longest sentence in English literature contains 13,955 words. The New Yorker's Plague Year article is around 30,000 words long. Took the entire magazine. Wikipedia's longest article has 634kB of text โ€“ย about 33,000 words.

    <tweet redacted>

    Curveballs are fun!

    Constants matter

    You're not gonna run out of memory. All that text barely makes a dent next to your execution environment. But do you need to build two hashmaps?

    Serverless environments pay per millisecond. Can you shave off milliseconds?

    ๐Ÿค”

    To check the longest sentence against Joe Biden's endorsements, takes 33000 + 13955 + 13955 = 60,910 operations. Say each one takes 1/100th of a millisecond.

    Your algorithm runs in 609 milliseconds. That's slow, mate. User-noticeable latency.

    You can remove a hashmap:

    def canBuildSentence(article, sentence):
    	haystack = countWords(article.split(' '))
    
    	for word in sentence.split(' '):
    		haystack[word] -= 1
    		if haystack[word] < 0:
    			return False
    
    	return True
    

    Now you're down to 33000 + 13955 = 46,955 operations. 23% better performance, 23% lower cost, better conversions, more sales.

    Same O(n) complexity. ๐Ÿคฏ

    When hashmaps aren't the answer

    What if you know inputs are small?

    A sentence is never longer than 5 unique words and an article never shorter than 5,000. Do you change your approach?

    You don't have to. The same algorithm works โ€“ย 5005 operations.

    Or you can leverage English word frequency. The top 1000 words cover 85.5% of uses, 10,000 gets you to 97.2%.

    There's an 85.5% chance you'll find those 5 words in the first 1000 words of your article. Hmmmm ๐Ÿค”

    A search algorithm might work better:

    def wordInHaystack(word, haystack):
    	for needle in haystack:
    		if needle == word:
    			return True
    	return False
    
    def canBuildSentence(article, sentence):
    	haystack = article.split(' ')
    
    	for word in sentence.split(' '):
    		if !wordInHaystack(word, haystack):
    			return False
    
    	return True
    

    Iterate the article looking for a word. 85.5% of the time, you look at fewer than 1000 words. 15.5% of the time, you look at 5000.

    Your algorithm has an expected operation count of E(5\*(1000\*0.855 + 5000\*0.155)) = 8150. Which is worse than 5005.

    But it's important to check. It works for 2-word sentences and shorter ๐Ÿ˜‡

    Or when you get lucky.

    Cheers,
    ~Swizec

    PS: you can sort both inputs, consume arrays, and compare that way. It's O(nlogn) complexity but zero overhead and suitable for streaming/distributed solutions.

    PPS: a great discussion on StackOverflow comparing Quicksort and Mergesort โ€“ย both typical O(nlogn) algorithms โ€“ย and how they behave differently in real world applications. Fun to skim โœŒ๏ธ

    Published on January 22nd, 2021 in Technical, Computer Science

    Did you enjoy this article?

    Continue reading about Tech interview best practice that fails in life

    Semantically similar articles hand-picked by GPT-4

    Senior Mindset Book

    Get promoted, earn a bigger salary, work for top companies

    Learn more

    Have a burning question that you think I can answer? Hit me up on twitter and I'll do my best.

    Who am I and who do I help? I'm Swizec Teller and I turn coders into engineers with "Raw and honest from the heart!" writing. No bullshit. Real insights into the career and skills of a modern software engineer.

    Want to become a true senior engineer? Take ownership, have autonomy, and be a force multiplier on your team. The Senior Engineer Mindset ebook can help ๐Ÿ‘‰ swizec.com/senior-mindset. These are the shifts in mindset that unlocked my career.

    Curious about Serverless and the modern backend? Check out Serverless Handbook, for frontend engineers ๐Ÿ‘‰ ServerlessHandbook.dev

    Want to Stop copy pasting D3 examples and create data visualizations of your own? Learn how to build scalable dataviz React components your whole team can understand with React for Data Visualization

    Want to get my best emails on JavaScript, React, Serverless, Fullstack Web, or Indie Hacking? Check out swizec.com/collections

    Did someone amazing share this letter with you? Wonderful! You can sign up for my weekly letters for software engineers on their path to greatness, here: swizec.com/blog

    Want to brush up on your modern JavaScript syntax? Check out my interactive cheatsheet: es6cheatsheet.com

    By the way, just in case no one has told you it yet today: I love and appreciate you for who you are โค๏ธ

    Created by Swizec with โค๏ธ