Swizec Teller - a geek with a hatswizec.com

    Tech interview best practice that fails in life

    Tech interviews have a magic answer: The hashmap.

    "Great solution, how would you optimize it?"

    Hashmap!

    You get the job and start to work. Your vast interview prep (and half of college) prepared you to answer every question with a hashmap. Sometimes a graph.

    And in life that instinct is wrong.

    I was thinking about this while writing the performance chapter of Serverless Handbook. What belief about performance shoots engineers in the foot? 🤔

    Swizec Teller published ServerlessHandbook.dev avatarSwizec Teller published ServerlessHandbook.dev@Swizec
    “Hashmap” is always the answer in an interview. But in real life it’s not.
    Tweet media

    A note on big-O

    We're going to use Big-O to describe algorithm complexity. A popularly abused notation in engineering circles.

    Formally Big-O talks about function growth at a particular type of limit. Other useful notations include Big Theta and Big Omega about average and lower-bound performance respectively.

    Proper analysis can be confusing. We'll use the abused version:

    • O(1) means a single operation lookup (hashmap, certain set operations, reading a variable)
    • O(logn) means you divide remaining steps with each iteration (divide and conquer algorithms)
    • O(n) means you iterate the full input (iteration algorithms)
    • O(n^2) means you iterate the full input for every entry (exponent counts nested loops)

    We ignore constant factors – O(2n) == O(n) – and that's where interview best practices break down.

    An easy phone screen question

    At Tia we ask senior fullstack candidates this question in the phone screen:

    Given a newspaper article and a sentence, write a function that answers "Can you create the sentence out of words in the article?". Use any language. Words delimited by spaces.

    Long for a phone screen, but we pay well 😛

    You look at this problem and hashmap is the obvious answer. Count words in the article, count words in the sentence, compare.

    def countWords(list):
    counts = {}
    for word in list:
    if counts[word]:
    counts[word] += 1
    else:
    counts[word] = 1
    return counts
    def canBuildSentence(article, sentence):
    haystack = countWords(article.split(' '))
    needle = countWords(sentence.split(' '))
    for word in needle.keys():
    haystack[word] -= needle[word]
    if haystack[word] < 0:
    return False
    return True

    A roughly O(n) algorithm, fantastic.

    You iterate the article to build a hashmap – O(n). Each word lookup is O(1) because hashmap. Iterate the sentence, another O(n). Then you iterate the keys from your sentence and do O(1) lookups.

    Can you improve performance?

    This algorithm is hard on memory. We like to ignore memory constraints in 2021 because memory is cheap.

    But you could get ridiculously large inputs.

    The longest sentence in English literature contains 13,955 words. The New Yorker's Plague Year article is around 30,000 words long. Took the entire magazine. Wikipedia's longest article has 634kB of text – about 33,000 words.

    Kiran 🗳 avatarKiran 🗳@MichiganKiran
    The longest Wikipedia article is Joe Biden’s 2020 endorsements
    Tweet media

    Curveballs are fun!

    Constants matter

    You're not gonna run out of memory. All that text barely makes a dent next to your execution environment. But do you need to build two hashmaps?

    Serverless environments pay per millisecond. Can you shave off milliseconds?

    🤔

    To check the longest sentence against Joe Biden's endorsements, takes 33000 + 13955 + 13955 = 60,910 operations. Say each one takes 1/100th of a millisecond.

    Your algorithm runs in 609 milliseconds. That's slow, mate. User-noticeable latency.

    You can remove a hashmap:

    def canBuildSentence(article, sentence):
    haystack = countWords(article.split(' '))
    for word in sentence.split(' '):
    haystack[word] -= 1
    if haystack[word] < 0:
    return False
    return True

    Now you're down to 33000 + 13955 = 46,955 operations. 23% better performance, 23% lower cost, better conversions, more sales.

    Same O(n) complexity. 🤯

    When hashmaps aren't the answer

    What if you know inputs are small?

    A sentence is never longer than 5 unique words and an article never shorter than 5,000. Do you change your approach?

    You don't have to. The same algorithm works – 5005 operations.

    Or you can leverage English word frequency. The top 1000 words cover 85.5% of uses, 10,000 gets you to 97.2%.

    There's an 85.5% chance you'll find those 5 words in the first 1000 words of your article. Hmmmm 🤔

    A search algorithm might work better:

    def wordInHaystack(word, haystack):
    for needle in haystack:
    if needle == word:
    return True
    return False
    def canBuildSentence(article, sentence):
    haystack = article.split(' ')
    for word in sentence.split(' '):
    if !wordInHaystack(word, haystack):
    return False
    return True

    Iterate the article looking for a word. 85.5% of the time, you look at fewer than 1000 words. 15.5% of the time, you look at 5000.

    Your algorithm has an expected operation count of E(5\*(1000\*0.855 + 5000\*0.155)) = 8150. Which is worse than 5005.

    But it's important to check. It works for 2-word sentences and shorter 😇

    Or when you get lucky.

    Cheers,
    ~Swizec

    PS: you can sort both inputs, consume arrays, and compare that way. It's O(nlogn) complexity but zero overhead and suitable for streaming/distributed solutions.

    PPS: a great discussion on StackOverflow comparing Quicksort and Mergesort – both typical O(nlogn) algorithms – and how they behave differently in real world applications. Fun to skim ✌️

    Did you enjoy this article?

    Published on January 22nd, 2021 in Technical, Computer Science

    Learned something new?
    Want to become an expert?

    Here's how it works 👇

    Leave your email and I'll send you thoughtfully written emails every week about React, JavaScript, and your career. Lessons learned over 20 years in the industry working with companies ranging from tiny startups to Fortune5 behemoths.

    Join Swizec's Newsletter

    And get thoughtful letters 💌 on mindsets, tactics, and technical skills for your career. Real lessons from building production software. No bullshit.

    "Man, love your simple writing! Yours is the only newsletter I open and only blog that I give a fuck to read & scroll till the end. And wow always take away lessons with me. Inspiring! And very relatable. 👌"

    ~ Ashish Kumar

    Join over 10,000 engineers just like you already improving their careers with my letters, workshops, courses, and talks. ✌️

    Have a burning question that you think I can answer? I don't have all of the answers, but I have some! Hit me up on twitter or book a 30min ama for in-depth help.

    Ready to Stop copy pasting D3 examples and create data visualizations of your own?  Learn how to build scalable dataviz components your whole team can understand with React for Data Visualization

    Curious about Serverless and the modern backend? Check out Serverless Handbook, modern backend for the frontend engineer.

    Ready to learn how it all fits together and build a modern webapp from scratch? Learn how to launch a webapp and make your first 💰 on the side with ServerlessReact.Dev

    Want to brush up on your modern JavaScript syntax? Check out my interactive cheatsheet: es6cheatsheet.com

    By the way, just in case no one has told you it yet today: I love and appreciate you for who you are ❤️

    Created bySwizecwith ❤️