Swizec Teller - a geek with a hatswizec.com

Senior Mindset Book

Get promoted, earn a bigger salary, work for top companies

Senior Engineer Mindset cover
Learn more

    How I answer the door with AWS Lambda and Twilio

    Remember the doorbell slack bot I built in 2016? Probably not, but the girl whose sanity it saved became one of my best friends. For 2 years she could focus on work instead of answering the phone every time Yup Inc got a visitor.

    Then we moved and the bot died.

    2 weeks ago I moved and the new apartment has a front gate buzzer, but no package concierge. For the first time in 5 years we're gonna have to answer the door! 😱

    Millennials are killing the doorbell industry by texting "here" but that don't work for deliveries. Especially if you're not home.

    The buzzer looks like this:


    Delivery person finds your name, taps call and the box calls a pre-programmed phone number. You pick up the phone, talk to the person, and press 9 to let them in. They drop off the package behind a locked gate and people don't steal it. 👌

    Now here's the thing: My phone is where phone calls go to die. And I don't want my girlfriend to be on the hook every time I order something from Giant Dildos Dot Com.

    The box accepts 1 phone number.

    So I sat down for 3 hours and banged out a serverless app that answers the door, transcribes the audio, sends us a text, waits for reply, and opens the door if you say YES. 🤙

    Click through for source
    Click through for source

    Final integration test around the 3:00:35 mark

    You can read the code on GitHub

    Code contains my old Twilio Auth Token because lazy. Someone racked up $300 in fraudulent calls within minutes. That was dumb.

    Click through for source
    Click through for source

    How you can answer the door with AWS Lambda and Twilio


    👆 sketch of how it works. It's harder to draw than I thought. Here's a description of the process:

    1. Delivery person makes call
    2. Twilio picks up the phone
    3. Twilio sends request to AWS Lambda "What should I say?"
    4. Lamda responds with instructions 4.1. Say "Welcome, what do you want? State your business after the beep and press any key" 4.2. Record response 4.3. Wait for 60 seconds
    5. Twilio talks to callbox
    6. Person says what they want and presses a key
    7. Twilio sends recording to AWS Lambda "now what?"
    8. Lambda responds with further instructions 8.1. Say "Thanks, someone will let you in 8.2. Pause for 60 seconds 8.3. If still in call, say "Sorry, nobody responded" 8.4. Hang up
    9. Twilio sends all that to callbox and pauses the call
    10. In parallel, twilio transcribes the recording
    11. Twilio sends transcript to AWS Lambda "here transcript, now what?"
    12. Lambda saves Call ID and callbox number in DB for later
    13. Lambda tells Twilio to text Swiz
    14. Twilio sends text
    15. Swiz sees text and replies with YES
    16. Twilio gets text response and sends to AWS Lambda "here, response. Now what?"
    17. Lambda looks up original Call ID 17.1. If no ongoing call, bail 17.2. If more than 60 seconds since call, bail (it hung up)
    18. Lambda checks if my text matches yes
    19. Send voice call response to Twilio 19.1. Say "Letting you in" or "Sorry, you can't come in" 19.2. Dial 9
    20. Twilio sends all that to callbox
    21. Door unlocks, delivery gets delivered
    22. Lambda sends text to Swizec saying "All good, person was let in"

    Sounds complicated, right? Thanks to AWS Lambda and Serverless it's pretty easy 👉 Each step becomes a standalone JavaScript function. The sophistication comes from how they work together.

    Like I mention in the Serverless Pros & Cons chapter of Serverless Handbook:

    Serverless lets you trade function complexity for systems complexity. Individual pieces are easier to build & test, but the system becomes hairier.

    You can see this in action during the livestream. Every few minutes we integration test the next piece of the puzzle. 🤘

    whoa giphy

    Step 1: Picking up the phone

    This is the first Lambda in our system. It answers the phone when Twilio converts it to an API POST request.

    Click through for source
    Click through for source

    Twilio sends a POST request with various params, which we ignore since our response is always the same: A TwiML message constructed via Twilio's node library.

    TwiML is Twilio's markup language based on XML used to respond to voice calls and handle text messages.

    response.say() turns into a <Say>Hello</Say> line and becomes a spoken computer voice. response.record() allows us to record the person's reply.

    In this case we're giving a 60 second timeout, asking Twilio to transcribe, and telling it to send the recording to an acceptRecording endpoint. Twilio is smart enough to handle relative URLs so we don't have to worry about that.

    We use Twilio's dashboard to map a phone number to an API endpoint.


    Step 2: Accept voice recording, ask to wait

    After the person says what they want, they press a button. This tells Twilio to stop recording and talk to our next lambda: acceptRecording.

    Click through for source
    Click through for source

    Same spiel as before 👉 we get a POST request and respond with some TwiML constructed with Twilio's node library. Let the person know someone's about to answer the door, wait 60 seconds, and if nothing happens deliver the bad news.

    Btw, the sendTwiml function is a helper to avoid code duplication:

    Click through for source
    Click through for source

    Status Code 200 means request succeeded, content type application/xml so Twilio API doesn't get confused, and twiml converted to a string as the body.

    Step 3: Accept transcript, send text

    Twilio's transcript API doesn't let you send TwiML into a phone call. That's why this is separate from the recording lambda, which just replies.

    Click through for source
    Click through for source

    This time we do care about params Twilio sends with their request:

    • RecordingUrl is the audio file I can listen to
    • TranscriptionText is the machine transcription of the audio, usually good, sometimes hilariously wrong
    • CallSid is the original call ID, we'll need it to hook back into the call
    • Called is the phone number that was called, which helps us identify the callbox (future proofing, if I productize)

    We use updateItem to save the (CallSid, Called) pair in DynamoDB. Our next lambda will use this to hook into the original call and to keep track of whether the call was handled yet. Great when multiple people reply YES to the same call.


    Step 4: Handle SMS reply

    Handling the YES reply to that SMS gets tricky. A bunch of situations to consider: What if there's no call? What if they're late? What if someone else said YES already?

    So we build the main handler method with a big conditional and call helper methods.

    Click through for source
    Click through for source

    Takes the sms Body and To phone number from Twilio's POST request and checks the DynamoDB database.

    If there's a call and it hasn't been handled and it's not too late, we call closeDoor or openDoor based on the text Body. Otherwise we reply with a text saying there's no call, it's too late, or all is well.

    The openDoor and closeDoor functions are similar. They both call continueCall to talk to the callbox and send a text to let me know the deed is done.

    Click through for source
    Click through for source

    Oh and they update the database to say call handled. Same updateItem method as before :)

    Hooking into the waiting call to open the door looks like this:

    Click through for source
    Click through for source

    Here's where storing that callSid becomes useful. We can update an ongoing call without being part of the original API loop 💪

    We send Twilio some TwiML to let the person know they're being buzzed in and dial number 9 with some waits.

    That is all ✌️

    And that completes the crazy flowchart from before. A sequence of small steps you can understand and test on their own. When combined they make magic.




    PS: wanna learn more about using serverless? I'm making Serverless Handbook the best way to get started

    Published on December 8th, 2019 in Back End, Technical

    Did you enjoy this article?

    Continue reading about How I answer the door with AWS Lambda and Twilio

    Semantically similar articles hand-picked by GPT-4

    Senior Mindset Book

    Get promoted, earn a bigger salary, work for top companies

    Learn more

    Have a burning question that you think I can answer? Hit me up on twitter and I'll do my best.

    Who am I and who do I help? I'm Swizec Teller and I turn coders into engineers with "Raw and honest from the heart!" writing. No bullshit. Real insights into the career and skills of a modern software engineer.

    Want to become a true senior engineer? Take ownership, have autonomy, and be a force multiplier on your team. The Senior Engineer Mindset ebook can help 👉 swizec.com/senior-mindset. These are the shifts in mindset that unlocked my career.

    Curious about Serverless and the modern backend? Check out Serverless Handbook, for frontend engineers 👉 ServerlessHandbook.dev

    Want to Stop copy pasting D3 examples and create data visualizations of your own? Learn how to build scalable dataviz React components your whole team can understand with React for Data Visualization

    Want to get my best emails on JavaScript, React, Serverless, Fullstack Web, or Indie Hacking? Check out swizec.com/collections

    Did someone amazing share this letter with you? Wonderful! You can sign up for my weekly letters for software engineers on their path to greatness, here: swizec.com/blog

    Want to brush up on your modern JavaScript syntax? Check out my interactive cheatsheet: es6cheatsheet.com

    By the way, just in case no one has told you it yet today: I love and appreciate you for who you are ❤️

    Created by Swizec with ❤️