Skip to content
Swizec Teller - a geek with a hatswizec.com

How I answer the door with AWS Lambda and Twilio

Remember the doorbell slack bot I built in 2016? Probably not, but the girl whose sanity it saved became one of my best friends. For 2 years she could focus on work instead of answering the phone every time Yup Inc got a visitor.

Then we moved and the bot died.

2 weeks ago I moved and the new apartment has a front gate buzzer, but no package concierge. For the first time in 5 years we're gonna have to answer the door! 😱

Millennials are killing the doorbell industry by texting "here" but that don't work for deliveries. Especially if you're not home.

The buzzer looks like this:

ST8uZfB

Delivery person finds your name, taps call and the box calls a pre-programmed phone number. You pick up the phone, talk to the person, and press 9 to let them in. They drop off the package behind a locked gate and people don't steal it. πŸ‘Œ

Now here's the thing: My phone is where phone calls go to die. And I don't want my girlfriend to be on the hook every time I order something from Giant Dildos Dot Com.

The box accepts 1 phone number.

So I sat down for 3 hours and banged out a serverless app that answers the door, transcribes the audio, sends us a text, waits for reply, and opens the door if you say YES. πŸ€™

Click through for source
Click through for source

Final integration test around the 3:00:35 mark

You can read the code on GitHub

Code contains my old Twilio Auth Token because lazy. Someone racked up \\\$300 in fraudulent calls within minutes. That was dumb.

Click through for source
Click through for source

How you can answer the door with AWS Lambda and Twilio

9SOoNwI

πŸ‘† sketch of how it works. It's harder to draw than I thought. Here's a description of the process:

  1. Delivery person makes call
  2. Twilio picks up the phone
  3. Twilio sends request to AWS Lambda "What should I say?"
  4. Lamda responds with instructions 4.1. Say "Welcome, what do you want? State your business after the beep and press any key" 4.2. Record response 4.3. Wait for 60 seconds
  5. Twilio talks to callbox
  6. Person says what they want and presses a key
  7. Twilio sends recording to AWS Lambda "now what?"
  8. Lambda responds with further instructions 8.1. Say "Thanks, someone will let you in 8.2. Pause for 60 seconds 8.3. If still in call, say "Sorry, nobody responded" 8.4. Hang up
  9. Twilio sends all that to callbox and pauses the call
  10. In parallel, twilio transcribes the recording
  11. Twilio sends transcript to AWS Lambda "here transcript, now what?"
  12. Lambda saves Call ID and callbox number in DB for later
  13. Lambda tells Twilio to text Swiz
  14. Twilio sends text
  15. Swiz sees text and replies with YES
  16. Twilio gets text response and sends to AWS Lambda "here, response. Now what?"
  17. Lambda looks up original Call ID 17.1. If no ongoing call, bail 17.2. If more than 60 seconds since call, bail (it hung up)
  18. Lambda checks if my text matches yes
  19. Send voice call response to Twilio 19.1. Say "Letting you in" or "Sorry, you can't come in" 19.2. Dial 9
  20. Twilio sends all that to callbox
  21. Door unlocks, delivery gets delivered
  22. Lambda sends text to Swizec saying "All good, person was let in"

Sounds complicated, right? Thanks to AWS Lambda and Serverless it's pretty easy πŸ‘‰ Each step becomes a standalone JavaScript function. The sophistication comes from how they work together.

Like I mention in the Serverless Pros & Cons chapter of Serverless Handbook:

Serverless lets you trade function complexity for systems complexity. Individual pieces are easier to build & test, but the system becomes hairier.

You can see this in action during the livestream. Every few minutes we integration test the next piece of the puzzle. 🀘

whoa giphy

Step 1: Picking up the phone

This is the first Lambda in our system. It answers the phone when Twilio converts it to an API POST request.

Click through for source
Click through for source

Twilio sends a POST request with various params, which we ignore since our response is always the same: A TwiML message constructed via Twilio's node library.

TwiML is Twilio's markup language based on XML used to respond to voice calls and handle text messages.

response.say() turns into a <Say>Hello</Say> line and becomes a spoken computer voice. response.record() allows us to record the person's reply.

In this case we're giving a 60 second timeout, asking Twilio to transcribe, and telling it to send the recording to an acceptRecording endpoint. Twilio is smart enough to handle relative URLs so we don't have to worry about that.

We use Twilio's dashboard to map a phone number to an API endpoint.

9ERHfRR

Step 2: Accept voice recording, ask to wait

After the person says what they want, they press a button. This tells Twilio to stop recording and talk to our next lambda: acceptRecording.

Click through for source
Click through for source

Same spiel as before πŸ‘‰ we get a POST request and respond with some TwiML constructed with Twilio's node library. Let the person know someone's about to answer the door, wait 60 seconds, and if nothing happens deliver the bad news.

Btw, the sendTwiml function is a helper to avoid code duplication:

Click through for source
Click through for source

Status Code 200 means request succeeded, content type application/xml so Twilio API doesn't get confused, and twiml converted to a string as the body.

Step 3: Accept transcript, send text

Twilio's transcript API doesn't let you send TwiML into a phone call. That's why this is separate from the recording lambda, which just replies.

Click through for source
Click through for source

This time we do care about params Twilio sends with their request:

  • RecordingUrl is the audio file I can listen to
  • TranscriptionText is the machine transcription of the audio, usually good, sometimes hilariously wrong
  • CallSid is the original call ID, we'll need it to hook back into the call
  • Called is the phone number that was called, which helps us identify the callbox (future proofing, if I productize)

We use updateItem to save the (CallSid, Called) pair in DynamoDB. Our next lambda will use this to hook into the original call and to keep track of whether the call was handled yet. Great when multiple people reply YES to the same call.

RZDckNq

Step 4: Handle SMS reply

Handling the YES reply to that SMS gets tricky. A bunch of situations to consider: What if there's no call? What if they're late? What if someone else said YES already?

So we build the main handler method with a big conditional and call helper methods.

Click through for source
Click through for source

Takes the sms Body and To phone number from Twilio's POST request and checks the DynamoDB database.

If there's a call and it hasn't been handled and it's not too late, we call closeDoor or openDoor based on the text Body. Otherwise we reply with a text saying there's no call, it's too late, or all is well.

The openDoor and closeDoor functions are similar. They both call continueCall to talk to the callbox and send a text to let me know the deed is done.

Click through for source
Click through for source

Oh and they update the database to say call handled. Same updateItem method as before :)

Hooking into the waiting call to open the door looks like this:

Click through for source
Click through for source

Here's where storing that callSid becomes useful. We can update an ongoing call without being part of the original API loop πŸ’ͺ

We send Twilio some TwiML to let the person know they're being buzzed in and dial number 9 with some waits.

That is all ✌️

And that completes the crazy flowchart from before. A sequence of small steps you can understand and test on their own. When combined they make magic.

RTPzkxz

Cheers,

~Swizec

PS: wanna learn more about using serverless? I'm making Serverless Handbook the best way to get started

Did you enjoy this article?

Published on December 8th, 2019 in Back End, Technical

Learned something new?
Want to become a high value JavaScript expert?

Here's how it works πŸ‘‡

Leave your email and I'll send you an Interactive Modern JavaScript Cheatsheet πŸ“–right away. After that you'll get thoughtfully written emails every week about React, JavaScript, and your career. Lessons learned over my 20 years in the industry working with companies ranging from tiny startups to Fortune5 behemoths.

Start with an interactive cheatsheet πŸ“–

Then get thoughtful letters πŸ’Œ on mindsets, tactics, and technical skills for your career.

"Man, love your simple writing! Yours is the only email I open from marketers and only blog that I give a fuck to read & scroll till the end. And wow always take away lessons with me. Inspiring! And very relatable. πŸ‘Œ"

~ Ashish Kumar

Join over 10,000 engineers just like you already improving their JS careers with my letters, workshops, courses, and talks. ✌️

Have a burning question that you think I can answer?Β I don't have all of the answers, but I have some! Hit me up on twitter or book a 30min ama for in-depth help.

Ready to Stop copy pasting D3 examples and create data visualizations of your own? Β Learn how to build scalable dataviz components your whole team can understand with React for Data Visualization

Curious about Serverless and the modern backend? Check out Serverless Handbook, modern backend for the frontend engineer.

Ready to learn how it all fits together and build a modern webapp from scratch? Learn how to launch a webapp and make your first πŸ’° on the side with ServerlessReact.Dev

Want to brush up on your modern JavaScript syntax?Β Check out my interactive cheatsheet: es6cheatsheet.com

By the way, just in case no one has told you it yet today: I love and appreciate you for who you are ❀️