Remember the doorbell slack bot I built in 2016? Probably not, but the girl whose sanity it saved became one of my best friends. For 2 years she could focus on work instead of answering the phone every time Yup Inc got a visitor.

Then we moved and the bot died.

2 weeks ago I moved and the new apartment has a front gate buzzer, but no package concierge. For the first time in 5 years we’re gonna have to answer the door! 😱

Millennials are killing the doorbell industry by texting “here” but that don’t work for deliveries. Especially if you’re not home.

The buzzer looks like this:

Delivery person finds your name, taps call and the box calls a pre-programmed phone number. You pick up the phone, talk to the person, and press 9 to let them in. They drop off the package behind a locked gate and people don’t steal it. 👌

Now here’s the thing: My phone is where phone calls go to die. And I don’t want my girlfriend to be on the hook every time I order something from Giant Dildos Dot Com.

The box accepts 1 phone number.

So I sat down for 3 hours and banged out a serverless app that answers the door, transcribes the audio, sends us a text, waits for reply, and opens the door if you say YES. 🤙

Click through for source

Final integration test around the 3:00:35 mark

You can read the code on GitHub

Code contains my old Twilio Auth Token because lazy. Someone racked up $300 in fraudulent calls within minutes. That was dumb.

Click through for source

How you can answer the door with AWS Lambda and Twilio

👆 sketch of how it works. It’s harder to draw than I thought. Here’s a description of the process:

  1. Delivery person makes call
  2. Twilio picks up the phone
  3. Twilio sends request to AWS Lambda "What should I say?"
  4. Lamda responds with instructions
    4.1. Say “Welcome, what do you want? State your business after the beep and press any key”
    4.2. Record response
    4.3. Wait for 60 seconds
  5. Twilio talks to callbox
  6. Person says what they want and presses a key
  7. Twilio sends recording to AWS Lambda "now what?"
  8. Lambda responds with further instructions
    8.1. Say “Thanks, someone will let you in
    8.2. Pause for 60 seconds
    8.3. If still in call, say “Sorry, nobody responded”
    8.4. Hang up
  9. Twilio sends all that to callbox and pauses the call
  10. In parallel, twilio transcribes the recording
  11. Twilio sends transcript to AWS Lambda "here transcript, now what?"
  12. Lambda saves Call ID and callbox number in DB for later
  13. Lambda tells Twilio to text Swiz
  14. Twilio sends text
  15. Swiz sees text and replies with YES
  16. Twilio gets text response and sends to AWS Lambda "here, response. Now what?"
  17. Lambda looks up original Call ID
    17.1. If no ongoing call, bail
    17.2. If more than 60 seconds since call, bail (it hung up)
  18. Lambda checks if my text matches yes
  19. Send voice call response to Twilio
    19.1. Say “Letting you in” or “Sorry, you can’t come in”
    19.2. Dial 9
  20. Twilio sends all that to callbox
  21. Door unlocks, delivery gets delivered
  22. Lambda sends text to Swizec saying “All good, person was let in”

Sounds complicated, right? Thanks to AWS Lambda and Serverless it’s pretty easy 👉 Each step becomes a standalone JavaScript function. The sophistication comes from how they work together.

Like I mention in the Serverless Pros & Cons chapter of Serverless Handbook:

Serverless lets you trade function complexity for systems complexity. Individual pieces are easier to build & test, but the system becomes hairier.

You can see this in action during the livestream. Every few minutes we integration test the next piece of the puzzle. 🤘

whoa giphy

Step 1: Picking up the phone

This is the first Lambda in our system. It answers the phone when Twilio converts it to an API POST request.

Click through for source

Twilio sends a POST request with various params, which we ignore since our response is always the same: A TwiML message constructed via Twilio’s node library.

TwiML is Twilio’s markup language based on XML used to respond to voice calls and handle text messages.

response.say() turns into a <Say>Hello</Say> line and becomes a spoken computer voice. response.record() allows us to record the person’s reply.

In this case we’re giving a 60 second timeout, asking Twilio to transcribe, and telling it to send the recording to an acceptRecording endpoint. Twilio is smart enough to handle relative URLs so we don’t have to worry about that.

We use Twilio’s dashboard to map a phone number to an API endpoint.

Step 2: Accept voice recording, ask to wait

After the person says what they want, they press a button. This tells Twilio to stop recording and talk to our next lambda: acceptRecording.

Click through for source

Same spiel as before 👉 we get a POST request and respond with some TwiML constructed with Twilio’s node library. Let the person know someone’s about to answer the door, wait 60 seconds, and if nothing happens deliver the bad news.

Btw, the sendTwiml function is a helper to avoid code duplication:

Click through for source

Status Code 200 means request succeeded, content type application/xml so Twilio API doesn’t get confused, and twiml converted to a string as the body.

Step 3: Accept transcript, send text

Twilio’s transcript API doesn’t let you send TwiML into a phone call. That’s why this is separate from the recording lambda, which just replies.

Click through for source

This time we do care about params Twilio sends with their request:

  • RecordingUrl is the audio file I can listen to
  • TranscriptionText is the machine transcription of the audio, usually good, sometimes hilariously wrong
  • CallSid is the original call ID, we’ll need it to hook back into the call
  • Called is the phone number that was called, which helps us identify the callbox (future proofing, if I productize)

We use updateItem to save the (CallSid, Called) pair in DynamoDB. Our next lambda will use this to hook into the original call and to keep track of whether the call was handled yet. Great when multiple people reply YES to the same call.

Step 4: Handle SMS reply

Handling the YES reply to that SMS gets tricky. A bunch of situations to consider: What if there’s no call? What if they’re late? What if someone else said YES already?

So we build the main handler method with a big conditional and call helper methods.

Click through for source

Takes the sms Body and To phone number from Twilio’s POST request and checks the DynamoDB database.

If there’s a call and it hasn’t been handled and it’s not too late, we call closeDoor or openDoor based on the text Body. Otherwise we reply with a text saying there’s no call, it’s too late, or all is well.

The openDoor and closeDoor functions are similar. They both call continueCall to talk to the callbox and send a text to let me know the deed is done.

Click through for source

Oh and they update the database to say call handled. Same updateItem method as before 🙂

Hooking into the waiting call to open the door looks like this:

Click through for source

Here’s where storing that callSid becomes useful. We can update an ongoing call without being part of the original API loop 💪

We send Twilio some TwiML to let the person know they’re being buzzed in and dial number 9 with some waits.

That is all ✌️

And that completes the crazy flowchart from before. A sequence of small steps you can understand and test on their own. When combined they make magic.

Cheers,

~Swizec

PS: wanna learn more about using serverless? I’m making Serverless Handbook the best way to get started

Learned something new? Want to improve your skills?

Join over 10,000 engineers just like you already improving their skills!

Here's how it works 👇

Leave your email and I'll send you an Interactive Modern JavaScript Cheatsheet 📖right away. After that you'll get thoughtfully written emails every week about React, JavaScript, and your career. Lessons learned over my 20 years in the industry working with companies ranging from tiny startups to Fortune5 behemoths.

PS: You should also follow me on twitter 👉 here.
It's where I go to shoot the shit about programming.