At Yup, we’ve been integrating a lot of 3rd-party services lately to help our sales team. First, we added BaseCRM to the mix to help us manage leads, then we added ActiveCampaign to help with email campaigns. A while ago, we integrated Mixpanel.
It took us 6 months to discover our Mixpanel integration had problems. Scaling hurt. Small scaling that is.
It took us 3 months to get BaseCRM to work reliably.
ActiveCampaign took just 1 week and it’s working great. 👌
That’s a 12x improvement in engineering team performance.
You might think ActiveCampaign is much easier to integrate. Or maybe that it’s more reliable and has better documentation. Mixpanel, by far, has the best documentation of all three. BaseCRM… ugh.
Really, it’s because we learned a couple painful lessons 👇
What makes 3rd party hard
Adding a 3rd party service is easy in theory. Every service out there claims to be one and done.
Copy this code. Plop it in your app. You’re done. Hooray \o/
Stripe famously takes just 7 lines of code to integrate. Mixpanel says you can do it with just a few function calls.
Basic integrations are easy. But most basic integrations take control away from you and your team and hand it over to the 3rd party service’s preferred setup.
When something goes wrong, how do you debug it? You don’t.
When the service errors out, what do you do? You crash your app.
When your business folks want to get metrics, how do they do it? They use ten different dashboards for ten different services and hope for the best.
That’s no way to build a unicorn. That’s how you build a sidehustle.
Track your own stuff
How are we going to know when something is wrong? How do we know what’s wrong?
Every 3rd party integration can go wrong. Services fall down, networks get crappy, traffic gets spikey.
Hell, you could just forget to fill up your credit card and get your access revoked. Things happen ¯(ツ)/¯
When something does happen, it’s important that you find out before your users do. Most services give you some ability to tell when something isn’t working. You can go into their dashboards, see metrics, and make sure things are wiggling in the right direction.
But that means someone has to remember to check.
Now, monitoring is a whole science of its own. If you have the funds, I warmly recommend getting a real monitoring expert to check your infrastructure and make suggestions. Read a book or three if you have time.
In lieu of expert monitoring, adding rows to your database every time you ping a service is good enough. Store who you called, what you sent them, why you did it, when you did it, and build a chart that wiggles around when you add stuff.
This helps you know things are working and gives you enough information to debug issues later. You can even replay requests and see what happens.
Keep your own data
Adding rows to the database for every service call brings us to the next lesson learned: Save your own data.
3rd party services looooove to say you can rely on them, keep all of your data there, and live a carefree life with not a worry on your mind. This is great for them because the more data they have, the harder it is for you to leave. This is bad for you because the more data they have, the harder it is for you to leave.
Storing your own data, even in a raw format, means you can pull the plug any time. Don’t like a service anymore? Find an alternative and switch.
You can import your old data (or not). At least you have it. No need to worry about whether a service offers data export. No need to fret about how useful or not useful that export.
Services love to offer data export without giving you anything crazy useful. You won’t find out until you try.
Besides, if you keep all your data, you can run your own analytics, build a common dashboard for business metrics, correlate stuff across services, etc. Your product managers will love you.
Encapsulate & isolate errors
You know all those places where you make a call to a service either through a library or directly? Each of those can fail.
I wrote about building distributed transactions before. Correctness across multiple services is really hard to ensure.
When a call does go wrong, it’s important that it doesn’t break the rest of the system.
A common problem is when you decide to process a bunch of entries in a loop.
for entry in queue thirdParty(entry) end
thirdParty call fails, the queue stops processing. When you try again, it fails at the same spot. Your queue never processes, and it keeps growing.
You can wrap each call in a
try/catch statement for your language. Skip over the failures, process everything else, and make a note in a database that the particular entry failed.
An even better fix I’ve found is to make as many calls as possible in isolated background workers of their own. Tiny workers with one per 3rd party service call.
It gives you error isolation, only 1 call per worker, and an easy way to retry calls. Most queue systems offer that, and you ensure entries don’t have to wait for each other. This makes your system fast.
Eventual consistency is your friend
The background worker approach to error isolation and call retrying brings us to the last lesson we learned: Eventual consistency is your friend.
Eventual consistency was all the rage a few years ago when Cassandra and other NoSQL databases topped the charts of internet hype. Those continue to be great ideas. Your team is probably too small to make good use of them. Ours definitely is.
You can use eventual consistency even if you’re in Postgres. Process stuff in workers.
When a request comes in that involves a 3rd party service (or a webhook from a service), store the raw payload in your database and have a background worker checking every few minutes to process those requests. Most things don’t have to happen inbound.
When you’re sending things to a service, put them in the database instead and send them later. Then update whatever needs to update as a result of that call.
It makes your system harder to test and understand, but it’s totally worth it. Would you rather have a more Rube-Goldbergian architecture or hotpatch a bug in production at 3am when everything falls apart and users are crying?
PS: You should also follow me on twitter 👉 here.
It's where I go to shoot the shit about programming.