Funfact: When candidates put AI on their resume, the key thing I try to find out is whether they used evals. How did you measure making improvements?
This filters out 80% of engineers.
When you work with AI and try to make something useful, you'll quickly find it's a bit ~~random~~ stochastic. You make a change and it works. Then you try again and it doesn't.
You're building a stochastic system that works 80% of the time. How do you know the next version works 85% of the time?
Evals.
What you need depends on what you're building. I'm a product engineer so I'd measure user behavior directly. Do users reach success? How many users? How often? When does it fail?
Build a dataset of what users are doing. Use that to create a test suite you can run quickly against different models, prompts, and tools.
This is the moat.
We all have access to the same models. The models keep improving. Your moat is that dataset and organizational expert knowledge of your problem. You need experts with unique insights and intuitions to build a differentiated AI product.
Make sure you don't overfit the test data. Build feedback loops with reality.
Have a human-in-the-loop fallback for failures. Measure how often humans have to intervene. You want this number to go down. When something fails spectacularly, add it to the dataset.
Make sure you know what a bad answer even looks like.
Cheers,
~Swizec
Continue reading about Quick note on evals and putting AI in your resume
Semantically similar articles hand-picked by GPT-4
- AI Engineer Summit report
- Software engineer interviews for the age of AI
- Why you shouldn't use AI to write your tests
- Solve the problem, not a different more difficult problem
- Coaching AI to write your code
Learned something new?
Read more Software Engineering Lessons from Production
I write articles with real insight into the career and skills of a modern software engineer. "Raw and honest from the heart!" as one reader described them. Fueled by lessons learned over 20 years of building production code for side-projects, small businesses, and hyper growth startups. Both successful and not.
Subscribe below 👇
Software Engineering Lessons from Production
Join Swizec's Newsletter and get insightful emails 💌 on mindsets, tactics, and technical skills for your career. Real lessons from building production software. No bullshit.
"Man, love your simple writing! Yours is the only newsletter I open and only blog that I give a fuck to read & scroll till the end. And wow always take away lessons with me. Inspiring! And very relatable. 👌"
Have a burning question that you think I can answer? Hit me up on twitter and I'll do my best.
Who am I and who do I help? I'm Swizec Teller and I turn coders into engineers with "Raw and honest from the heart!" writing. No bullshit. Real insights into the career and skills of a modern software engineer.
Want to become a true senior engineer? Take ownership, have autonomy, and be a force multiplier on your team. The Senior Engineer Mindset ebook can help 👉 swizec.com/senior-mindset. These are the shifts in mindset that unlocked my career.
Curious about Serverless and the modern backend? Check out Serverless Handbook, for frontend engineers 👉 ServerlessHandbook.dev
Want to Stop copy pasting D3 examples and create data visualizations of your own? Learn how to build scalable dataviz React components your whole team can understand with React for Data Visualization
Want to get my best emails on JavaScript, React, Serverless, Fullstack Web, or Indie Hacking? Check out swizec.com/collections
Did someone amazing share this letter with you? Wonderful! You can sign up for my weekly letters for software engineers on their path to greatness, here: swizec.com/blog
Want to brush up on your modern JavaScript syntax? Check out my interactive cheatsheet: es6cheatsheet.com
By the way, just in case no one has told you it yet today: I love and appreciate you for who you are ❤️

