Software Engineering14:10–14:28Cinema 2

Multi-Armed Bandits: The Scientific Shotgun for Evals

Ron Au

Senior Software Engineer · Canva (Leonardo.Ai)

A/B testing is too rigid a tool for AI systems. You're stuck serving worse results for the duration of the experiment and getting billed for slower models while three providers release SOTA updates this week.

Steal a trick from data science instead and use multi-armed bandits to organically surface ideal models, prompting choices and harnesses. You want your evals to be more than scores– make them an exploration in minimising regret.