Three Lanes Below One Millisecond: A Rust SDK for Gemini Live
Full-duplex voice agents punish the choices text-only agents forgive. A 50ms GC pause is a perceptible glitch. A blocked event loop is a barged-in user talking over your model. Google’s Python ADK is a beautiful kit for building agents — until audio frames arrive every 20 milliseconds and refuse to wait.
This talk walks through gemini-rs, an open-source Rust SDK that rebuilds Google’s Agent Development Kit and the Gemini Multimodal Live wire protocol from scratch, in three layered crates: a zero-copy WebSocket transport, an agent runtime with phase machines and typed prefix-scoped state, and a fluent builder where a production voice agent fits in twenty lines of declarative Rust.
The architectural keystone is a three-lane processor: a sync fast lane for audio and VAD under one millisecond, an async control lane for tool dispatch and phase transitions, and a debounced telemetry lane built on atomics. We’ll cover what broke in the naive translation, why Vertex AI sends binary frames where Google AI sends text, how to compose tool-using agents without rebuilding LangChain, and where the Python ADK still belongs in the stack.