Workshop12:00–14:00Gandel Digital Future Lab

Observability and Evaluation for LLM Apps and Agentic AI with Langfuse

Muhammad Ali

Senior Solution Architect · ClickHouse

Shipping an LLM app is easy. Knowing whether it's actually working is hard. Unlike traditional software, LLM systems and agentic pipelines can run without errors while producing wrong or degraded answers — and with agents making multi-step decisions across tools and APIs, a single bad output can cascade silently through your entire system.

This hands-on workshop shows you how Langfuse gives your team the visibility and control to ship AI with confidence. You'll go from a blind prototype to a fully observable system — seeing exactly what your app costs, where quality drops, and which prompt changes actually improve things.

For agentic systems, Langfuse traces every step of an agent's decision-making, so when something goes wrong you know exactly why, rather than losing hours to guesswork. Features like the Prompt Playground and LLM-as-a-judge evals mean faster iteration with less manual effort.

Prerequisites: Python 3.10+, OpenAI API key (optional), laptop. Langfuse account created on the day (free).