Diego Baldi
CTO focused on AI orchestration, governance, and agent reliability.
Notes on building AI systems that survive production — agent reliability, orchestration, and the kind of boring governance that keeps things working when the model changes underneath them.
- May 25 · 2026 Hello, world
A short note on what this site is, who it's for, and the thesis the rest of the writing will defend.
A governance & observability runtime for enterprise agents — policy enforcement, traceable tool calls, and human-in-the-loop checkpoints.
Open benchmark scoring agent answers against a frozen corpus, with adversarial paraphrase and recency probes.
Re-runs a stored agent session against a different model and diffs the divergence. Debugging aid, eval pipeline component.
Currently scaffolding the AI Control Plane and reading deeply on agent reliability engineering. See what I'm focused on this month →