Operator notes on building AI for production.
Architecture decisions we’d defend, failure modes we keep watching teams hit, and the parts most vendor pitches leave out. Written by the founders, in the voice we actually use.
- 9 min readrag · retrieval · evaluation · production
Why most RAG systems fail before retrieval
The retrieval algorithm is rarely the problem. Most RAG failures happen earlier, at stages the team isn't looking at. Here's the failure shape we keep seeing and the order we'd actually debug it in.
Read the note - 10 min readevaluation · production · ai-engineering
Why eval harnesses belong in week one
Most teams treat evaluation as a post-launch optimisation. By the time launch happens, the team is debugging with vibes and reverting changes based on hunches. The eval set is week-one work, not week-six work.
Read the note - 10 min readagents · cost · operations · production
The real operational cost of AI agents
Token bills are the visible part of the cost. The bigger numbers are hidden in retries, fallbacks, conversation context growth, and cost accounting nobody set up. Cost discipline is an architecture decision, not an optimisation.
Read the note - 9 min readarchitecture · infrastructure · production
Boring on purpose: the stack that survives a year in production
Every framework you adopt is migration risk you accept on day one. The cheapest production system is the one made of components that have been in production for years. Boring is a feature.
Read the note - 9 min readvoice-ai · latency · performance · production
The latency budget you didn't know you had
Many voice AI and real-time agent projects ship with no explicit latency budget. They discover the budget exists when users start hanging up. The median number is the lie; the p95 is the system.
Read the note
