notesField note

Prompt caching in production: what actually matters

Apr 8, 2026#applied-ai

Prompt caching is the lowest-effort, highest-impact optimisation in applied AI right now. The trick is putting the stable parts at the front: system prompt, tool definitions, retrieved context that rarely changes.

Variable user input goes last. Cache hit rate becomes a metric you watch the way you watch p99 latency. Mine sits above 80% on the hot paths — that's not a tuning win, that's just structure.