Inference & Serving
Patterns for controlling how requests reach LLM providers and how responses come back. This pillar covers the infrastructure layer between your application code and model APIs — routing, caching, failover, and cost control at the inference boundary.
Patterns
Section titled “Patterns” LLM Gateway Centralized proxy for API key management, rate limiting, logging, and multi-provider routing.
Model Router Route queries to cheap/fast models vs powerful ones based on complexity for 85-99% cost reduction.
Semantic Caching Cache LLM responses by meaning rather than exact text match, serving similar queries from cache for 40-60% cost reduction.