Patterns
20 production-ready patterns across 10 pillars.
Cascading Context Assembly
Start with minimal context and progressively enrich only when the model signals low confidence, avoiding maximum context stuffing on every request.
Token Budget Pattern
Hard limits on input/output tokens per request with trimming or summarization when limits are hit.
Data Contract Pattern
Schema, quality, and SLA agreements enforced as code between data producers and consumers to prevent upstream drift from breaking downstream AI.
Semantic Deduplication
Deduplicate documents at the meaning level before indexing to prevent redundant chunks from wasting context window slots during retrieval.
LLM-as-Judge
Use a strong LLM to evaluate outputs from another model, replacing expensive human evaluation with scalable automated quality scoring.
Entity Resolution Graph
Use graph-based approaches to deduplicate, link, and merge entity mentions across heterogeneous data sources into a unified knowledge graph.
Graph of Thoughts
Structure LLM reasoning as a directed graph where thoughts branch, merge, and loop to solve complex problems that linear chains cannot.
GraphRAG
Augment RAG with knowledge graphs for multi-hop reasoning, entity relationships, and structured context that vector search alone cannot provide.
Model Card Pattern
Standardized documentation of model capabilities, limitations, training data, and known failure modes.
Prompt Canary Deployment
Deploy prompt changes to a small traffic slice, monitor quality metrics, and auto-rollback on regression — treating prompts as deployable artifacts, not configuration.
LLM Gateway
Centralized proxy for API key management, rate limiting, logging, and multi-provider routing.
Model Router
Route queries to cheap/fast models vs powerful ones based on complexity for 85-99% cost reduction.
Semantic Caching
Cache LLM responses by meaning rather than exact text match, serving similar queries from cache for 40-60% cost reduction.
Embedding Drift Detector
Monitor embedding distribution shifts over time to detect silent RAG degradation before it surfaces as bad answers.
Span-Level Tracing
Trace every step of an AI pipeline with latency and token counts per span for debugging and optimization.
Circuit Breaker for LLMs
Detect LLM provider degradation early and trip to fallback before user impact accumulates.
Hybrid Search
Combine dense vector search with sparse BM25 keyword search for retrieval that consistently outperforms either method alone.
Retrieval Freshness Watermark
Attach temporal metadata to every retrieved chunk and surface it to the LLM so it can reason about staleness and prefer fresh evidence.
Input Sanitization
Filter prompt injection, jailbreaks, and PII before queries reach the model.
Tool Output Firewall
Sanitize and validate tool/API outputs before they re-enter the LLM context to block indirect prompt injection in agentic systems.