Patterns

35 production-ready patterns across 10 pillars.

Cascading Context Assembly

Start with minimal context and progressively enrich only when the model signals low confidence, avoiding maximum context stuffing on every request.

costcontext-windowretrievallatency

Cost & EfficiencyValidated in Production

Dynamically compress conversation history and intermediate state into structured summaries when context windows approach capacity, preserving critical information while freeing tokens for new content.

context-windowcompressionmemory-managementtoken-optimization

Cost & EfficiencyEmerging

Effort Scaling

Dynamically adjust computational resources — agent count, tool call budget, model selection — based on query complexity to avoid overinvestment on simple tasks and underinvestment on complex ones.

resource-allocationcost-optimizationdynamic-scalingquery-complexity

Cost & EfficiencyValidated in Production

Token Budget Pattern

Hard limits on input/output tokens per request with trimming or summarization when limits are hit.

costtokensbudgetingefficiency

Data PatternsValidated in Production

Data Contract Pattern

Schema, quality, and SLA agreements enforced as code between data producers and consumers to prevent upstream drift from breaking downstream AI.

data-qualityschemacontractspipelines

Data PatternsEmerging

Semantic Deduplication

Deduplicate documents at the meaning level before indexing to prevent redundant chunks from wasting context window slots during retrieval.

data-qualitydeduplicationembeddingsrag

GovernanceValidated in Production

Human-in-the-Loop

Require explicit human approval before agents execute consequential or irreversible actions, enforcing safety boundaries through infrastructure-level gates rather than model discretion.

human-approvalsafetygovernanceconfirmation

GovernanceValidated in Production

Model Card Pattern

Standardized documentation of model capabilities, limitations, training data, and known failure modes.

governancedocumentationcompliancemodel-management

GovernanceEmerging

Prompt Canary Deployment

Deploy prompt changes to a small traffic slice, monitor quality metrics, and auto-rollback on regression — treating prompts as deployable artifacts, not configuration.

governancepromptsdeploymentcanary

Graph PatternsEmerging

Entity Resolution Graph

Use graph-based approaches to deduplicate, link, and merge entity mentions across heterogeneous data sources into a unified knowledge graph.

graphentity-resolutiondeduplicationdata-quality

Graph PatternsEmerging

Graph of Thoughts

Structure LLM reasoning as a directed graph where thoughts branch, merge, and loop to solve complex problems that linear chains cannot.

reasoninggraphmulti-pathplanning

Graph PatternsValidated in Production

GraphRAG

Augment RAG with knowledge graphs for multi-hop reasoning, entity relationships, and structured context that vector search alone cannot provide.

graphragknowledge-graphretrieval

Inference & ServingValidated in Production

LLM Gateway

Centralized proxy for API key management, rate limiting, logging, and multi-provider routing.

gatewayroutingobservabilitycost

Inference & ServingValidated in Production

Model Router

Route queries to cheap/fast models vs powerful ones based on complexity for 85-99% cost reduction.

routingcostlatencymodel-selection

Inference & ServingValidated in Production

Multi-Agent Orchestration

Coordinate multiple specialized agents through explicit delegation patterns to decompose complex tasks while maintaining control flow and context.

multi-agentorchestrationcoordinationdelegation

Inference & ServingValidated in Production

Semantic Caching

Cache LLM responses by meaning rather than exact text match, serving similar queries from cache for 40-60% cost reduction.

cachingcostlatencyembeddings

Loop EngineeringValidated in Production

Loop Termination

A composite stop-decision — iteration ceiling, goal verification, and no-progress signal — that runs after every step of an agentic loop so it stops for the right reason.

loopsagentsterminationiteration

Loop EngineeringEmerging

Maker–Verifier Split

Structurally separate the agent that does the work from the agent or process that certifies it's done, so "done" is an independent claim rather than the maker grading its own homework.

loopsagentsverificationhuman-in-the-loop

Loop EngineeringEmerging

No-Progress Detection

Detect stalled or oscillating agent loops by measuring the delta between successive iterations, so the loop can stop before exhausting its full iteration ceiling.

loopsagentsconvergenceoscillation

Loop EngineeringEmerging

Reflexion Loop

A three-role loop — Actor, Evaluator, Self-Reflection — where verbal lessons from failed attempts are written to episodic memory and read back on the next try.

loopsagentsself-correctionmemory

Observability & EvaluationEmerging

Embedding Drift Detector

Monitor embedding distribution shifts over time to detect silent RAG degradation before it surfaces as bad answers.

observabilityembeddingsdriftrag

Observability & EvaluationValidated in Production

Span-Level Tracing

Trace every step of an AI pipeline with latency and token counts per span for debugging and optimization.

observabilitytracinglatencydebugging

Observability & EvaluationEmerging

Embedding Drift Detector

Monitor embedding distribution shifts over time to detect silent RAG degradation before it surfaces as bad answers.

observabilityembeddingsdriftrag

Observability & EvaluationValidated in Production

LLM-as-Judge

Use a strong LLM to evaluate outputs from another model, replacing expensive human evaluation with scalable automated quality scoring.

evaluationllm-judgequalitytesting

Observability & EvaluationValidated in Production

Span-Level Tracing

Trace every step of an AI pipeline with latency and token counts per span for debugging and optimization.

observabilitytracinglatencydebugging

Reliability & ResilienceValidated in Production

Circuit Breaker for LLMs

Detect LLM provider degradation early and trip to fallback before user impact accumulates.

reliabilityresiliencefallbackmonitoring

Reliability & ResilienceEmerging

Request Hedging for LLMs

Send duplicate requests to multiple providers and return the fastest adequate response to mitigate tail latency.

reliabilitylatencytail-latencyredundancy

Reliability & ResilienceEmerging

Uncertainty-Triggered Retry

Detect low-confidence LLM responses and retry with stronger models or improved prompts before serving degraded answers.

reliabilityretryconfidencequality-assurance

Retrieval & MemoryValidated in Production

Hybrid Search

Combine dense vector search with sparse BM25 keyword search for retrieval that consistently outperforms either method alone.

retrievalragsearchembeddings

Retrieval & MemoryEmerging

Retrieval Freshness Watermark

Attach temporal metadata to every retrieved chunk and surface it to the LLM so it can reason about staleness and prefer fresh evidence.

retrievalragfreshnessmetadata

Retrieval & MemoryValidated in Production

Retrieval Quality Gate

Filter retrieved chunks by relevance before injecting into context to prevent hallucinations grounded in irrelevant content.

retrievalragquality-controlrelevance

Retrieval & MemoryEmerging

Sliding Context Window with Memory Consolidation

Maintain long conversations by keeping recent turns verbatim and consolidating old turns into factual summaries to prevent context overflow.

memoryconversationcontext-managementsummarization

Security & TrustValidated in Production

Guardrails

Validate inputs, outputs, and tool calls at pipeline boundaries using parallel validation functions that can block execution when safety or quality thresholds are violated.

validationsecurityinput-sanitizationoutput-filtering

Security & TrustValidated in Production