AI Engineering Patterns

Patterns

20 production-ready patterns across 10 pillars.

Cost & EfficiencyEmerging

Cascading Context Assembly

Start with minimal context and progressively enrich only when the model signals low confidence, avoiding maximum context stuffing on every request.

costcontext-windowretrievallatency
Cost & EfficiencyValidated in Production

Token Budget Pattern

Hard limits on input/output tokens per request with trimming or summarization when limits are hit.

costtokensbudgetingefficiency
Data PatternsValidated in Production

Data Contract Pattern

Schema, quality, and SLA agreements enforced as code between data producers and consumers to prevent upstream drift from breaking downstream AI.

data-qualityschemacontractspipelines
Data PatternsEmerging

Semantic Deduplication

Deduplicate documents at the meaning level before indexing to prevent redundant chunks from wasting context window slots during retrieval.

data-qualitydeduplicationembeddingsrag
Evaluation & TestingValidated in Production

LLM-as-Judge

Use a strong LLM to evaluate outputs from another model, replacing expensive human evaluation with scalable automated quality scoring.

evaluationllm-judgequalitytesting
Graph PatternsEmerging

Entity Resolution Graph

Use graph-based approaches to deduplicate, link, and merge entity mentions across heterogeneous data sources into a unified knowledge graph.

graphentity-resolutiondeduplicationdata-quality
Graph PatternsEmerging

Graph of Thoughts

Structure LLM reasoning as a directed graph where thoughts branch, merge, and loop to solve complex problems that linear chains cannot.

reasoninggraphmulti-pathplanning
Graph PatternsValidated in Production

GraphRAG

Augment RAG with knowledge graphs for multi-hop reasoning, entity relationships, and structured context that vector search alone cannot provide.

graphragknowledge-graphretrieval
GovernanceValidated in Production

Model Card Pattern

Standardized documentation of model capabilities, limitations, training data, and known failure modes.

governancedocumentationcompliancemodel-management
GovernanceEmerging

Prompt Canary Deployment

Deploy prompt changes to a small traffic slice, monitor quality metrics, and auto-rollback on regression — treating prompts as deployable artifacts, not configuration.

governancepromptsdeploymentcanary
Inference & ServingValidated in Production

LLM Gateway

Centralized proxy for API key management, rate limiting, logging, and multi-provider routing.

gatewayroutingobservabilitycost
Inference & ServingValidated in Production

Model Router

Route queries to cheap/fast models vs powerful ones based on complexity for 85-99% cost reduction.

routingcostlatencymodel-selection
Inference & ServingValidated in Production

Semantic Caching

Cache LLM responses by meaning rather than exact text match, serving similar queries from cache for 40-60% cost reduction.

cachingcostlatencyembeddings
ObservabilityEmerging

Embedding Drift Detector

Monitor embedding distribution shifts over time to detect silent RAG degradation before it surfaces as bad answers.

observabilityembeddingsdriftrag
ObservabilityValidated in Production

Span-Level Tracing

Trace every step of an AI pipeline with latency and token counts per span for debugging and optimization.

observabilitytracinglatencydebugging
Reliability & ResilienceValidated in Production

Circuit Breaker for LLMs

Detect LLM provider degradation early and trip to fallback before user impact accumulates.

reliabilityresiliencefallbackmonitoring
Retrieval & MemoryValidated in Production

Hybrid Search

Combine dense vector search with sparse BM25 keyword search for retrieval that consistently outperforms either method alone.

retrievalragsearchembeddings
Retrieval & MemoryEmerging

Retrieval Freshness Watermark

Attach temporal metadata to every retrieved chunk and surface it to the LLM so it can reason about staleness and prefer fresh evidence.

retrievalragfreshnessmetadata
Security & TrustValidated in Production

Input Sanitization

Filter prompt injection, jailbreaks, and PII before queries reach the model.

securityprompt-injectionpiiinput-validation
Security & TrustEmerging

Tool Output Firewall

Sanitize and validate tool/API outputs before they re-enter the LLM context to block indirect prompt injection in agentic systems.

securityagentstool-useindirect-injection