Input Sanitization
What It Is
Section titled “What It Is”Input sanitization is a defense layer that inspects, classifies, and transforms user input before it reaches the LLM. It detects and blocks prompt injection attacks, jailbreak attempts, and PII leakage at the input boundary — the same way web applications sanitize form inputs before passing them to a database.
The Problem It Solves
Section titled “The Problem It Solves”LLMs process user input as part of their prompt. Unlike traditional software where input and instructions are structurally separated, LLMs treat everything as text in the same context window. This creates a fundamental vulnerability:
- Prompt injection: Malicious instructions embedded in user input that override the system prompt (“Ignore previous instructions and…”).
- Jailbreaks: Attempts to bypass safety constraints through social engineering, role-playing scenarios, or encoding tricks.
- PII in prompts: Users inadvertently (or intentionally) including personal data that gets sent to external model providers, logged, or used in training.
Without input sanitization, every user query is a potential attack surface.
How It Works
Section titled “How It Works”User Input │ ▼┌──────────────────┐│ Length & Format │ Reject oversized or malformed input│ Validation │└────────┬─────────┘ │ ▼┌──────────────────┐│ PII Detection │ Detect and redact emails, SSNs,│ & Redaction │ phone numbers, credit cards└────────┬─────────┘ │ ▼┌──────────────────┐│ Injection │ Classify input for injection│ Detection │ patterns and jailbreaks└────────┬─────────┘ │ ┌────┴─────┐ │ │ Clean Flagged │ │ ▼ ▼ Pass to Block, log, model and return safe response- Format validation: Check input length, encoding, and structure. Reject inputs that exceed token budgets or contain suspicious encoding.
- PII detection: Scan for patterns matching personal data (emails, phone numbers, SSNs, credit card numbers). Redact or tokenize detected PII before the input moves forward.
- Injection detection: Classify the input for known injection patterns using one or more methods:
- Pattern matching: Regular expressions for common injection prefixes (“ignore previous”, “system:”, delimiter injection).
- Perplexity detection: Inputs with unusual token distributions may contain injected instructions.
- Classifier model: A fine-tuned model that classifies inputs as benign or adversarial.
- Decision: Clean inputs proceed to the model. Flagged inputs are blocked, logged for security review, and the user receives a generic safe response.
When to Use It
Section titled “When to Use It”- Your system accepts free-text user input that is incorporated into LLM prompts.
- Your application handles any type of sensitive or personal data.
- You expose LLM functionality to external or untrusted users.
- Compliance requirements (GDPR, HIPAA, PCI-DSS) mandate input data handling controls.
When NOT to Use It
Section titled “When NOT to Use It”- The system only processes internal, pre-validated data that never includes user input (batch processing of structured records). Sanitization adds latency to a path that does not need it.
- You use structured input exclusively (API calls with typed parameters, not free text). Schema validation is more appropriate than text-level sanitization.
- The LLM has no tools, no access to sensitive data, and no actions beyond text generation in a sandboxed environment. The blast radius of a successful injection is limited to the response text itself.
Trade-offs
Section titled “Trade-offs”- False positives — Aggressive injection detection blocks legitimate queries that happen to contain patterns resembling attacks. “Please ignore the previous error and retry” is a valid user request that matches injection patterns.
- Latency — Each sanitization step adds processing time. PII detection with NER models can add 10-50ms. Classifier-based injection detection adds more.
- Arms race — Injection techniques evolve continuously. Pattern-based detection has a short half-life as attackers find new phrasings. Classifier models require ongoing retraining.
- Incomplete coverage — No sanitization layer catches everything. Sophisticated indirect injection (through retrieved documents, tool outputs, or multi-turn context manipulation) bypasses input-level checks.
Implementation Example
Section titled “Implementation Example”import refrom dataclasses import dataclassfrom enum import Enum
class ThreatLevel(Enum): CLEAN = "clean" PII_DETECTED = "pii_detected" INJECTION_SUSPECTED = "injection_suspected" BLOCKED = "blocked"
@dataclassclass SanitizationResult: threat_level: ThreatLevel sanitized_input: str detections: list[str] original_length: int
PII_PATTERNS = { "email": re.compile(r"\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b"), "ssn": re.compile(r"\b\d{3}-?\d{2}-?\d{4}\b"), "phone": re.compile(r"\b(?:\+?1[-.\s]?)?\(?\d{3}\)?[-.\s]?\d{3}[-.\s]?\d{4}\b"), "credit_card": re.compile(r"\b(?:\d{4}[-\s]?){3}\d{4}\b"),}
INJECTION_PATTERNS = [ re.compile(r"ignore\s+(?:all\s+)?(?:previous|prior|above)\s+instructions", re.I), re.compile(r"disregard\s+(?:all\s+)?(?:previous|prior|your)\s+instructions", re.I), re.compile(r"you\s+are\s+now\s+(?:a|an)\s+", re.I), re.compile(r"new\s+instruction[s]?\s*:", re.I), re.compile(r"system\s*(?:prompt|message)\s*:", re.I), re.compile(r"\[INST\]|\[/INST\]|<<SYS>>|<\|im_start\|>", re.I), re.compile(r"```\s*system", re.I),]
MAX_INPUT_LENGTH = 10000REDACTION_PLACEHOLDER = "[REDACTED]"
def sanitize_input(text: str) -> SanitizationResult: detections: list[str] = [] original_length = len(text) sanitized = text
if len(text) > MAX_INPUT_LENGTH: return SanitizationResult( threat_level=ThreatLevel.BLOCKED, sanitized_input="", detections=[f"Input exceeds maximum length: {len(text)} > {MAX_INPUT_LENGTH}"], original_length=original_length, )
for pii_type, pattern in PII_PATTERNS.items(): matches = pattern.findall(sanitized) if matches: detections.append(f"{pii_type}: {len(matches)} instance(s) redacted") sanitized = pattern.sub(REDACTION_PLACEHOLDER, sanitized)
for pattern in INJECTION_PATTERNS: if pattern.search(sanitized): detections.append(f"Injection pattern detected: {pattern.pattern[:50]}") return SanitizationResult( threat_level=ThreatLevel.INJECTION_SUSPECTED, sanitized_input="", detections=detections, original_length=original_length, )
if any("redacted" in d for d in detections): threat_level = ThreatLevel.PII_DETECTED else: threat_level = ThreatLevel.CLEAN
return SanitizationResult( threat_level=threat_level, sanitized_input=sanitized, detections=detections, original_length=original_length, )
def enforce_sanitization(text: str) -> str: result = sanitize_input(text)
if result.threat_level == ThreatLevel.BLOCKED: raise ValueError("Input blocked by sanitization policy")
if result.threat_level == ThreatLevel.INJECTION_SUSPECTED: raise ValueError("Input blocked: potential prompt injection detected")
return result.sanitized_inputFor production systems, complement pattern-based detection with a trained classifier (e.g., a fine-tuned model on prompt injection datasets) and use a dedicated PII detection library (Presidio, Phileas) for higher accuracy across entity types and languages.
Tool Landscape
Section titled “Tool Landscape”| Tool | Type | Notes |
|---|---|---|
| Presidio | Open-source (Microsoft) | PII detection and anonymization with pluggable NER backends |
| LLM Guard | Open-source | Input/output sanitization specifically designed for LLM applications |
| Rebuff | Open-source | Prompt injection detection using multiple detection strategies |
| Lakera Guard | Managed API | Real-time prompt injection and content moderation |
| NVIDIA NeMo Guardrails | Open-source | Programmable guardrails for LLM applications |
Related Patterns
Section titled “Related Patterns”- Output Validation Pattern — Sanitize inputs before the model; validate outputs after. Both layers are needed.
- Schema-Enforced Input Pattern — Replace free text with structured input to eliminate injection surface entirely.
- Structured Output Enforcement — Constrain outputs to match a schema. Complements input sanitization.
- PII Scrubbing Pipeline — Dedicated PII handling for data that flows into training and logging pipelines.
- Audit Trail Pattern — Log sanitization decisions for incident investigation and compliance.