How to Redact PII in LLM Prompts: Step by Step

Why prompt-level redaction matters

Every prompt your team sends can carry names, emails, account numbers, health details, or secrets. Once that text reaches an external model, you have lost control of it. Redacting at the prompt level means sensitive data is removed or replaced before the request leaves your boundary, so the model still does useful work without ever seeing what it should not. The steps below walk through a runtime approach that holds up under audit.

Step 1: Classify what counts as sensitive

Write down the data classes you must protect: direct identifiers like names and emails, financial identifiers, health data, credentials and API keys, and any regulated categories specific to your industry. This list becomes your redaction policy. Without it, detection has no target and your controls cannot be audited against a clear standard.

Step 2: Detect sensitive data in the prompt

Run each prompt through detection before it is sent. Combine pattern matching for structured items (emails, card numbers, keys) with named-entity recognition for unstructured items (people, locations, organizations). Detection should run on the full prompt including any context your application injects, since retrieved documents often carry more PII than what the user typed.

Step 3: Redact, tokenize, or block

Decide what happens to each detected item. Redaction replaces the value with a placeholder so the model sees structure but not content. Tokenization swaps the value for a reversible token so you can restore it in the response when the data never needed to leave at all. Blocking stops the request entirely when the data class is too sensitive to send under any transformation. Choose per data class, and default to the safest option when confidence is low.

Step 4: Enforce at runtime, fail-closed

Place redaction in the request path at a gateway every prompt must cross, not as an optional client-side step a developer can skip. Set it to fail-closed: if detection cannot run or a policy cannot be evaluated, the request is held rather than sent in the clear. This is what turns redaction from a best-effort feature into a control you can rely on.

Step 5: Restore and validate the response

When you tokenized values, map them back in the model response so the end user sees a complete, correct answer while the external model never held the raw data. Validate that no sensitive value leaked into the output, since models sometimes echo or infer protected details, and redact the response too when needed.

Step 6: Audit every redaction

Record what was detected, what action was taken, which policy fired, and who made the request, for every prompt. This log is the evidence that your redaction actually runs in production, which is what an assessor or a regulator under regimes like the EU AI Act will ask to see. Difinity performs detection, redaction, enforcement, and logging in one runtime layer, so prompt-level PII protection is applied to every model call without per-app wiring.

Frequently asked questions

Should PII redaction run on the client or at a gateway?

At a gateway every prompt must cross. Client-side redaction can be skipped or bypassed; a gateway enforces it on every request and can fail-closed when detection cannot run.

What is the difference between redaction and tokenization?

Redaction replaces sensitive values with placeholders the model cannot reverse. Tokenization swaps them for reversible tokens you map back in the response, so the model never sees the raw value but the user still gets a complete answer.

Do I need to redact retrieved context too, not just user input?

Yes. Documents pulled into a prompt through retrieval often contain more PII than the user typed. Detection must cover the full prompt, including injected context.