Redacting Sensitive Information from LLM Prompts

Two approaches: Presidio and Phi-3 + Outlines

May 30, 2024

I take a look at two ways to (reversibly) redact sensitive info from prompts being sent to an LLM.
1. Using pattern matching and spaCy named-entity recognition models (via the Presidio library).
2. Using a small local phi-3 mini model along with structured generation (using Outlines) to anonymize the prompt and later de-anonymize the response.

1️⃣ Presidio Library
- Uses small neural networks and pattern matching to extract names, phone numbers, credit card numbers, etc.
- Replace sensitive entities with fake but realistic values (using "faker" library)
- Send anonymized prompt to LLM
- Try to substitute fake values in the LLM's response back with originals

2️⃣ Larger Local LLM (e.g. Phi-3 mini)
- Extract sensitive information into JSON format using the local LLM with structured generation
- Replace sensitive values with fakes
- Send to 3rd party LLM
- Use local LLM to substitute fakes with originals

➡️ Pros and Cons:
- Presidio is faster but can struggle with inconsistent formats or errors/typos in inputs and reversing anonymization
- The local LLM approach is slower but can be more robust at extracting and reversing sensitive information

💡 Tips:
- Experiment with fuzzy matching for handling typos and inconsistencies
- Consider combining NER and local LLM approaches

Removing sensitive information before sending prompts to 3rd party LLMs can help with privacy. While not always perfect, these techniques can help minimize exposure of personal data.

That’s it for this week, see you later on the livestream, cheers, Ronan

Ronan McGovern

Resources/Support at Trelis.com/About

Trelis Research Updates

Redacting Sensitive Information from LLM Prompts

Two approaches: Presidio and Phi-3 + Outlines