PII Redaction in Pega GenAI

GenAI brings powerful automation into enterprise workflows — but without strong data protection, it introduces serious risk.

In real-world Pega implementations, sensitive data (PII) often flows through prompts sent to LLMs. This is where redaction becomes critical.

Demo : https://youtu.be/5Me5araFBYQ?si=a6pddhHe6yOBskgN

Redaction Capability in Pega GenAI Connect

Pega Platform enables secure AI interactions through its GenAI Connect rule, where sensitive data masking is applied as an integral part of the execution pipeline:

Press enter or click to view image in full size

  • When masking is enabled, Pega automatically applies the default configuration of the pyGenAIPIIdetector text analyzer rule, defined at the @baseclass level

  • The rule leverages Pega’s text analyzer rule, where underlying detection models and configurations are managed within Prediction Studio, allowing governance over how sensitive entities are identified

Press enter or click to view image in full size

  • The analyzer scans the outgoing prompt to detect sensitive entities such as names, email addresses, phone numbers, account-related information

  • Each identified value is replaced with a structured masked token using angle bracket notation: <EMAIL_1> ,<PHONE_1>,<NAME_1>

  • The masked prompt preserves the original structure and context, allowing the AI model to generate meaningful responses without accessing real sensitive data.

  • The masked prompt is sent to the AI model, ensuring that no actual sensitive data is exposed externally.

  • The entire process is handled transparently within the platform.

  • Reference https://docs.pega.com/bundle/platform/page/platform/gen-ai/masking-pii-for-gen-ai.html

For detailed information,Please refer : https://medium.com/@stellarnexus/designing-privacy-driven-ai-workflows-in-pega-securing-genai-with-built-in-redaction-2b6da4deda31

5 Likes

Thanks for sharing @Anandhuc17408577!

Good to highlight the PII masking mechanism.

I’d like to add a few considerations:

  • As you mention, the PII masking relies on a Text Analyzer rule, essentially an NLP Machine Learning model (not GenAI). When you want to use this, I recommend assessing if the OOTB model needs additional training for specific PII instances you want to recognize for your organization. Think of non-standard address notations or client IDs specific to your organization.
  • It is good to be aware that the option to switch on PII masking, is available for GenAI Connect rules. For features like Coach and Agents, this is not available in the same way.
  • In general, I believe we should aim for GenAI solutions that are inherently secure, also without optional PII masking. This starts with using GenAI solutions through the Pega GenAI Gateway, so models are used in a stateless way, the RBAC settings on your application are respected, and all communication between your application and your chosen LLM model is fully secured and encrypted.
1 Like

@Tim_Straatsma Thanks for the feedback, Tim. Absolutely, and as mentioned in the article, this should be seen only as a starting point. In practice, not every entity that requires masking or pseudonymization will be covered by a machine learning model alone, exactly as you pointed out.

That is also one of the reasons we went ahead and built our own PII service on top of an open-source technology stack, with much broader coverage of PII entities and extensive fine-tuning on both real and synthetic data. If this is of interest, you can take a look here: https://medium.com/@stellarnexus/the-hidden-liability-in-your-data-pipelines-why-robust-pii-redaction-matters-in-the-age-of-ai-c9edda7695c2

Also, I would be curious to know what options Pega provides for customers who are unable to adopt the GenAI Gateway through Pega.

2 Likes