Pega Platform enables secure AI interactions through its GenAI Connect rule, where sensitive data masking is applied as an integral part of the execution pipeline:
When masking is enabled, Pega automatically applies the default configuration of the pyGenAIPIIdetector text analyzer rule, defined at the @baseclass level
The rule leverages Pega’s text analyzer rule, where underlying detection models and configurations are managed within Prediction Studio, allowing governance over how sensitive entities are identified
The analyzer scans the outgoing prompt to detect sensitive entities such as names, email addresses, phone numbers, account-related information
Each identified value is replaced with a structured masked token using angle bracket notation: <EMAIL_1> ,<PHONE_1>,<NAME_1>
The masked prompt preserves the original structure and context, allowing the AI model to generate meaningful responses without accessing real sensitive data.
The masked prompt is sent to the AI model, ensuring that no actual sensitive data is exposed externally.
The entire process is handled transparently within the platform.
As you mention, the PII masking relies on a Text Analyzer rule, essentially an NLP Machine Learning model (not GenAI). When you want to use this, I recommend assessing if the OOTB model needs additional training for specific PII instances you want to recognize for your organization. Think of non-standard address notations or client IDs specific to your organization.
It is good to be aware that the option to switch on PII masking, is available for GenAI Connect rules. For features like Coach and Agents, this is not available in the same way.
In general, I believe we should aim for GenAI solutions that are inherently secure, also without optional PII masking. This starts with using GenAI solutions through the Pega GenAI Gateway, so models are used in a stateless way, the RBAC settings on your application are respected, and all communication between your application and your chosen LLM model is fully secured and encrypted.
@Tim_Straatsma Thanks for the feedback, Tim. Absolutely, and as mentioned in the article, this should be seen only as a starting point. In practice, not every entity that requires masking or pseudonymization will be covered by a machine learning model alone, exactly as you pointed out.