Confidence Scoring and Learning Loop for Pega AI

I recently did a demo for a client showcasing some of Pega’s AI capabilities, including DocAI for document data extraction, AI agents that let users upload documents and ask questions about their content, and Knowledge Buddy for quick, contextual Q&A.

The client came back with the following questions, and I’m curious to hear inputs from experts in this space on the best way to respond.

  • Learning Loop: We discussed that the system is currently “prompt-driven.” Does Pega have a mechanism where manual user corrections can be used to refine the underlying prompts or “teach” the AI for future cases?

  • Accuracy Controls: Can we implement Confidence Scoring or visual indicators for fields where the AI has low certainty, ensuring a mandatory human-in-the-loop review?

Side note - as shown in the attached image, I did notice that the agent/LLM can come back with a confidence score when it’s explicitly asked for one. That led to some other questions as well.

  • Would that make sense to try to put a confidence score field for every field that’s extracted from a document if using DocAI?
  • Can the confidence score be used to refine the prompts in any way - or is this on the product roadmap for the future?
  • Does Knowledge Buddy support any sort of confidence scoring being returned?

Thank you in advance for any inputs and information that I can take back to the client!

Hello @morak

We did a POC for a similar kind of requirement asked by one of your client where they wanted to read the incoming document using AI and then get the response from agent as a json containing the field name, value, and confidence score of how well the data has been read from the document. The document which we have used was a hand written document.

Once we have the json response, we parse it and map the content onto the work page fields accordingly and setup the confidence score list on the workpage for further steps.

Using a custom dx component for Text Input and Paragraph fields, we extended and utilized the OOTB field warning type display for success, warning and error based on the confidence score configured in a decision table based on the percentage. The output was like below.

Of course it was a POC to showcase the requirement and not yet materialized to be used in PROD.

Intention was that the case should be populated with extracted content from AI and will be routed to a reviewer for verification. Reviewer will be manually modifying the information by looking at the document (which is already attached to the case), if they see a less confidence score.

May be this will provide you some inspiration to think through your requirement.

Regards

JC

3 Likes

Please find my inline comments,

  • Would that make sense to try to put a confidence score field for every field that’s extracted from a document if using DocAI? This will be overwhelming to add the confidence score at each field level unless we go with unstructured format

  • Can the confidence score be used to refine the prompts in any way - or is this on the product roadmap for the future? I feel this should be part of the Product roadmap. I think it’s good to show the confidence score for the human agent to review carefully before they any actions.

  • Does Knowledge Buddy support any sort of confidence scoring being returned? NO, I haven’t seen based on my knowledge

Hi @morak — these are excellent questions and touch on scenarios that frequently arise during client demos, especially when the conversation moves beyond the basics to more advanced, practical concerns.

@JayachandraSiddipeta’s proof-of-concept (POC) approach is spot-on from an architectural standpoint. Let’s break down each of your questions with a delivery-focused lens, clarifying the technical concepts and connecting them for a more cohesive understanding:

Confidence scoring per field (DocAI): As @RameshSangili points out, scoring every individual field can quickly become overwhelming. Instead, it’s more effective to apply confidence scoring selectively, focusing on fields that carry significant business risk. For example, in a mortgage application, the loan amount and property address require high accuracy and should be prioritized for confidence scoring, whereas less critical fields like the applicant’s middle initial can be deprioritized. This selective approach is also valuable in insurance claims processing: prioritize confidence scoring for high-value claims to ensure careful review, while routine approvals can be streamlined. The DX component approach refers to using Pega’s Digital Experience components to visually highlight fields with low confidence scores, prompting human review only when necessary. A decision table can automate this governance, setting thresholds such as: confidence above 90% leads to auto-acceptance, scores between 70–90% trigger a visual warning, and anything below 70% mandates a human review. This ensures both efficiency and risk management.

Learning loop: Building on the importance of confidence scoring, the learning loop ensures continuous improvement by systematically capturing and analyzing corrections made by humans. Pega’s current GenAI architecture is prompt-driven, meaning that the AI’s responses are shaped by the instructions (prompts) provided at runtime, rather than by retraining or fine-tuning the underlying model. This allows for flexible adjustments through prompt updates. In practice, you collect human corrections (comparing original AI outputs with the revised values), review these patterns regularly, and refine prompts to address recurring misinterpretations. While this feedback loop is manual today, it’s a deliberate feature for clients in regulated industries—automatic self-training could compromise governance. Prompt versioning with audit trails supports transparency and compliance, making sure every change is tracked and justified. In sectors like banking or healthcare, this approach mitigates risks associated with ungoverned model updates.

Knowledge Buddy confidence: Similarly, Knowledge Buddy’s confidence estimation complements the broader strategy of human-in-the-loop governance. While Knowledge Buddy doesn’t natively provide a confidence score (as far as I know), you can architect a solution using GenAI Connect—a tool that lets you customize AI interactions by crafting specific prompts. For instance, you can instruct the model to return both an answer and a self-assessed confidence score, parsing these into separate fields for downstream processing. Although the confidence score is generated by the model itself and isn’t absolute ground truth, it serves as a useful signal, especially for flagging situations where the model is working outside its familiar domain. This is particularly helpful in customer service or knowledge management scenarios, where ambiguous queries may need escalation to a human expert.

Bringing it all together, the combination of confidence scoring, human-in-the-loop review, and correction logging forms the foundation for trustworthy AI deployment, as described in Ryan Easley’s “Mastering Trust” article. These mechanisms not only improve accuracy but also provide the evidence needed to expand AI’s role responsibly, starting with narrow use cases and widening scope as metrics and governance frameworks mature. Across industries, this practical, measured approach ensures both reliability and scalability in AI adoption.

1 Like

@sikap My thought was pega was very careful in redirecting processing to GPU OR JVM Node to avoid anti patterns in enterprise applications. By restricting retraining of the model to make the behaviour deterministic and process it on JVM nodes. Pega systems design is intentional to avoid GPU processing on cloud environment if not needed like a switch. if they turn it off whole process behave deterministically.