Hi @morak — these are excellent questions and touch on scenarios that frequently arise during client demos, especially when the conversation moves beyond the basics to more advanced, practical concerns.
@JayachandraSiddipeta’s proof-of-concept (POC) approach is spot-on from an architectural standpoint. Let’s break down each of your questions with a delivery-focused lens, clarifying the technical concepts and connecting them for a more cohesive understanding:
Confidence scoring per field (DocAI): As @RameshSangili points out, scoring every individual field can quickly become overwhelming. Instead, it’s more effective to apply confidence scoring selectively, focusing on fields that carry significant business risk. For example, in a mortgage application, the loan amount and property address require high accuracy and should be prioritized for confidence scoring, whereas less critical fields like the applicant’s middle initial can be deprioritized. This selective approach is also valuable in insurance claims processing: prioritize confidence scoring for high-value claims to ensure careful review, while routine approvals can be streamlined. The DX component approach refers to using Pega’s Digital Experience components to visually highlight fields with low confidence scores, prompting human review only when necessary. A decision table can automate this governance, setting thresholds such as: confidence above 90% leads to auto-acceptance, scores between 70–90% trigger a visual warning, and anything below 70% mandates a human review. This ensures both efficiency and risk management.
Learning loop: Building on the importance of confidence scoring, the learning loop ensures continuous improvement by systematically capturing and analyzing corrections made by humans. Pega’s current GenAI architecture is prompt-driven, meaning that the AI’s responses are shaped by the instructions (prompts) provided at runtime, rather than by retraining or fine-tuning the underlying model. This allows for flexible adjustments through prompt updates. In practice, you collect human corrections (comparing original AI outputs with the revised values), review these patterns regularly, and refine prompts to address recurring misinterpretations. While this feedback loop is manual today, it’s a deliberate feature for clients in regulated industries—automatic self-training could compromise governance. Prompt versioning with audit trails supports transparency and compliance, making sure every change is tracked and justified. In sectors like banking or healthcare, this approach mitigates risks associated with ungoverned model updates.
Knowledge Buddy confidence: Similarly, Knowledge Buddy’s confidence estimation complements the broader strategy of human-in-the-loop governance. While Knowledge Buddy doesn’t natively provide a confidence score (as far as I know), you can architect a solution using GenAI Connect—a tool that lets you customize AI interactions by crafting specific prompts. For instance, you can instruct the model to return both an answer and a self-assessed confidence score, parsing these into separate fields for downstream processing. Although the confidence score is generated by the model itself and isn’t absolute ground truth, it serves as a useful signal, especially for flagging situations where the model is working outside its familiar domain. This is particularly helpful in customer service or knowledge management scenarios, where ambiguous queries may need escalation to a human expert.
Bringing it all together, the combination of confidence scoring, human-in-the-loop review, and correction logging forms the foundation for trustworthy AI deployment, as described in Ryan Easley’s “Mastering Trust” article. These mechanisms not only improve accuracy but also provide the evidence needed to expand AI’s role responsibly, starting with narrow use cases and widening scope as metrics and governance frameworks mature. Across industries, this practical, measured approach ensures both reliability and scalability in AI adoption.