Best Practices Around the Latest Data Science Tools & AI Models: Your Key to Maximizing Value from CDH

Join us for an engaging webinar on leveraging Pega data science tools (pdstools) to uncover value from AI models in Pega. We will explore the latest updates, including APIs to Prediction Studio, while also delving into established best practices. Discover what the future holds with the possibility of running data science tools in the new decision job tier and expanding analysis beyond adaptive models. Don’t miss this opportunity to stay ahead of the curve in the world of data science with Pega!

https://players.brightcove.net/1519050010001/default_default/index.html?videoId=6370269417112

Watch a video replay…

Download the PDF (see attached)

Find other CDH Community Events like this one

PDSTools DS Webinar 2025.pdf (4.35 MB)

Q&A From APAC Audience:

Q1: Could you please show where to download data from pega?Is this where you get the data from: https://academy.pega.com/challenge/exporting-adaptive-model-data-external-analysis/v1 ?

Ans: Correct, that academy challenge is best document showing how to do it step by step.

Q2: A question around Health Check reports, overview of actions highlighted that the number of actions (only 4) is less than best practice ie 1000. Can you elaborate more on that? is it 1000 responses or really the number of the actions?

Ans: Best practice suggests aiming for more than 1000 actions, as this allows for optimal utilization of our AI capabilities. However, depending on the industry, having over 200 actions can still be good enough. A higher number of actions is desirable because during the decision-making process, many actions are filtered out, and only a small portion—perhaps around 20—will reach the arbitration stage. If there are too few actions, it limits the effectiveness of our machine learning algorithm, as there are fewer options to evaluate and choose from during arbitration. We can not personalize enough with limited number of actions

Q3: Please correct my understanding
For cold start problem, if we use AGB then we don’t need to worry about starting propensity anymore. System will ignore the current thomson sampling to derive Starting propensity

Ans: The starting propensity in an AGB setup is determined by the action metadata, which includes the hierarchy of the actions: issue, group, and treatment-specific information like banner color. In a naive Bayes (NB) approach, each action is independent, so we lack statistical insight into the starting propensity. However, AGB works across all issues/groups and uses this metadata, as well as other predictors about actions you provide, which aids in establishing a more informed starting propensity.

Q4: One of the interesting findings we’ve encountered in Prediction Studio is the way in which categorical features are currently being binned. It seems as if they are binned by volume constraints, which means that often times unrelated categorical values are part of the same bin, which may distort any predictive signal between the categorical predictor and the outcome.
Do you have any suggestions on how to solve this within CDH (or PDS tools if possible), or will this have to be solved in the data pipeline that sends data to CDH itself?

Ans: Categorical values are binned based on the level of discrimination they offer when considered independently or as part of a bin. The process involves iteratively assigning predictor values to bins, with the aim of maximizing discrimination, measured by the Z ratio. If two seemingly unrelated categorical values end up in the same bin, it suggests that, according to the model’s current evidence, they are correlated. Additionally, the bins are adaptive, meaning they can change over time as new evidence is gathered.