Takeaway…
Real customer interactions are not isolated events. Each decision changes the customer’s future responsiveness, for better or worse. Repeatedly showing irrelevant actions trains customers to disengage, lowering the value of everything that follows.
A PxV (propensity × value) strategy is mathematically optimal only if the interaction is the last one you will ever have. In any ongoing relationship, prioritising relevance and positive engagement can create far more value over time than maximising the expected return of a single decision.
I trained as a physicist. Physicists have a long and slightly deserved reputation for making drastic simplifications in order to make hard problems tractable.
A classic example is a student exercise that asks how long it would take to defrost a frozen woolly mammoth, given that it takes 24 hours to defrost a frozen chicken. On the surface, this is an awkward problem. Chickens are covered in feathers (or plastic), mammoths in thick fur. They have very different shapes, sizes, and appendages. Wings, legs, ears, tusks, beaks. All this impacts heat transfer.
However, the promising student begins their calculation with the phrase: “Let us assume a spherical mammoth in a vacuum…”
This kind of simplification turns an intractable problem into a solvable one, and often gets you surprisingly close to the right answer. But it only works if the simplification preserves what actually matters in the system, and discards only what is genuinely incidental. When they discard the wrong things, the maths still works, but the conclusions quietly stop being useful. The same is true in customer decisioning: simplifications are necessary, but only if they preserve how customers actually respond over time.
In my previous post, “Beyond Customer Propensity: Incorporating Business Value in AI-Driven Next Best Action,” I outlined ten core principles for combining Value (V) with Propensity (P) in Pega Customer Decision Hub (CDH) arbitration. We discussed how combining propensity (a measure of customer relevance) with business value can better align Next-Best-Action (NBA) decisions with organisational objectives. That conversation sparked an insightful comment by my long-time colleague Peter van der Putten, who highlighted a critical assumption underlying the simple PxV approach. In this follow-up post, I want to explore that assumption and propose how we might evolve beyond a simplistic PxV utility function to improve long-term customer engagement and business outcomes.
The Flaw in the Independence Assumption
The PxV model essentially treats each customer decision as an independent event: we multiply the probability of a positive response (P) by the attributed business value (V) of that response, aiming to maximise the expected value of each interaction. This is a neat simplification, although too neat, as it turns out. The fundamental flaw is assuming that a decision at time t does not influence future decisions at time t′ for the same customer.
In reality, customer interactions are not independent trials. Each experience leaves a mark on the customer’s future behaviour, shaping how they will respond to the next offer or message. Customer’s have memory of their experiences with you and that influences behavior.
When we relentlessly push high-value but low-relevance offers (e.g. repeatedly trying to sell an expensive new product to an uninterested customer), we risk diminishing the customer’s propensity to engage in the future. As Peter aptly noted, “…if we get offered lots of food (actions) we don’t like, we stop drooling (subsequent propensities will be lower).”
In other words, bombarding customers with messages they find irrelevant or unhelpful trains them to tune out and say “No”. Marketing research backs this up: we know unsubscribe rate increase when relevancy in messages is low. Consistently low-quality interactions create a kind of “marketing fatigue” or negative conditioning that lowers response rates over time.
The flip side is also true. If customers are consistently presented with highly relevant interactions, even ones with modest business value, they are more likely to remain engaged and responsive in the long run. You might even say we can “train our customers to say yes” (in a positive sense) by rewarding them with useful, welcome offers. Each positive interaction builds trust and primes the customer to respond favourably to future propositions. In short, every NBA you show today affects the success of your future NBAs – for better or worse. Our customers are not perfectly spherical. In customer engagement, assuming perfectly rational, memoryless customers is an assumption too far.
The relationship with Customer Lifetime Value
CLV is not the value of a single response or a single offer. It is the expected value of an entire future relationship with a customer: the cumulative value of many interactions, spread over time, discounted by uncertainty and decay. In other words, CLV is explicitly a sequence‑level construct.
Where PxV asks, “What is the expected value of this one interaction?”, CLV asks a much harder question:
“How does this interaction change the value of all the interactions that come after it?”
That distinction matters. A decision that maximises short‑term PxV can still destroy long‑term value if it conditions the customer to disengage. Conversely, an interaction with modest immediate value can increase CLV if it builds trust, relevance, and responsiveness that pays off later.
This is why CLV has always been conceptually aligned with retention, loyalty, and relationship management rather than one‑off conversion. It reflects the reality that customers remember, adapt, and learn — exactly the behaviours that break the independence assumption behind naïve PxV.
The Pavlovian Pattern in Customer Behaviour: Brand Loyalty and Aversion
Human beings have memory, emotions, and learned behaviours. Over time, your customers can develop habits in how they respond to your interactions.
If you treat every touchpoint as an isolated opportunity to maximize immediate value, you might unknowingly create a negative habit. For example, consider a customer who keeps getting sales pitches for high-end smartphones and pricey plans when all they requested was a solution to a billing problem. After rejecting offer after offer, the customer becomes conditioned to assume that “these messages aren’t for me” and starts ignoring future communications entirely. From the business’s perspective, not only is the customer’s propensity to respond dropping, but any future value from that customer is now in jeopardy.
This Pavlovian effect extends beyond just the customer’s own habits. In assisted channels (such as call centres or retail stores), the people delivering the offers are also affected. If a sales agent makes ten consecutive offers (even to different customers) and all are refused, the agent will lose confidence in the system. As Peter observed, they might even start bypassing the AI’s recommendations – effectively “gaming” the system by not offering actions they expect to be declined. This human feedback loop can further distort your decisioning outcomes.
So, what does this tell us? It means that a naïve PxV approach, treating each recommendation like a one-off gamble, misses a crucial part of the reality of customer engagement. The simple multiplication P×V optimises for the immediate expected value of a single interaction, but that’s only truly optimal if you never plan to interact with the customer again. If you’re in the business of building long-term customer relationships (and most of us are), you have to think beyond the “spherical” customer and beyond the next single offer.
Rethinking the PxV Function: from “Value” to “Utility”
OK. Let’s roll up our sleeves up and do some maths. And let’s be brave and work to a strong cadidate proposal on how to solve this.
Once you accept that customer decisions are not independent, the problem with plain PxV becomes clearer. “Value” is rarely linear in its effect on long-term outcomes. A £900 handset is not 15 times “better” than a £60 roaming bundle in the way a naïve PxV multiplication implies, especially when repeated, low-relevance pushes can train customers to disengage.
This is exactly why decision science distinguishes value (the raw magnitude in some unit) from utility (the experienced or effective value once you account for diminishing returns and behaviour). There is a deep literature on this idea. Classic expected utility theory models preferences using concave utility functions to capture diminishing marginal utility, and behavioural economics extends this further with concepts like diminishing sensitivity in value functions.
At this point it is worth being explicit about intent. The goal here is not to discover a “true” utility function in the economic sense, nor to perfectly model customer psychology. In a real‑time decisioning system, utility functions play a different role: they act as behavioural constraints on optimisation. They define what kinds of trade‑offs the system is allowed to make, not what the world “really” looks like.
From that perspective, the exact functional form matters less than the properties it enforces. We want utility to increase with value, but with diminishing returns. We want extreme values not to dominate relevance. And we want the result to be simple, interpretable, and governable. Any function with those properties would serve. The exponential form is not special because it is “true”, but because it is the smallest, smoothest function that gives us all three with a single, meaningful parameter.
What have users of CDH tried in the past?
Practitioners have not been blind to these issues. Long before anyone reaches for new utility functions, real CDH implementations tend to experiment with ways of softening the dominance of raw value in arbitration.
One common approach is to tilt arbitration more strongly toward propensity, making relevance the primary driver in the short term. Instead of a simple P×V, teams experiment with non‑linear combinations such compressing value with expressions P×√V. The intent is clear: an action the customer is likely to accept is often preferable to a high‑value action that is likely to be rejected, because acceptance preserves engagement and future opportunity.
Another pragmatic pattern is additive or weighted scoring, for example:
Score = P + α·V
with α deliberately kept small. In this formulation, value acts as a tie‑breaker when relevance is comparable, rather than a force that can overwhelm it. An action with dramatically lower propensity should never leapfrog a highly relevant one simply because its nominal value is larger. This approach reflects a widely shared intuition: business priorities matter, but only after relevance has been respected.
These techniques are sensible, and they often improve outcomes compared to naïve PxV. But they all share the same structural limitation: they reshape the arithmetic without explicitly modelling diminishing returns. They reduce the impact of value, but they do not encode the idea that additional value should matter less and less as it grows. As a result, they remain difficult to reason about, difficult to govern, and difficult to evolve as the business changes.
A practical first proposal: concave exponential utility
We want to transform action value into action utility. A very useful, simple choice is the exponential utility function:
![]()
This function has three properties that are exactly what we want for NBA arbitration:
-
Monotonic: higher V always gives higher u(V), so value ordering is preserved.
-
Concave: marginal utility diminishes as V increases, so huge values stop dominating.
-
Flexible: α tunes how quickly the utility “saturates”. Larger α saturates faster, smaller α saturates more slowly.
In practice, you use it as a transform on V before combining with propensity. A simple pattern is:
![]()
This keeps propensity in the driving seat, but allows value to matter without turning the system into a “highest price wins” machine.
“Using this concave exponential utility function preserves value ordering but prevents extreme values from overwhelming propensity, which is exactly what you want in repeated, customer‑centric decisioning.”
To make this concrete, here is u(V) across a wide range of telco-style values. Think of V as any consistent business-value currency (margin, contribution, strategic points). The key is the shape, not the unit.
How to read this table
-
Value ordering is preserved: a flagship handset is always “worth more” than a roaming bundle, which is worth more than a free accessory.
-
Dominance is removed: with a balanced α (0.005), a £1,000‑equivalent bundle is not twenty times more influential than a £50 add‑on. Its utility advantage is real, but bounded.
-
α controls strategic emphasis:
-
α = 0.02: value saturates quickly, so value mostly acts as a tie‑breaker.
-
α = 0.005: value differentiates sensibly across handset tiers.
-
α = 0.002: value continues to matter across a wide commercial range.
-
This is why concave utility is so powerful in NBA arbitration. It allows you to compare a cheap handset, a premium handset, and a service interaction in the same decision frame, without allowing the largest price tag to overwhelm customer relevance.
Worked telco arbitration: PxV versus P×u(V)
Now consider four typical telco actions:
-
Premium handset upgrade (high value, low relevance for many customers)
-
Roaming bundle (moderate value, moderate relevance)
-
Free accessory (low value, high relevance)
-
Service reassurance message (tiny monetary value, very high relevance)
Using α=0.005:
Notice what happens:
-
PxV is dominated by the handset upgrade, even with low P, because V is so large.
-
P×u(V) allows the roaming bundle to win because it is both relevant and reasonably valuable, while the premium handset cannot overwhelm propensity simply by being expensive.
This is the behaviour you want if you care about repeat interactions, not one-off extraction.
What does α really mean?
α is a “saturation rate”:
-
High α: value saturates quickly, so after a certain point “more value” barely matters. This is useful if you want value to act mostly as a tie-breaker.
-
Low α: value saturates slowly, so value keeps differentiating actions over a broader range. This is useful if you genuinely want to discriminate between value bands, but still avoid domination.
A practical way to set α is to decide what “high value” means in your chosen value currency and tune α so u(V) is near saturation for that range. The point is not financial precision, it is behavioural control.
But let’s look at the table again. Despite the blunting of value, the service reassurance message looks like it is still very unlikely to win arbitration. But it is exactly these sorts of actions that customers really appreciate and can add to significant brand loyalty. How do we ensure these never lose?
Going one better: explicitly rewarding positive engagement
We have still not gone far enough to ensure actions with negligible commercial value can sometimes win.
An agent in a store wishing a customer “Happy Birthday” has zero immediate economic impact. But how can we nudge the staff member to celebrate the occaision if its intrinsic value is near non-existent. And yet we know it will be met with a strong positive response.
If we want to include a term in our utility function that emphasises that any positive engagement has value, even if the business value in monetary terms is negligible, what would the expression look like?
Let’s simply add a constant that reflects the value of customer engagement
![]()
This says: every action that is worth showing has a baseline reward (β), and then monetisation contributes additional utility with diminishing returns.
When you use Score = P×u(V), even low-V actions are not treated as worthless if they are relevant and keep the customer positively engaged.
Here is what that looks like with β=0.05 (same telco actions as above):
The reassurance message is still not going to beat strong commercial actions, but it no longer collapses to “basically zero” simply because its monetary V is small. This will be in with a fighting chance when there is not a strong commercial offer available.
From experience, we know that even when an offer is declined, if there was high relevance it have a strong positive emotional response. NPS scores can be high even when the offer was not quite right. Making a empathic offer has an intrinsic value and our β value reflects that.
Why this works, and why it matters
Putting it all together, an exponential utility function, with optional explicit engagement separation, gives us an arbitration scheme that is:
-
Customer‑centric by construction, because propensity remains the dominant signal
-
Business‑aware without being extractive, because value has diminishing returns
-
Stable and governable, because α and β are simple, interpretable parameters
-
Aligned to repeated interactions, because it avoids “big value now, customer fatigue later”
Crucially, this is not just mathematically convenient. It aligns extremely well with the ten principles laid out in the previous post.
A concave utility function preserves the clean separation between relevance and importance. Propensity still answers “what is the customer most likely to welcome?”, while value answers “how much do we care if this works?”, but without forcing value to do too much heavy lifting. Because extreme values no longer dominate arbitration, we rely less on downstream mechanisms such as aggressive weighting to stop sales actions overwhelming service or retention. That is a good thing.
Weights are powerful, but they are also blunt instruments. When arbitration itself is better behaved, weights can return to their proper role: expressing strategy and brand intent, not compensating for a utility signal that is structurally misaligned with long‑term experience. In other words, a well‑designed utility function reduces the need for corrective configuration elsewhere in the stack.
But how do we choose α and β?
This is the point where it is tempting to look for a theoretically “correct” answer. There isn’t one.
The right values for α and β cannot be derived from first principles, because they depend on something we do not fully know in advance: how spherical our customers really are.
A perfectly spherical customer, in the physicist’s sense, would respond independently to each decision, would not remember past interactions, and would not change their behaviour as a result of repeated experience. In that imaginary world, linear PxV would be fine, and α could be arbitrarily small. Value could scale without consequence.
Real customers are not like that.
Some customer bases are highly sensitive to over‑commercialisation. Push high‑value sales too early or too often and propensities collapse. Others are more tolerant, or even expect assertive selling once relevance is established. Some segments respond very positively to repeated low‑value, high‑relevance interactions. Others disengage unless there is a clear tangible benefit.
In other words, α and β are empirical properties of your line of business, your customer population, the market and your corporate strategy, not simply abstract constants. They encode how quickly customers experience diminishing returns from value, and how much intrinsic worth there is in engagement itself, independent of monetisation.
This implies an empirical, test‑and‑learn approach
The only credible way to tune these parameters is empirically.
Start with conservative assumptions. Use a value of α that ensures value saturates within a sensible commercial range, and a small β that explicitly recognises that positive engagement is never worthless. Then observe what happens.
Measure acceptance rates, downstream propensities, opt‑outs, channel fatigue, and longer‑term outcomes such as conversion later in the journey or retention. Simulate alternative settings. Run controlled experiments. Adjust deliberately.
Over time, you may discover that different parts of your business, or different customer populations, have measurably different “degrees of sphericity”. That opens the door to more sophisticated optimisation strategies, where the utility function itself evolves based on observed customer behaviour, not just static design assumptions.
That may sound ambitious, but it is entirely consistent with what Customer Decision Hub is already doing with propensity learning. Extending that learning mindset to how we combine relevance and value is a natural next step.
And if, in doing so, we move closer to optimising not just individual decisions but the long‑term performance of the entire enterprise, that is not overreach. It is simply taking seriously the idea that customer experience is a system, not a sequence of isolated offers.
The destination is not a single perfect utility function. The value is in adopting a framework that lets us evolve safely, empirically, and transparently, without pretending that our customers are perfectly spherical.
So how long does it actually take to defrost a frozen woolly mammoth?
The time to defrost depends on the heat diffusion through the object, which for spherical approximations scales with the square of the radius. Assuming similar densities, the radius scales with the cube root of the mass, so the defrosting time scales with the mass to the power of 2/3.A typical frozen whole chicken has a mass of about 2 kg, while an adult woolly mammoth has a mass of about 6,000 kg.The mass ratio is 6,000 / 2 = 3,000.The time ratio is 3,000^(2/3) ≈ 208. Thus, the defrosting time for the mammoth is about 200 days - plus or minus a week.



