How to set the number of Partitions in Batch Outbound Scheduler Dataflow of Customer Decision Hub?

Hi,

We need to adjust the number of partitions of the batch outbound scheduler dataflow on Customer Decision Hub from the default 99.

We have 6 batch nodes with each 7 threads. On our daily scheduler, it will really slowing down when some of the nodes finished their partitions & only several with slow nodes working with the partitions left, which if it can be processed with the other idle nodes, it will be so much faster.

I have changed MKTSegPartitionCount & MKTEmailPartitionCount dynamic system settings & fully restart the server, but it still not working.

Can anyone tell me which configuration should i change for this need?

Thank you for your help.

To adjust the number of partitions for the batch outbound scheduler dataflow in Pega Customer Decision Hub, you need to create and configure a dynamic system setting. In the header, click Create > SysAdmin > Dynamic System Settings. Configure the parameters in the New tab. In the Owning Ruleset field, enter: Pega-Engine. In the Setting Purpose field, enter: prconfig/dsm/services/stream/pyTopicPartitionsCount/default. In the Value field, enter the new global default partitions count per topic. Save the changes and restart the server. Please note that the new setting is applied only to newly created partitions.

:warning: This is a GenAI-powered answer . All generated answers require validation against the provided references.

Changing the default number of partitions per topic

Managing Concurrent Campaign Data Flow Runs

==> @malaa1 @nairv1 please could you provide your input here?

@ClarissaL16661030

Want to update about this question, I’ve got the solution from GCS.

It can be addressed by updating the partitions count by using the formula.

The number of partitions = 3 * number of threads (Infrastructure > Services) * number of batch nodes.

SysAdmin > Dynamic System Settings > MKTSegPartitionCount has to be updated as per the above formula.

If the updated value is not taking effect, it could be due to the Refresh audience option before each campaign run is not configured. You can also Refresh the segment configured on the campaign manually and during the campaign run, verify if the partitions on the dataflow got updated.
Try the above before increasing the number of threads or batch nodes.

Clarissa

@MarijeSchillern Hi Marije, thank you so much for your response. But there are another way by changing the MKTSegPartitionCount DSS & refresh the segment which are easier to implement in this case.

@ClarissaL16661030 I understand that the factor of 3 is to distribute the load evenly .. but do we know why it is “3” . Shouldn’t the no of partition be a multiple of (no of node * no of threads ). In your case - a multiple of 42 , like 84 ,126 ?

@DeepakRaghulR16785688 Hi Deepak, honestly i didn’t really get the detail of why it should be 3, but I believe the number 3 is only the recommended baseline just to make sure we didn’t over partitioned our segment.

I think we can configure the multiplier even higher than 3 depends on the number of processed data, because in my case, the speed of my batch nodes are varying. Sometimes it took so long for a node to process, when all of the other nodes already finished their job. For this case, I think increasing the partitions will be beneficial to make sure the loads were evenly distributed and no idle nodes when the others are ‘struggling’.

Just try configuring different number of partitions & compare the performance.