High heap usage even when no traffic

DivyakrishnaV16705370 · April 10, 2025, 5:54pm

Non-Prod environments frequently going down with high heap usage even with traffic and with out traffic.

Please find attached doc for details.

NonProd environment issue.docx (85.7 KB)

MarijeSchillern · April 25, 2025, 12:54pm

@DivyakrishnaV16705370

if you still need help please log an INC via the MSP and provide the INC reference here.

This is a GenAI-powered tool. All generated answers require validation against the provided references.

Root Cause Analysis

Based on the information you shared, the primary issue appears to be related to the high number of batch requestors that remain in memory even when there’s no traffic. The data you provided shows 3,561 total requestors with 2,240 batch requestors, but only 35 active requestors. This significant discrepancy is a clear indicator of the problem.

In Pega environments, batch requestors created by background processes like Queue Processors, Agents, and Job Schedulers are not automatically cleared from memory. Instead, they remain in memory and are reused until the node is restarted. This explains why your heap usage remains consistently high (>80%) even without traffic, and why it quickly maxes out when traffic is introduced.

Recommended Solutions

Here are my recommendations to address this issue:

1. Implement Batch Requestor Cleanup

The most direct solution is to implement proper cleanup of orphaned batch requestors. Recent Pega Platform updates (specifically 24.2.1) include fixes for this exact issue. According to Pega support documentation: “An excessive number of batch requestors were causing performance issues and exceeding the node limit. Investigation showed the requestors were not being properly cleared, and an update has been added to return orphan batch requestors.”

If you’re not on this version, consider upgrading or implementing a similar fix through a hotfix.

2. Optimize Requestor Management

Perform a detailed analysis of your requestor count and implement proper requestor management:

Review and optimize the configuration of Queue Processors, Agents, and Job Schedulers
Consider implementing a mechanism to periodically clean up idle requestors
Check for any custom background processes that might be creating requestors without properly releasing them

3. JVM and Garbage Collection Optimization

While you’ve already increased JVM capacity from 8GB to 15GB, this is treating the symptom rather than the cause. However, you can optimize garbage collection:

Ensure GC throughput is above 97% (considered good performance)
Monitor heap memory trends using PDC (Predictive Diagnostic Cloud) or JVisualVM
Review JVM configuration settings following Pega best practices

4. Scheduled Node Restarts (Short-term Solution)

As a temporary measure while implementing the above solutions, consider implementing scheduled node restarts during low-traffic periods. Since batch requestors are only cleared when the node restarts, this can provide temporary relief.

5. Review Custom Tables and Data Management

You mentioned cleaning up custom tables that were taking longer to respond. Continue to monitor these tables and consider:

Implementing more aggressive archiving and purging strategies
Optimizing queries that access these tables
Reviewing indexes on frequently accessed columns

Monitoring and Verification

After implementing these changes, closely monitor:

Requestor counts by type (especially batch requestors)
Heap usage patterns
Garbage collection frequency and duration
System performance metrics

References:

Conversation		Replies	Views
How to remove idle or expired Requestors from Nodes General senior-system-architect , system-administration , performance , client-managed-cloud , financial-services , 8-4-1	1	726	November 24, 2023
Requestor Pooling General pega-platform , system-administration , installation-and-deployment , performance	2	328	January 11, 2025
Pega buffer to support high number of Real Time transactions General team-lead , mobile-channels , 8-7-3	1	66	September 17, 2024
Threshold limit of Requestors General pega-platform , system-administration , case-management , performance , 8-6	1	223	February 19, 2025
We are facing error in pegarules log Timed out borrowing service requestor from requestor pool for service package: PegaMKTContainer, timeout set to 10000 General decision-management , installation-and-deployment , performance , outbound-marketing , 8-3	1	194	June 26, 2024