High heap usage even when no traffic

Non-Prod environments frequently going down with high heap usage even with traffic and with out traffic.

Please find attached doc for details.

NonProd environment issue.docx (85.7 KB)

@DivyakrishnaV16705370

if you still need help please log an INC via the MSP and provide the INC reference here.

:warning: This is a GenAI-powered tool. All generated answers require validation against the provided references.

Root Cause Analysis

Based on the information you shared, the primary issue appears to be related to the high number of batch requestors that remain in memory even when there’s no traffic. The data you provided shows 3,561 total requestors with 2,240 batch requestors, but only 35 active requestors. This significant discrepancy is a clear indicator of the problem.

In Pega environments, batch requestors created by background processes like Queue Processors, Agents, and Job Schedulers are not automatically cleared from memory. Instead, they remain in memory and are reused until the node is restarted. This explains why your heap usage remains consistently high (>80%) even without traffic, and why it quickly maxes out when traffic is introduced.

Recommended Solutions

Here are my recommendations to address this issue:

1. Implement Batch Requestor Cleanup

The most direct solution is to implement proper cleanup of orphaned batch requestors. Recent Pega Platform updates (specifically 24.2.1) include fixes for this exact issue. According to Pega support documentation: “An excessive number of batch requestors were causing performance issues and exceeding the node limit. Investigation showed the requestors were not being properly cleared, and an update has been added to return orphan batch requestors.”

If you’re not on this version, consider upgrading or implementing a similar fix through a hotfix.

2. Optimize Requestor Management

Perform a detailed analysis of your requestor count and implement proper requestor management:

  • Review and optimize the configuration of Queue Processors, Agents, and Job Schedulers
  • Consider implementing a mechanism to periodically clean up idle requestors
  • Check for any custom background processes that might be creating requestors without properly releasing them

3. JVM and Garbage Collection Optimization

While you’ve already increased JVM capacity from 8GB to 15GB, this is treating the symptom rather than the cause. However, you can optimize garbage collection:

  • Ensure GC throughput is above 97% (considered good performance)
  • Monitor heap memory trends using PDC (Predictive Diagnostic Cloud) or JVisualVM
  • Review JVM configuration settings following Pega best practices

4. Scheduled Node Restarts (Short-term Solution)

As a temporary measure while implementing the above solutions, consider implementing scheduled node restarts during low-traffic periods. Since batch requestors are only cleared when the node restarts, this can provide temporary relief.

5. Review Custom Tables and Data Management

You mentioned cleaning up custom tables that were taking longer to respond. Continue to monitor these tables and consider:

  • Implementing more aggressive archiving and purging strategies
  • Optimizing queries that access these tables
  • Reviewing indexes on frequently accessed columns

Monitoring and Verification

After implementing these changes, closely monitor:

  • Requestor counts by type (especially batch requestors)
  • Heap usage patterns
  • Garbage collection frequency and duration
  • System performance metrics

References: