Long running (Stale) Thread causing system resart

Hi team..

Can I get more information regarding this SR. We are facing the same issue. which configuration is this article pointing to? Is it DSS or an option exposed on rule form?

https://community.pega.com/node/2396651

Alerts for queue processor (QP) items which took more than 15 minutes to run could result in the system marking the node as ‘unhealthy’. In environments with Pega Health Check enabled, this would shut down the node gracefully. It was not possible to change this default as it was hardcoded. In order to support systems that may have custom processes that run beyond 15 minutes, a a new setting has been exposed that allows configuration of the interval after which a node with long-running queue processor is marked as unhealthy and is restarted. By default this remains 900000 milliseconds / 900 seconds / 15 minutes, but it may be adjusted up to 24 hours to avoid premature node shutdown. The stale thread detection mechanism will take that setting into account and use the provided value or default to 15 minutes if the value was not provided. In addition, the threshold’s units in the UI have been changed from ms to seconds.

Resolved in Product Version: 8.3.6

SR/INC: 172675

Issue: 649451

@Tarun Bolla I’ve checked the description listed against this Issue-649451 (fixed in Pega 8.5.4 under ISSUE-640749)

The configuration mentioned appears to be a possibility to set node’s health check timeout to extend a default 15 minutes.

You should be able to find this configuration on the Queue Processor rule form to allow for setting a timeout value that will be used to mark node as unhealthy. . This value defaults to 15 minutes and can be extended up to 24 hours.

Hi @MarijeSchillern

But it’s mentioned in the article that this is available in 8.3.6 onwards.. Can you confirm if it’s a typo in the article?

Hi @Tarun Bolla I checked the Pega version you reported this against and responded using the correct incident ID for that particular version.

So for Pega 8.5 clients it is fixed in 8.5.4.

The same issue found by our 8.3 customers was reported against Issue
649451 and I can confirm that it was fixed in the Patch Release of 8.3.6 as reported in the Resolved Issues link you found.

I hope I’ve explained it a bit better. Feel free to let me know if you have any questions.

Hi @MarijeSchillern

Thanks for taking time to explain in detail. This clarified all the questions. Screenshot for reference incase someone stumbles onto this question..

@Tarun Bolla @MarijeSchillern

where do we find this setting in 8.4.1? we need to increase the threshold as our processing activity will be taking more time.

i see that it is there in 8.5 onwards;

thanks in advance for any guidance