Job pyNodeCleaner failing

Hi All,

On one of our Production systems, the job pyNodeCleaner fails to run on all the web nodes. Results in a SQL error.

Error message: Job Scheduler [pyNodeCleaner] activity [pzNodeCleaner] execution marked as failed with message [There was a problem getting a list: code: 1795 SQLState: 42000 Message: ORA-01795: maximum number of expressions in a list is 1000]. Exception message [-].*

Having checked this table, pr_data_op_session there are 2.9m records in here dating back to 2021, with an avg of 400k in each created month.

The SQL being generated is below, shortened since it’s very big.

Caused by: Error : 1795, Position : 73, Sql = DELETE FROM BASE.pr_data_op_session "PC0" WHERE  ( "PC0"."PZREQUESTORID" IN (:1  , :2  , :3  , :4  , :5  , :6  , :7  , :8  , :9  , :10  , :11  , :12  , :13  , :14  , :15  , :16  , :17  , :18  , :19  , :20  , :21  , :22  , :23  , :24  , :25  , :26  , :27  , :28  , :29  , :30  , :31  , :32  , :33  , :34  , :35  , :36  , :37  , :38  , :39  , :40  , :41  , :42  , :43  , :44  , :45  , :46  , :47  , :48  , :49  , :50  , :51  , :52  , :53  , :54  , :55  , :56  , :57  , :58  , :59  , :60  , :61  , :62  , :63  , :64  , :65  , :66  , :67  , :68  , :69  , :70  , :71  , :72  , :73  , :74  , :75  , :76  , :77  , :78  , :79  , :80  , :81  , :82  , :83  , :84  , :85  , :86  , :87  , :88

Any idea what causes this and how do we rectify?

Thanks
Craig

@CraigA52

It sounds like the table pr_data_op_session is not being purged correctly which means you have passivated sessions being retained.

See Operator Presence troubleshooting FAQs

Have you noticed Job Scheduler pzClusterAndDBCleaner failing?

Operator presence records are stored in the pr_data_op_session table in the database. If requestors remain in the Active state for a very long time (even when the requestor session has ended) then that is not a good sign. You can purge anything from this table, that is not in ‘away’ or ‘active’ status. ‘Away’ means its passivated. But if ‘Away’ is longer than 24 hours, then you can get rid of it because the passivation data is gone after 24 hours.

What is the setting of the following DSS setting ?prconfig/initialization/persistrequestor/SystemCleanerCanProcessDatabaseObjectsFromAllNodes/default value

This DSS will help to avoid concurrency issues like pyNodeCleaner scheduler enabled from 3-4 nodes running at same time Wth this DSS enabled pyClusterAndDBCleaner (enabled on backgroundProcessing nodetype) will qualify all the nodes’ data & cleanup.

The pr_sys_context will have the passivated requestor details and pr_page_store tables will have all the passivated pages for each requestor. Check the “initialization/persistrequestor/contentExpiryThreshold” setting(default is 1 or 2 day) to mark the passivation data as expired and remove it from those 2 tables.

Presence table is cleaned up as part of pyClusterAndDBCleaner job scheduler which should runs periodically (once in 24 hours for example). This job first removes the old records from pr_sys_context table (hold passivated requestors data) and then tries to remove all of those requestorids from the presence table pr_data_op_session which got deleted from sys_context table.

If you have the old entries in this table there could be multiple reasons, which can be investigated by looking at the logs.
One reason could be that the Job Scheduler pyClusterAndDBCleaner was not run or failed to run, due to which it did not clean up the old entries.

Only as a last resort should you use manual DB changes.

eg

truncate table pegadata.pr_log_reportstats;
delete from pr_data_op_session where pzrequestorid not in (select pxrequestorid from pr_sys_context) and pxispassivated = ‘true’ and pxState in(10,15) and pxcreatedatetime < CURRENT_TIMESTAMP - INTERVAL ‘2 DAY’;

Please also set this DSS to purge data more frequently from this table

Owning Ruleset: Pega-RulesEngine
Setting Purpose: reporting/reportstatistics/minimumDaysToPurgeFromDatabase
Value: 7

Also note:

If you are seeing ORA-01795 error while pzClusterAndDBCleaner job is running then this is a known problem (o BUG-695023). This error occurs when there are more than 1000 requestors in IN clause of delete query which deletes records from presence table.

The pzClusterAndDBCleaner job scheduler was not able to cleanup data in pr_op_data session table due to the delete query formed to clean up this table throwing “ora-01795 maximum number of expressions in a list is 1000 oracle 19c” exception. This has been resolved by splitting requestor IDs into batches of 1000.

Fixed in: 8.5.6, 8.6.4, 8.7.2 and 8.8. You can read about it in the ‘Resolved Issues’ documentation.