pzPersistClusterState ootb job scheduler failing in production.

Hi All,

We are getting failures while running the job scheduler pyPersistClusterState. Below is the stack trace.

Job Scheduler pyPersistClusterState activity Log-System-State-Cluster:pzPersistClusterState execution failed on node pega-swc-prod-batch-85c946766b-mwmth. Error message: Job Scheduler [pyPersistClusterState] activity [pzPersistClusterState] execution marked as failed with message [Exception happened at remote node pega-swc-prod-web-5dc9c88488-kkj9p]. Exception message [-].

Upon checking the logs we can see below pipeline error

java.io.IOException: Broken pipe
at sun.nio.ch.FileDispatcherImpl.write0(Native Method) ~[?:?]
at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47) ~[?:?]
at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:113) ~[?:?]
at sun.nio.ch.IOUtil.write(IOUtil.java:79) ~[?:?]
at sun.nio.ch.IOUtil.write(IOUtil.java:50) ~[?:?]
at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:462) ~[?:?]
at com.pega.hazelcast.v5.internal.networking.nio.NioOutboundPipeline.flushToSocket(NioOutboundPipeline.java:439) ~[pega-

Please let us know how to reduce this failure in production.

@PrasadN16719109

There could be different reason for your mentioned issue.

It can be Timeout / Network issue. It could also be related because the connection was closed before the update/write operation may have been completed. You need to check the logs to see if any other exception did happen before job scheduler pyPersistClusterState failue. That may give a better idea