ISPN000299: Unable to acquire lock after 15 seconds for key SessionCreationMetaDataKey

This is related to incident INC-268701.

We recently upgraded our application from pega v7.2 to pega v8.6.5 . After we migrated to prod, for 2 days everything worked fine after setting up the prod environment. However from the 3rd day onwards, we started seeing a lot of slowness, as well as the following error, on logging in with administrator access :
Org.infinispan.util.concurrent.TimeoutException : ISPN000299: Unable to acquire lock after 15 seconds for key SessionCreationMetaDataKey(NCb-6AkmthbIwwoZpSMc8kqoeRwtJ9G-JtQ5hYWu) and requestor GlobalTx:Pega-MX-slave2-mxcyvlpras2005:pega-mx-server-one:2036. Lock is held by GlobalTx:Pega-MX-slave2-mxcyvlpras2005:pega-mx-server-one:2034

We’re seeing a lot of deadlocks related to the job scheduler PZPURGEPRSYSSTATUSNODES in the logs :

ERROR    - [PersistentJobExecutionFactory] Job[pzPurgePRSysStatusNodes] execution lock has failed. 
com.pega.pegarules.pub.database.LockFailureException: Exception occurred while retrieving existing lock PZPURGEPRSYSSTATUSNODES: code: <none> SQLState: Problem executing lock check: code: 1205 SQLState: 40001 Message: Transaction (Process ID 83) was deadlocked on lock resources with another process and has been chosen as the deadlock victim. Rerun the transaction. 
DatabaseException caused by prior exception: com.microsoft.sqlserver.jdbc.SQLServerException: Transaction (Process ID 83) was deadlocked on lock resources with another process and has been chosen as the deadlock victim. Rerun the transaction. 
 | SQL Code: 1205 | SQL State: 40001 

We have also noticed the below observations :

  1. This slowness is only observed for dev/ops/admin users, i.e. while accessing the dev studio/admin studio/app studio. Our branch users who access the applications are fine. Hence the impact is more for the ops team who needs to access the admin studio, and this would also impact importing packages during deployments.

  2. Somehow all the 6 nodes got configured as STREAM nodes, and the job scheduler PZPURGEPRSYSSTATUSNODES runs on all nodes.

Has anyone else faced similar issues? What is the impact if all nodes are configured as stream nodes?

Hi @SayoniM1 Please let me know whether you got any fix for the above issue.

We are also facing the same, Please share any idea regarding the same

Regards,

Surya

@SayoniM1 @SuryaYanamandra ticket INC-268701 logged against Pega 8.6.5 was closed in June with the following note:

"
For the connection parameters that are added to the URL for the MSSQL database there are a few important parameters, namely “selectMethod=cursor” and “sendStringParametersAsUnicode=false” that can cause some significant issues if not present.

In addition, we recommend that the JDBC transaction isolation level be set to “READ_COMMITTED_SNAPSHOT” in order to avoid some potential deadlocks.

Read committed snapshot is set at the database level, and is a setting that would need to be configured by sp_configure.

Letting the job scheduler run on a single node should be fine, it is not attempting to run on more than one node.

The user confirmed that decommissioning of a node and changing the node type in test environment worked fine"

Please post any new questions relating to this type of issue as a New Question.