com.pega.dsm.dnode.api.StreamServiceException: Cannot add new node

Hi,

Greetings. Hope you are well! We have a situation where the stream service does not start because of the exception

2022-11-18 11:36:04,746 [StreamServer.Default] [ STANDARD] (rvice.operation.StartOperation) ERROR - Cannot start service [StreamServer.Default]. Will retry in 180 seconds. Remaining attempts: 2
com.pega.dsm.dnode.api.StreamServiceException: Cannot add new node. Make sure all listed nodes are available or reattach the volume from one of these nodes to this node.

We stop the instances every day and they get restarted next day. IP addresses are not static, and entries that are in pr_data_stream_* today may not be the same tomorrow. As a workaround, i have to truncate these tables every day and restart the ec2 instances however this cannot be the solution in Production environments. Is there any one who have had similar issue and found a resolution?

I have raised an incident with Pega INC-249864

Any suggestions, please do let me know!

Regards,
Bharat

@KOMARINA Any updates to the referenced issue? Are you able to get a stream service started?

Thanks!

@JonathanB9941 @KOMARINA the support ticket was closed with the following information:

"Kafka-data is not a persistent volume and maintaining in the node file system which is going to retire every day. This makes the stream cluster inconsistent as it tries to get the volume from the old nodes as the stream tables still have old node information.

The first thing we need to take care is reattaching the volume which was explained during a call on 18th December.

We have to create separate EBS volumes per node and the same has to re-mounted to the new node which is replacing the old node.

Please see following URL

__*Terminate Amazon EC2 instances - Amazon Elastic Compute Cloud

It is required to have same node ids. Please check with your AWS team to get more info on this."

@JonathanB9941

try cleaning OOTB tables if you are facing the issue with PE

delete from data.pr_data_stream_sessions where pyid is not null;
commit;
delete from data.pr_sys_statusnodes;
commit;