Externalized Elastic restart causing k8 cluster restarts

We have externalized our Elastic search as part of our migration to k8. We recently had an issue in our lower environment where a change to our Elastic cluster caused a pod restart. The Elastic cluster went yellow and momentarily had no master.We saw the following alert although we don’t have a root cause yet: PEGA0102.

Our question is what is the impact to the Pega platform on k8 if an external Elastic cluster looses a master node for a few moments.

Below is the PegaSearch helm chart snippet that has a minimum of 1 master.

IMPORTANT: Important settings configuration | Elastic Docs

To prevent data loss, you must configure the discovery.zen.minimum_master_nodes setting so that each master-eligible

node is set to the minimum number of master-eligible nodes that must be visible in order to form a cluster.

Configure this value using the formula (n/2) + 1 where n is replica count or desired capacity.

minimumMasterNodes: 1

@DPCLARK12 I found that you already logged support tickets via the MSP for these questions.

Please include these details whenever you post on the PSC in parallel…


1. INC-B1654 (External Elastic cluster restart impacts K8 clusters)

Answer provided by GCS:

“What is the impact to the Pega platform if an externalized elastic cluster has no master node for a short duration of time?”

Without a master node in an ElasticSearch cluster, the search functionality will be completely broken.

Searching yields no results, and also indexing is impacted. This will create all sorts of alerts, such as the PEGA0102 alert you referenced as it is unable to process the queued items for indexing.

Prolonged impact will cause default healthcheck failures which subsequently shuts down and restarts the nodes.

2. INC-B2073 (Kafka Nodes intermittently Unreachable)

GCS analyzed your stream latency test results and details from your stream service landing page and provided you with the following recommendation:

The optimal latency should be less than 10ms and its taking beyond 50ms

The recommendation is to switch to the same region / DC to reduce the network latency.

Next Steps: Client - to review the information provided and confim back to GCS.


More detail from posts submitted already on this PSC forum:

If an external Elastic cluster loses a master node for a few moments, the impact on the Pega platform on Kubernetes depends on the configuration of your Elastic cluster and Pega platform. If you have multiple nodes configured on the search landing page, each of those nodes is master eligible and therefore can handle search and indexing requests. If one node goes down, the other can take care of the search requests.

The environment does not wait for the node to come back online if other nodes are available to service search and indexing requests. However, if the original node will never be added back, then the search landing page must be configured to remove the unavailable node and a new node must be added to replace it. This requires manual intervention.

:warning: This is a GenAI-powered tool. All generated answers require validation against the provided references.

More details on Elastic Search in multi node environment when the primary sea

Third-party externalized services FAQs > Why do clients have to move to externalized services?

@DPCLARK12 Ticket INC-B1654 and INC-B2073 have been closed as you have not responded to our support team for 2 weeks.

Please click Accept Solution on the reply which answered your questions.