Multi-datacenter setup in 8.6.1 with two Openshift clusters

HI Pega Team,

Our PEGA 8.6.1 infrastructure is moving from a single datacenter (nodes are across two different datacenters, but with a single K8s cluster) to a split-k8s-cluster setup. That is, in place of one single K8s(Openshift) cluster, there will be two separate K8s(Openshift) clusters across two datacenters. So we need to run our workloads - web, batch, search (elasticsearch 5.6), stream - in both these DCs in an Active/Passive or Active/Active configuration. Please find a rough picture of the overall architecture that we are considering. How could we do this?

  1. We are using Embedded Hazelcast cluster. How could we configure this to support two independent clusters?

  2. We are considering using a single Oracle DB (with replication, as in existing single cluster setups). What would be the challenges around this and how to resolve these?

  3. How do we do cross-cluster(datacenter) replication for pega-stream data? Is this required? When should we do this?

  4. How do we do cross-cluster(datacenter) replication for elasticsearch data? Is this required? When should we do this?

  5. Should we use Hazelcast in client-server model? How does this help?

If this architecture doesn’t look feasible, could you please suggest any alternate approaches.

Thanks a lot,

best regards,
Hari

@HariR246 We are still looking for any information regarding this. Hoping for any leads as to what could be our best options.

@HariR246

We are also having same requirement and raised an SR 1 month back but no solution received yet. Please provide if you get any updates / solution that helps us.

Thanks in advance.

@DivyaKrishnaV0051

We came to know that the pods in a multi-cluster deployment should be accessible with POD-IPs, even when they are deployed across clusters. Unfortunately, Pega does not have a reference architecture for this.

Just to trigger a discussion -

For an active-passive setup (where POD IP addressing is not possible between the clusters):

  • elasticsearch snapshot and restore could be run at frequent intervals to keep the search data almost in sync.

  • for stream pods, Mirror-Maker maybe an option to duplicate the Kafka data across clusters.

  • but how do both application instances (on the different clusters), be up at the same time and use the same DB? Wouldn’t there be conflicts in read-writes, in identifiers etc. even with an Active-Passive setup? So would it be possible to run this with two separate Databases?

For an active-active setup (where POD IP addressing is not possible between the clusters):

  • Opster has a multi-cluster load-balancer solution which is Elasticsearch API aware. The load-balancer fans out certain API calls to both ES clusters (this could help keep the index data on both sides similar) but also directs certain other API calls to only one of the ES clusters. This sounds (to me) better than the ES cross cluster replication option.

  • Running Kafka Mirror Maker on both clusters may be an option.

  • Not sure what issues would come up with a single DB or a dual-DB configuration.

  • What issues would come up in web/batch stateless workloads?

@HariR246 have you guys figured it out? Your scenario is a typical on-prem cloud setup. Too bad PEGA has no answer.

Pega does not support, and therefore does not test, multi K8S cluster deployments.

@HariR246 As we know Pega won’t support Active-Active.. you said going to setup with single db Active-Active, but its not 100% Active-Active setup until you separate db.

coming to clustering from two different data centers… above you mentioned going to achieve with hazelcast, I did similar in my earlier project where you can mention portions in separate hazelcast cluster file, but in new pega versions cluster going to taken care by service registery think about that.

for Kafka use EFS instead of EBS.