pyFTSIncrementalIndexer constantly requesting replication factor 2

Hi,

I am running into an issue where, when we externalize Kafka services everything works, but even though we have a pyFTSIncrementalIndexer topic created, it is throwing the following error:

Error processing create topic request CreateTopic(name=‘pega-PYFTSINCREMENTALINDEXER’, numPartitions=20, replicationFactor=2, assignments=, configs=[CreateableTopicConfig(name=‘retention.ms’, value=‘216000000’)]) (kafka.server.ZkAdminManager)
org.apache.kafka.common.errors.InvalidReplicationFactorException: Replication factor: 2 larger than available brokers: 1.

So, when looking at pyFTSIncrementalIndexer in our rulesets the numPartitions are set to 5 and the replication factor is set in the settings to 1. A topic exists that is created that meets this setup in Kafka, but Pega, a couple times a second is throwing the above error.

We have searched everywhere for where there might be an overriding configuration that is requesting a replication factor of 2, but we can’t find it.

This is the only rule throwing this error, other topics are running without problem.

More info 11/1/2023:

the topic attempting to be created is pega-PYFTSINCREMENTALINDEXER when the configuration and working topic is pega-p86-PYFTSINCREMENTALINDEXER. I am not sure where this “pega-” topic is coming from considering the pyconfig is specifying a replication factor of 1. A bunch of errors in the middle of the logs suggest there are multiple topics that are attempting to be made, pega-PYFTSINCREMENTALINDEXER is the first. There is only one server configured for this external kafka instance, the IP matches the one instance we are running.

@Nicholas Edwards

Hello,

Just to clarify your setup, you’ve got 1 stream node & 1 search node and replication factor for your stream node is set to 1.

This is correct understanding?

Issue appeared only when you externalised Kafka?

Have you tried this line in prconfig?

Thank you

Regards

Anthony

@Anthony_Gourtay , thanks for the response. Answers to your questions below:

Just to clarify your setup, you’ve got 1 stream node & 1 search node and replication factor for your stream node is set to 1. This… I will have to check, we had the search node on the same server and only externalized Kafka.

Issue appeared only when you externalised Kafka? Yes

Have you tried this line in prconfig?


Yes, all the other topics are coming in and running uninterrupted with a replication factor of 1 only pyFTSIncrementalIndexer is throwing this error several times a second and filling up the logs… although, there is a topic created that is for pyFTSIncrementalIndexer already running fine that I can see meaning, this is in addition to the saved change we made to the rule.

@Nicholas Edwards

There are a couple places you can check the replication:

In PRCONFIG:

or If you would like to set the values via DSS, please see Syntax in the attachment.

Also, please ensure the Topic in Confluent have min.insync.replicas set to 2 [ see attached screenshot]

@SUMAN_GUMUDAVELLY - Sorry, but this is just OOTB Kafka not Confluent. The equivalent settings as described are all set to replication factor 1. I will double check the specific topic causing the issue to see if maybe there is some discrepancy with what is the equivalent to the last screenshot shown on my system. I will post to this thread if I find one.

Thanks for everyone’s info and advice. The issue turned out to be very simple, but because our administration team is separate from our development team and run by a different organization there was a communication issue regarding cloning of the environments.

An old image of the dev environment was cloned and put into the same domain threshold using a configuration with a replication factor of 2.

Thanks @Anthony_Gourtay and @SUMAN_GUMUDAVELLY ! While your responses didn’t provide the solution they did help me eliminate some possible causes. When asking to our broader ecosystem we found out that the wrong image and configuration was being used based on outdated information and are currently moving to rectify the issue.

Cheers!

PS - the thing that tipped me off that it was a different environment was the “pega-” prefix instead of the configured prefix of “pega-p86-” which I suggest as a best practice to anyone who might be falling into a similar issue.