How to Improve Indexing performance for ElasticSearch in 8.4?

In our production environment it is taking lot of time to do the complete Re-indexing for all three types of default classes. Is there a way to make it faster?

We have not configured any dedicated indexing.

In Work object indexing, how to restrict the properties used in indexing?

I understand that Custom search properties are used only for dedicated indexing. Am I correct? or is it for search results purpose even in default indexing?

some where I saw, in Work object indexing all the properties present in relevant records for any class will be indexed by default. Is it true? what is the relation between relevant records and indexing?

how to know what are all the content of work object is indexing?

We have 4 nodes in our production environment. what could be the best way to configure indexing on different nodes? How to mark particular node as primary and replica for storing index files?

hi and happy new year! will try to answer all your questions:

In Work object indexing, how to restrict the properties used in indexing?

By default all properties are used for full-text search. If you want to restrict the properties used in indexing, please follow instructions below:

  1. Go to Search Landing Page (Configure > System > Settings > Search) and check “For specified properties” at the bottom
  2. Create a new instance of Custom Search Properties for your work class, and configure properties based on your current requirements for each Case Type (Include in search results / Available for full-text search )

I understand that Custom search properties are used only for dedicated indexing. Am I correct? or is it for search results purpose even in default indexing?

Custom search properties are used either for dedicated or default indexes.

some where I saw, in Work object indexing all the properties present in relevant records for any class will be indexed by default. Is it true? what is the relation between relevant records and indexing?

Just for clarification purpose, properties could be used for full-text search or/and could be used as part of search results. I supose that with indexing properties you want to say that these properties will be available for full-text search.

As I said by default all the properties will be used in full-text search, and you will be able to modify this behaviour from Search Landing page.

how to know what are all the content of work object is indexing?

We have 4 nodes in our production environment. what could be the best way to configure indexing on different nodes? How to mark particular node as primary and replica for storing index files?

Nodes marked as Search node type are master nodex for Elasticsearch cluster. It means that Search nodes are used both for index and search requests.

Without any context about the index size or the number of search requests that you have to manage, the best answer is having at least two dedicated Search nodes. Storing the indexes could require variable disk consumption, and sech operations could be very demanded with CPU and RAM.

As I said any Search node is master eligible in the cluster. If you need more control about nodes and partitions configuration, or you need scale search independently of the Pega Platform cluster you could connect an externarl Elasticsearch cluster to your onprem Pega deployment: Pegasystems Documentation

Hi David,

Thanks for your explanation.

I still have some doubt on the last question. If we classify two nodes as Search nodes, then both works as Master nodes? If so, index files present in the both nodes contains all the work objects? how to identify which one is master, which one is replica?

if we have 4 nodes in our environment, node-1 & node-2 marked as search nodes, then index file will be stored in both the nodes. In node-3 & node-4 which node’s index file path we need to configure for search purpose?

Adding a clarification about master nodes, any Search node is master elegible, it means eachtime elasticsearch cluster starts, one will be the master and the other ones will be the replicas.

About search requests from a node that is not the index host node, Elasticsearch uses its own protocol to communicate between the nodes over port range 9300~9399. You can find the port number in pr_sys_statusnodes table (pyIndexerAddress column).

@satishJ

By increasing the thread count in batch indexer queue processor and enabling it in multiple nodes, we have increased the performance of search indexing. it resolved our issue.