Our environment went down at night saying below message:–
2020-12-07 00:47:22,531 [ervice-registry-task] [ STANDARD] ( database.tasks.Heartbeat) ERROR - Current heartbeat is taking longer than [30000]ms. The node will soon become unhealthy. Creating a thread dump…
However, I don’t see any thread dump in the log files.
I raised an incident (INC-161157) and there are improvements made for the heartbeat process in the 8.4.3 & 8.4.4 version so I guess we need to patch to the latest version.
Having said that, heartbeat error could be caused due to slow database/Network, or another system resource unavailability. We had an agent which was running processes that were not optimized enough and consuming lot of resources. After turning the agent off, the issue disappeared. we are investigating our application code.