Kafka logs Retention from Stream PODS

Kishore_Sanagapalli · August 11, 2025, 5:44am

We’re using Pega Platform in On-Prem Containers Platform (Redhat-Openshift) along with Embedded Search and Stream Services in Pega 24.1.3 version.

So far, Kafka diagnostics logs and kafka data being retained with in the PODS. However, Need to implement a method to retain the logs and kafkadata in persistent mode for better application performance and to analyze the kafka logs when some thing wrong.

Looking for better options to implement it

RaviChandra · August 11, 2025, 8:10am

@Kishore Sanagapalli

Would you be able to use Persistent Volume Clains (PVCs) to configure your Stream pods to mount a Persistent Volume using PVC? This would enable the kafka data (topics, offsets and logs) to survive any pod restart or rescheduling.

Alternatively you can also opt to externalise the kafka logs. e.g. you can deploy a sidecar container logging with Filebeat or Fluentd along with your stream pod to push the kafka logs to an external log viewer like kibana/splunk.

Sairohith · August 12, 2025, 3:47pm

@Kishore Sanagapalli

to make Kafka logs and data survive pod restarts in Pega 24.1.3 on OpenShift. Use the Pega Helm chart to run stream as a StatefulSet with a volumeClaimTemplate so Kafka writes to a PVC, not the ephemeral container filesystem. In values.yaml, enable persistent storage for the stream tier and set a storageClass and size; this creates per-pod PVCs. Critically, mount the PVC to the actual Kafka log dir used by Pega’s stream image; earlier builds wrote to /opt/pega/kafkadata while charts mounted /opt/pega/streamvol, so verify and align the mount path or data will still be ephemeral. After you confirm the mount, set Kafka retention and cleanup knobs via the stream tier properties: log.retention.ms or log.retention.hours, log.segment.bytes, and log.cleanup.policy=delete (or delete,compact for compaction where appropriate). Keep PVCs after helm uninstall with a Retain reclaimPolicy if you want to preserve data across redeploys. For diagnostics, don’t store giant file logs on the same PVC; ship container stdout and any Kafka server logs to a central store using cluster logging (e.g., Fluent Bit/Vector to ELK/Loki/Splunk) and keep pod volumes focused on broker data. If you need stronger SLOs and simpler ops, consider externalizing the stream service: point Pega to a managed Kafka on OpenShift (Red Hat AMQ Streams/Strimzi) or your enterprise Kafka; Pega explicitly recommends externalized Kafka for new deployments. When using AMQ/Strimzi, request persistent storage for Kafka and ZooKeeper/KRaft via their CRDs and let that stack handle retention, tiering, and compaction. Set topic-level retention overrides only where needed to avoid over-retention on internal Pega topics. Test failover by deleting a stream pod and confirming leadership and partitions recover with data intact from the PVC. Monitor disk usage and segment counts; shrink segment.bytes if you need faster log rolling and smaller recovery windows. Document a backup/restore runbook at the storage layer (snapshots of the PV or storage backend) rather than exporting topics from inside the pods. Finally, keep search service storage persistent the same way, but treat it as rebuildable cache and prioritize Kafka durability first. This approach gives you durable broker data, centralized logs for analysis, and cleaner day-2 operations

Kishore_Sanagapalli · August 13, 2025, 5:14am

@RaviChandra

I implemented the same method already…Similar Method i applied to retain the Pega Application logs as well.

Kafka performance is doing better than earlier. But I still see sometimes that Under replications and com.pega.charlatan.utils.CharlatanException$SessionExpiredException issue.

RaviChandra · August 13, 2025, 8:34am

@Kishore Sanagapalli

CharlatanException$SessionExpiredException typically occurs when the internal diagnostics session times out or loses context.
Could you check if your stream pods have sufficient memory/ cpu such that the diagnostics service isnt being evicted or restarted frequently? PVC disk throughput might effect when multiple brokers are sharing the same backend resources.

Also verify whether all kafka brokers are healthy(no frequent restart or performance bottlenecks) and are part of the In-Sync replica set.

Kishore_Sanagapalli · August 13, 2025, 10:09am

@RaviChandra

Increased the Memory limits to Stream PODS and monitoring currently

Kishore_Sanagapalli · September 7, 2025, 7:40am

@Kishore Sanagapalli

I have retained the kafka logs by adding below DSS in a PVC path. Issue fixed by below actions.

Owning Ruleset: Pega-Engine

Setting Purpose: prconfig/dsm/services/stream/pyunpackbasepath/default

Value: PVC-Mounted Path

&

Owning Ruleset: Pega-Engine

Setting Purpose: prconfig/dsm/services/stream/pybaselogpath/default

Value: PVC-Mounted Path

Conversation		Replies	Views
How to Enable Logs Persistance (for Logs Retention eventhough POD Dies) in Kubernetes Cluster Pega-as-a-Service senior-system-architect , solutions-consultant , solutions-engineer , system-administration , cloud-services , devops , installation-and-deployment , performance , client-managed-cloud , manufacturing , 8-6-5 , system-cloud-ops-administrator	10	531	June 23, 2024
Pega Logs Retention in OpenShift Knowledge Share team-lead , devops , client-managed-cloud , manufacturing , insurance , other-industry , 8-8-3 , system-cloud-ops-administrator	2	397	June 27, 2024
Logging Stack Tools for OpenShift Containerization Environment Pega-as-a-Service system-administration , devops , performance , pega-cloud , client-managed-cloud , manufacturing , other-industry , 8-8-3 , system-cloud-ops-administrator	1	143	June 18, 2024
persistent logs and transfer to Elastic / Openshift/Kubernetes Pega-as-a-Service devops , installation-and-deployment , 8-8-3	1	274	March 14, 2024
Externalization Of Kafka Services General system-administration , installation-and-deployment , performance , 8-8-3	1	448	March 4, 2024

Kafka logs Retention from Stream PODS

Related topics