Best Practice to Move from BIX Batch Extracts to Kafka-Based Streaming in Pega

KiranmaiK · February 6, 2026, 12:31pm

We currently have BIX-based data extraction implemented in our Pega application and are exploring the possibility of introducing Kafka for near real-time data streaming. I would like guidance on the recommended approach and best practices.

Current Architecture

We have 52 BIX extract rules, each created for a different class
Data extraction runs daily using a Job Scheduler
The Job Scheduler invokes an OOTB BIX extraction activity
Extracted data is written as files
Files are then moved to an external FTP location (Sterling) using:
- File Listener
- FTP Server configuration
This is currently a batch-oriented process

New Requirement

We are planning to introduce Kafka to support data streaming / near real-time integration instead of (or along with) the existing batch file-based approach.

KiranmaiK · February 6, 2026, 12:34pm

Additional questions:

Is Kafka-based streaming helpful for my use case?
What is the recommended Pega approach to publish data to Kafka?
- Using Kafka Connect / Pega Stream Service
- Using Queue Processors
- Using Data Flows
- Using Real-time event publishing (case processing hooks)
Should we:
- Replace the existing BIX extracts completely?
- Or keep BIX for batch use cases and introduce Kafka separately for streaming?
How can we efficiently handle multiple classes (52 classes) when publishing data to Kafka?
Are there any OOTB integrations or best practices for Kafka in recent Pega versions?
What are the performance and scalability considerations when moving from file-based BIX extracts to Kafka?

Sairohith · February 12, 2026, 4:01am

@KiranmaiK Kafka is a good fit here because you want incremental, near real-time change events instead of once-a-day files.
In Pega, publish those change events using a Data Flow that writes to a Kafka Data Set, and send them asynchronously via a Queue Processor so online case commits stay fast and reliable.
Use your existing 52 extract definitions as the controlled list of classes to stream, following the real-time extraction pattern (Extract + Data Flow to Kafka) and map each class to the same message structure.
Standardize topic naming and include a stable key, event type, and timestamp so downstream systems can upsert safely and handle retries without duplicates.
Keep the nightly BIX run only for full backfill and reconciliation, not for integration, so you can recover cleanly if any consumer falls behind.
Plan capacity around peak change volume by tuning partitions and Stream Service sizing, keeping payloads small, and actively monitoring lag and retries for performance and scalability.

KiranmaiK · February 12, 2026, 1:59pm

Hi @Sairohith

Thank you for taking the time to provide such a detailed explanation earlier — I really appreciate it.

Since we currently have 52 BIX extracts for 52 different classes (including Work, Data and Index classes), I’d like to clarify a few architectural points before proceeding:

Topic Strategy:
For these 52 classes, would you recommend creating one Kafka topic per class, or using a shared topic and including the class name as part of a standardized event payload?
Kafka Data Set Design:
Should we create 52 Kafka Data Sets aligned to each class, or design a single generic Kafka Data Set with a common JSON structure for all classes?
Real-Time Implementation Approach:
For live streaming, would you suggest creating 52 Declare Triggers (one per class), or implementing a reusable utility activity that dynamically publishes events based on class?
Handling Index Classes:
For the index classes currently used in BIX joins, is it better to:
- Stream them as separate Kafka messages, or
- Embed the index data within the parent Work class payload (denormalized JSON)?

I am very new to Kafka and trying to understand all these rules and our BIX functionality that we have right now in the application is huge.

Conversation		Replies	Views
Setting up Kafka and Integrating it with Pega using Stream Dataset , Event strategy and Dataflow General pega-platform , lead-system-architect , case-management , data-integration , decision-management , 8-7-3	4	3296	September 19, 2023
Extract case/data in real-time (BIX) Knowledge Share pega-delivery-leader , sales-consultant , sales-manager , senior-system-architect , business-architect , solutions-consultant , solutions-consulting-manager , solutions-engineer , system-architect , lead-system-architect , data-scientist , decisioning-architect , citizen-developer , team-lead , low-code-app-development , reporting , case-management , cloud-services , data-integration , pega-cloud , enterprise-application-development , data-management , data-model , financial-services , manufacturing , government , insurance , cross-industry , other-industry , consumer-services , communications-and-media , healthcare-and-life-sciences , all-products , pega-infinity , dev-designer-studio	29	2552	December 3, 2025
Use Kafka, Data Set, and Data Flow in Pega Knowledge Share sales-consultant , sales-manager , senior-system-architect , solutions-consultant , solutions-consulting-manager , solutions-engineer , system-architect , lead-system-architect , decisioning-architect , team-lead , low-code-app-development , data-integration , decision-management , enterprise-application-development , business-simulations , data-management , data-model , financial-services , manufacturing , government , insurance , cross-industry , other-industry , consumer-services , communications-and-media , healthcare-and-life-sciences , all-products , pega-infinity , dev-designer-studio	3	2005	May 13, 2025
BIX real time data extraction questions on method, performance and security General pega-platform , lead-system-architect , system-administration , pega-business-intelligence-exchange , 24-2	4	708	May 14, 2025
Kafka in Pega General pega-platform , senior-system-architect , system-administration , data-integration , decision-management , installation-and-deployment	3	4954	March 25, 2023

Best Practice to Move from BIX Batch Extracts to Kafka-Based Streaming in Pega

Current Architecture

New Requirement

Related topics