Setting up Kafka and Integrating it with Pega using Stream Dataset , Event strategy and Dataflow

One of the key design patterns that I have come across while working with clients in the last couple of years is the requirement of

  • Ingesting data from data streams
  • Detecting patterns and Insights
  • Actioning on the Insight if needed

Well, the solution which I have often recommended in these scenarios was to use the powerful combination of “Stream Dataset”, “Dataflow” and “Event Strategy” specifically when the volume of data is high.

The initial use was mostly for “Pega Marketing”/” Customer Decision Hub” clients but it’s also proved quite popular for “Process AI”.

Well, it’s good to propose this pattern but I firmly believe that to get comfortable with any pattern it makes sense to try it out in a local instance and hence this article

In this article I’ll document the steps to for the following (for Win 10)

  • Setup a Local Kafka Instance
  • Connect it to Pega using a stream Dataset
  • Using an Event strategy to detect patterns
  • Bring it all together using a Dataflow

Setting up Kafka

There are quite a few articles on this some of them out of date, so I’ll keep it very simple and describe the process for the latest version of Kafka

  • Go To https://kafka.apache.org/downloads

  • Download the Latest Binary Version (I used 2.13)

  • Unzip/Extract using your favorite unzip utility. (I did it in “C:\Apps\Kafka”)

  • Create the data, logs folders and kafka, zookeeper sub folders. So, I created C:\Apps\Kafka\logs, C:\Apps\Kafka\data, C:\Apps\Kafka\data\kafka and C:\Apps\Kafka\data\zookeeper

  • Go to the config folder and open zookeeper.properties file and then set dataDir, for me it was “dataDir=C:/Apps/Kafka/data/zookeeper”

  • From the same config folder open server.properties and set log.dirs, for me it was “log.dirs=C:/Apps/Kafka/logs”

  • Go back to the kafka root folder (C:\Apps\Kafka) and create a batch file to start and stop kafka and zookeeper (you can use command line as well). I just created the batch files since it makes things easy to start and stop

    rem startkafka.bat
    
    cd %cd%\bin\windows
    kafka-server-start.bat ../../config/server.properties
    
    rem startzookeeper.bat
    
    cd %cd%\bin\windows
    zookeeper-server-start.bat ../../config/zookeeper.properties
    
    rem stopkafka.bat
    
    cd %cd%\bin\windows
    kafka-server-stop.bat
    
    rem stopzookeeper.bat
    
    cd %cd%\bin\windows
    zookeeper-server-stop.bat
    

    Start Zookeeper first using the batch file/command prompt and then once it’s running then start kafka

Now that everything is up and running time to create a Kafka Topic and post some messages

  • Once both are up and running open another command prompt and navigate to the kafka bin folder, for me it was “C:\Apps\Kafka\bin\windows “

  • Create a topic using the command

    kafka-console-producer.bat --topic DeviseAlerts --bootstrap-server localhost:9092
    

    Here DeviseAlerts is the Topic name and localhost:9092 is the host and port

  • Connect to the topic as a producer

    kafka-console-producer.bat --topic quickstart-events --bootstrap-server localhost:9092
    

    post some message , sample I used is given below , you can post anything preferably JSON since pega works ootb with JSON

    { "devisename": "SmartMeter", "deviseid": "SM0824832049832", "alarm": { "alarmid": "HP398432", "alarmtype": "tamper", "threshold": 200.78, "reading": 239, "date": "2022-08-12T03:40:49.453Z" } }
    

Next Connect to the topic using a stream Dataset

This is quite simple

  • First create a Kafka Configuration instance, Records → SysAdmin → Kafka , fill in the details as below and click on test connectivity

  • Then create a Data Set, Records → Data Model → Data Set , You can set the name to be the same as the name of the Topic and type is Kafka . So for me the name was DeviseAlerts

  • Run the dataset and view the results

That’s it all done with the connectivity , optionally use a data transform to map the json to properties in the clipboard

Now create a Event strategy

This again is really simple

  • Create an instance of “Event Strategy” (Decision → Event strategy)

Event Strategy.gif

Finally Create the data flow

This too is really simple

  • Create an instance of “Data Flow” (Data Model → Data Flow)

Data Flow.gif

Now that everything is done we need to run the dataflow , but before that set up Pega for stream processing

  • Stop the Pega personal Edition instance

  • Go to tomcat\bin folder for personal edition and open setenv.bat , update -DNodeType to the following

    -DNodeType=Search,WebUser,BackgroundProcessing,Batch,RealTime,Stream
    
  • Restart personal edition , this would enable Batch and Real time processing , you can check this from Decisioning: Services landing page

Finally open the dataflow and run , this would create a Dataflow Run like the one below . and that’s it all done

Any messages that ae now sent to the Kafka topic will be processed by the data flow and if the condition given in the Event Strategy is true (in my case it’s true when more than 5 events come from a devise in 1 hour window) then a case is created to investigate the alarm .

@KAUSTAV.DUTTA awesome content. Please share more stuff.

@KAUSTAV.DUTTA

Thank you for these details. Did you come across an issue where personal edition embeded kafka conflicted with the external instance. Because of this personal edition would not start. How would you suggest to resolve it?

This can be resolved by updating the listener port in server.properties (example 9099)

listeners = PLAINTEXT://your.host.name:9092

listeners=PLAINTEXT://:9099

Settings that worked for me

Config files
zookeeper.properties

clientPort=2999

server.properties

listeners=PLAINTEXT://:9099

zookeeper.connect=localhost:2999

Commands

.\bin\windows\zookeeper-server-start.bat .\config\zookeeper.properties
.\bin\windows\kafka-server-start.bat .\config\server.properties
.\bin\windows\kafka-topics.bat -create --topic quickstart-events --bootstrap-server localhost:9099
.\bin\windows\kafka-console-producer.bat --topic quickstart-events --bootstrap-server localhost:9099