Processing Large Data set in PEGA

In Our organization , PEGA will be receiving one large data set ( Around 2 lakhs record ) from one Non PEGA Team , which needs to be processed and stored in PEGA as a Data type objects . This operation needs to happen bi weekly .

Currently , due to some operational limitation , This Non PEGA team , will not be able to connect to PEGA repository/or they can’t consume any PEGA API to push this data to PEGA . They have option to use Share point at the moment .

With the above situation , we have thought through below approaches -

  1. Non PEGA Team can push this file to Share Point .One job schedular will be set up , that will use , Share point component (D_SPOnlineGetFileContent) , to read the file content and push it to PEGA repository . One File listener , will pick up this file from repository and process this and store in PEGA data type .

  2. Non PEGA Team can push this file to Share Point . One job schedular will be set up , that will use , Share point component (D_SPOnlineGetFileContent) , to read the file content and push it to PEGA repository .Will create one Data set to read this file from Repository and will use one Data flow to read the data and write it to PEGA data type .

3.Non PEGA Team can push this file to Share Point.One job schedular will be set up , that will use , Share point component (D_SPOnlineGetFileContent) , to read the file content and also parse it ( Using Binary File template and pxParseExcelFile activity) and directly write it to PEGA data type( by passing PEGA repository ) .

Can some one suggest -what will be ideal one /or any better approach present than the suggested considering the performance aspect specifically ?

We are on PEGA 8.7 Cloud .

@AVIKCEMK this question would be best directed to our pega consulting services.

Based on the provided context, the third approach seems to be the most efficient. It involves reading the file content from SharePoint, parsing it, and directly writing it to a Pega data type. This approach bypasses the need to store the file in the Pega repository, which could save time and resources. However, it’s important to consider the size of the data and the performance of the parsing operation. If the data is too large, it might be more efficient to temporarily store it in the Pega repository and process it in smaller chunks. Ultimately, the best approach would depend on the specific requirements and constraints of your project.

:warning: This is a GenAI-powered tool. All generated answers require validation against the provided references.

How to read data from Microsoft-Sharepoint

Data management and integration > File processing

Big volume of data processing Real-time in Pega

Upload CSV File on UI from Data-Portal and then parse the CSV Records

How to Connect Pega with One Drive / Share Point

@MarijeSchillern Thanks for the suggestion .

Considering the parsing time , We are going ahead with the 2nd approach .