Restrict Data flow processing to set number of records

We have millions of records in a table and we need to migrate those records using certain complex filtering to a new table. For the purpose of tracking and avoiding issues, we wanted to migrate a finite set of records daily for a certain period by calling the data flow in an activity and using that activity in a job scheduler.

The report configuration has all the required columns, filters, and max records (as per our use case) and a left join with the new class along with a filter on the new class so that subsequent runs only get us records that are not migrated to the new table.

The data flow configuration is simple, with the source as a report in the first class (all filtering done in the report only), then a convert shape to convert the class to the new class, and then the destination is a data set of the new class with config as insert only new records.

This approach overall is not working because whatever max records we are setting in the main report, the data flow while execution completely ignores the max records and processes the entire data in the first table. This defeats our use case.

While it considers all the columns and filters in the report, it only ignores the max records which is a bit strange.

Is there any way to restrict the data flow to execute only for a certain number of records at a time?

@RITVIK MOHAPATRA

In your case you may use the “Display Top Ranked …” configuration of the report definition to restrict the DFD to execute the first n records only.

Please try that and revert back.

Display top ranked config.png

@RITVIK MOHAPATRA you have to break this process into 2 chunks.

Update your original activity which is calling the dataflow as below:

  1. First call DataFlow-Execute and fetch the data. Just 2 shapes dataset and output to abstract. You can specify the number of records in the parameters below.

  1. Now you can call the second DataFlow which takes the results from the page and then process it. The dataflow source will be abstract.

This will solve the problem of limiting the number.

Thanks,

Puneet

@SoumyajitB Thanks for the suggestion. Yes, we went ahead with same approach only. The drawback is if we have any grouping done on the columns (like max count etc.), this option is not displayed so we created another report with grouping and one without grouping where we specified the display rows and then merged both in data flows and it worked for us.

I wanted to explore a straightforward solution to restrict the records.

@B.Puneet Thanks Puneet. Yes, this solution is also working and it’s a good way to restrict the records.