Best practice for mass cleanup of Pega Repository / File Storage

VincenzoF1238 · April 20, 2026, 12:42pm

Hello everyone,
I’m looking for advice / best practices regarding a large‑scale cleanup of files stored in the Pega Repository / File Storage.

Context

During several years of platform usage, files in a specific repository folder were never deleted.
We now need to reorganize it and remove files according to defined business/technical criteria.

From my analysis, the only supported way to delete repository files is via the standard data pages (D_pxDeleteFile, etc.).

Current approach

I’m using D_pxListFiles to list the files in a given repository folder.
Since the data page returns paginated results, I:
- Call it the first time with an empty marker
- Then keep calling it passing the returned marker to retrieve the next page
For each page, I iterate over the files and delete those matching the removal criteria

I’ve verified that:

The marker value changes dynamically on each invocation
Therefore, it doesn’t seem feasible to first collect all markers and then process them later
The only viable approach appears to be streaming through D_pxListFiles and deleting files on the fly

Problem

If we assume hundreds of thousands or even ~1 million files, this approach:

Runs for hours (or more) - batch approach

Question

Is there a more efficient or recommended approach to handle mass deletion / cleanup of repository files in Pega?

Specifically, I’d appreciate guidance on:

Known patterns or best practices for large repository cleanups
Whether a batch / queue‑based / Job Scheduler / async approach is preferred
Any built‑in purge mechanisms, repository‑level tricks, or supported shortcuts
Things to avoid (e.g. long single activities vs chunking)
Real‑world experiences with similar volumes

Thanks in advance for any insights or recommendations.

Vincenzo

PoojaPalla · April 21, 2026, 2:17am

Hi Vincenzo,

Your approach looks correct to me - just that at this scale, the challenge is more about execution.

Maybe you can try:

Using a Queue Processor to handle deletions asynchronously instead of doing everything in one run
Processing in smaller chunks using the marker and storing progress so it can resume safely
Triggering it via a Job Scheduler rather than a long-running activity
Optionally separating file identification and deletion for better control

I’d avoid long-running activities for this.

VincenzoF1238 · April 21, 2026, 7:20am

Hello Pooja,

thanks for your reply.

Absolutely, the challenge is about execution. I’m already trying to use a Job Scheduler, but the executions are still too long.

In particular, the problem I’m facing is that after the first N runs, I may end up in a situation where the first X elements (with X being very large) have already been examined, but I still need to iterate over them again just to check whether they should be deleted or not. This creates an increasing amount of “rework” as the process progresses.

I was also thinking about a two‑phase approach, separating:

File identification
File deletion

The idea would be to first scan the repository and extract the full list of files (with the relevant metadata), store that information somewhere (for example in a Data Type), and only afterward iterate over that list to perform the actual deletion.

Of course, the first phase alone could run for more than 24 hours, but it would be executed only once. Then the deletion phase would be much more controlled and resumable. Do you think storing something like ~1 million rows in a Data Type could be a reasonable and supported approach in this scenario?

I’d be very interested in your thoughts or any alternative patterns you’ve seen working in similar cases.

Thanks again!

RaviChandra · April 21, 2026, 8:56am

For a cleanup at that scale, I would not recommend one long, single-threaded activity that streams through D_pxListFiles and deletes inline. The supported delete API is still the right mechanism, but the execution pattern should be chunked and asynchronous so you avoid holding one request open for hours and reduce the blast radius if something fails.

You can try-

Use D_pxListFiles only to enumerate a small batch of files at a time.
Push each file path, or each batch of file paths, into a queue processor or other async work queue.
Let the async workers call D_pxDelete for the targeted files.

this approach give you smaller transactions, retry capability, better observability and the ability to throttle the delete rate.

I’d suggest this structure:

A scheduler starts the cleanup job.
A controller activity reads one page of file names from D_pxListFiles.
For each candidate file, push an item in a queue processor.
The queue processor deletes the file using D_pxDelete.
The controller continues with the next marker and repeats until complete.

This is much closer to how Pega expects large-volume purge-like operations to be handled
scheduled, chunked, and time-bounded rather than all at once

There is no general-purpose “purge this repository folder by criteria” wizard equivalent to case purge/archive for repository files. Repository file cleanup is typically done through the repository APIs, with your own orchestration around them

RanjithK8179 · April 22, 2026, 2:28am

@VincenzoF1238
I agree you’re heading in the right direction. The first step is to identify the attachments that need to be deleted.

You can retrieve the attachment metadata and repository path from Data‑WorkAttach‑File. Persist the key identifiers (such as attachment key, work object key, repository reference, and file path) into a separate data type/table dedicated to deletion tracking.

Once the attachments are identified:

Store only the required key parameters in this data table
Create appropriate indexes to support efficient querying and batch processing

Storing even millions of records is not a concern, since the table will contain only lightweight metadata and keys.

Using this table as a control list, you can then execute a controlled, batch‑based deletion process to remove the corresponding files from the repository ensuring proper tracking, retry handling, and auditability.

Conversation		Replies	Views
Archiving Case data General pega-platform , lead-system-architect , case-management , financial-services , 8-8-5	5	78	December 23, 2025
how to delete a file related to a case in cloud storage programmatically General pega-platform , 8-8-3	2	161	June 16, 2025
Archiving Completed Files General decisioning-architect , pega-cloud , financial-services	1	81	February 27, 2025
Direct Database Deletions Are Not Recommended for Purging Case Data in Pega General pega-platform , senior-system-architect , financial-services , 24-2-2	1	75	May 14, 2025
Deleting an attachment does not delete the document in the repository General user-experience , constellation	6	72	March 19, 2026

Best practice for mass cleanup of Pega Repository / File Storage

Context

Current approach

Problem

Question

Related topics