How to delete the Attachments stored in S3 for cases that are already expunged?

I looked at this article Willcase attachments get deleted after expunger has deleted all traces of an archived case? and understand that before 25.x, the expunger job in the archival process does not delete the attachments in S3. If we need to clean up S3 for all the expunged cases, how to do it? I could not think of any other approach than using a combination of D_pxListFiles and D_pxDelete APIs. But I doubt if these APIs are effective if we had been storing the attachments in S3 for several years now. Any suggestions or ideas? @Eliseo_Olla @Will_Cho @Sairohith @Chetan.Chaudhari

My suggestion is to use S3 Lifecycle Rules.

How it works:

  • Configure S3 lifecycle policies at the bucket level

  • Example strategies:

    • Delete objects older than N days

Lifecycle rules operate outside Pega at massive scale and are safe and efficient.

Another option to run a report definition of joining the tables Data-WorkAttach-File(pxRefObjectKey) and Work table pzInskey (Outer join) to identify the ones which are being purged from Work tables. Now, you can trigger D_pxDelete using the background jobs.

Hi @RameshSangili - the problem with this is, there is no way to find a joining condition to find the attachment object in the S3 from the pzinskey of pc_data_workattach table. Is there a way to link them?

Hi @anandchouti - we are on pega managed cloud. Ideally we want to find a way to do it from our end, but this is for sure a great recommendation if we are on client managed cloud.

Work table - pzInsKey

Data-WorkAttach-File - pxRefObjectKey

if corresponding pxRefObjectKey is not available in the Work table, then you can get the Repo name, Repo file name to delete the attachment from S3

Hi @RameshSangili - that is the challenge here. Even if I find the pxRefObjectkey details from pc_data_workattach table, there is no way to find the corresponding Attachment object in the Repo. Pegacloudfilestorage repository stores the attachments under attachments folder and there is no correlation between the pxRefObjectkey and the attachments folder under Pegacloudfilestorage repo. This one I have already tried finding before posting my question here on the forum. Please let me know if I’m missing something

once the case is expunged, Pega no longer has a reliable case-level link to the attachment object in the repository. So there is no clean OOTB way to go from pzInsKey or even pxRefObjectKey back to the exact S3 object name under pegacloudfilestorage/attachments

That means the suggested D_pxListFiles + D_pxDelete approach only works if you already have some external way to identify the orphaned files for example, a naming convention, a retained metadata index, or a custom mapping table captured at upload time. Without that, you’re basically trying to reverse-engineer the repository path after the fact, and Pega does not expose a direct correlation for that in the cloud storage layout.

For already-expunged cases, there is no referential join from pc_data_workattach to the S3 object if that mapping was never preserved and for Pega Managed Cloud, this is best treated as a separate repository cleanup exercise, not a case-expunge extension.

So Pega way of cleanup is possible only if you have retained enough metadata to identify the objects otherwise there is no standard Pega-native way to do it after expunge

Can you try as your case metadata is gone, you must use a reconciliation process to find and delete the orphaned files. First, create a utility activity to list every file path currently in your S3 bucket and save them into a temporary database table. Next, run a comparison query to identify any file in that table that no longer has a matching case ID in your active work tables. Finally, feed these orphaned paths into a Queue Processor to delete the files from S3 in small batches.