Design Approach - Uploading an excel with >100k records for Validations

Would it be a viable and scalable approach to design a user interface in Pega that allows users to upload Excel files containing over 100,000 records, perform validations, and display the results in a grid format?

@ABAGAN

Approach #1: Case Type–Driven Upload and Processing

This approach leverages Pega’s case management capabilities to track and orchestrate the upload lifecycle.

Steps:

  1. Create a dedicated Case Type for Excel uploads.

  2. Screen 1: Upload the Excel file as a case attachment.

  3. On submit, route the case to a Wait shape or Processing assignment.

  4. Trigger a Queue Processor (QP) or Job Scheduler (JS) to:

    • Read the attachment stream.

    • Parse and validate the Excel data.

    • Save validated rows into a Data Type, using the Case ID as a key.

  5. Once processing is complete, move the case to a Review stage.

  6. Display results using a Data Page reference in the case details or footer.

Benefits:

  • Full traceability via case history.

  • Easy to extend with audit, retry, or escalation logic.

  • Ideal for business workflows that require approval or review.

Approach #2 : Landing Page with Data Type Tracking

This approach is UI-centric and better suited for operational users who need quick access to upload history and results.

Steps:

  1. Create a Landing Page with:

    • Excel upload control.

    • A grid listing previously uploaded files.

  2. On upload:

    • Save the file as a Data-WorkAttach-File record.

    • Create a record in a Track Excel Uploads Data Type.

    • Link the attachment to the tracking record.

  3. Trigger a QP or JS to:

    • Read and parse the attachment.

    • Validate and store results in a separate Data Type.

    • Update a flag in the tracking record once complete.

  4. Enable a Preview button once processing is done.

  5. On click, show a modal dialog with the validated data.

Benefits:

  • Clean separation of UI and processing logic.

  • Easy to manage multiple uploads and preview results.

  • No need for case management overhead.

@ABAGAN

Short answer no, not directly in one go. For files over 100k rows, upload to a repository first and process in the background with a queue processor or data flow instead of parsing on the requestor. Convert to CSV if you can since Excel parsing is heavier and memory hungry. Stage rows into a temporary table or data set, run validations in chunks with checkpoints, and write only failures and key stats. Do not try to render 100k rows in a grid; show a paginated summary with counts and the first few hundred errors, plus a download link to a full error file. Add server side limits on file size and row count, and reject files that exceed safe thresholds with a friendly message. Use optimistic concurrency and retries around inserts to avoid lock issues during bulk loads. Log per batch and surface progress to the user with a status page that auto refreshes. When clean, trigger the final import step as a separate job