Finding the CORRECT MIME type of a file attached in pega

RahulChoudhary · February 1, 2024, 12:10pm

What is MIME (Multipurpose Internet Mail Extensions):

It is a two-part identifier for file formats and format contents transmitted on the Internet. It indicates the nature and format of a document or a file. It does not change when the extension of a file is changed.

A MIME type usually consists of two parts: a type and a subtype, separated by a slash. The type represents the general category into which the data type falls, such as video or text. The subtype identifies the exact kind of data of the specified type the MIME type represents

Here are some examples of MIME types:

- `text/plain`: Plain text
- `application/octet-stream`: Any kind of binary data
- `text/html`: HTML files
- `image/jpeg`: JPEG images
- `application/json`: JSON data

In Pega whenever a file is attached to a case its instance is created in Data-WorkAttach-File class.

Now, this class do contain the Mime type in a property called pyAttachMimeType. But it is not always correct, because it finds the mime type according to the file extension. If we change the extension of a pdf file to png it will start showing the mime type as image/png.

An external library can be used to find the correct MIME type. In this case we are using Apache Tika

Jar can be found here: Tika Jar . Version: 1.18 (So that it is compatible with pega)

Definitely connect with your LSA before JAR import and take a cold backup of system before import.

Create an activity that takes the the pzInsKey of the Data-WorkAttach-File of the file as the parameter.

Function:

Java code:

try{
  String encodedAttachstream = DownloadPage.getProperty("pyAttachStream").toString();
  String fileName = DownloadPage.getProperty("pxAttachName").toString();
  byte[] decodedAttachstream = Base64Util.decodeToByteArray(encodedAttachstream);
  Tika tika = new Tika();
  String mimeType = tika.detect(TikaInputStream.get(decodedAttachstream)); 
  oLog.error("MIME for file: " + fileName + " is "+ mimeType);
  return mimeType;
}catch (Exception e){
  oLog.error("Exception occurred in finding the MIME type: " + e);
}
return "ERROR";

Imports:

Output:

VishruthiReddy · February 14, 2025, 5:15pm

@RahulChoudhary: Thanks for the clear explanation.

i could reproduce converted a csv to docx in pega 24.1 (cloud), we still have this issue where MIME type is deciphered from extension by pega.

Is it recommended to use this apache Tika on pega cloud solutions, ,Would future upgrades of pega have a solution for this problem?

Thanks

Vishruthi

Conversation		Replies	Views
How to verify the MimeType of attached PDF file General security , 8-1-2	4	859	August 12, 2021
Send File Attachment in Connect-Rest using "multipart/form-data" in Pega 7.3.1 General case-management , data-integration , 7-3-1	2	1949	February 23, 2024
PEGA OCR component is not able to extract data from image attachment in email having format png, jpeg and tiff. General pega-platform , email , 8-8 , 8-8-2	1	210	December 27, 2023
how to attached file in a API request data with content type from-data General data-integration , 8-8-3	1	324	October 11, 2024
Conversion of Word & Excel documents to PDF from Case Attachments General lead-system-architect , case-management , cross-industry , 8-2-7 , dev-designer-studio	11	1974	August 12, 2021

Finding the CORRECT MIME type of a file attached in pega

Related topics