Finding the CORRECT MIME type of a file attached in pega

What is MIME (Multipurpose Internet Mail Extensions):

It is a two-part identifier for file formats and format contents transmitted on the Internet. It indicates the nature and format of a document or a file. It does not change when the extension of a file is changed.

A MIME type usually consists of two parts: a type and a subtype, separated by a slash. The type represents the general category into which the data type falls, such as video or text. The subtype identifies the exact kind of data of the specified type the MIME type represents

Here are some examples of MIME types:

- `text/plain`: Plain text
- `application/octet-stream`: Any kind of binary data
- `text/html`: HTML files
- `image/jpeg`: JPEG images
- `application/json`: JSON data

In Pega whenever a file is attached to a case its instance is created in Data-WorkAttach-File class.

Now, this class do contain the Mime type in a property called pyAttachMimeType. But it is not always correct, because it finds the mime type according to the file extension. If we change the extension of a pdf file to png it will start showing the mime type as image/png.

An external library can be used to find the correct MIME type. In this case we are using Apache Tika

Jar can be found here: Tika Jar . Version: 1.18 (So that it is compatible with pega)

Definitely connect with your LSA before JAR import and take a cold backup of system before import.

Create an activity that takes the the pzInsKey of the Data-WorkAttach-File of the file as the parameter.

Function:

Java code:

try{
  String encodedAttachstream = DownloadPage.getProperty("pyAttachStream").toString();
  String fileName = DownloadPage.getProperty("pxAttachName").toString();
  byte[] decodedAttachstream = Base64Util.decodeToByteArray(encodedAttachstream);
  Tika tika = new Tika();
  String mimeType = tika.detect(TikaInputStream.get(decodedAttachstream)); 
  oLog.error("MIME for file: " + fileName + " is "+ mimeType);
  return mimeType;
}catch (Exception e){
  oLog.error("Exception occurred in finding the MIME type: " + e);
}
return "ERROR";

Imports:

Output:

Output.png

@RahulChoudhary: Thanks for the clear explanation.

i could reproduce converted a csv to docx in pega 24.1 (cloud), we still have this issue where MIME type is deciphered from extension by pega.

Is it recommended to use this apache Tika on pega cloud solutions, ,Would future upgrades of pega have a solution for this problem?

Thanks

Vishruthi