Hi All,
Greetings!
I am currently working on a scenario where entities need to be extracted from inbound emails using the Email Channel in version 24.2.2.
The challenge is that the incoming emails are completely unstructured, with no predefined or consistent format. I initially attempted to address this by training the email parser (pxEmailParser) using a set of sample emails. While this worked initially, an issue has emerged when new email formats are introduced and trained. Specifically, it has been observed that some of the previously trained email formats— which were functioning correctly—stop working after new training sessions.
This appears to be caused by incorrect classification of the subject, body, and signature by the email parser. Given that regular retraining is not feasible, waiting for the model to gradually become robust is not a practical option.
As an alternative, I am considering removing the email parser entirely. However, this introduces another challenge: the email signature. In several cases, information present in the signature section leads to incorrect identification of both entities and intent.
Therefore, I would like to seek guidance from the community on the following:
-
Is there a reliable way to filter out or isolate only the signature portion from inbound emails?
-
Can this be achieved even when the signature appears in trailing email content (for example, in forwarded or replied messages)?
Any suggestions, best practices, or approaches to handle this scenario would be greatly appreciated.
Thanks & regards,
Viswanath