RUTA Script - Keywords

Hi ,

Could you please help me with RUTA script to parse keywords from the below

example :

1.Certificate of Title

  1. Amend Title

  2. Title

Script need to check above mentioned words from the email

Its working fine for ‘Title’ but its unable to recognise ‘Certificate of Title’ i.e, when there is space in between characters. The below expression is working when testing it from regex builder but it does work from Pega.

This is the regex expression i have created, please correct me where did i go wrong.

W?{REGEXP(“(?i)(Certificate of Title|Amend Title|Title)”) ->MARK(EntityType,1,2,3,4,5,6,7,8,9,10,11,12)};

@AnushaK4695 did you already consult:

Modifying Apache Ruta scripts to extract custom structured entities

Creating entity extraction rules for text analytics

Best practices for pattern extraction in text analytics

@MarijeSchillern

We have created a function to convert HTMLToPlaintext to get rid of entity parsing issues and HTML tags and this has resolved the issue with entity detection.