RUTA does not support special characters and Chinese

GAVINHSU · March 18, 2026, 8:18am

We’re trying to extract fields from email subject as below:

Tom Cruise (汤姆克鲁斯)'s Overtime Application [OTA2026030001] is waiting for your approval

Expected fields:

applicant name:Tom Cruise (汤姆克鲁斯)

application name: Overtime Application

document number: OTA2026030001

There’re both special character (single quota) and Chinese characters in the subject.

So we tried to process these characters in the RUTA script.

And it seems not working.

Got error says “” when tryed to script as below:
W{ REGEXP(“[a-zA-Z0-9\\']+”) → MARK(EntityType) };

Also tried use unicode instead of the character as below, no error, but not working neither:
W{ REGEXP(“[a-zA-Z0-9'']+”) → MARK(EntityType) };

For Chinese characters, also tried with unicodes, no error but not working either.

RameshSangili · March 18, 2026, 2:20pm

@VikasRaidhan Any thoughts?

Conversation		Replies	Views
RUTA Script - Keywords General pega-platform , senior-system-architect , decision-management , prediction-studio , email , natural-language-processing , other-industry , 8-8-4	2	469	January 16, 2024
RUTA Script for UETR General pega-platform , decision-management , natural-language-processing , 24-2	3	127	February 5, 2025
RUTA Postal code value General constellation , lead-system-architect , outbound-marketing , email , generative-ai , natural-language-processing , financial-services , 8-8	6	98	May 19, 2025
RUTA script for extracting 37 digits string General decision-management , financial-services , pega-customer-decision-hub	1	197	June 7, 2024
Regex not support the unicode character in script control General robotic-process-automation , pega-robotic-automation , r25 , pega-academy	0	19	February 16, 2026