Ingesting files in utf-16 format

we have successfully used a service package, service file, file listener, file dataSet, Activities … to ingest data files in the pega cloud, and also ingested files on a timed basis, directly using an activity and file dataSets.

BUT … we are now being sent a file in UTF-16 format and need to convert it at ingestion time to UTF-8 for processing, or be able to cleanly read UTF-16 to a file DataSet …

I have spent some time using the Embed-Repository-File DataPages, and looked at the underlying Amazon S3 Activities that are used, and am pretty sure using java steps that we could just use byte arrays in java and do the conversion by creating a new file in the Repository. but that just seems ugly.

I can plainly see on a Service file that you can specify UTF-16 in the Data Description.

so … the question is Does Pega have a functional model to handle the ingestion of data in UTF-16 format?

NOTE: the supplier of the files in UTF-16 to the Pega Cloud SFTP location will only deliver the Data Files, no manifest, no token control file. so the example process for File ingestion does not work in this case.

any suggestions please.

Hello @PaulN396

Could you please check out this post?

Thank you!

@PoojaGadige I did … that is not helpful.

I see two ways of doing this …

(1) use custom stream processing for a File DataSet …

an example that assumes a fair bit of knowledge of how Pega Implements InputStream, but would just convert they byte array from 16 to 8 on they way in …

OR

(2) use the pega Data Pages on Embed-Repository-File, read the UTF-16 file as input and write to a UTF-8 file as output …
which will involve using Java Steps that will of course violate the Guardrails Pega puts in place …

this seems less invasive to me … just have to figure out how to intercept the Stream and provide the conversion …

I plainly have the stream, just need to figure out where to insert the conversion of it …

READ a file…

step page

D_pxGetFile[repositoryName:Param.RepositoryName,filePath:Local.FilePath,responseType:“STREAM”]

Java Step

java.io.InputStream FileData = (java.io.InputStream)myStepPage.getObject(“pyStream”);
tools.getParameterPage().putObject(“FileDataStream”,FileData);

Initialize a NEW file …

step page

D_pxNewFile[repositoryName:param.RepositoryName,filePath:local.TargetFile]

Java Step

tools.getStepPage().getProperty(“pyStream”).setValue(tools.getParameterPage().getObject(“FileDataStream”));

Save the NEW file …

step page

D_pxNewFile[repositoryName:param.RepositoryName,filePath:local.TargetFile]

Save-DataPage D_pxNewFile Param.RepositoryName local.TargetFile

if this was native java its easy to either convert the file or convert each line as a byte array …

if you had direct access to the files you would just use normal file I/O

Reader in = new InputStreamReader(new FileInputStream(infile), “UTF-16”);

Writer out = new OutputStreamWriter(new FileOutputStream(outfile), “UTF-8”);

char cbuf = new char[2048];

int len;

while ((len = in.read(cbuf, 0, cbuf.length)) != -1) {

out.write(cbuf, 0, len);

}

or if you read a file you could convert it line by line using byte arrays

String str = new String(bytes, 0, len, “UTF-16”);

byte outbytes = str.getBytes(“UTF-8”);

OutputStream out = new FileOutputStream(outfile);

out.write(outbytes);

out.close();

@PaulN396

it would be nice to have a Pega Engineer that works with File DataSet I/O comment on this …

we kept testing, and yes following the example in Pegasystems Documentation you just override the reader:

public static class UTFInputStream extends InputStream {
private Reader reader=null;;
public UTFInputStream(InputStream inputStream) {
try{
reader = new InputStreamReader(inputStream, “UTF-16”);
}catch(IOException ex){
}
}

and the writer of course …
then build a simple jar and import it …

I need to be more adventurous