Search and replace text-file for weird characters

Post Reply
User avatar
JimmyHartington
Member
Posts: 28
Joined: Tue Mar 22, 2011 7:38 am

Search and replace text-file for weird characters

Post by JimmyHartington »

Hi

I have a problem with a textfile which is supplied by my customer.
And it sometimes contains some weird characters and I would like to find a way to remove them from the text-file.

I can see the weird characters in Notepad++. It is called VT in the texteditor.
See this screendump:
Image

And here is an example of the text-file: http://d.pr/f/5Tym

Is it possible somehow to search and replace the textfile with scripting?

Kind regards Jimmy Hartington
sander
Advanced member
Posts: 228
Joined: Wed Oct 01, 2014 8:58 am
Location: The Netherlands

Re: Search and replace text-file for weird characters

Post by sander »

How's your scripting capabilities? This replaceLineBreak part should get you started, I use it to read a XML and replace the LF with CRLF before I inject it in my SQL database.

Code: Select all

// Replace LF with CRLF, Prodist needs CRLF for correct parsing of e.g. addresses
var replaceLineBreak = readBody.replace(/\n/g,"\r\n");
Part of my playground:
- HP Indigo 10k, HP Indigo 7600's (full options), Highcon Euclid III, Zünd S3
- HP Production Pro 6.0.1, HP Production Center 2.5.1 beta, Apogee 9.1, Enfocus Switch 13u1 & PitStop Server 13u2.

Chat: open-automation @ gitter
bens
Member
Posts: 130
Joined: Thu Mar 03, 2011 10:13 am

Re: Search and replace text-file for weird characters

Post by bens »

VT is probably "Vertical Tab", an ancient character that is used very infrequently. Its ASCII code is 0xB (11). I don't know whether Switch regular expressions support it, but you could try searching for \xb or \x0b. Backslash x means "interpret the next part as a hexadecimal unicode point".

You may need to experiment a bit to see which syntax Switch accepts. See here for some more info: http://www.regular-expressions.info/nonprint.html
User avatar
JimmyHartington
Member
Posts: 28
Joined: Tue Mar 22, 2011 7:38 am

Re: Search and replace text-file for weird characters

Post by JimmyHartington »

Hi

Thanks for the help with identify the character and the replace script.

I had a script I have used before to manipulate text-files.
By modifying this with the replace command I got it to work.

Here is the code:

Code: Select all

// Is invoked each time a new job arrives in one of the input folders for the flow element.
// The newly arrived job is passed as the second parameter.
function jobArrived( s : Switch, job : Job ){
   var extension = s.getPropertyValue("Extension");
   var tempFile = job.createPathWithExtension(extension);
   var myFile = new File(tempFile);
   var InputPath = job.getPath();
   var inputFileText = File.read(InputPath);
   var outputFileText = inputFileText.replace(/\x0b/g,"");
	myFile.open( File.WriteOnly | File.Truncate );
   myFile.writeLine(outputFileText);
   myFile.close();
   job.sendToSingle(tempFile);
   job.sendToNull(InputPath);
}

// Is invoked at regular intervals regardless of whether a new job arrived or not.
// The interval can be modified with s.setTimerInterval().
function timerFired( s : Switch )
{
}
And here is the SwitchScript file:
http://d.pr/f/UTUv

Thanks for the help.
Post Reply