Page 1 of 1

Getting the XML out of the PDF

Posted: Sun Sep 23, 2012 5:47 pm
by Flow666
Hi,



I have a PDF with XML data embedded in the file (jdf).

I can see this if i open the PDF as a text file.

It isnt visible in the metadata when i search for it.



I have tried to save the PDF in acrobat and export to XMl but thats not working (error messages).



With what application can i get the data out?



Getting the XML out of the PDF

Posted: Tue Sep 25, 2012 12:17 am
by dkelly
Apago's PDFspy

Getting the XML out of the PDF

Posted: Tue Sep 25, 2012 8:13 am
by Peter Kleinheider
Flow666 wrote: Hi,



I have a PDF with XML data embedded in the file (jdf).

I can see this if i open the PDF as a text file.

It isnt visible in the metadata when i search for it.



I have tried to save the PDF in acrobat and export to XMl but thats not working (error messages).



With what application can i get the data out?






Can you please provide the PDF as there are various ways to extract the JDF from the PDF to get access to the XML data.



peter[at]inpetto[dot]cc



Thx,

Peter Kleinheider

Getting the XML out of the PDF

Posted: Wed Sep 26, 2012 10:33 am
by Clive Andrews
Yeah - if you can put a link to it, I'll have a look too...

Getting the XML out of the PDF

Posted: Sun Sep 30, 2012 2:40 pm
by Peter Kleinheider
Good afternoon,



the XML code you refer to is part of a PostScript Form XObject. I do not know of any software that extracts such PS-Parts as part of its functionality.



The only solution I know is to write a Switch Script that searches for such XML as part of PS Form XObjects and save it in a separate file or attach it as dataset.



If that is something you are interested in, just drop me a line on get in touch with other folks here on the list.



Cheers,

Peter