Page 1 of 1

Extract images from PDF

Posted: Wed May 03, 2017 10:44 pm
by Terkelsen
Acrobat has a function to export all images from a PDF as JPEG, JPEG 2000, PNG or TIFF. Is this possible using a script with the Acrobat Configurator in Switch?

Re: Extract images from PDF

Posted: Thu May 04, 2017 9:55 am
by jan_suhr
There is a CLI utility that is called pdfimages that can extract images inside of PDF's

It is a simple tool that can be run with the Execute command tool.

Depending on how the images is saved inside of the PDF you can get it to save the images as .jpg, how ever normally it is saving out images in a RAW-format .ppm but that can be converted to any image format with ImageMagick that you can run as a second step to convert to the format you like.

The program can be downloaded here, it is a package of some PDF utilities and pdfimages is part of the package:
http://www.foolabs.com/xpdf/download.html

Here is a list of options and commands for pdfimages:
http://linuxcommand.org/man_pages/pdfimages1.html


Good luck

Re: Extract images from PDF

Posted: Thu May 04, 2017 11:24 am
by Terkelsen
Thanks a lot, Jan. I'll definitely have a closer look at this solution.

Meanwhile I found out, that if you let the Acrobat configurator save as XML it actually creates a folder containing all images :o . In the Acrobat preferences you can even choose between JPEG, PNG or TIFF and determine the resolution (and turn saving of an image folder on and off).

If the PDF has been flattened the usual export of images from Acrobat will create a puzzle of small images. I don't know how this will work with pdfimages, but with the saving of XML the images are not split into fractions. The downside is, that text overlapping the images will be included in the images. I solved this by using a Pitstop action to remove text elements before saving as XML.