Extracting images and text

You can extract text and images from a PDF document using the extracttext and extractimage actions.

The extracttext action extracts all words from the specified page numbers in the PDF document, as shown in the following code snippet:

<cfpdf action = "extracttext" source = "../myBook.pdf" pages = "5-20, 29, 80" destination ="../adobe/textdoc.txt"

The extractimage action extracts all images from the specified page number in a PDF document, as shown in the following code snippet:

<cfpdf action = "extractimage" source = "../myBook.pdf" pages = "1-200" destination = "..\mybookimages" imageprefix = "mybook"> 

The images are extracted and saved in the directory that you specify in the destination attribute. You can specify a prefix for the images (imageprefix) being extracted, otherwise the system prefixes the image name similar to “cf+page number”. To save the images in a specific format, use the format attribute.