Extracting data from a PDF form submission



Data extraction differs based on how the PDF form is submitted. ColdFusion supports two types of PDF form submission: HTTP post, which submits the form data, but not the form itself, and PDF, which submits the entire PDF file.

One use for PDF submission is for archival purpose: because the form is submitted with the data, you can write the output to a file. HTTP post submissions process faster because only the field data is transmitted, which is useful for updating a database or manipulating specific data collected from the form, but you cannot write an HTTP post submission directly to a file.

Note: Although forms created in LiveCycle Designer allow several types of submission, including XDP and XML, ColdFusion can extract data from HTTP post and PDF submissions only.

In LiveCycle Designer, the XML code for an HTTP post submission looks like the following example:

<submit format="formdata" target="http://localhost:8500/pdfforms/pdfreceiver.cfm" textEncoding="UTF-8"/>

In LiveCycle Designer, the XML code for a PDF submission looks like the following example:

<submit format="pdf" target="http://localhost:8500/pdfforms/pdfreceiver.cfm" textEncoding="UTF-16" xdpContent="pdf datasets xfdf"/>
Note: Acrobat forms are submitted in binary format, not XML format.

Extracting data from a PDF submission

Use the following code to extract data from a PDF submission and write it to a structure called fields:

<!--- The following code reads the submitted PDF file and generates a result structure called fields. ---> 
<cfpdfform source="#PDF.content#" action="read" result="fields"/>

Use the cfdump tag to display the data structure, as follows:

<cfdump var="#fields#">
Note: When you extract data from a PDF submission, always specify "#PDF.content#" as the source.

You can set the form fields to a variable, as the following example shows:

<cfset empForm="#fields.form1#">

Use the populate action of the cfpdfform tag to write the output to a file. Specify "#PDF.content#" as the source. In the following example, the unique filename is generated from a field on the PDF form:

<cfpdfform action="populate" source="#PDF.content#" 
    destination="timesheets\#empForm.txtsheet#.pdf" overwrite="yes"/>

Writing LiveCycle form output to an XDP file

For Acrobat forms, you can write the output to a PDF file only. For LiveCycle forms, you have the option to write the output to an XDP file. The filename extension determines the file format: to save the output in XDP format, simply use an XDP extension in the destination filename, as the following example shows:

<cfpdfform action="populate" source="#PDF.content#" 
    destination="timesheets\#empForm.txtsheet#.xdp" overwrite="yes"/>

An XDP file is an XML representation of a PDF file. In LiveCycle Designer, an XDP file contains the structure, data, annotations, and other relevant data to LiveCycle forms, which renders the form at run time.

ColdFusion XDP files contain the XDP XML code and the PDF image. Therefore, the file size is larger than a PDF file. Only write PDF forms to XDP files if you must incorporate them into the LiveCycle Designer workflow on a LiveCycle server.

Writing PDF output to an XML file

ColdFusion lets you extract data from a PDF form and write the output to an XML data file. To do so, you must save the form output as a PDF file. (The cfpdfform tag source must always be a PDF file.)

To write the output of a PDF file to an XML file, use the read action of the cfpdfform tag, as the following example shows:

<cfpdfform action="read" source="#empForm.txtsheet#.pdf" 
    XMLdata="timesheets\#empForm.txtsheet#.xml"/>

To save disk space, you can delete the PDF file and maintain the XML data file. As long as you keep the blank PDF form used as the template, you can use the populate action to regenerate the PDF file. For more information on populating forms, see Populating a PDF form with XML data.

Extracting data from an HTTP post submission

For an HTTP post submission, use the cfdump tag with the form name as the variable to display the data structure, as follows:

<cfdump var="#FORM.form1#">
Note: When you extract data from an HTTP post submission, always specify the form name as the source. For example, specify "#FORM.form1#" for a form generated from a standard template in LiveCycle.

Notice that the structure is not necessarily the same as the structure of the PDF file used as the template (before submission). For example, the structure of a form before submission could look like the following example:

struct

form1

struct

 

txtDeptName

[empty string]

txtEMail

[empty string]

txtEmpID

[empty string]

txtFirstName

[empty string]

txtLastName

[empty string]

txtPhoneNum

[empty string]

After submission by using HTTP post, the resulting structure would look like the following example:

struct

FORM1

struct

 

SUBFORM

struct

 

HEADER

struct

 

HTTPSUBMITBUTTON1

[empty string]

 

TXTDEPTNAME

Sales

 

TXTFIRSTNAME

Carolynn

 

TXTLASTNAME

Peterson

 

TXTPHONENUM

(617) 872-9178

TXTEMPID

1

TXTEMAIL

carolynp@company

Note: When data extraction using the cfpdfform tag results in more than one page, instead of returning one structure, the extraction returns one structure per page.

The difference in structure reflects internal rules applied by Acrobat for the HTTP post submission.

To extract the data from the HTTP post submission and update a database with the information, for example, map the database columns to the form fields, as the following code shows:

<cfquery name="updateEmpInfo" datasource="cfdocexamples"> 
UPDATE EMPLOYEES 
    SET FIRSTNAME = "#FORM1.SUBFORM.HEADER.TXTFIRSTNAME#", 
        LASTNAME = "#FORM1.SUBFORM.HEADER.TXTLASTNAME#", 
        DEPARTMENT = "#FORM1.SUBFORM.HEADER.TXTDEPTNAME#", 
        IM_ID = "#FORM1.SUBFORM.TXTEMAIL#", 
        PHONE = "#FORM1.SUBFORM.HEADER.TXTPHONENUM#" 
    WHERE EMP_ID = <cfqueryparam value="#FORM1.SUBFORM.TXTEMPID#"> 
</cfquery>

You can set a variable to create a shortcut to the field names, as the following code shows:

<cfset fields=#form1.subform.header#>

Use the cfoutput tag to display the form data:

<h3>Employee Information</h3> 
<cfoutput> 
    <table> 
        <tr> 
            <td>Name:</td> 
            <td>#fields.txtfirstname# #fields.txtlastname#</td> 
        </tr> 
        <tr> 
            <td>Department:</td> 
            <td>#fields.txtdeptname#</td> 
        </tr> 
        <tr> 
            <td>E-Mail:</td> 
            <td>#fields.txtemail#</td> 
        <tr> 
            <td>Phone:</td> 
            <td>#fields.txtphonenum#</td> 
        </tr> 
    <table> 
</cfoutput>