Using Internet queries



With the Internet query parser, users can search entire documents or parts of documents (zones and fields) entering words, phrases, and plain language like what many web search engines use. Adobe ColdFusion supports two Internet query parsers in the cfsearch type attribute.

Internet
Uses standard, web-style query syntax. For more information, see Query syntax.

Internet_basic
Like Internet. This query parser enhances performance, but produces less accurate relevancy statistics.
Note: Verity also includes the Internet_BasicWeb and Internet_AdvancedWeb query parsers, which are not directly supported by ColdFusion.

Search terms

In a search form enabled with the Internet query parser, users can enter words, phrases, and plain language. The Internet parser does not support the Verity query language (VQL).

Words

To search for multiple words, separate them with spaces.

Phrases

To search for an exact phrase, surround it with double-quotation marks. A string of capitalized words is assumed to be a name. Separate a series of names with commas. Commas aren’t needed when the phrases are surrounded by quotation marks.

The following example searches for a document that contains the phrases “San Francisco” and “sourdough bread”:

"San Francisco" "sourdough bread"

Plain language

To search with plain language, enter a question or concept. The Internet Query Parser identifies the important words and searches for them. For example, enter a question such as:

Where is the sales office in San Francisco?

This query produces the same results as entering:

sales office San Francisco 

Including and excluding search terms

You can limit searches by excluding or requiring search terms, or by limiting the areas of the document that are searched.

A minus sign (–) immediately preceding a search term (word or phrase) excludes documents containing the term.

A plus sign (+) immediately preceding a search term (word or phrase) means that returned documents are guaranteed to contain the term.

If neither sign is associated with the search term, the results can include documents that do not contain the specified term as long as they meet other search criteria.

Field searches

The Internet parser lets users perform field searches. The fields that are available for searching depend on field extraction rules based on the document type of the documents in the collection.

To search a document field, type the name of the field, a colon (:), and the search term with no spaces.

field:term 

If you enter a minus sign (–) immediately preceding field, documents that contain the specified term are excluded from the search results. For example, if you enter -field:term, documents that contain the specified term in the specified field are excluded from the results of the search.

If you enter a plus sign (+) immediately proceeding the field search specification, such as +field:term, documents are included in the search results only if the search term is present in the specified field.

Field searches are enabled by the enableField parameter in a template file. This parameter, set to 0 by default, must be set to 1 to allow searching a document field.

Important: The enableField parameter is the only thing in a template file to be modified.

Query syntax

The query syntax is like the syntax that users expect to use on the web. Queries are interpreted according to the following rules:

  • Individual search terms are separated by whitespace characters, such as a space, tab, or comma, for example:

    cake recipes

  • Search phrases are entered within double-quotation marks, for example:

    "chocolate cake" recipe

  • Exclude terms with the negation operator, minus ( - ), or the NOT operator, for example:

    cake recipes -rum

    cake recipes NOT rum

  • Require a compulsory term with the unary inclusion operator, plus sign (+); in this example, the term chocolate must be included:

    cake recipes +chocolate

  • Require compulsory terms with the binary inclusion operator AND; in this example, the terms recipes and chocolate must be included:

    cake recipes and chocolate

Field searches

You can search fields or zones by specifying name: term, where:

name is the name of the field or zone

term is an individual search term or phrase

For example:

bakery city:"San Francisco" 
bakery city:Sunnyvale

For more information, see Refining your searches with zones and fields.

Pass-through of terms

Search terms are passed through to the VDK-level and are interpreted as Verity Query Language (VQL) syntax. No issues arise if the terms contain only alphabetic or numeric characters. Other kinds of characters might be interpreted by the language you’re using. If a term contains a character that is not handled by the specified language, it can be interpreted as VQL. For example, a search term that includes an asterisk (*)can be interpreted as a wildcard.

Stop words

The configurable Internet query parser uses its own stop-word list, qp_inet.stp, to specify terms to ignore for natural language processing.

Note: You can override the “stop out” by using quotation marks around the word.

For example, the following stop words are provided in the query parser’s stop-word file for the English (Basic) template:

a

did

i

or

what

also

do

i’m

should

when

an

does

if

so

where

and

find

in

than

whether

any

for

is

that

which

am

from

it

the

who

are

get

its

there

whose

as

got

it’s

to

why

at

had

like

too

will

be

has

not

want

with

but

have

of

was

would

can

how

on

were

<or>

Verity provides a populated stop-word file for the English and English (Advanced) languages. You need not modify the qp_inet.stp file for these languages. If you use the configurable Internet query parser for another language, provide your own qp_inet.stp file that contains the stop words that you want to ignore in that language. This stop-word file must contain, at a minimum, the language-equivalent words for or and <or>.

Note: The configurable Internet query parser’s stop-word file contains a different word list than the vdk30.stp word file, which is used for other purposes, such as summarization.