|
Composing search expressions
The following rules apply to the composition
of search expressions.
Case sensitivityVerity searches are case sensitive only when
the search term is entered in mixed case. For example, a search
for zeus finds zeus, Zeus, or ZEUS; however, a search for Zeus finds
only Zeus.
To have your application always ignore the case that the user
types, use the ColdFusion LCase function in the criteria attribute
of cfsearch. The following code
converts user input to lowercase, thereby eliminating case sensitivity
concerns:
<cfsearch name="results"
collection="#form.collname#"
criteria="#LCase(form.criteria)#"
type="#form.type#">
Prefix and infix notationBy default, Verity uses infix notation,
in which precedence is implicit in the expression; for example,
the AND operator takes precedence over the OR operator.
You can use prefix notation with any operator except an
evidence operator (typically, STEM, WILDCARD, or WORD; for a description
of evidence operators, see Evidence operators). In prefix notation, the expression explicitly
specifies precedence. Rather than repeating an operator, you can
use prefix notation to list the operator once and list the search
targets in parentheses. For example, the following expressions are
equivalent:
Moses <NEAR> Larry <NEAR> Jerome <NEAR>
Daniel <NEAR> Jacob
<NEAR>(Moses,Larry,Jerome,Daniel,Jacob)
The following prefix notation example searches first for documents
that contain Larry and Jerome, and then for documents that contain
Moses:
OR (Moses, AND (Larry,Jerome))
The infix notation equivalent of this is as follows:
Moses OR (Larry AND Jerome)
Commas in expressionsIf
an expression includes two or more search terms within parentheses,
a comma is required between the elements (whitespace is ignored).
The following example searches for documents that contain any combination
of Larry and Jerome together:
AND (Larry, Jerome)
Precedence rulesExpressions
are read from left to right. The AND operator takes precedence over the
OR operator; however, terms enclosed in parentheses are evaluated
first. When the search engine encounters nested parentheses, it
starts with the innermost term.
Example
|
Search result
|
Moses AND Larry OR Jerome
|
Documents that contain Moses and Larry,
or Jerome
|
(Moses AND Larry) OR Jerome
|
(Same as above)
|
Moses AND (Larry OR Jerome)
|
Documents that contain Moses and either
Larry or Jerome
|
Delimiters in expressionsYou use angle brackets (<
>), double quotation marks ("), and backslashes (\) to delimit
various elements in a search expression, as the following table
describes:
Character
|
Usage
|
< >
|
Left and right angle brackets are reserved
for designating operators and modifiers. They are optional for the
AND, OR, and NOT, but required for all other operators.
|
"
|
Use double quotation marks in expressions
to search for a word that is otherwise reserved as an operator or
modifier, such as AND, OR, and NOT.
|
\
|
To include a backslash in a search expression,
insert two backslashes for each backslash character that you want included
in the search; for example, C:\\CFusion\\bin.
|
Operators and modifiersYou are probably familiar with searches containing AND,
OR, and NOT. Verity has many additional operators and modifiers,
of various types, that offer you a high degree of specificity in
setting search parameters.
OperatorsAn operator represents
logic to be applied to a search element. This logic defines the
qualifications that a document must meet to be retrieved. You can
use operators to refine your search or to influence the results
in other ways.
For example, you can construct an HTML form for conducting searches.
In the form, you can search for a single term. You can refine the
search by limiting the search scope in many ways. Operators are
available for limiting a query to a sentence or paragraph, and you
can search words based on proximity.
Ordinarily, you use operators in explicit searches, as follows:
"<operator>search_string"
The following operator types are available:
Operator type
|
Purpose
|
Concept
|
Identifies a concept in a document by combining
the meanings of search elements.
|
Relational
|
Searches fields in a collection.
|
Evidence
|
Specifies basic and intelligent word searches.
|
Proximity
|
Specifies the relative location of words
in a document.
|
Score
|
Manipulates the score returned by a search
element. You can set the score percentage display to four decimal places.
|
The following table shows the operators, according to type, that
are available for conducting searches of ColdFusion Verity collections:
Concept
|
Relational
|
Evidence
|
Proximity
|
Score
|
ACCRUE
|
<
|
STEM
|
NEAR
|
YESNO
|
ALL
|
<=
|
WILDCARD
|
NEAR/N
|
PRODUCT
|
AND
|
=
|
WORD
|
PARAGRAPH
|
SUM
|
ANY
|
>
|
THESAURUS
|
PHRASE
|
COMPLEMENT
|
OR
|
>=
|
SOUNDEX
|
SENTENCE
|
|
|
CONTAINS
|
TYPO/N
|
IN
|
|
|
MATCHES
|
|
|
|
|
STARTS
|
|
|
|
|
ENDS
|
|
|
|
|
SUBSTRING
|
|
|
|
Concept operatorsConcept
operators combine the meaning of search elements to identify a concept
in a document. Documents retrieved using concept operators are ranked
by relevance. The following table describes each concept operator:
Operator
|
Description
|
AND
|
Selects documents that contain all the search
elements that you specify.
|
OR
|
Selects documents that show evidence of
at least one of the search elements that you specify.
|
ACCRUE
|
Selects documents that include at least
one of the search elements that you specify. Documents are ranked
based on the number of search elements found.
|
ALL
|
Selects documents that contain all of the
search elements that you specify. A score of 1.00 is assigned to
each retrieved document. ALL and AND retrieve the same results,
but queries using ALL are always assigned a score of 1.00.
|
ANY
|
Selects documents that contain at least
one of the search elements that you specify. A score of 1.00 is
assigned to each retrieved document. ANY and OR retrieve the same
results, but queries using ANY are always assigned a score of 1.00.
|
Relational operatorsRelational operators search document fields
(such as AUTHOR) that you defined in the collection. Documents that
contain specified field values are returned. Documents retrieved
using relational operators are not ranked by relevance, and you
cannot use the MANY modifier with relational operators.
You use the following operators for numeric and date comparisons:
Operator
|
Description
|
=
|
Equal
|
!=
|
Not equal
|
>
|
Greater than
|
>=
|
Greater than or equal to
|
<
|
Less than
|
<=
|
Less than or equal to
|
For example, to search for documents that contain values for
1999 through 2002, you perform either of the following searches:
A simple search for 1999,2000,2001,2002
An explicit search using the = operator: >=1999,<=2002
If a document field named PAGES is defined, you can search for
documents that are five pages or fewer by entering PAGES < 5 in
your search. Similarly, if a document field named DATE is defined,
you can search for documents dated before and including December
31, 1999 by entering DATE <= 12-31-99 in your search.
The following relational operators compare text and match words
and parts of words:
Operator
|
Description
|
Example
|
CONTAINS
|
Selects documents by matching the word or
phrase that you specify with the values stored in a specific document
field. Documents are selected only if the search elements specified
appear in the same sequential and contiguous order in the field
value.
|
In a document field named TITLE,
to retrieve documents whose titles contain music, musical, or musician,
search for TITLE <CONTAINS> Musi*.
To retrieve CFML and HTML pages whose meta tags contain Framingham
as a content word, search for KEYWORD <CONTAINS> Framingham.
|
MATCHES
|
Selects documents by matching the query
string with values stored in a specific document field. Documents
are selected only if the search elements specified match the field
value exactly. If a partial match is found, a document is not selected.
When you use the MATCHES operator, you specify the field name to
search, and the word, phrase, or number to locate. You can use ?
and * to represent individual and multiple characters, respectively,
within a string.
|
For examples, see the text immediately following this
table.
|
STARTS
|
Selects documents by matching the character
string that you specify with the starting characters of the values
stored in a specific document field.
|
In a document field named REPORTER, to retrieve documents
written by Clark, Clarks, and Clarkson, search for REPORTER <STARTS> Clark.
|
ENDS
|
Selects documents by matching the character
string that you specify with the ending characters of the values
stored in a specific document field.
|
In a document field named OFFICER, to retrieve arrest
reports written by Tanner, Garner, and Milner, search for OFFICER <ENDS> ner.
|
SUBSTRING
|
Selects documents by matching the query
string that you specify with any portion of the strings in a specific
document field.
|
In a document field named TITLE, to retrieve
documents whose titles contain words such as solution, resolution,
solve, and resolve, search for TITLE <SUBSTRING> sol.
|
For example, assume a document field named SOURCE includes the
following values:
Computer
Computerworld
Computer Currents
PC Computing
To locate documents whose source is Computer, enter the following:
SOURCE <MATCHES> computer
To locate documents whose source is Computer, Computerworld,
and Computer Currents, enter the following:
SOURCE <MATCHES> computer*
To locate documents whose source is Computer, Computerworld,
Computer Currents, and PC Computing, enter the following:
SOURCE <MATCHES> *comput*
For an example of ColdFusion code that uses the CONTAINS relational
operator, see Field searches.
You can use the SUBSTRING operator to match a character string
with data stored in a specified data source. In the example described
here, a data source called TEST1 contains the table YearPlaceText,
which contains three columns: Year, Place, and Text. Year and Place
make up the primary key. The following table shows the TEST1 schema:
Year
|
Place
|
Text
|
1990
|
Utah
|
Text about Utah 1990
|
1990
|
Oregon
|
Text about Oregon 1990
|
1991
|
Utah
|
Text about Utah 1991
|
1991
|
Oregon
|
Text about Oregon 1991
|
1992
|
Utah
|
Text about Utah 1992
|
The following application page matches records that have 1990
in the TEXT column and are in the Place Utah. The search operates
on the collection that contains the TEXT column and then narrows
further by searching for the string Utah in the CF_TITLE
document field. Document fields are defaults defined in every collection
corresponding to the values that you define for URL, TITLE, and KEY
in the cfindex tag.
<cfquery name="GetText"
datasource="TEST1">
SELECT Year || Place AS Identifier, text
FROM YearPlaceText
</cfquery>
<cfindex collection="testcollection"
action="Update"
type="Custom"
title="Identifier"
key="Identifier"
body="TEXT"
query="GetText">
<cfsearch name="GetText_Search"
collection="testcollection"
type="Explicit"
criteria="1990 and CF_TITLE <SUBSTRING> Utah">
<cfoutput>
Record Counts: <br>
#GetText.RecordCount# <br>
#GetText_Search.RecordCount# <br>
</cfoutput>
Query Results --- Should be 5 rows <br>
<cfoutput query="Gettext">
#Identifier# <br>
</cfoutput>
Search Results -- should be 1 row <br>
<cfoutput query="GetText_Search">
#GetText_Search.TITLE# <br>
</cfoutput>
Evidence operatorsEvidence
operators let you specify a basic word search or an intelligent
word search. A basic word search finds documents that contain
only the word or words specified in the query. An intelligent word search expands
the query terms to create an expanded word list so that the search
returns documents that contain variations of the query terms.
Documents retrieved using evidence operators are not ranked by
relevance unless you use the MANY modifier.
The following table describes the evidence operators:
Operator
|
Description
|
Example
|
STEM
|
Expands the search to include the word that
you enter and its variations. The STEM operator is automatically
implied in any simple query.
|
<STEM>believe retrieves matches
such as “believe,” “believing,” and “believer”.
|
WILDCARD
|
Matches wildcard characters included in
search strings. Certain characters automatically indicate a wildcard
specification, such as apostrophe (*) and question mark(?).
|
spam* retrieves matches such as,
spam, spammer, and spamming.
|
WORD
|
Performs a basic word search, selecting
documents that include one or more instances of the specific word
that you enter. The WORD operator is automatically implied in any
SIMPLE query.
|
<WORD> logic retrieves logic,
but not variations such as logical and logician.
|
THESAURUS
|
Expands the search to include the word that
you enter and its synonyms. Collections do not have a thesaurus
by default; to use this feature you must build one.
|
<THESAURUS> altitude retrieves
documents containing synonyms of the word altitude, such as height
or elevation.
|
SOUNDEX
|
Expands the search to include the word that
you enter and one or more words that “sound like,” or whose letter
pattern is similar to, the word specified. Collections do not have
sound-alike indexes by default; to use this feature you must build
sound-alike indexes.
|
<SOUNDEX> sale retrieves words
such as sale, sell, seal, shell, soul, and scale.
|
TYPO/N
|
Expands the search to include the word that
you enter plus words that are similar to the query term. This operator
performs “approximate pattern matching” to identify similar words.
The optional N variable in the operator name expresses the maximum
number of errors between the query term and a matched term, a value
called the error distance. If N is not specified, the default error
distance is 2.
|
<TYPO> swept retrieves kept.
|
The following example uses an evidence operator:
<cfsearch name = "quick_search"
collection="bbb"
type = "explicit"
criteria="<WORD>film">
Proximity operatorsProximity
operators specify the relative location of specific words in the document.
To retrieve a document, the specified words must be in the same phrase,
paragraph, or sentence. In the case of NEAR and NEAR/N operators, retrieved
documents are ranked by relevance based on the proximity of the specified
words. Proximity operators can be nested; phrases or words can appear within
SENTENCE or PARAGRAPH operators, and SENTENCE operators can appear
within PARAGRAPH operators.
The following table describes the proximity operators:
Operator
|
Description
|
Example
|
NEAR
|
Selects documents containing specified search
terms. The closer the search terms are to one another within a document,
the higher the document’s score. The document with the smallest
possible region containing all search terms always receives the
highest score. Documents whose search terms are not within 1000
words of each other are not selected.
|
war <NEAR> peace retrieves
documents that contain stemmed variations of these words within close
proximity to each other (as defined by Verity). To control search
proximity, use NEAR/N.
|
NEAR/N
|
Selects documents containing two or more
search terms within N number of words of each other, where N is
an integer between 1 and 1024. NEAR/1 searches for two words that
are next to each other. The closer the search terms are within a
document, the higher the document's score.
You can specify
multiple search terms using multiple instances of NEAR/N as long
as the value of N is the same.
|
commute <NEAR/10> bicycle <NEAR/10> train <NEAR/10> retrieves
documents that contain stemmed variations of these words within
10 words of each other.
|
PARAGRAPH
|
Selects documents that include all of the
words you specify within the same paragraph. To search for three
or more words or phrases in a paragraph, you must use the PARAGRAPH
operator between each word or phrase.
|
<PARAGRAPH> (mission, goal, statement) retrieves documents
that contain these terms within a paragraph.
|
PHRASE
|
Selects documents that include a phrase
you specify. A phrase is a grouping of two or more words that occur
in a specific order.
|
<PHRASE> (mission, oak) returns
documents that contain the phrase mission oak.
|
SENTENCE
|
Selects documents that include all of the
words you specify within the same sentence.
|
<SENTENCE> (jazz, musician) returns
documents that contain these words in the same sentence.
|
IN
|
Selects documents that contain specified
values in one or more document zones. A document zone represents
a region of a document, such as the document’s summary, date, or
body text. To search for a term only within the one or more zones
that have certain conditions, you qualify the IN operator with the
WHEN operator.
|
Chang <IN> author searches
document zones named author for the word Chang.
|
The following example uses a proximity operator:
<cfsearch name = "quick_search"
collection="bbb"
type = "explicit"
criteria="red<near>socks">
For an example using the IN proximity operator to search XML
documents, see Zone searches.
Score operatorsScore
operators control how the search engine calculates scores for retrieved documents.
The maximum score that a returned search element can have is 1.000.
You can set the score to display a maximum of four decimal places.
When you use a score operator, the search engine first calculates
a separate score for each search element found in a document, and
then performs a mathematical operation on the individual element
scores to arrive at the final score for each document.
The document’s score is available as a result column. You can
use the SCORE result column to get the relevancy score of any document
retrieved, for example:
<cfoutput>
<a href="#Search1.URL#">#Search1.Title#</a><br>
Document Score=#Search1.SCORE#<BR>
</cfoutput>
The following table describes the score operators:
Operator
|
Description
|
Example
|
YESNO
|
Forces the score of an element to 1 if the
element’s score is nonzero.
|
<YESNO>mainframe. If the retrieval
result of the search on mainframe is 0.75, the YESNO operator forces
the result to 1. You can use YESNO to avoid relevance ranking.
|
PRODUCT
|
Multiplies the scores for the search elements
in each document matching a query.
|
<PRODUCT>(computers, laptops) takes
the product of the resulting scores.
|
SUM
|
Adds the scores for the search element in
each document matching a query, up to a maximum value of 1.
|
<SUM>(computers, laptops) takes
the sum of the resulting scores.
|
COMPLEMENT
|
Calculates scores for documents matching
a query by taking the complement (subtracting from 1) of the scores
for the query’s search elements. The new score is 1 minus the search
element’s original score.
|
<COMPLEMENT>computers. If the
search element’s original score is .785, the COMPLEMENT operator
recalculates the score as .215.
|
ModifiersYou
combine modifiers with operators to change the standard behavior
of an operator in some way. The following table describes the available
modifiers:
Modifier
|
Description
|
Example
|
CASE
|
Specifies a case-sensitive search. Normally,
Verity searches are case-insensitive for search text entered in
all uppercase or all lowercase, and case-sensitive for mixed-case
search strings.
|
<CASE>Java OR <CASE>java retrieves
documents that contain Java or java, but not JAVA.
|
MANY
|
Counts the density of words, stemmed variations,
or phrases in a document and produces a relevance-ranked score for
retrieved documents. Use with the following operators:
WORD
WILDCARD
STEM
PHRASE
SENTENCE
PARAGRAPH
|
<PARAGRAPH><MANY>javascript <AND> vbscript.
You
cannot use the MANY modifier with the following operators:
AND
OR
ACCRUE
Relational operators
|
NOT
|
Excludes documents that contain the specified
word or phrase. Use only with the AND and OR operators.
|
Java <AND> programming <NOT> coffee retrieves
documents that contain Java and programming, but not coffee.
|
ORDER
|
Specifies that the search elements must
occur in the same order in which you specify them in the query.
Use with the following operators:
PARAGRAPH
SENTENCE
NEAR/N
Place the ORDER modifier before
any operator.
|
<ORDER><PARAGRAPH> ("server", "Java") retrieves
documents that contain server before Java.
|
|