|
Validating form data with regular expressions
You can use regular expressions to
match and validate the text that users enter in cfinput and cftextinput tags. Ordinary
characters are combined with special characters to define the match
pattern. The validation succeeds only if the user input matches
the pattern.
Regular expressions let you check input text for a wide variety
of custom conditions for which the input must follow a specific
pattern. You can concatenate simple regular expressions into complex
search criteria to validate against complex patterns, such as any
of several words with different endings.
You can use ColdFusion variables and functions in regular expressions.
The ColdFusion server evaluates the variables and functions before
the regular expression is evaluated. For example, you can validate
against a value that you generate dynamically from other input data
or database values.
Special charactersBecause special characters are the operators
in regular expressions, to represent a special character as an ordinary
one, escape it by preceding it with a backslash. For example, use
two backslash characters (\\) to represent a backslash character.
Single-character regular expressionsThe following rules govern regular
expressions that match a single character:
Special characters are: + * ? . [ ^ $ ( ) { | \
Any character matches itself if it is not a special character
or if a preceding backslash (\) escapes the character.
A backslash (\) followed by any special character matches
the literal character itself; that is, the backslash escapes the
special character.
A period (.) matches any character except newline.
A set of characters enclosed in brackets ([]) is a one-character
regular expression that matches any of the characters in that set.
For example, “[akm]” matches an a, k, or m.
If you include ] (closing square bracket) in square brackets, it
must be the first character. Otherwise, it does not work, even if
you use \].
A dash can indicate a range of characters. For example, [a-z]
matches any lowercase letter.
If the first character of a set of characters in brackets
is the caret (^), the expression matches any character except those
characters in the set. It does not match the empty string. For example:
“[^akm]” matches any character except a, k, or m.
The caret loses its special meaning if it is not the first character
of the set.
You can make regular expressions case insensitive by substituting
individual characters with character sets; for example, “[Nn][Ii][Cc][Kk]”
is a case-insensitive pattern for the name Nick (or NICK, or nick,
or even nIcK).
You can use the following escape sequences to match specific
characters or character classes:
Escape seq
|
Matches
|
Escape seq
|
Meaning
|
[\b]
|
Backspace.
|
\s
|
Any of the following white-space characters:
space, tab, form feed, and line feed.
|
\b
|
A word boundary, such as a space.
|
\S
|
Any character except the white-space characters matched
by \s.
|
\B
|
A nonword boundary.
|
\t
|
Tab.
|
\cX
|
The control character Ctrl-x. For example,
\cv matches Ctrl-v, the usual control character for pasting text.
|
\v
|
Vertical tab.
|
\d
|
A digit character [0-9].
|
\w
|
An alphanumeric character or underscore.
The equivalent of [A-Za-z0-9_].
|
\D
|
Any character except a digit.
|
\W
|
Any character not matched by \w. The equivalent
of [^A-Za-z0-9_].
|
\f
|
Form feed.
|
\n
|
Backreference to the nth expression in parentheses. See Backreferences.
|
\n
|
Line feed.
|
\ooctal
|
The character represented in the ASII character
table by the specified octal number.
|
\r
|
Carriage return.
|
\\xhex
|
The character represented in the ASCII character
table by the specified hexadecimal number.
|
Multicharacter regular expressionsUse the following rules to build a multicharacter
regular expression:
Parentheses group parts of regular expressions into a
subexpression that can be treated as a single unit. For example,
“(ha)+” matches one or more instances of ha.
A one-character regular expression or grouped subexpression
followed by an asterisk (*) matches zero or more occurrences of
the regular expression. For example, “[a-z]*” matches zero or more
lowercase characters.
A one-character regular expression or grouped subexpression
followed by a plus sign (+) matches one or more occurrences of the
regular expression. For example, “[a-z]+” matches one or more lowercase
characters.
A one-character regular expression or grouped subexpression
followed by a question mark (?) matches zero or one occurrence of
the regular expression. For example, “xy?z” matches either xyz or xz.
The carat (^) at the beginning of a regular expression matches
the beginning of the field.
The dollar sign ($) at the end of a regular expression matches
the end of the field.
The concatenation of regular expressions creates a regular
expression that matches the corresponding concatenation of strings.
For example, “[A-Z][a-z]*” matches any capitalized word.
The OR character (|) allows a choice between two regular
expressions. For example, “jell(y|ies)” matches either jelly or jellies.
Curly brackets ({}) indicate a range of occurrences of a
regular expression. You use them in the form “{m, n}” where m is
a positive integer equal to or greater than zero indicating the
start of the range and n is equal to or greater than m, indicating
the end of the range. For example, “(ba){0,3}” matches up to three pairs
of the expression ba. The form “{m,}” requires at least m occurrences
of the preceding regular expression. The form “{m}” requires exactly m occurrences
of the preceding regular expression. The form “{,n}” is not allowed.
BackreferencesBackreferencing
lets you match text in previously matched sets of parentheses. A slash
followed by a digit n (\n) refers to the nth parenthesized
subexpression.
One
example of how you can use backreferencing is searching for doubled words;
for example, to find instances of “the the” or “is is” in text.
The following example shows backreferencing in a regular expression:
(\b[A-Za-z]+)[ ]+\1
This code matches text that contains a word that is repeated
twice; that is, it matches a word (specified by the \b word boundary
special character and the “[A-Za-z]+)” followed by one or more spaces
(specified by “[ ]+”), followed by the first matched subexpression,
the first word, in parentheses. For example, it would match “is
is”, but not “This is”.
Exact and partial matchesColdFusion validation normally considers
a value to be valid if any of it matches the regular expression
pattern. If you want to ensure that the entire entry matches the
pattern, “anchor” it to the beginning and end of the field, as follows:
If a caret (^) is at the beginning of a pattern, the
field must begin with a string that matches the pattern.
If a dollar sign ($) is at the end of a pattern, the field
must end with a string that matches the pattern.
If the expression starts with a caret and ends with a dollar
sign, the field must exactly match the pattern.
Expression examplesThe following examples show
some regular expressions and describe what they match:
Expression
|
Description
|
[\?&]value=
|
Any string containing a URL parameter value.
|
^[A-Z]:(\\[A-Z0-9_]+)+$
|
An uppercase Windows directory path that
is not the root of a drive and has only letters, numbers, and underscores
in its text.
|
^(\+|-)?[1-9][0-9]*$
|
An integer that does not begin with a zero
and has an optional sign.
|
^(\+|-)?[1-9][0-9]*(\.[0-9]*)?$
|
A real number.
|
^(\+|-)?[1-9]\.[0-9]*E(\+|-)?[0-9]+$
|
A real number in engineering notation.
|
a{2,4}
|
A string containing two to four occurrences
of a: aa, aaa, aaaa; for example, aardvark, but not automatic.
|
(ba){2,}
|
A string containing least two ba pairs;
for example, Ali baba, but not Ali Baba.
|
Note: An
excellent reference on regular expressions is Mastering Regular
Expressions by Jeffrey E.F. Friedl, published by O'Reilly &
Associates, Inc.
|