About regular expressions



In traditional string matching, as used by the ColdFusion Find and Replace functions, you provide the string pattern to search for and the string to search. The following example searches a string for the pattern " BIG " and returns a string index if found. The string index is the location in the search string where the string pattern begins.

<cfset IndexOfOccurrence=Find(" BIG ", "Some BIG string")> 
<!--- The value of IndexOfOccurrence is 5 --->

You must provide the exact string pattern to match. If the exact pattern is not found, Find returns an index of 0. Because you must specify the exact string pattern to match, matches for dynamic data can be difficult, if not impossible, to construct.

The next example uses a regular expression to perform the same search. This example searches for the first occurrence in the search string of any string pattern that consists entirely of uppercase letters enclosed by spaces:

<cfset IndexOfOccurrence=REFind(" [A-Z]+ ", "Some BIG string")> 
<!--- The value of IndexOfOccurrence is 5 --->

The regular expression " [A-Z]+ " matches any string pattern consisting of a leading space, followed by any number of uppercase letters, followed by a trailing space. Therefore, this regular expression matches the string " BIG " and any string of uppercase letters enclosed in spaces.

By default, the matching of regular expressions is case-sensitive. You can use the REFindNoCase and REReplaceNoCase functions for case-insensitive matching.

Because you often process large amounts of dynamic textual data, regular expressions are invaluable in writing complex ColdFusion applications.

Using ColdFusion regular expression functions

ColdFusion supplies four functions that work with regular expressions:

REFind and REFindNoCase use a regular expression to search a string for a pattern and return the string index where it finds the pattern. For example, the following function returns the index of the first instance of the string " BIG ":

<cfset IndexOfOccurrence=REFind(" BIG ", "Some BIG BIG string")> 
<!--- The value of IndexOfOccurrence is 5 --->

To find the next occurrence of the string " BIG ", you must call the REFind function a second time. For an example of iterating over a search string to find all occurrences of the regular expression, see Returning matched subexpressions.

REReplace and REReplaceNoCase use regular expressions to search through a string and replace the string pattern that matches the regular expression with another string. You can use these functions to replace the first match, or to replace all matches.

For detailed descriptions of the ColdFusion functions that use regular expressions, see the CFML Reference.

Basic regular expression syntax

The simplest regular expression contains only literal characters. The literal characters must match exactly the text being searched. For example, you can use the regular expression function REFind to find the string pattern " BIG ", just as you can with the Find function:

<cfset IndexOfOccurrence=REFind(" BIG ", "Some BIG string")> 
<!--- The value of IndexOfOccurrence is 5 --->

In this example, REFind must match the exact string pattern " BIG ".

To use the full power of regular expressions, combine literal characters with character sets and special characters, as in the following example:

<cfset IndexOfOccurrence=REFind(" [A-Z]+ ", "Some BIG string")> 
<!--- The value of IndexOfOccurrence is 5 --->

The literal characters of the regular expression consist of the space characters at the beginning and end of the regular expression. The character set consists of that part of the regular expression in brackets. This character set specifies to find a single uppercase letter from A to Z, inclusive. The plus sign (+) after the brackets is a special character specifying to find one or more occurrences of the character set.

If you removed the + from the regular expression in the previous example, " [A-Z] " matches a literal space, followed by any single uppercase letter, followed by a single space. This regular expression matches " B " but not " BIG ". The REFind function returns 0 for the regular expression, meaning that it did not find a match.

You can construct complicated regular expressions containing literal characters, character sets, and special characters. Like any programming language, the more you work with regular expressions, the more you can accomplish with them. The examples here are fairly basic. For more examples, see Regular expression examples.