Using backreferences



You use parenthesis to group components of a regular expression into subexpressions. For example, the regular expression “(ha)+” matches one or more occurrences of the string “ha”.

ColdFusion performs an additional operation when using subexpressions; it automatically saves the characters in the search string matched by a subexpression for later use within the regular expression. Referencing the saved subexpression text is called backreferencing.

You can use backreferencing when searching for repeated words in a string, such as “the the” or “is is”. The following example uses backreferencing to find all repeated words in the search string and replace them with an asterisk:

REReplace("There is is coffee in the the kitchen", 
    "[ ]+([A-Za-z]+)[ ]+\1"," * ","ALL")

Using this regular expression, ColdFusion detects the two occurrences of “is” as well as the two occurrences of “the”, replaces them with an asterisk enclosed in spaces, and returns the following string:

There * coffee in * kitchen

You interpret the regular expression [ ]+([A-Za-z]+)[ ]+\1 as follows:

Use the subexpression ([A-Za-z]+) to search for character strings consisting of one or more letters, enclosed by one or more spaces, [ ]+, followed by the same character string that matched the first subexpression, \1.

You reference the matched characters of a subexpression using a slash followed by a digit n (\n) where the first subexpression in a regular expression is referenced as \1, the second as \2, and so on. The next section includes an example using multiple backreferences.

Using backreferences in replacement strings

You can use backreferences in the replacement string of both the REReplace and REReplaceNoCase functions. For example, to replace the first repeated word in a text string with a single word, use the following syntax:

REReplace("There is is a cat in in the kitchen", 
    "([A-Za-z ]+)\1","\1")

This results in the sentence:

"There is a cat in in the kitchen"

You can use the optional fourth parameter to REReplace, scope, to replace all repeated words, as in the following code:

REReplace("There is is a cat in in the kitchen", 
    "([A-Za-z ]+)\1","\1","ALL") 

This results in the following string:

"There is a cat in the kitchen"

The next example uses two backreferences to reverse the order of the words "apples" and "pears" in a sentence:

<cfset astring = "apples and pears, apples and pears, apples and pears"> 
<cfset newString = REReplace("#astring#", "(apples) and (pears)", 
    "\2 and \1","ALL")>

In this example, you reference the subexpression (apples) as \1 and the subexpression (pears) as \2. The REReplace function returns the string:

"pears and apples, pears and apples, pears and apples"
Note: To use backreferences in either the search string or the replace string, you must use parentheses within the regular expression to create the corresponding subexpression. Otherwise, ColdFusion throws an exception.

Using backreferences to perform case conversions in replacement strings

The REReplace and REReplaceNoCase functions support special characters in replacement strings to convert replacement characters to uppercase or lowercase. The following table describes these special characters:

Special character

Description

\u

Converts the next character to uppercase.

\l

Converts the next character to lowercase.

\U

Converts all characters to uppercase until encountering \E.

\L

Converts all characters to lowercase until encountering \E.

\E

End \U or \L.

To include a literal \u, or other code, in a replacement string, escape it with another backslash; for example \\u.

For example, the following statement replaces the uppercase string "HELLO" with a lowercase "hello". This example uses backreferences to perform the replacement.

REReplace("HELLO", "([[:upper:]]*)", "Don't shout\scream \L\1")

The result of this example is the string "Don't shout\scream hello".

Escaping special characters in replacement strings

You use the backslash character, \, to escape backreference and case-conversion characters in replacement strings. For example, to include a literal "\u" in a replacement string, escape it, as in "\\u".

Omitting subexpressions from backreferences

By default, a set of parentheses will both group the subexpression and capture its matched text for later referral by backreferences. However, if you insert "?:" as the first characters of the subexpression, ColdFusion performs all operations on the subexpression except that it will not capture the corresponding text for use with a back reference.

This is useful when alternating over subexpressions containing differing numbers of groups would complicate backreference numbering. For example, consider an expression to insert a "Mr." in between Bonjour|Hi|Hello and Bond, using a nested group for alternating between Hi & Hello:

<cfset regex = "(Bonjour|H(?:i|ello))( Bond)"> 
<cfset replaceString = "\1 Mr.\2"> 
<cfset string = "Hello Bond"> 
#REReplace(string, regex, replaceString)#

This example returns "Hello Mr. Bond". If you did not prohibit the capturing of the Hi/Hello group, the \2 backreference would end up referring to that group instead of " Bond", and the result would be "Hello Mr.ello".