Understanding Regular Expressions

When the Regular expression checkbox is unchecked, Replace In Files will look for exact matches to the literal text you have entered into the Find field. That’s great if you want to find a simple string, such as ‘2010’. But what if you wanted to find ‘2008’, ‘2009’ and ‘2010’? You can construct searches like this using regular expressions.

Regular expressions enable you to search line-by-line for a pattern or template of characters, rather than simply find exact matches to the text you specify. Replace In Files uses the standard perl regex syntax to specify search patterns. This is the same syntax as that used by Araxis Merge and many other applications.

For a guide to regular expression syntax, please see Regular Expression Reference.

Example regular expressions for the ‘Find’ field

Simple matches

To match lines containing the word apple:

apple

To match lines containing only the word apple:

^apple$

Matching whitespace

To match lines that are either completely empty, or that only contain whitespace (spaces and tab characters):

^[ \t]*$

Breakdown:

^ Match the start of the line.
[ \t]* Match zero or more space or tab (\t) characters.
$ Match the end of the line.

Matching C++ comments

To match lines that contain only a C++ style comment (//, followed by any characters up to the end of the line), the following expression can be used:

^[ \t]*//.*$

Breakdown:

^ Match the start of the line.
[ \t]* Match zero or more space or tab (\t) characters.
// Match two consecutive / characters.
.* Match zero or more occurrences of any character.
$ Match the end of the line.

Combining expressions

Several expressions can be combined in to one by using the parenthesis () and | characters:

(apple|^pear$)

Breakdown:

( Begins a group of expressions.
apple Match lines containing the word apple.
| Match lines that contain matches for the previous expression (apple) or the next one (^pear$).
^pear$ Match lines consisting of only the word pear.
) Ends the group.

This syntax enables larger expressions like the following to be constructed:

^[ \t]*$(apple|pear|orange)-?flavour.*$

Referring back to the text matched by a regular expression when specifying replacement text

Let’s say you wish to update the copyright notices in a series of HTML files with a new finishing year. You might have notices such as:

<!-- Copyright 2008-2009 -->

and

<!-- Copyright 1993-2008 -->

You’d like these to become:

<!-- Copyright 2008-2010 -->

and

<!-- Copyright 1993-2010 -->

Now, you could run several different find and replace operations, one for each particular date range. Or you could match all copyright date-ranges with a regular expression such as:

^<!-- Copyright (\d\d\d\d)-\d\d\d\d -->$

The first four digits of the date range is placed into its own group by surrounding it with parentheses. In your replacement text, you can now refer back to this group by using a backslash and a group number, like this:

<!-- Copyright \1-2010 -->

The \1 in the replacement text is a back reference – it refers back to the first (\d\d\d\d) in the find regular expression. Using a back reference in this way, you can preserve the start date of the existing copyright date ranges, replacing only the end of the date range.

When searching using more complex expressions with several groups, \2 would refer to the second parenthetical grouping, \3 to the third, and so on.

Example regular expressions for the ‘Replace’ field

Below shows examples of using regular expressions to specify replacement text. For additional information see Boost perl regex format.

Swapping text

To swap the contents of two columns separated by a tab:

(.*?)\t(.*?)$

$2\t$1

The $2 refers to the second parenthetical grouping in the find text and the $1 the first. It should be noted that $1 and \1 express the same meaning.

Prepending text

To prepend a currency symbol to prices that have two decimal places:

\d+\.\d{2}

$$$&

The $$ is used to represent a literal single dollar symbol and the $& refers to the entire match in the find text.

Replacing several alternatives with a single substitute

To replace a set of different words with a single word:

(blue|white|red)

colour