Edit Expression

Use the controls in this window to edit the definition and description of a regular expression. For further information about the purpose of regular expressions, see Expressions.

Description

Use this field to enter a description of the behaviour of the expression that is in the field below.

Expression

This entry field contains the regular expression definition. The text below the entry field indicates whether the content of the entry field is syntactically correct.

Force entire line unchanged if any part matches the expression

Choose this option to force lines into the unchanged state if any part of the line contains a match for the regular expression. This causes Merge to ignore entirely for comparison purposes any line that contains a match for the expression. Note that the entire line does not need to match the expression (unless the expression specifies that it must).

Ignore sequences of characters that match the expression

Choose this option to make Merge ignore sequences of characters that match the regular expression. Use this option when lines may contain useful content in addition to matches for the regular expression. For example, you could use an expression to cause HTML mark-up elements to be ignored (<[^<]*>), and have Merge compare the remaining line-content.

Note that if the entire content of the line is ignored because it matches your list of regular expressions, the line is treated as if it were a blank line; it is not forced into the unchanged state.

Note also that the regular expression matching algorithm is greedy. For example, an expression <.*> will cause all of <b>Hello, world!</b> to be ignored, not just the individual <b> and </b> sequences.

Sample line text

You can enter a sample line of text into this field to see how the regular expression you entered will cause sequences of characters to be ignored.

Remove sequences of characters matching these selected sub-expressions

By default, Merge will ignore the entire sequence of characters that matches a regular expression. You may want to ignore only part of the sequence that was matched. For example, if you want to ignore changes in a C++ class name, but show where a class has changed into a struct (or vice versa), you could use an expression like this:

(class|struct)[ \t]+([a-zA-Z0-9_]+)

This expression contains two sub-expressions enclosed in parentheses. When applied to a sample line:

class SomeClass : public BaseClass {

The sub-expression list will show three entries. The first (All) is the sequence of characters (class SomeClass) matched by the entire regular expression. The second (1) is the sequence of characters (class) matched by the first sub-expression. The third (2) is the sequence of characters (SomeClass) matched by the second sub-expression. If you wanted changes to the class name to be ignored and to see changes to the class/struct status, you would check the third (2) item in the list, and leave the second item unchecked.

Sample line, with selected sub-expressions removed

This field displays the effect of applying the regular expression to the sample line you entered in the edit field above. Matching sequences of characters are stripped out of the sample line. What remains is what the comparison engine will use when comparing the line against other lines. If you have configured Merge to ignore whitespace within lines then this field will show the effects of that too.

Regular Expression Syntax

The regular expression syntax used by Araxis Merge is the same as that used by many applications in the UNIX operating system. A regular expression is a series of simple and special characters that can be used to search for sequences of characters within a piece of text.

The rest of this page contains example regular expressions. For more comprehensive information, please see the Regular Expression Reference.

Simple matches

To match lines containing the word apple:

apple

To match lines containing only the word apple:

^apple$

Matching whitespace

To match lines that are either completely empty, or that only contain whitespace (spaces and tab characters):

^[ \t]*$

Breakdown:

  • ^ Match the start of the line.
  • [ \t]* Match zero or more space or tab (\t) characters.
  • $ Match the end of the line.

Matching C++ comments

To match lines that contain only a C++ style comment (//, followed by any characters up to the end of the line), the following expression can be used:

^[ \t]*//.*$

Breakdown:

  • ^ Match the start of the line.
  • [ \t]* Match zero or more space or tab (\t) characters.
  • // Match two consecutive / characters.
  • .* Match zero or more occurrences of any character.
  • $ Match the end of the line.

Matching source code control keywords

Some version control products enable special keywords to be inserted into text files. Subversion, for example, will expand out a piece of text $Date$ so that it contains the date and time of the last check-in. When comparing different revisions of a file, lines containing these keywords will almost always be different and can be ignored. An expression to ignore the Date keyword when it appears in C++ comment lines follows:

^[ \t]*//.*\$Date:.*\$.*$

Breakdown:

  • ^ Match the start of the line.
  • [ \t]* Match zero or more space or tab (\t) characters.
  • // Match two consecutive / characters.
  • .* Match zero or more occurrences of any character.
  • \$ Match the character $, not the end of line. Putting \ before a character means that the character is treated as literal. Any special meaning it might have had as a regular expression is removed.
  • Date: Match Date:
  • .* Match zero or more occurrences of any character.
  • \$ Match the literal character $.
  • .* Match zero or more occurrences of any character.
  • $ Match the end of the line.

Related expressions:

  • ^[ \t]*//.*\$Archive:.*\$.*$
  • ^[ \t]*//.*\$Author:.*\$.*$
  • ^[ \t]*//.*\$Header:.*\$.*$
  • ^[ \t]*//.*\$JustDate:.*\$.*$
  • ^[ \t]*//.*\$Modtime:.*\$.*$
  • ^[ \t]*//.*\$Revision:.*\$.*$
  • ^[ \t]*//.*\$Workfile:.*\$.*$

Combining expressions

Several expressions can be combined in to one by using the parenthesis () and | characters:

(apple|^pear$)

Breakdown:

  • ( Begins a group of expressions.
  • apple Match lines containing the word apple.
  • | Match lines that contain matches for the previous expression (apple) or the next one (^pear$).
  • ^pear$ Match lines consisting of only the word pear.
  • ) Ends the group.

This syntax enables larger expressions like the following to be constructed:

^[ \t]*//.*\$(Date|Archive|Author|Header|JustDate|Modtime|Revision|Workfile):.*\$.*$

It is almost always better for comparison performance if expressions are made as short as possible. The example above is significantly better than the following:

(^[ \t]*//.*\$Date:.*\$.*$)|
(^[ \t]*//.*\$Archive:.*\$.*$)|
(^[ \t]*//.*\$Author:.*\$.*$)|
(^[ \t]*//.*\$Header:.*\$.*$)|
(^[ \t]*//.*\$JustDate:.*\$.*$)|
(^[ \t]*//.*\$Modtime:.*\$.*$)|
(^[ \t]*//.*\$Revision:.*\$.*$)|
(^[ \t]*//.*\$Workfile:.*\$.*$)