Working with Character Encodings

Merge enables you to compare files in various character encodings, including ASCII and Unicode format files. Merge also supports IMEs (Input Method Editors used in the Far East) for editing files.

Merge uses the Unicode format for its in-memory representation of the files that it opens. Other encodings are used at load and save-time to translate files from their disk-based single or multi-byte character set representations into Unicode and vice-versa.

After you have opened a file in Merge, you can change the encoding that will be used when the files is saved by clicking on the Set Encoding toolbar button, or by selecting the FileSet Character Encoding… menu item. The help topic for the Choose Encoding sheet describes how to use the sheet that appears.

Reopening a file with a different character encoding

If you open a file using an incorrect encoding, some (or all) characters may not appear correctly. In this case, you can use the Reopen File with Encoding… command in the file panel’s context menu (right-click) to reopen the file with the correct encoding. The command will display a sheet that enables you to choose a different encoding to use. Other files displayed within the file comparison window are not affected, and will continue to use the encodings with which they were opened.

Non-Unicode encodings

Files that use non-Unicode encodings (e.g. ASCII) typically do not contain information about the encoding that they use.

When you open a non-Unicode file, Merge will default to opening the file using the encoding that you have chosen on the Encoding page in the Settings window. If you know that the file was saved using a different encoding, you can specify that encoding in the Open File… sheets.

Some newer text editors, including Araxis Merge, can use an extended attribute called com.apple.TextEncoding to access and store encoding information for text files. If this attribute is present on a file, Araxis Merge will use it (overriding the encoding setting on the Encoding page in the Settings window) when opening a file. Note though that the content of the file may override even this and cause a different encoding to be used. i.e. if file begins with a Unicode byte order marker, or if it is an XML, HTML, or Windows resource file containing an encoding directive

When saving a file, Merge will default to saving the file using the encoding that was used to open it, or an encoding that was subsequently chosen using the Choose Encoding sheet. You can use the Save File As… sheet to choose a different code page when saving a file. Araxis Merge will automatically set the com.apple.TextEncoding extended attribute when saving a file.

Unicode

Unicode files often start with a two-byte byte-order marker (0xFFFE or 0xFEFF) that can be used by applications to determine whether a file uses Unicode. The marker is not normally visible in text editors. Merge will check for this marker when a file is opened. If the marker is present, Merge will open the file with the correct Unicode encoding. If the marker is not present, and the com.apple.TextEncoding extended attribute is not set on the file, Merge will assume that the file is not stored in Unicode and will load it using either the default encoding, or the specific encoding that you have chosen in the Open File… sheet, or an encoding specified in the file’s contents.

Merge supports both big- and little-endian Unicode formats.