Working with Character Encodings

Merge enables you to compare ASCII, MBCS (multi-byte character set), and Unicode format files. Merge also supports IMEs (Input Method Editors, used in the Far East) for editing files.

Merge uses the Unicode format for its in-memory representation of the files that it opens. Codepages are used at load and save-time to translate files from their disk-based single or multi-byte character set representations into Unicode and vice versa.

After you have opened a file in Merge, you can use the Set encoding ribbon button to change the codepage that will be used when the file is saved. The help topic for the Choose Character Encoding window describes how to use the window that is opened by this command.

Reopening a file with a different encoding

If you open a file using the incorrect character encoding, some (or all) characters may not appear correctly. In this case, you can use the Reopen File with Encoding… command in the file panel’s context menu (right-click) to reopen the file with the correct codepage. The command will display a window that enables you to choose a different codepage to use. Other files displayed within the file comparison window are not affected, and will continue to use the codepages with which they were opened.

ASCII and MBCS

Files that use the ASCII or MBCS format do not typically contain information about which encoding they use.

When you open a non-Unicode file, Merge will default to opening the file using either the default codepage for your computer’s current language settings, or the codepage that you have chosen on the Text Comparisons page in the options dialog. If you know that the file was saved using a different codepage, you can specify that codepage in the Application menu Open (Merge 2020 and later) or FileOpen (Merge 2019 or earlier) dialogs.

When saving a file, Merge will default to saving the file using the codepage was used to load it. You can use the Application menu Save as… (Merge 2020 and later) or FileSave as… (Merge 2019 or earlier) dialogs to choose a different codepage when saving a file.

Unicode

Unicode files typically start with a two-byte identifier (0xFFFE or 0xFEFF) that can be used by applications to determine that a file uses Unicode. The identifier is not visible in text editors. Merge will check for this identifier when a file is opened. If the identifier is present, Merge will open the file as Unicode. If the identifier is not present, Merge will assume that the file is not stored in Unicode and will load it using either the default codepage, or a specific codepage/Unicode format that you have chosen in the Application menu Open (Merge 2020 and later) or FileOpen (Merge 2019 or earlier) dialogs.

Merge supports UTF-7, UTF-8, and both big-endian and little-endian Unicode formats.