Working with Character Encodings

Merge enables you to compare ASCII, MBCS (multi-byte character set), and Unicode format files. Merge also supports IMEs (Input Method Editors, used in the Far East) for editing files.

Merge uses the Unicode format for its in-memory representation of the files that it opens. Code pages are used at load and save-time to translate files from their disk-based single or multi-byte character set representations into Unicode and vice versa.

After you have opened a file in Merge, you can use the Set encoding ribbon button to change the code page that will be used when the file is saved. The help topic for the Choose Character Encoding window describes how to use the window that is opened by this command.

Reopening a file with a different encoding

If you open a file using the incorrect character encoding, some (or all) characters may not appear correctly. In this case, you can use the Reopen File with Encoding… command in the file panel’s context menu (right-click) to reopen the file with the correct code page. The command will display a window that enables you to choose a different code page to use. Other files displayed within the file comparison window are not affected, and will continue to use the code pages with which they were opened.

ASCII and MBCS

Files that use the ASCII or MBCS format do not typically contain information about which encoding they use.

When you open a non-Unicode file, Merge will default to opening the file using either the default code page for your computer’s current language settings, or the code page that you have chosen on the Text Comparisons page in the options dialog. If you know that the file was saved using a different code page, you can specify that code page in the Application menu Open (Merge 2020 and later) or FileOpen (Merge 2019 or earlier) dialogs.

When saving a file, Merge will default to saving the file using the code page was used to load it. You can use the Application menu Save as… (Merge 2020 and later) or FileSave as… (Merge 2019 or earlier) dialogs to choose a different code page when saving a file.

Unicode

Unicode files typically start with a two-byte identifier (0xFFFE or 0xFEFF) that can be used by applications to determine that a file uses Unicode. The identifier is not visible in text editors. Merge will check for this identifier when a file is opened. If the identifier is present, Merge will open the file as Unicode. If the identifier is not present, Merge will assume that the file is not stored in Unicode and will load it using either the default code page, or a specific code page/Unicode format that you have chosen in the Application menu Open (Merge 2020 and later) or FileOpen (Merge 2019 or earlier) dialogs.

Merge supports UTF-7, UTF-8, and both big-endian and little-endian Unicode formats.