Character sets, character encodings, and locales

When you discuss globalization issues, two topics that you must consider are the character sets or character encodings recognized by the application and the locales for which the application must format data.

A character set is a collection of characters. For example, the Latin alphabet is the character set that you use to write English, and it includes all of the lower- and uppercase letters from A to Z. A character set for French includes the character set used by English, plus special characters such as “é,” “à,” and “ç.”

The Japanese language uses three alphabets: Hiragana, Katakana, and Kanji. Hiragana and Katakana are phonetic alphabets that each contain 46 characters plus two accents. Kanji contains Chinese ideographs adapted to the Japanese language. The Japanese language uses a much larger character set than English because Japanese supports more than 10,000 different characters.

In order for a ColdFusion application to process text, the application must recognize the character set used by the text. The character encoding maps between a character set definition and the digital codes used to represent the data.

In general use, the terms character set (or charset) and character encoding are often used interchangeably, and most often a specific character encoding encodes one character set. However, this is not always true; for example, there are multiple encodings of the Unicode character set. For more information on character encodings, see About character encodings.

Note: ColdFusion uses the term charset to indicate character encoding in some attribute names, structure field keys, and function parameter names.

A locale identifies the exact language and cultural settings for a user. The locale controls how dates and currencies are formatted, how to display time, and how to display numeric data. For example, the locale English (US) determines that a currency value displays as:

$100,000.00

while a locale of Portuguese (Portugese) displays the currency as:

R$ 100.000

To correctly display date, time, currency, and numeric data to your customers, you must know the customer’s locale. For more information on locales, see Locales.