ColdFusion 9.0 Resources |
About character encodingsA character encodingmaps each character in a character set to a numeric value that a computer can represent. These numbers can be represented by a single byte or multiple bytes. For example, the ASCII encoding uses 7 bits to represent the Latin alphabet, punctuation, and control characters. You use Japanese encodings, such as Shift-JIS, EUC-JP, and ISO-2022-JP, to represent Japanese text. These encodings can vary slightly, but they include a common set of approximately 10,000 characters used in Japanese. The following terms apply to character encodings:
The Java Unicode character encodingColdFusion uses the Java Unicode Standard for representing character data internally. This standard corresponds to UCS-2 encoding of the Unicode character set. The Unicode character set can represent many languages, including all major European and Asian character sets. Therefore, ColdFusion can receive, store, process, and present text from all languages supported by Unicode. The Java Virtual Machine (JVM) that is used to processes ColdFusion pages converts between the character encoding used on a ColdFusion page or other source of information to UCS-2. The page or data encodings that ColdFusion supports depend on the specific JVM, but include most encodings used on the web. Similarly, the JVM converts between its internal UCS-2 representation and the character encoding used to send the response to the client. By default, ColdFusion uses UTF-8 to represent text data sent to a browser. UTF-8 represents the Unicode character set using a variable-length encoding. ASCII characters are sent using a single byte. Most European and Middle Eastern characters are sent as 2 bytes, and Japanese, Korean, and Chinese characters are sent as 3 bytes. One advantage of UTF-8 is that it sends ASCII character set data in a form that is recognized by systems designed to process only single-byte ASCII characters, while it is flexible enough to handle multiple-byte character representations. While the default format of text data returned by ColdFusion is UTF-8, you can have ColdFusion return a page to any character set supported by Java. For example, you can return text using the Japanese language Shift-JIS character set. Similarly, ColdFusion can handle data that is in many different character sets. For more information, see Determining the page encoding of server output. Character encoding conversion issuesBecause different character encodings support different character sets, you can encounter errors if your application gets text in one encoding and presents it in another encoding. For example, the Windows Latin-1 character encoding, Windows-1252, includes characters with hexadecimal representations in the range 80-9F, while ISO 8859-1 does not include characters in that range. As a result, under the following circumstances, characters in the range 80-9F, such as the euro symbol (Ä), are not displayed properly:
Similar issues can arise if you convert between other character encodings; for example, if you read files encoded in the Japanese Windows default encoding and display them using Shift-JIS. To prevent these problems, ensure that the display encoding is the same as the input encoding. |