UTF-8 (8-bit UCS/Unicode Transformation Format) is a variable-length character encoding for Unicode. It is able to represent any character in the Unicode standard, yet the initial encoding of byte codes and character assignments for UTF-8 is backwards compatible with ASCII. For these reasons, it is steadily becoming the preferred encoding for e-mail, web pages, and other places where characters are stored or streamed.
UTF-8 encodes each character (code point) in one to four octets (8-bit bytes), with the 1-byte encoding used for the 128 US-ASCII characters. See the Description section below for details.
The Internet Engineering Task Force (IETF) requires all Internet protocols to identify the encoding used for character data, and the supported character encodings must include UTF-8. The Internet Mail Consortium (IMC) recommends that all email programs be able to display and create mail using UTF-8.