Character Encoding Systems: ASCII vs. Unicode Evolution

Words: 429 Pages: 1

Character Encoding System is a set of numerical values that are placed in a group of alphanumeric characters, punctuation characters, and special characters. In fact, encoding systems are equivalent to substitution cipher in which elements of coded information correspond to specific code symbols. However, it is incorrect to assume that encryption systems are the same as a coding system. The difference is that the ciphers contain a variable part called a key. There is no variable part in coding systems: information items can be replaced with code characters based on a table, or it can be defined using a function or coding algorithm.

The reason underlying the creation of coding systems is the technical mismatch between human consciousness and computer memory. The set of letters and numbers which a person perceives has appeared unreadable for the computer: using binary system pioneers of computer epoch managed to enclose in combinations of zeros and units all English alphabet, numbers, and some additional symbols (Zentgraf, 2015). Thus, one of the very first single-byte ASCII encoding systems appeared: each character was encoded with eight bits of information. For example, the number “1” in the decimal number system corresponds to the ASCII code “0000000000”.

Historically, the ASCII system has undergone many modifications that have changed its structure (Zentgraf, 2015). Initially, the first seven bits defined a number, while the eighth bit, the parity bit, was a control function. Over time, the parity bit became used for encoding more elements, resulting in the ISO-8859-1 system (Zentgraf, 2015). Moreover, the evolution of encoding systems has led to a number of new systems, each of which was designed for specific tasks. One of the most important is the Unicode standard, introduced in 1991 (Zentgraf, 2015). Being not a particular system, but rather a large table, Unicode spawned a host of new subsystems.

Because Unicode was based on ASCII, it is often referred to as extended ASCII. In fact, Unicode, unlike its ancestor, is a table of 1,114,112 positions, most of which are still free (Zentgraf, 2015). Unicode encodings, including UTF-8 and UTF-16, are used to solve the problem of assigning a specific code character. The most convenient way to compare different encoding systems is to use the table below.

Criterion for comparison	ASCII	ISO-8859-1	Unicode
Mechanism	1 character = 7 bits	1 character = 8 bits	Can be encoded with different numbers of bits
Number of symbols	128	256	1 114 112
Destination	For English only	Some Western European languages	Suitable for all national languages
General features	The first 128 characters are encoded in an identical way

Reference

Zentgraf, D. C. (2015). What every programmer absolutely, positively needs to know about encodings and character sets to work with text. Web.