Site NavigationSearch TG |
Geek: “fascinated by knowledge and imagination” home | about | contact | guestbook |
What is Unicode? :: Technologies How does a computer recognize and display characters (fonts)? What happens when we go beyond English? Why is Unicode important? Some other questions Know moreWe keep hearing the word 'Unicode' and people are saying that it is better if a font is 'Unicode'. What is this 'Unicode' thing, and why is it better?
Unicode provides a unique number for every character,
If we want to understand Unicode, we need to have some idea of how a computer uses characters ("letters" or "fonts"). How does a computer recognize and display characters?A computer does not know anything about "letters" or "symbols". All a computer understands is numbers. We come to believe that the computer knows about letters because it can write words on the screen, but the computer is really only translating a sequence of codes (numbers) into drawings on the screen (the letters that you see). For example, if the computer wants to show the word "cat" on the screen, it finds the codes for characters "C", "A", and "T" in its memory and replaces them on the screen with the drawings that look like "C", "A", and "T". What are these codes? Originally it was decided to use the numbers from 1-127 (that's 7 bits) to represent a different symbol each. So for example, those letters "CAT" were given the codes 99, 97, and 116, respectively. That worked well in the beginning when the only language that computer scientists cared about was English. They didn't need more than 127 codes to show all the characters in the English language: 26 for lower-case letters, 26 for upper-case letters, 10 for numbers, and plenty of extra for symbols and "control characters". What happens when we go beyond English?But — English isn't the only language in the world! As time went on, it became evident that people were wanting to use languages other than English on their computers. The range of codes was expanded to 1-255, and most of the european symbols were added. This range is what you will most often see referred to as ASCII code, today. Now that worked fine for European languages. But there are many more languages besides just the European ones! With many many more characters. 255 codes is not enough to give each of the world's symbols (characters) a unique number. This is where Unicode steps in ... It provides more than 4 BILLION (!) codes. So more than 4 billion different symbols can each be given its own, unique number. (Technically speaking: Unicode expands the number of bits used by each symbol to 32.)
The Tibetan letters and symbols have been given the range
Why is it good that Tibetan (and other scripts) have their own numbers?Remember that the computer stores the word "cat" as the numbers 99, 97, and 116. If you use a non-Unicode Tibetan font, the numbers 99, 97, and 116 would be drawn on the screen as some Tibetan characters and not as "C-A-T". OR, if I choose another language's non-Unicode font, the symbols drawn on the screen would be totally different again — these non-sense letters, are what you see in a browser or other program when it does not recognize the font. This means that the user has to tell the computer what language the document is in, because the computer does not know that the document is english, tibetan, or hindi ... it just draws the symbols from the selected font for the given numbers. But if we now use Unicode fonts, exclusively, we would find that the letters "C-A-T" would always be drawn on the screen because the computer uses other codes to represent Tibetan, English, and Hindi ... so the computer now knows that the document is English, because it is only using the codes for English. Why is all this important?
Some other questions
What is
|