What is Unicode

PDF
Print
Wednesday, 15 July 2009 20:13

Over the course of the past year I have found myself explaining Unicode many times.  I have written various introductions for different groups.  While I was just in Liberia I put together a introduction for LIBTRALO and decided to post it on my blog for all who may be interested.  You can download also download a better designed version of this article from What is Unicode.pdf.

To understand what Unicode is, we need to understand how computers think. They only think in numbers. Letters are just a representation for humans of what the numbers are. These numbers are the codes for the corresponding letter.

Since the inception of computers there have been many standards for how letters should be coded. The problem is that each was designed for a specific major language. Each of these legacy standards could only handle a maximum of 256 characters.

People who worked in minority languages, as LIBTRALO does, had to develop their own standards and in so doing created uniquely modified fonts. This worked fine within a language group, but data was not able to be shared outside of that group without also passing along the necessary fonts and keyboards. Sometimes even two organizations that were working in the same language were not able to share their files.

To combat these problems a new system has been developed to create a distinct code for every character in every language in the world. Unicode has a place for more than 16 million different characters.

All of this is to say that, with Unicode, there is now a level playing field. Unicode is an international standard. It allows data to be shared among various organizations in Liberia and throughout the world. Once the linguistic analysis is done in a given language the information can then be shared with missionaries and other organizations working in similar languages.

Benefits of Unicode:

  • As a well developed international standard it works across almost all computer programs and operating systems throughout the world. Something that is typed in Microsoft Word on a Windows computer in Liberia can be read in OpenOffice on a Macintosh in China. Information can be easily shared. It even works on the internet so web pages can be published in any language.
  • Because it is an international standard, there are more choices available to us when we develop materials using Unicode. We are not limited to one font, but now have many choices. The best ones are from SIL, but even companies like Microsoft and Adobe are now developing very nice fonts that display Liberian characters.
  • Unicode improves our ability to archive. It used to be that when you archived a document, you had to make sure that you saved the specific font with that document so that it could be read. Now, if the file is saved using Unicode you can be sure that when it is opened again by somebody else it will be readable.
  • Unicode is not simply about how a character looks. It also knows what a character is. Take for example the “ɔ”. In Unicode it is stored as character number U+0254 which is also known as a “LATIN_SMALL_LETTER_OPEN_O”. Linguistic analysis tools know that this is a vowel and what sound it makes.

More Information:
The Unicode Consortium (http://www.unicode.org/standard/principles.html)
Non-Roman Script Initiative (http://scripts.sil.org/unicode)
Keyboard Manager (http://www.tavultesoft.com)


blog comments powered by Disqus

What We Do

What We Do

We support Bible Translation by supporing Bible Translators. We believe that they cannot do their work unless they are properly equipped.

Who We Are

Who We Are

Come in and meet the family. Paul, Ali, Hannah and Levi live in Tamale, Ghana. We have been preparing for our ministry our whole lives.

How to Connect

Partner With Us

We can't do this without you! Find out how you can join the team.  Sign up to receive our prayer letters and email updates.