Skip to main content

What is UTF? and comparision of UTF-8, UTF-16, UTF-32










Before we start discussing about UTF we need to know few basic elements.

As we know that we have to encode the human understandable language into machine understandable language. To achieve this objective there are various encoding systems.
Few famous encoding systems are enlisted below:


  • ASCII: American Standard Code for information interchange(For United States)
  • ISO 8859-1: Western European Languages
  • KOI-8: for Russian Language
  • GB 18030 and BIG-5 : for Chinese Language
These are different encoding systems for making character sets of various languages. All these were evolved before Unicode System.

No system is perfect, so there are few flaws in these encoding systems as well:

  • There code values correspond to different letters in various language standards.
  • The encoding for language with large character sets have variable length. Some common characters are encoded as single bytes, other requires two or more Byte. 

To reslove all these problems a new Language Standard was developed
i.e. Unicode System


In Unicode system, each character holds 2 byte, so JAVA also uses 2 bytes for character(Because JAVA follows Unicode encoding system)

values for Unicode code units
minimum value:\U0000
maximum value:\UFFFF

ASCII code Standard was limited to only 128 character definitions whereas Unicode standard defines values for over 100,000 characters.


Objective of Unicode System:
                                                      Its objective is to unify all the different encoding schemes so that the confusion between computers can be eliminated. It has various character encoding forms.
UTF stands for Unicode transformation unit.

UTF-8: It represents 1 Byte(8 bits). It uses one byte to encode the English character.

UTF-16: It uses 2 Bytes(16 bites) to encode .

UTF-32: It uses 4 Bytes(32 bits) to encode the characters.


Code Points:
                       The values written in unicode is written as Hexadecimal Numbers. Its all the values have a prefix of "U+" 
for Example: A represents as U+0041 and "a" represents as U+0061

These code points are further divided into 17 separate sections callled as "Planes".

The first plane, which have most commonly used characters is know as " Basic Multilingual Planes".



Basic difference between UTF-8 and UTF-16

Now over all the web development languages over internet have UTF character set.Among those, UTF-8 and UTF-16 are most commonly used families.


UTF-8 encodes a character using 1 to 4 bytes. It usually uses 1 byte(8bits) to encode a character and for representing other characters which require more than 1 byte it uses the combination of characters.
and UTF-8 contains only ASCII character set.

UTF-16 uses exact 2 Byte(16 bits) per character. In this frmat, the space sometimes remains empty; which is unnecessarily wastage of memory.
and UTF-16 contains Latin, Cyrillic, Chinese, Japenese character sets.

There are three basic versions for UTF-16 and UTF-32, which are as follow:

BE : Big Endian byte serialization(Most significant first)

LE: Little Endian byte serialization(Least significant first)

unmarked: It by default follows Big endian byte serialization.

for example: UTF-16,  UTF-32,   UTF-16BE,    UTF-16LE,   UTF-32BE,   UTF-32LE



Comments

Popular posts from this blog

Android: Login Screen using Fragments

In this tutorial, our focus is on making a User/Member login activity using fragments. With the help of Fragments we will use the same activity to show User login area and also Members Login area. Prerequisite for this tutorial: You should be know how to make an Activity And most importantly you should have prior Knowledge of Fragments. For practising basic Fragment implementation refer to Android Simple Fragment Example      We have used only three activities for this: MainActivity(which represents the Login Screen) Fragments for Members area Fragment for New Users

Simple Login/Register Example using SQLite database

MainActivity.java(login Screen) package com.AndroidDevelopmentGuru.database_new; import java.util.List; import android.app.Activity; import android.content.Intent; import android.database.Cursor; import android.os.Bundle; import android.view.MenuItem; import android.view.View; import android.view.View.OnClickListener; import android.widget.Button; import android.widget.EditText; import android.widget.Toast; public class MainActivity extends Activity {                                 EditText user, pass;                 Button login, not_reg;                 DatabaseHandler db;           ...

Android: How to Generate Key Hash for Facebook Integration

Hello everyone, As this is a quite confusing for everyone to generate the key hash in your PC, so i decided to write about it. I have also tried many ways but every time i failed to generate the key hash. After trying for an hour i successfully generated the key. Just follow the below given Steps and you are done with this. Step I: Download Openssl from   here Step II: Download openssl-0.9.8e_X64.zip for 64 bit PC and openssl-0.9.8e_WIN32.zip for 32 bit PC Step III: once your download the zip folder. Go to C drive(where window is installed) and make a folder named openssl Step IV: extract all zip file in this folder. Step V: Now go to Java folder and copy path of jre. For example: C:\Program Files\Java\jre1.8.0_101\bin