Skip to main content

What is UTF? and comparision of UTF-8, UTF-16, UTF-32










Before we start discussing about UTF we need to know few basic elements.

As we know that we have to encode the human understandable language into machine understandable language. To achieve this objective there are various encoding systems.
Few famous encoding systems are enlisted below:


  • ASCII: American Standard Code for information interchange(For United States)
  • ISO 8859-1: Western European Languages
  • KOI-8: for Russian Language
  • GB 18030 and BIG-5 : for Chinese Language
These are different encoding systems for making character sets of various languages. All these were evolved before Unicode System.

No system is perfect, so there are few flaws in these encoding systems as well:

  • There code values correspond to different letters in various language standards.
  • The encoding for language with large character sets have variable length. Some common characters are encoded as single bytes, other requires two or more Byte. 

To reslove all these problems a new Language Standard was developed
i.e. Unicode System


In Unicode system, each character holds 2 byte, so JAVA also uses 2 bytes for character(Because JAVA follows Unicode encoding system)

values for Unicode code units
minimum value:\U0000
maximum value:\UFFFF

ASCII code Standard was limited to only 128 character definitions whereas Unicode standard defines values for over 100,000 characters.


Objective of Unicode System:
                                                      Its objective is to unify all the different encoding schemes so that the confusion between computers can be eliminated. It has various character encoding forms.
UTF stands for Unicode transformation unit.

UTF-8: It represents 1 Byte(8 bits). It uses one byte to encode the English character.

UTF-16: It uses 2 Bytes(16 bites) to encode .

UTF-32: It uses 4 Bytes(32 bits) to encode the characters.


Code Points:
                       The values written in unicode is written as Hexadecimal Numbers. Its all the values have a prefix of "U+" 
for Example: A represents as U+0041 and "a" represents as U+0061

These code points are further divided into 17 separate sections callled as "Planes".

The first plane, which have most commonly used characters is know as " Basic Multilingual Planes".



Basic difference between UTF-8 and UTF-16

Now over all the web development languages over internet have UTF character set.Among those, UTF-8 and UTF-16 are most commonly used families.


UTF-8 encodes a character using 1 to 4 bytes. It usually uses 1 byte(8bits) to encode a character and for representing other characters which require more than 1 byte it uses the combination of characters.
and UTF-8 contains only ASCII character set.

UTF-16 uses exact 2 Byte(16 bits) per character. In this frmat, the space sometimes remains empty; which is unnecessarily wastage of memory.
and UTF-16 contains Latin, Cyrillic, Chinese, Japenese character sets.

There are three basic versions for UTF-16 and UTF-32, which are as follow:

BE : Big Endian byte serialization(Most significant first)

LE: Little Endian byte serialization(Least significant first)

unmarked: It by default follows Big endian byte serialization.

for example: UTF-16,  UTF-32,   UTF-16BE,    UTF-16LE,   UTF-32BE,   UTF-32LE



Comments

Popular posts from this blog

Android: Login Screen using Fragments

In this tutorial, our focus is on making a User/Member login activity using fragments. With the help of Fragments we will use the same activity to show User login area and also Members Login area. Prerequisite for this tutorial: You should be know how to make an Activity And most importantly you should have prior Knowledge of Fragments. For practising basic Fragment implementation refer to Android Simple Fragment Example      We have used only three activities for this: MainActivity(which represents the Login Screen) Fragments for Members area Fragment for New Users

Simple Login/Register Example using SQLite database

MainActivity.java(login Screen) package com.AndroidDevelopmentGuru.database_new; import java.util.List; import android.app.Activity; import android.content.Intent; import android.database.Cursor; import android.os.Bundle; import android.view.MenuItem; import android.view.View; import android.view.View.OnClickListener; import android.widget.Button; import android.widget.EditText; import android.widget.Toast; public class MainActivity extends Activity {                                 EditText user, pass;                 Button login, not_reg;                 DatabaseHandler db;           ...

Android: Current Location Using Fused APi on Google Maps

This tutorial gives us the simple implementation of "Fused API" to fetch the current location on google map in android. Fused API is latest among all techniques to get the location. It provides you very precise results and also uses less battery of your device. It chooses GPS or Network provider to get to your current location. And it helps your device remember about the last saved location. Let's implement the Fused API to fetch/get the current location. Step 1: Create a new project in Android studio. and select Maps Activity.