Online Handwritten Script Recognition
Automatic identification of handwritten script facilitates many important applications such as automatic transcription of multilingual documents and search for documents on the Web containing a particular script. The increase in usage of handheld devices which accept handwritten input has created a growing demand for algorithms that can efficiently analyze and retrieve handwritten data.
online handwritten script recognition
Under guidelines by:
Department of IT.
This project is based on the domain hand writing script recognition.
Here, we emphasize upon one aspect of this domain , i.e. signature verification as well script recognition.
Online Handwritten Script Recognition
WITH the increase in popularity of portable computing devices
such as PDAs and handheld computers , , nonkeyboardbased
methods for data entry are receiving more attention in the
research communities and commercial sector. The most promising
options are pen-based and voice-based inputs. Digitizing devices
like SmartBoards  and computing platforms such as the IBM
Thinkpad TransNote  and Tablet PCs , have a pen-based user
interface. Such devices, which generate handwritten documents
with online or dynamic (temporal) information, require efficient
algorithms for processing and retrieving handwritten data.
DATA COLLECTION AND PREPROCESSING
The data used in this paper was collected using the CrossPad1.3
The CrossPad has a pen and paper interface along with the ability
to digitally capture the ðx; yÞ position of the pen tip using an
RF transmitter embedded in the pen. The pen position is sampled
at a constant rate of 132 samples per second and the device has a
resolution of 0:1 mm along the x and y axes. The data was collected
on ruled paper with an interline distance of 8:75 mm, although
some writers wrote on alternate lines. We must point out that the
actual device for data collection is not important as long as it can
generate a temporal sequence of x and y positions of the pen tip.
Line and Word Detection
The data available to a script recognizer is usually a complete
handwritten page or a subset of it. To recognize the script of
individual lines or words in the page, we first need to segment the
page into lines and words. The problem of text line identification in
online documents has been attempted before . To identify the
individual lines, first the interline distance is estimated. The
interline distance, d, is defined as the distance between successive
peaks in the autocorrelation of the y-axis projection of the text.
Fig. 4a shows the y-axis projection of the document in Fig. 1, and
Fig. 4b shows the autocorrelation of the projection. The interline
distance estimate is indicated in Fig. 4b as d.
Each sample or pattern that we attempt to classify is either a word
or a set of contiguous words in a line. It is helpful to study the
general properties of each of the six scripts for feature extraction.
1. Arabic: Arabic is written from right to left within a line and
the lines are written from top to bottom. A typical Arabic
character contains a relatively long main stroke which is
drawn from right to left, along with one to three dots. The
character set contains three long vowels. Short markings
(diacritics) may be added to the main character to indicate
short vowels . Due to these diacritical marks and the dots
in the script, the length of the strokes vary considerably.
2. Cyrillic: Cyrillic script looks very similar to the cursive
Roman script. The most distinctive features of Cyrillic
script, compared to Roman script are: 1) individual
characters, connected together in a word, form one long
stroke, and 2) the absence of delayed strokes (see Fig. 7).
Delayed strokes cause movement of the pen in the
direction opposite to the regular writing direction.
3. Devnagari: The most important characteristic of Devnagari
script is the horizontal line present at the top of each word,
called “Shirorekha” (see Fig. 6). These lines are usually
drawn after the word is written and hence are similar to
delayed strokes in Roman script. The words are written
from left to right in a line.
4. Han: Characters of Han script are composed of multiple
short strokes. The strokes are usually drawn from top to
bottom and left to right within a character. The direction of
writing of words in a line is either left to right or top to
bottom. The database used in this study contains Han
script text of the former type (horizontal text lines).