09 Sep 2016 |
Research article |
Intelligent and Autonomous Systems
Fast Arabic Handwriting Recognition
Arabic word recognition is an active field of research [1, 2, 3]. Most word recognition systems use a lexicon, which is made up of a set of accepted words, to limit their output to valid words. Testing all the lexicon word hypotheses improves the recognition rate, but it also increases the processing time. Lexicon reduction methods have been developed to alleviate this problem, by dynamically reducing the lexicon based on the input images. Unfortunately, the reduction process is prone to error, in that it may discard the true label of an input image. Lexicon reduction methods must manage the difficult trade-off between reducing the size of a lexicon and maintaining a high level of accuracy on the retained word hypotheses, while having a low computational overhead. The proposed word descriptor is able to improve the recognition speed of ancient and modern Arabic documents.
Unlike Latin script, Arabic script is written from right to left, and the alphabet is composed of 28 letters instead of 26 (Figure 1). The shape of the letters is dependent on their position in the word, and is usually different if they are at the beginning, middle, or end of a word. Six letters (‘ʾ’, ‘D’, ‘D’, ‘R’, ‘Z’, and ‘W’) can be connected only if they appear in a final position; if they appear in initial or medial position, a space is inserted after them and the word is broken into subwords. Several letters share the same base shape and are only distinguishable by diacritics in the form of one, two, or three dots appearing above or below the shape. The features of Arabic words are illustrated in Figure 2.
Arabic word descriptor
In this article, we propose to represent the shape of Arabic words using the Arabic word descriptor (AWD). It incorporates information about the shape and count of the subwords and diacritics into a single vector, without performing any word layout analysis. It is built in two steps: First a structure descriptor is formed for each image connected component (CC) and then, the AWD is formed by sorting and normalizing the structural descriptors (SDs) of all the CCs.
Structural descriptor (SD)
The SD encodes the shape of each CC, based on the bag-of-words (BOW) model . Given a CC image (Fig. 3a), the local structure around each foreground pixel of the CC skeleton image is represented by a pixel descriptor (Fig. 3b). The pixel descriptors are assigned to their nearest visual word from a predefined codebook of the feature space (Fig. 3c). The SD is then formed as a histogram representing the number of occurrences of each visual word (Fig. 3d). Fig 3e illustrates the structure encoded by each visual word on the original shape, the shape pixels are shown with the color of their pixel prototypes.
The Arabic word descriptor (AWD) further integrates information on the subword counts and diacritics. Typically, the subwords and diacritics correspond to CCs of the image. First, the SD of each CC is computed. Then, the SDs are sorted in descending order with respect to the number of pixels in their CC skeleton. This ordering is expected to rank the largest subwords first and the diacritics last. The sorted descriptors are then concatenated into the Arabic word descriptor — see Figure 4 for an illustration. The AWD is well suited for lexicon reduction, because it allows efficient shape matching by vector comparison.
Lexicon reduction system
The lexicon reduction system is based on shape indexing. A reference database is composed of AWD corresponding to word images and their corresponding labels. The set of labels forms the application lexicon. More images per lexicon word improve the modeling of the handwriting variability. During the lexicon reduction phase, the AWD of a query image is compared to the AWDs of the reference database. The labels of the N most similar entries of the database form the reduced lexicon, where N is a parameter provided to the system. Finally, the reduced lexicon is fed to the word recognition system (Figure 5).
The effect of lexicon reduction has been tested on two recognition systems (Figure 6). The first one is an analytic recognition system based on structural descriptor Hidden Markov Models (HMM) for modern documents. The recognition rate and the processing time decreases log linearly as N decreases (Fig 6a). The second one is a holistic recognition system based on shape matching for ancient documents. The processing time decreases linearly with N while the recognition rate only decreases slightly (Fig 6b).Figures 6 a and b Influence of lexicon reduction on recognition systems.
In this work, we proposed an Arabic word descriptor for lexicon reduction. It encodes the shape of each connected component of the image through a structural descriptor (SD) based on the bag–of–words model. The sorting and normalization of the SDs emphasize the symbolic features of Arabic words, such as the subwords and the diacritics. Experiments on Arabic word databases demonstrate the suitability of the AWD to speed up the recognition.
To get more information on this subject, you could read the following research article:
Chherawala, Youssouf et Cheriet, Mohamed. 2014. « Arabic word descriptor for handwritten word indexing and lexicon reduction ». Pattern Recognition, vol. 47, nº 10. p. 3477-3486.
Mohamed Cheriet is a professor in the Department of Systems Engineering at ÉTS and Director of Synchromedia. His research focuses on eco-cloud computing, knowledge acquisition and artificial intelligence systems and learning algorithms.
Program : Automated Manufacturing Engineering
Research chair : Canada Research Chair in Smart Sustainable Eco-Cloud
Research laboratories : SYNCHROMEDIA – Multimedia Communication in Telepresence CIRODD- Centre interdisciplinaire de recherche en opérationnalisation du développement durable
Youssouf Chherawala is a software pattern recognition engineer at Apple. He received his M.A.Sc. degree and a Ph.D. in electrical engineering at the ÉTS. He was also a postdoctoral fellow at the ÉTS Synchromedia Laboratory.
Research laboratories : SYNCHROMEDIA – Multimedia Communication in Telepresence
Research chair :
Research laboratories :
Field(s) of expertise :
Learning Algorithms & Classification Methods Pattern Recognition 2D Scene Analysis Multilingual Document Processing & Understanding Character Recognition Handwritten Character Recognition Language Models for Document Recognition & Understanding Variational Models with Level-Sets for Image Segmentation PDE Based Models for Image Enhancement, Denoising & Restoration Intelligent Visual Interfaces to Sustain Collaborative Work & Research in Telepresence