Author: Harshad Kadu, Jian Li, and C.-C. Jay Kuo

Detecting text regions in natural images is an important task for many computer vision applications like compound video compression, optical character recognition, reading text for visually impaired subjects, robotic navigation etc. We are trying to solve this text localization problem, also known as the compound image segmentation problem. Contrary to the scanned documents, text in natural images may have different sizes, fonts, orientations, colors and foreground or background illumination. The cluttered background in natural images may also pose a serious threat to the accuracy of the text localization algorithms. So the compound image segmentation is inherently a difficult problem to solve.

In our research, we propose a novel text localization scheme based on the fusion of diverse local operators such as, the morphological detector, maximally stable extremal regions (MSER) blob analyzer [3, 4], distance transform and stroke-width transform [2]. These operators investigate different peculiar characteristics of text to discover regions with possible textual content. An ensemble of trained SVM classifiers categorizes these regions into text or non-text using the local feature information. Finally a fragment grouping mechanism merges these text candidates together and carves out individual words. Refer to the figure below.

Our proposed fusion technique uses a novel three-tier framework to systematically separate out the individual words in the images. The text regions have some peculiar properties which distinguish them from the non-text regions. To gain insights, we explore these properties using our novel morphological text detector. Apart from the aforementioned operator we also use some existing detectors such as, the MSER [3, 4] and stroke width transform [2] to improve the detection accuracy. We hope to get a significantly enhanced performance using this fusion framework.

The future [...]