바로가기메뉴

본문 바로가기 주메뉴 바로가기

logo

Local Similarity based Document Layout Analysis using Improved ARLSA

INTERNATIONAL JOURNAL OF CONTENTS / INTERNATIONAL JOURNAL OF CONTENTS, (P)1738-6764; (E)2093-7504
2015, v.11 no.2, pp.15-19
https://doi.org/10.5392/IJoC.2015.11.2.015
GwangBok Kim (Chonnam National University)


Abstract

In this paper, we propose an efficient document layout analysis algorithm that includes table detection. Typical methods of document layout analysis use the height and gap between words or columns. To correspond to the various styles and sizes of documents, we propose an algorithm that uses the mean value of the distance transform representing thickness and compare with components in the local area. With this algorithm, we combine a table detection algorithm using the same feature as that of the text classifier. Table candidates, separators, and big components are isolated from the image using Connected Component Analysis (CCA) and distance transform. The key idea of text classification is that the characteristics of the text parallel components that have a similar thickness and height. In order to estimate local similarity, we detect a text region using an adaptive searching window size. An improved adaptive run-length smoothing algorithm (ARLSA) was proposed to create the proper boundary of a text zone and non-text zone. Results from experiments on the ICDAR2009 page segmentation competition test set and our dataset demonstrate the superiority of our dataset through f-measure comparison with other algorithms.

keywords
Document Layout Analysis, Page Segmentation, Table Detection, Adaptive RLSA.

INTERNATIONAL JOURNAL OF CONTENTS