Document Region Image Classification via Feature Extraction and Machine Learning Algorithms

Li Yixin; Zou Yajun; Ma Jinwen

doi:10.16798/j.issn.1003-0530.2019.05.003

Li Yixin, Zou Yajun, Ma Jinwen. Document Region Image Classification via Feature Extraction and Machine Learning Algorithms[J]. JOURNAL OF SIGNAL PROCESSING, 2019, 35(5): 747-757. DOI: 10.16798/j.issn.1003-0530.2019.05.003

Citation:

Document Region Image Classification via Feature Extraction and Machine Learning Algorithms

Graphical Abstract

Abstract

Abstract

Document region classification is a crucial task for understanding document images. In the conventional machine learning algorithms, taking an image as the input directly will lead to a model with a large number of parameters which is difficult to be trained. To overcome this difficulty, we design a group of effective features for document region images and propose a document region classification framework based on feature extraction and machine learning classifiers. To make the features discriminative, the aspects of the total 32-dimension features include geometry, grayscale, region, texture, and content. And we conduct the experiments on conventional machine learning algorithms, auto-ml method and deep learning based on these features. The experimental results on the public dataset demonstrate that our proposed document region classification algorithm can achieve a higher classification accuracy while maintaining the same efficiency. In addition, we implement a simple stepwise page layout analysis algorithm to prove the generalization ability of the proposed document region classification algorithm.

FullText(HTML)

References (27)

Supplements (0)

Cited By

Document Region Image Classification via Feature Extraction and Machine Learning Algorithms

Abstract

Catalog

Export File

Citation

Format

Content