A PCA based character level script identification method is proposed to identify Korean, Chinese and English scripts in a image. First, the space of eigenvectors is constructed by using PCA, and the segmented character was reconstructed by projecting into the space. Second, relative entropy of vertical and horizontal histograms between the original and the reconstructed image is calculated. Finally, according to Euclidean distance and relative entropy between the original and the reconstructed image, the script is identified. The experiment results show that the proposed method achieves 99.78% accuracy under fully correct wrong segmentation, which successfully addresses the script identification problem in Korean, Chinese and English multi-lingual document image.
PIAO Mingji, CUI Rongyi.
An Approach to Script Identification in Image with Multi-lingual Texts. Journal of Chinese Information Processing. 2017, 31(2): 220-225