基于混合模型的中国人名自动识别

毛婷婷,李丽双,黄德根

PDF(368 KB)
PDF(368 KB)
中文信息学报 ›› 2007, Vol. 21 ›› Issue (2) : 22-28.
综述

基于混合模型的中国人名自动识别

  • 毛婷婷,李丽双,黄德根
作者信息 +

Recognizing Chinese Person Names Based on Hybrid Models

  • MAO Ting-ting, LI Li-shuang, HUANG De-gen
Author information +
History +

摘要

本文提出了一种支持向量机(SVM)和概率统计模型相结合的中国人名自动识别方法。该方法首先按字抽取特征向量的属性得到训练集,采用多项式核函数建立SVM人名识别模型,然后在特征空间中计算测试样本到SVM最优超平面的距离,当该距离大于给定的阈值时使用SVM对测试样本进行分类,否则使用概率统计方法。实验表明,采用混合模型,对样本在空间的不同分布使用不同的方法可以取得比单独使用SVM或概率统计更好的分类效果,系统开式综合指标F-值比单纯使用支持向量机方法提高了1.51%。

Abstract

This paper describes a hybrid model and the corresponding algorithm combining support vector machines (SVM) with statistical methods to improve the performance of SVM for the task of Chinese person names recognition. In this algorithm, a training set is obtained by extracting the attributes of feature vectors based on characters and the SVM model of automatic identification of Chinese person names is set up by choosing a proper kernel function. Thus a threshold of the distance from the test sample to the hyperplane of SVM in feature space is used to separate SVM region and statistical method region. If the distance is greater than the given threshold, the test sample is classified using SVM; otherwise, the statistical model is used. The experimental results show the recall, precision and F-measure for recognition of Chinese person names based on the hybrid model are up to 91.96%, 94.62% and 93.27% respectively in open test. Compared with sole SVM, the F-measure increases 1.51%. By integrating the advantages of two methods, the performance is obviously improved.

关键词

计算机应用 / 中文信息处理 / 支持向量机 / 概率统计 / 混合模型 / 人名识别

Key words

computer application / Chinese information processing / support vector machines / statistical method / hybrid model / recognition of person names

引用本文

导出引用
毛婷婷,李丽双,黄德根. 基于混合模型的中国人名自动识别. 中文信息学报. 2007, 21(2): 22-28
MAO Ting-ting, LI Li-shuang, HUANG De-gen. Recognizing Chinese Person Names Based on Hybrid Models. Journal of Chinese Information Processing. 2007, 21(2): 22-28

参考文献


[1] 刘秉伟,黄萱菁,郭以昆,等. 基于统计方法的中文姓名识别[J]. 中文信息学报,2000,14(3):16-24.
[2] 张华平,刘群. 基于角色标注的中国人名自动识别研究[J]. 计算机学报,2004,27(1):85-91.
[3] 宋柔,朱宏,潘维桂,等. 基于语料库和规则库的人名识别法[A]. 计算语言研究与应用[C]. 北京: 北京语言学院出版社,1993,150-154.
[4] 黄德根,杨元生,王省,等. 基于统计方法的中文姓名识别[J]. 中文信息学报,2001,15(2):31-37.
[5] 王振华,孔祥龙,陆汝占,等.结合决策树方法的中文姓名识别[J]. 中文信息学报,2004,18(6):10-15.
[6] Vapnik, V.N. Statistical Learning Theory[M]. New York: John Wiley & Sons,1996.
[7] 田盛丰,黄厚宽,李洪波. 基于支持向量机的手写体相似识别[J]. 中文信息学报,2000,14(3):37-41.
[8] Vapnik, V.N. Statistical Learning Theory[M]. New York: John Wiley & Sons,1998.
[9] Hsu Chih-Wei and Lin Chih-Jen. A comparison of methods for multi-class support vector machines[J]. IEEE Transactions on Neural Networks, 2002, 13(2): 415-425.

基金

国家自然科学基金资助项目(60373095; 60373096)
PDF(368 KB)

Accesses

Citation

Detail

段落导航
相关文章

/