结合决策树方法的中文姓名识别

王振华,孔祥龙,陆汝占,刘绍明

PDF(234 KB)
PDF(234 KB)
中文信息学报 ›› 2004, Vol. 18 ›› Issue (6) : 11-16.

结合决策树方法的中文姓名识别

  • 王振华1,孔祥龙1,陆汝占1,刘绍明2
作者信息 +

Chinese Name Identification Integrated Decision Tree Learning

  • WANG Zhen-hua1,KONG Xiang-long1,LU Ru-zhan1,LIU Shao-ming2
Author information +
History +

摘要

中文姓名识别是自然语言处理中专名识别的一个重要的子问题,本文将中文姓名的识别过程细分为三个步骤:抽取阶段、分类阶段和消歧阶段。利用中文姓和名的用字概率信息,在文本中抽取潜在的中文姓名,以及其相关的上下文词法、语法和语义特征,并将潜在姓名是否是真实姓名的判别看作是两分类问题,并利用决策树算法来实现初步判别,最后消除初步判别结果中的歧义现象。实验结果表明,该方法的召回率和准确率都可达到90%以上。

Abstract

Chinese person name identification is a subfield of Named Entity Identification in natural language processing. This identification is divided into three stages in this paper : extraction , classification , and disambiguation. The candidate Chinese person names are extracted using statistical information. The morphological , syntax , and semantic features of the context are also extracted to compose the sample of classification. The estimation of the candidate is deemed to classification. We classify every candidate using decision tree to distinguish whether it is a real Chinese person name. In the end , the inconsistency in classification is disambiguated. Recall and precision are all above 90% in experiments using this method.

关键词

人工智能 / 自然语言处理 / 中文姓名识别 / 决策树 / 自然语言处理

Key words

artificial intelligence / natural language processing / Chinese person name identification / decision tree / natural language processing

引用本文

导出引用
王振华,孔祥龙,陆汝占,刘绍明. 结合决策树方法的中文姓名识别. 中文信息学报. 2004, 18(6): 11-16
WANG Zhen-hua,KONG Xiang-long,LU Ru-zhan,LIU Shao-ming. Chinese Name Identification Integrated Decision Tree Learning. Journal of Chinese Information Processing. 2004, 18(6): 11-16

参考文献

[1] 孙茂松,黄昌宁,高海燕,方捷. 中文姓名的自动辨识[J] . 中文信息学报,1995 ,9 (2) :16 - 27.
[2] 郑家恒,李鑫,谭红叶. 基于语料库的中文姓名识别方法研究[J] . 中文信息学报,2000 ,14 (1) :7 - 12.
[3] 李建华,王晓龙. 中文人名自动识别的一种有效方法[J] . 高技术通讯. 2000 ,2 ,46 - 49.
[4] Lluis Marquez , Part-of-speech Tagging : A Machine Learning Approach based on Decision Trees[D] . PhD Thesis. Dep. Llenguatges i Sistemes Informàtics , Universitat Politecnica de Catalunya. Barcelona (UPC) , July 1999.
[5] Michael Fleischman , Eduard Hovy , Fine Grained Classification of Named Entities[A] , the 19th international Conference on Computational Linguistics (COLING2002) , 2002 , 267 - 273.
[6] 刘挺,王开铸. 关于歧义字段切分的思考与实验[J] . 中文信息学报,1998 ,12 (2) :63 - 64.
[7] Tom M. Mitchell , Machine Learning[M] , The McGraw - Hill Companies , Inc. , 1997.

基金

自然科学基金资助项目(60496326);日本富士施乐公司资助项目
PDF(234 KB)

688

Accesses

0

Citation

Detail

段落导航
相关文章

/