命名实体识别是信息抽取的重要研究内容,主要包括对组织机构名、地名和人名的自动识别。针对英语和汉语的命名实体识别研究开始较早,主要采用基于规则和基于统计的方法进行识别,但目前国内还少有针对越南语命名实体识别的研究。该文分析了越南语命名实体的语言学特点,对其分类并进行了形式化表达,提出了一种基于规则的越南语命名实体识别方法,实验结果显示,该方法能够达到较高的识别准确率。
Abstract
Named Entity Recognition (NER) is an important task for Information Extraction. NER mainly includes the recognition of person names, location names and organization names. Studies on English and Chinese NER began relatively earlier, mainly using rule-based methods or statistical methods. There are fewer studies carried out on Vietnamese NER, and there are even no domestic studies. This paper presents a rule based method to recognize Vietnamese Named Entities on the basis of their linguistic formations. Experiments results validate the effectiveness of this method.
关键词
命名实体识别 /
越南语 /
规则
{{custom_keyword}} /
Key words
named entity recognition /
Vietnamese /
rule
{{custom_keyword}} /
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
参考文献
[1] Tri Tran Q, Thao Pham T X, Hung Ngo Q, et al. Named Entity Recognition in Vietnamese documents [J]. Progress in Informatics, 2007,4: 5-13.[2] 孙衍峰. 越南语人际称谓研究[M]. 北京: 外文出版社,2009.
[3] 陆利军: 《越南行政地名研究》,(硕士论文)广西民族大学,2007年,第26页。
[4] 丛国胜. 越南行政地名译名手册[M]. 北京: 军事谊文出版社,2004.
[5] 宗成庆. 统计自然语言处理[M]. 北京: 清华大学出版社,2008.
[6] Daniel Jurafsky, James H. Martin, 冯志伟 孙乐译. 自然语言处理综论[M]. 北京: 电子工业出版社,2005.
[7] 俞鸿魁,张华平,刘群等. 基于层叠隐马尔可夫模型的中文命名实体识别[J]. 通信学报,2006,27(2).
[8] 张晓艳,王挺,陈火旺. 基于混合统计模型的汉语命名实体识别方法[J]. 中文信息学报,2009,(2).
[9] Chen, Hsin-His, Yang Changhua & Ying Lin. Learning Formulation and Transformation Rules for Multilingual Named Entities [C] // Proceedings of ACL-2003.
[10] Chieu, Hai Leong & Hwee Tou Ng. Named Entity Recognition with a Maximum Entropy Approach [C] // Proceedings of CoNLL-2003.
[11] Dat Bat Nguyen, Son Huu Hoang, Son Bao Pham & Thai Phuong Nguyen. Named Entity Recognition for Vietnamese [J]. ACIIDS 2010. Part II, LNAI 5991, pp. 205-214.
[12] Klein, Dan, Joseph Smarr, Huy Nguyen & Christopher D. Manning. Named Entity Recognition with Character-Level Models [C] // Proceedings of CoNLL-2003.
[13] Mayeld, James, Paul McNamee & Christine Piatko. Named Entity Recognition using Hundreds of Thousands of Features [C] // Proceedings of CoNLL-2003.
[14] Thao Pham T. X, Tri T. Q., Ai Kawazoe, Dien Dinh & Nigel Collier. Construction of Vietnamese Corpora for Named Entity Recognition [C] // Conference RIAO2007. Pittsburgh PA, U.S.A. May 30-June 1, 2007.
[15] Tri Tran Q., Thao Pham T. X., Hung Ngo Q., Dien DINH, Nigel COLLIER. Named Entity Recognition in Vietnamese documents [J]. Progress in Informatics, No.4, pp.5-13,(2007).
[16] Whitelaw, Casey & Jon Patrick. Named Entity Recognition Using a Character-based Probabilistic Approach [C] // Proceedings of CoNLL-2003.
[17] WU, Youzheng, ZHAO Jun & XU Bo. Chinese Named Entity Recognition Combining a Statistical Model with Human Knowledge [C] // Proceedings of ACL-2003.
{{custom_fnGroup.title_cn}}
脚注
{{custom_fn.content}}
基金
中国—东盟研究中心资助课题(201205)
{{custom_fund}}