基于层叠条件随机场的旅游领域命名实体识别

郭剑毅,薛征山,余正涛,张志坤,张宜浩,姚贤明

PDF(757 KB)
PDF(757 KB)
中文信息学报 ›› 2009, Vol. 23 ›› Issue (5) : 47-53.
综述

基于层叠条件随机场的旅游领域命名实体识别

  • 郭剑毅1,2,薛征山1,余正涛1,2,张志坤1,张宜浩1,姚贤明1
作者信息 +

Named Entity Recognition for the Tourism Domain
Based on Cascaded Conditional Random Fields

  • GUO Jianyi1,2, XUE Zhengshan1, YU Zhengtao1,2, ZHANG Zhikun1, ZHANG Yihao1, YAO Xianming1
Author information +
History +

摘要

针对旅游领域,提出了一种基于层叠条件随机场模型的旅游领域命名实体识别方法。该方法在低层条件随机场中以字为切分粒度,结合旅游景点常用字表、景点常用后缀表、地名常用字表等特征词典,实现简单旅游命名实体的识别;其识别结果传递到高层模型,以词为切分粒度,结合复杂特征,实现嵌套景点、特产风味、地点的识别。最后进行了两组相关实验,结果表明,在开放测试中,层叠条件随机场模型相比于单层模型,F值提高了8个百分点;相比于HMM模型,正确率提高了8个百分点,召回率提高了22个百分点,F值提高了15个百分点。

Abstract

This paper presents a method for named entity recognition in the tourism domain based on the cascaded conditional random fields. This method consists of two steps. The first step is used to identify simple tourism named entities, using Chinese characters as units with the dictionary of common character and suffix in tourism attractions, the dictionary of common character in location names and other dictionaries. Then the results of the first step are sent to the second step, in which the nesting tourist attractions, special snacks and location names are recognized by the word unit and other complex features. The results of six experiments indicated that in open testing, the proposed method increases by 8% in the F-score compared to the model of single layer, and by 15% in the F-score (with 8% in the precision and 22% in recall, respectively) compared to the HMM model.
Key words computer application; Chinese information processing; tourism domain; named entity recognition; cascaded conditional random fields; feature template

关键词

计算机应用 / 中文信息处理 / 旅游领域 / 命名实体识别 / 层叠条件随机场 / 特征模板

Key words

computer application / Chinese information processing / tourism domain / named entity recognition / cascaded conditional random fields / feature template

引用本文

导出引用
郭剑毅,薛征山,余正涛,张志坤,张宜浩,姚贤明. 基于层叠条件随机场的旅游领域命名实体识别. 中文信息学报. 2009, 23(5): 47-53
GUO Jianyi, XUE Zhengshan, YU Zhengtao, ZHANG Zhikun, ZHANG Yihao, YAO Xianming. Named Entity Recognition for the Tourism Domain
Based on Cascaded Conditional Random Fields. Journal of Chinese Information Processing. 2009, 23(5): 47-53

参考文献

[1] 俞鸿魁, 张华平, 刘群,等. 基于层叠隐马尔可夫模型的中文命名实体识别[J]. 通信学报, 2006, 27(2): 87-93.
[2] 周俊生, 戴新宇, 尹存燕,等.基于层叠条件随机场模型的中文机构名自动识别[J].电子学报, 2006, 34(5): 804-809.
[3] 李中国, 刘颖. 边界模板和局部统计相结合的中国人名识别[J].中文信息学报. 2006, 20(5): 44-50.
[4] 王浩畅, 赵铁军. 基于SVM的生物医学命名实体识别[J].哈尔滨工程大学学报. 2006, 27: 570-574.
[5] 陈锦, 常致全, 许军. 基于HMM的生物医学命名实体的识别与分类[J].Computer Era, 2006, (10): 40-42.
[6] 刘非凡, 赵军,等. 面向商务信息抽取的产品命名实体识别研究[J].中文信息学报, 2005, 20(1): 7-13.
[7] 张小衡, 王玲玲. 中文机构名称的识别与分析[J].中文信息学报, 1997, 11(4): 21-32.
[8] Wang Houfeng, Shi Wuguang. A simple rule-based approach to organization name recognition in Chinese text[C]//Proc. of 5th CICLing. LNCS 3406, Heidelberg, German:Springer-Verlag, 2005. 769-772.
[9] Hongkui Yu, Huaping Zhang, Quan Liu. Recognition of Chinese organization name based role tagging[C]//Proc. of Advances in Computation of Oriental Languages. Beijing: Tsinghua University Press, 2003. 79-87.
[10] Church K W, Hanks P. Word association norms, mutual information and lexicography[J].Computational Linguistics, 1990(3):22-29.
[11] Darroch J, Lauritzen S, Speed T..Markov fields and log-linear interaction models for contingency tables[J].Annals of Statistics, 1980,8(3):522-539.
[12] Della Pietra S, Della Pietra V, and Lafferty J. Inducting features of random fields[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence,1997,19(4):380-393.
[13] Wallach H. Efficient. Traning of conditional random fields[OL].http:www.cogsci.ed.ac.uk.
[14] 冯元勇, 孙乐, 李文波,等. 基于单字提示特征的中文命名实体识别快速算法[J].中文信息学报,2008, 22(1): 104-110.
[15] 廖先桃, 于海滨, 秦兵,等. HMM与自动规则提取相结合的中文命名实体识别[C]// 第二届全国学生计算语言学研讨会论文集,2004.

基金

国家自然科学基金资助项目(60863011,60663004);教育部博士点基金资助项目(20050007023);云南省中青年学术带头人后备人才基金资助项目(2007PY01-11);云南省教育厅重点基金资助项目(07Z11139);昆明理工大学博士基金资助项目(2006-12)
PDF(757 KB)

Accesses

Citation

Detail

段落导航
相关文章

/