市场信息化使得商务信息抽取、市场内容管理日益成为信息科学领域的一个研究热点。产品命名实体识别作为其中非常重要的关键技术之一也逐渐受到人们的关注。本文面向商务信息抽取对产品命名实体进行了定义并系统分析了其识别任务的特点和难点,提出了一种基于层级隐马尔可夫模型(hierarchical hidden Markov model)的产品命名实体识别方法,实现了汉语自由文本中产品命名实体识别和标注的原型系统。实验表明,该系统在电子数码和手机领域均取得了令人满意的实验结果,对产品名实体、产品型号实体、产品品牌实体整体识别性能的F值分别为79.7% ,86.9% ,75.8%。通过和最大熵模型相比较,验证了HHMM对于处理多尺度嵌套序列有更强的表征能力。
Abstract
Electronic business has fueled increasing research interest recently in business information extraction and market intelligence management. As one of the key techniques , product named entity recognition (product NER) has also begun to draw more attention in the field of natural language processing. In the paper , characteristics and challenges in product NER are explored and analyzed deliberately , and a hierarchical hidden Markov model (HHMM) based approach to product NER from Chinese free text is presented. Experimental results in both digital and mobile phone domains show that our approach performs quite well in these two different domains and achieves F-measures of 79.7% , 86.9% , 75.8% on the whole for three types of product named entities respectively. In comparison with maximum entropy model , HHMM is experimentally proved to be more powerful for dealing with multi-scale embedded sequence problem.
关键词
计算机应用 /
中文信息处理 /
产品命名实体识别 /
商务信息抽取 /
层级隐马尔可夫模型
{{custom_keyword}} /
Key words
computer application /
Chinese information processing /
product named entity recognition /
business information extraction /
hierarchical hidden Markov model (HHMM)
{{custom_keyword}} /
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
参考文献
[1] John M. Pierre. Mining Knowledge from Text Collections Using Automatically Generated Metadata [A] . In : Proceedings of Fourth International Conference on Practical Aspects of Knowledge Management [C] . London , UK:Springer-Verlag , 2002 , 537 - 548.
[2] Bick , Eckhard. A Named Entity Recognizer for Danish[A] . In :Lino et al. (eds.) , Proc. of 4th International Conf. on Language Resources and Evaluation (LREC2004) [C] , Lisbon , 2004 , 305 - 308.
[3] Jian Sun , Jianfeng Gao , Lei Zhang , Ming Zhou , Changning Huang. Chinese Named Entity Identification Using Class-based Language Model [A] . In :Proceedings of the 19th international conference on Computational Linguistics [C] . Morristown , NJ , USA , Association for Computational Linguistics , 2002 , 1 - 7.
[4] Huaping Zhang , et al. Chinese NER Using Role Model [J] . Special Issue of the International Journal of Computational Linguistics and Chinese Language Processing , 2003 , 8 (2) : 29 - 60.
[5] Guohong Fu and Kang-Kwong Luke. Chinese Unknown Word Identification Using Class-based LM[A] . In :Proceedings of the First International Joint Conference on Natural Language Processing (IJCNLP - 04) [C] . Hainan , China , 2004 , 262 - 269.
[6] Tzong-Han Tsai , et al. Mencius : A Chinese Named Entity Recognizer Using the Maximum Entropy-based Hybrid Model [J] . International Journal of Computational Linguistics & Chinese Language Processing , 2004 , 9 (1) :62 - 82.
[7] Cheng Niu , Wei Li , Jihong Ding and Rohini K. Srihari. A Bootstrapping Approach to Named Entity Classification Using Successive Learners [A] . In : Proceedings of the 41st ACL [C] , Sapporo , Japan , 2003 , 335 - 342.
[8] Shai Fine , Yoram Singer , Naftali Tishby. (1998) The Hierarchical HiddenMarkovModel : Analysis and Applications [J] . Machine Learning. 1998 , 32 (1) : 41 - 62.
[9] Y. Z. Wu , J. Zhao , B. Xu. Chinese Named Entity Recognition Combining Statistical Model with Human Knowledge [A] . Workshop of 41st ACL :Multilingual and Mix-language NER [C] ,Sapporo , Japan , 2003 , 65 - 72.
{{custom_fnGroup.title_cn}}
脚注
{{custom_fn.content}}
基金
国家自然科学基金资助项目(60372016);北京市自然科学基金资助项目(4052027)
{{custom_fund}}