基于多层次特征集成的中文实体指代识别

张海雷,曹菲菲,陈文亮,任飞亮,王会珍,朱靖波

PDF(136 KB)
PDF(136 KB)
中文信息学报 ›› 2007, Vol. 21 ›› Issue (5) : 126-130.
综述

基于多层次特征集成的中文实体指代识别

  • 张海雷,曹菲菲,陈文亮,任飞亮,王会珍,朱靖波
作者信息 +

Chinese Entity Mention Detection Based on Multi-level Feature Integration

  • ZHANG Hai-lei, CAO Fei-fei, CHEN Wen-liang, REN Fei-liang, WANG Hui-zhen, ZHU Jing-bo
Author information +
History +

摘要

实体指代识别(Entity Mention Detection, EMD)是识别文本中对实体的指代(Mention)的任务,包括专名、普通名词、代词指代的识别。本文提出一种基于多层次特征集成的中文实体指代识别方法,利用条件随机场模型的特征集成能力,综合使用字符、拼音、词及词性、各类专名列表、频次统计等各层次特征提高识别性能。本文利用流水线框架,分三个阶段标注实体指代的各项信息。基于本方法的指代识别系统参加了2007年自动内容抽取(ACE07)中文EMD评测,系统的ACE Value值名列第二。

Abstract

The purpose of Entity Mention Detection (EMD) is to recognizel all mentions of entities in a document, involving recognition of named entities, noun words and pronoun coreference etc. In this paper, we propose an approach for Chinese entity mention detection by integrating multi-level features into the Conditional Random Fields (CRFs) framework. These features used include characters, phonetic symbols, lexical words and part-of-speech, named entities, and frequency statistics. All EMD subtasks are integrated into a three-stage pipeline framework in which three different CRFs classifiers are used to label different attributes sequentially in a predefined order. The system described here is the our submission to NIST ACE07 EMD Evaluation project, and achieved rank-2 performance in ACE07.

关键词

计算机应用 / 中文信息处理 / 实体指代识别 / 多任务标注 / 条件随机场模型 / ACE评测

Key words

computer applicatiopn / Chinese information processing / entity mention detection / mutil-task labeling conditional random fields / ACE evaluation

引用本文

导出引用
张海雷,曹菲菲,陈文亮,任飞亮,王会珍,朱靖波. 基于多层次特征集成的中文实体指代识别. 中文信息学报. 2007, 21(5): 126-130
ZHANG Hai-lei, CAO Fei-fei, CHEN Wen-liang, REN Fei-liang, WANG Hui-zhen, ZHU Jing-bo. Chinese Entity Mention Detection Based on Multi-level Feature Integration. Journal of Chinese Information Processing. 2007, 21(5): 126-130

参考文献

[1] The ACE 2007 (ACE07) Evaluation Plan v1.3. http://www.nist.gov/speech/tests/ace07/doc/.
[2] K. Hacioglu, B. Douglas, Y. Chen. Detection of Entity Mentions Occurring in English and Chinese Text[A]. In: Proceedings of HLT/EMNLP-2005[C]. Vancouver: 2005. 379-386.
[3] R. Florian, H. Hassan, A. Ittycheriah et al. A Statistical Model for Multilingual Entity Detection and Tracking[A]. In: Proceeding of HLT-NAACL 2004[C]. Boston: 2004, 1-8.
[4] G.D. Zhou, J. Su. Named Entity Recognition using an HMM-based Chunk Tagger[A]. In: Proceeding of the 40th Annual Meeting of the ACL[C]. Philadelphia: 2002, 473-480.
[5] 刘非凡,赵军,吕碧波, 等. 面向商务信息抽取的产品命名实体识别研究[J]. 中文信息学报, 2006, 20(1): 7-13.
[6] 吴雪军,朱靖波,王会珍,等. Co-Training的机器学习方法在中文机构名识别中的应用[A]. 全国第七届计算语言学联合学术会议[C]. 2003. 85-90.
[7] J. Lafferty, A. McCallum, F. Pereira. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data[A]. International Conference on Machine Learning (ICML01)[C]. 2001. 282-289.
[8] W.L. Chen, Y.J. Zhang, H. Isahara. Chinese Named Entity Recognition with Conditional Random Fields[A]. In: Proceedings of the Fifth SIGHAN Workshop on Chinese Language Processing[C]. Sydney: 2006. 118-121.
[9] R. Florian, H. Jing, N. Kambhatla et al. Factorizing Complex Models: A Case Study in Mention Detection[A]. In: Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the ACL[C]. Sydney: 2006. 473-480.
[10] H. Daume III, D. Marcu. A Large-Scale Exploration of Effective Global Features for a Joint Entity Detection and Tracking Model[A]. In: Proceedings of HLT/EMNLP-2005[C]. Vancouver: 2005. 379-386.
[11] H. Zhao, C.N. Huang, M. Li. An Improved Chinese Word Segmentation System with Conditional Random Field[A]. In: Proceeding of the 5th SIGHAN Workshop on Chinese Language Processing[C]. Sydney: 2006. 162-165.
[12] 吴雪军. 面向信息抽取的命名实体识别与模块获取技术研究[D]. 沈阳: 东北大学, 2004.

基金

国家自然科学基金资助项目(60473140);国家863高科技计划资助项目(2006AA01Z154);国家教育部新世纪优秀人才计划资助项目(NCET-05-0287);国家985工程计划资助项目(985-2-DB-C03)
PDF(136 KB)

611

Accesses

0

Citation

Detail

段落导航
相关文章

/