一种基于图划分的无监督汉语指代消解算法

周俊生,,黄书剑,陈家骏,曲维光

PDF(272 KB)
PDF(272 KB)
中文信息学报 ›› 2007, Vol. 21 ›› Issue (2) : 77-82.
综述

一种基于图划分的无监督汉语指代消解算法

  • 周俊生1,2,黄书剑1,陈家骏1,曲维光2
作者信息 +

A New Graph Clustering Algorithm for Chinese Noun Phrase Coreference Resolution

  • ZHOU Jun-sheng1, 2, HUANG Shu-jian1, CHEN Jia-jun1, QU Wei-guang2
Author information +
History +

摘要

指代消解是自然语言处理领域中的一个重要问题。针对当前中文指代标注训练语料非常缺乏的现状,本文提出一种无监督聚类算法实现对名词短语的指代消解。引入图对名词短语的指代消解问题进行建模,将指代消解问题转化为图划分问题,并引入一个有效的模块函数实现对图的自动划分,使得指代消解过程并不是孤立地对每一对名词短语分别进行共指决策,而是充分考虑了多个待消解项之间的相关性,并且避免了阈值选择问题。通过在ACE中文语料上的人称代词消解和名词短语消解实验结果表明,该算法是一种有效可行的无监督指代消解算法。

Abstract

Coreference resolution plays an important role in natural language processing. Facing the fact that the Chinese training corpus for coreference resolution is heavily lacking, this paper presents a new unsupervised clustering algorithm for noun phrase coreference resolution. In this approach, the problem of coreference resolution is firstly converted as a graph clustering problem, and then an objective function called the modularity function, which allows automatic selection of the number of clusters, is selected for graph clustering. The proposed algorithm does not make pairwise coreference decisions independently of each other. The experimental results on the Chinese ACE training corpus demonstrate that the proposed method is a feasible unsupervised algorithm for noun phrase coreference resolution.

关键词

人工智能 / 自然语言处理 / 聚类 / 指代消解 / 模块函数

Key words

artificial intelligence / natural language processing / clustering / coreference resolution / modularity function

引用本文

导出引用
周俊生,,黄书剑,陈家骏,曲维光. 一种基于图划分的无监督汉语指代消解算法. 中文信息学报. 2007, 21(2): 77-82
ZHOU Jun-sheng, , HUANG Shu-jian, CHEN Jia-jun, QU Wei-guang. A New Graph Clustering Algorithm for Chinese Noun Phrase Coreference Resolution. Journal of Chinese Information Processing. 2007, 21(2): 77-82

参考文献


[1] T. Morton. Coreference for NLP applications[A]. In: Proc. of the ACL[C]. 2000.
[2] D. Zelenko, C. Aone, and J. Tibbetts. Coreference resolution for information extraction[A]. In: Proc. of the ACL Workshop on Reference Resolution and its Applications[C]. 2004. 9-16.
[3] Kee van Deemter, Rodger Kibble. On Coreferring:Coreference in MUC and Related Annotation Schemes[J]. Computational Linguistics, 2000, 26(4):615-623.
[4] 王厚峰.指代消解的方法和实现技术[J]. 中文信息学报, 2002,16(6): 9-17.
[5] W. M. Soon, H. T. Ng, and D. Lim. A machine learning approach to coreference resolution of noun phrases[J]. Computational Linguistics, 2001, 27(4):521-544.
[6] V. Ng and C. Cardie.. Improving machine learning approaches to coreference resolution[A]. In: Proc. of the ACL[C]. 2002. 104-111.
[7] X. Yang, G. D. Zhou, J. Su, and C. L. Tan. Coreference resolution using competitive learning approach[A]. In: Proc. of the ACL 2003[C]. 176-183.
[8] X. Luo, A. Ittycheriah, H. Jing, N. Kambhatla, and S. Roukos. A mention-synchronous coreference resolution algorithm based on the Bell tree[A]. In: Proc. of the ACL 2004[C]. 136-143.
[9] 王厚峰, 梅铮. 鲁棒性的汉语人称代词消解[J]. 软件学报, 2005,16(5).
[10] C. Cardie and K. Wagstaff.. Noun phrase coreference as clustering[A]. In: Proc. of EMNLP/VLC[C]. 1999. 82-89.
[11] 李国臣,罗云飞. 采用优先选择策略的中文人称代词的指代消解[J]. 中文信息学报, 2005,19(4): 24-30.
[12] M. E. J. Newman and M. Girvan. Finding and evaluating community structure in networks[J]. Physical Review E, 69, 066133. 2004.
[13] M. E. J. Newman. Mixing patterns in networks[J]. Physical Review E, 67, 026126. 2003.
[14] M Vilain , J Aberdeen et al. A model theoretic coreference scoring scheme[A]. In: Proc. of the 6th Message Understanding Conf (MUC6)[C], San Francisco: Morgan Kaufmann Publishers, 1995. 45-52.
[15] Zhou Junsheng, Dai Xinyu et al. A Hybrid Approach to Chinese Word Segmentation around CRFs[A]. In: proceedings of the Fourth SIGHAN Workshop on Chinese Language Processing[C]. Jeju Island, Korea, 2005.

基金

国家863高技术研究发展计划资助项目(2006AA01Z143); 国家自然科学基金资助项目(60673043);江苏省自然科学基金项目(BK2006117)
PDF(272 KB)

Accesses

Citation

Detail

段落导航
相关文章

/