本文针对中文共指消解的具体任务,提出采用谱聚类的方法进行共指消解。首先,在待消解项对上抽取特征,使用最大熵模型判断两个待消解项存在共指关系的概率;然后,以此概率值作为相似度进行谱聚类;最后,得到若干实体,实现共指消解。该方法能从全局的角度进行实体划分,有效的提高准确率。在ACE2007标准数据集上的Diagnostic实验结果表明该方法的ACE Value比baseline方法有了2.5%的提高,Unweighted Precision值有5.4%的提高。
Abstract
This paper presents a novel method to implement coreference resolution. This method is based on spectral clustering. A maximum entropy model is first used to get the coreference probability of mention pairs with extracted features. The probabilities of mention pairs are then used to construct the similarity matrix for spectral clustering. Entities are generated according to the clustering cuts. This method can divide entities with a global view, which effectively improves precision. Experiments on ACE 2007 dataset show that the ACE Value of this method is 2.5% higher than that of baseline on Diagnostic task and Unweighted Precision is 5.4% higher.
关键词
共指消解 /
谱聚类 /
最大熵模型
{{custom_keyword}} /
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
参考文献
[1] 王厚峰,指代消解的基本方法和实现技术[J].中文信息学报2002,16(6): 9-17.
[2] 钱伟,郭以昆,周雅倩,等.基于最大熵分类模型的英文名词短语指代消解[J]. 计算机研究与发展,2003,40(9): 1337-1343.
[3] 庞宁,杨尔弘.基于最大熵分类模型的共指消解研究[J].中文信息学报,2008,22(2): 24-27.
[4] 周俊生,黄书剑,陈家骏,等.一种基于图划分的无监督汉语指代消解算法[J].中文信息学报,2007,21(2): 77-82.
[5] 高琰,谷士文,唐琎,等.机器学习中谱聚类方法的研究[J].计算机科学,2007,34(2): 201-203
[6] Yaqian Zhou, Changning Huang, Jianfeng Gao, et al. Transformation Based Chinese Entity Detection and Tracking[A]. IJCNLP[C], 2005, 232-237.
[7] Wee Meng Soon, Hwee Tou Ng, Daniel Chung Yong Lim. A machine learning approach to coreference resolution of noun phrases[J]. Computational Linguist, 2001, 27(4): 521 544.
[8] Vincent Ng and Claire Cardie. Improving machine learning approaches to coreference resolution[A]. ACL[C], 2002, 104-111.
[9] Aron Culotta, Michael Wick, Andrew McCallum, First-Order Probabilistic Models for Coreference Resolution[A]. NAACL/HLT[C], 2007, 81-88.
[10] AL Berger, VJ Della Pietra, SA Della Pietra. A Maximum Entropy Approach to Natural Language Processing[J]. Computational Linguistics, 1996,22(1): 39-71.
[11] Shi J, Malik J. Normalized cuts and image segmentation[J]. IEEE Transaction on PAMI, 2000, 22(8): 888-905.
[12] Luo X, Ittycheriah A, Jing H, et al. A mention-synchronous coreference resolution algorithm based on the bell tree[J]. Proc of ACL, 2004, 135-142.
[13] Florian, R., Hassan, H., Ittycheriah, A., et al. A statistical model for multilingual entity detection and tracking[A]. NAACL/HLT[C], 2004, 1-8.
{{custom_fnGroup.title_cn}}
脚注
{{custom_fn.content}}
基金
本项研究由自然科学基金(编号:60503070)和技术发展高校项目(编号:GH0742002)资助。
{{custom_fund}}