共指消解是信息抽取中一个重要子任务。近年来,许多学者尝试利用统计机器学习的方法来进行共指消解并取得了一定的进展。背景知识作为新的研究热点已经被越来越多地利用在自然语言处理的各个领域。该文集成多种背景语义知识作为基于二元分类的共指消解框架的特征,分别在WordNet、维基百科上提取背景知识,同时利用句子中的浅层语义关系、常见文本模式以及待消解词上下文文本特征。并利用特征选择算法自动选择最优的特征组合,同时对比同样的特征下最大熵模型与支持向量机模型的表现。在ACE数据集上实验结果表明,通过集成各种经过特征选择后的背景语义知识,共指消解的结果有进一步提高。
Abstract
The coreference resolution is an important subtask of information extraction. Recently statistical machine learning methods have been substantially attempted for this issue with some achievements. In this paper, we try to integrate the background semantic knowledge, which is a new subject being introduced in every field of NLP nowadays, into the classical pairwise classification framework for coreference resolution. We extract background knowledge from WordNet and Wikipedia, and exploit the semantic role labeling, general pattern knowledge and the context of mention as well. In the experiment, the feature selection algorithm is employed to decide the best features set, on which the maximum entropy model and SVM model are compared for their performance. The experimental results on ACE dataset exhibit the improvement of coreference resolution after adding selected background semantic knowledge.
Key words computer application; Chinese information processing; coreference resolution; background knowledge; WordNet; wikipedia
关键词
计算机应用 /
中文信息处理 /
共指消解 /
背景语义知识 /
WordNet /
维基百科
{{custom_keyword}} /
Key words
computer application /
Chinese information processing /
coreference resolution /
background knowledge /
WordNet /
wikipedia
{{custom_keyword}} /
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
参考文献
[1] Jun Lang, Bing Qin, Ting Liu, Sheng Li. 2007. Intra-document Coreference Resolution: The state of the art[J]. Journal of Chinese Language and Computing, 17 (4):227-253
[2] Ponzetto, Simone Paolo and Michael Strube. 2006. Exploiting Semantic Role Labeling, WordNet and Wikipedia for Coreference Resolution. [C]//Proceedings of the Human Language Technology Conference of the NAACL, Main Conference 2006.
[3] David L. Bean and Ellen Riloff. 2004. Unsupervised Learning of Contextual Role Knowledge for Coreference Resolution. [C]//Proceedings of HLT-NAACL 2004.
[4] Xiaofeng Yang and Jian Su. 2007. Coreference Resolution Using Semantic Relatedness Information From Automatically Discovered Patterns. [C]//Proceedings of ACL 2007.
[5] J. McCarthy and W. Lehnert. 1995. Using decision trees for coreference resolution. In: C.R. Perrault ed. [C]//Proc. of the Fourteenth International Joint Conference on Artificial Intelligence. Qu bec, Canada: Springer, 1050-1055.
[6] Kohavi, R., G. H. John. 1997. Wrappers for feature subset selection[J]. Artificial Intelligence Journal. 97(1-2): 273–324.
[7] 张钹. 2007. 自然语言处理的计算模型[J].中文信息学报,21(3):3-7.
[8] Soon, W. M., H. T. Ng, D. C. Y. Lim. 2001. A machine learning approach to coreference resolution of noun phrases [J]. Computational Linguistics, 27(4): 521 544.
[9] X. Luo. 2005. On coreference resolution performance metrics. [C]//Proc. of the conference on Human Language Technology and Empirical Methods in Natural Language Processing. Vancouver, British Columbia, Canada: Association for Computational Linguistics, 25-32.
{{custom_fnGroup.title_cn}}
脚注
{{custom_fn.content}}
基金
国家自然科学基金资助项目(60575042, 60503072);国家863计划资助项目(2006AA01Z145)
{{custom_fund}}