语义信息在命名实体间语义关系抽取中具有重要的作用。该文以《同义词词林》为例,系统全面地研究了词汇语义信息对基于树核函数的中文语义关系抽取的有效性,深入探讨了不同级别的语义信息和一词多义等现象对关系抽取的影响,详细分析了词汇语义信息和实体类型信息之间的冗余性。在ACE2005中文语料库上的关系抽取实验表明,在未知实体类型的前提下,语义信息能显著提高抽取性能;而在已知实体类型的情况下,语义信息也能明显提高某些关系类型的抽取性能,这说明《词林》语义信息和实体类型信息在中文语义关系抽取中具有一定的互补性。
Abstract
Semantic information plays an important role in the semantic relation extraction between named entities. Taking “TongYiCi CiLin” as an example, this paper systematically investigates the effectiveness of lexical semantic information on tree kernel-based Chinese semantic relation extraction, particularly the influence of different levels of semantic information and polysemy phenomenon, as well as details about the redundancy between lexical semantic information and entity type information. The experiments of relation extraction on the ACE2005 Chinese corpus shows that semantic information can significantly improve the extraction performance without entity types, while in the case of known entity types, semantic information can also noticeably enhance the extraction performance for some relation types. This implies a certain degree of complementarity between “CiLin” semantic information and entity type information in Chinese semantic relation extraction.
关键词
中文实体关系抽取 /
树核函数 /
同义词词林 /
语义信息
{{custom_keyword}} /
Key words
Chinese entity relation extraction /
tree kernel /
TongYiCi CiLin /
semantic information
{{custom_keyword}} /
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
参考文献
[1] Nanda Kambhatla. Combining lexical, syntactic and semantic features with Maximum Entropy models for extracting relations[C]//Proceedings of the ACL. Morristown, NJ, USA, 2004: 178-181.
[2] Zhou GuoDong, Su Jian, Zhang Jie, et al. Exploring various knowledge in relation extraction[C]//Proceedings of the ACL, 2005:427-434.
[3] Zhou G D, Qian L H, Fan J X. Tree kernel-based semantic relation extraction with rich syntactic and semantic information[C]//Proceedings of the Information Sciences, 2010:1313-1325.
[4] Chan Y S, Roth D. Exploiting Background Knowledge for Relation Extraction[C]//Proceedings of the COLING, 2010:152-160.
[5] Sun A, Grishman R, Sekine S. Semi-supervised Relation Extraction with Large-scale Word Clustering[C]//Proceedings of the ACL, 2011:521-529.
[6] Zhang M, Zhang J, Su J, et al. A Composite Kernel to Extract Relations between Entities with both Flat and Structured Features[C]//Proceedings of the COLING-ACL. Sydney, Australia, 2006:825-832.
[7] Zhou G D, Zhang M, Ji D H, et al. Tree Kernel-based Relation Extraction with Context-Sensitive Structured Parse Tree Information[C]//Proceedings of the EMNLP/CoNLL. Prague,Czech, 2007:728-736.
[8] Zhou G D, Zhu Q M. Kernel-based semantic relation detection and classification via enriched parse tree structure[J]. Journal of Computer Science and Technology. 2011. 26(1):45-56.
[9] Qian L H, Zhou G D, Kong F, et al. Exploiting constituent dependencies for tree kernel-based semantic relation extraction[C]//Proceedings of the COLING. Manchester, 2008:697-704.
[10] Qian L H, Zhou G D, Zhu Q M. Employing Constituent Dependency Information for Tree Kernel-based Semantic Relation Extraction between Named Entities[C]//Proceedings of the ACM Transaction on Asian Language Information Processing. 2011. 10(3): Article 15(24pages).
[11] Culotta A, Sorensen J. Dependency tree kernels for relation extraction[C]//Proceedings of the ACL. Barcelona, Spain, 2004:423-429.
[12] Bunescu R C, Raymond J M. A Shortest Path Dependency Kernel for Relation Extraction[C]//Proceedings of the EMNLP. Vancover, B.C, 2005:724-731.
[13] Nguyen T T, Moschitti A, Riccardi G. Convolution Kernels on Constituent, Dependency and Sequential Structures for Relation Extraction[C]//Proceedings of the EMNLP, 2009: 1378-1387.
[14] 车万翔, 刘挺, 李生. 实体关系自动抽取[J]. 中文信息学报, 2005,19(2): 1-6.
[15] 董静, 孙乐, 冯元勇, 黄瑞红. 中文实体关系抽取中的特征选择研究[J]. 中文信息学报, 2007,21(4): 80-85, 91.
[16] Li W J, Zhang P, Wei F R, et al. A Novel Feature-based Approach to Chinese Entity Relation Extraction[C]//Proceedings of the ACL. Columbus, Ohio, USA, 2008: 89-92.
[17] Che W X, Jiang J M, Su Z, et al. Improved-Edit-Distance Kernel for Chinese Relation Extraction[C]//Proceedings of the IJCNLP. 2005: 132-137.
[18] 刘克彬, 李芳, 刘磊, 韩颖. 基于核函数中文关系自动抽取系统的实现[J]. 计算机研究与发展, 2007,44(8): 1406-1411.
[19] 黄瑞红, 孙乐, 冯元勇, 黄云平. 基于核方法的中文实体关系抽取研究[J]. 中文信息学报, 2008, 22(5): 102-108.
[20] 虞欢欢, 钱龙华, 周国栋, 朱巧明. 基于合一句法和实体语义树的中文语义关系抽取[J]. 中文信息学报, 2010,24(5): 17-23.
[21] 梅家驹, 竺一鸣, 高蕴琦, 殷鸿翔.同义词词林(第二版)[M].上海:上海辞书出版社, 1996.
[22] Zhang H P, Yu H K, Xiong D Y, et al. HHMM-based Chinese Lexical Analyzer ICTCLAS[C]//Proceedings of the 2nd SIGHAN workshop affiliated with 41th ACL. Sapporo Japan, 2003:184-187.
[23] Moschitti A. A Study on Convolution Kernels for Shallow Semantic Parsing[C]//Proceedings of the ACL. Barcelona, Spain, 2004:335.
{{custom_fnGroup.title_cn}}
脚注
{{custom_fn.content}}
基金
国家自然科学基金(60873150,90920004)、江苏省自然科学基金(BK2010219, 11KJA520003)
{{custom_fund}}