远监督关系抽取算法能够自动将关系库中的关系与无标注的文本对齐,以进行文本中的关系抽取。目前提出的远监督关系抽取算法中,大多数是基于特征的。然而,此类算法在将实例转换为特征时,经常会出现关键信息不突出、数据集线性不可分等问题,影响关系抽取的效果。该文提出了一种基于模式的远监督关系抽取算法,其中引入了基于模式的向量,并使用了基于核的机器学习算法来克服上述问题。实验结果表明,该文提出的基于模式的远监督关系抽取算法,能够有效地提升远监督关系抽取的准确率。
Abstract
Distant supervision for relation extraction is an approach that can extract relations from texts automatically by aligning a database of facts with texts. Most of existing solutions are feature-based algorithms with certain defects. In this paper, we propose a pattern-based algorithm for distant supervised relation extraction with pattern-based vector. A kernel-based method is used in the algorithm to overcome the problems in feature-based algorithm. The experimental result shows that our algorithm can successfully improve the precision of distant supervision for relation extraction.
关键词
远监督 /
关系抽取 /
模式 /
核方法
{{custom_keyword}} /
Key words
distant supervision /
relation extraction /
pattern /
kernel method
{{custom_keyword}} /
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
参考文献
[1] Zelenko D, Aone C, Richardella A. Kernel methods for relation extraction[J]. The Journal of Machine Learning Research, 2003(3): 1083-1106.
[2] Kambhatla N. Combining lexical, syntactic, and semantic features with maximum entropy models for extracting relations[C]//Proceedings of the ACL 2004 on interactive poster and demonstration sessions. Association for Computational Linguistics, 2004: 22.
[3] GuoDong Z, Jian S, Jie Z, et al. Exploring various knowledge in relation extraction[C]//Proceedings of the 43rd annual meeting on association for computational linguistics. Association for Computational Linguistics, 2005: 427-434.
[4] Rosenfeld B, Feldman R. Clustering for unsupervised relation identification[C]//Proceedings of the sixteenth ACM conference on Conference on Information and Knowledge Management. ACM, 2007: 411-418.
[5] Yan Y, Okazaki N, Matsuo Y, et al. Unsupervised relation extraction by mining Wikipedia texts using information from the web[C]//Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2-Volume 2. Association for Computational Linguistics, 2009: 1021-1029.
[6] Mintz M, Bills S, Snow R, et al. Distant supervision for relation extraction without labeled data[C]//Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2-Volume 2. Association for Computational Linguistics, 2009: 1003-1011.
[7] Riedel S, Yao L, McCallum A. Modeling relations and their mentions without labeled text[M].Machine learning and knowledge discovery in databases. Springer Berlin Heidelberg, 2010: 148-163.
[8] Takamatsu S, Sato I, Nakagawa H. Reducing wrong labels in distant supervision for relation extraction[C]//Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers-Volume 1. Association for Computational Linguistics, 2012: 721-729.
[9] Pham A T, Raich R. Kernel-based instance annotation in multi-instance multi-label learning[C]//Proceedings of Machine Learning for Signal Processing (MLSP), 2014 IEEE International Workshop on. IEEE, 2014: 1-6.
[10] Craven M, Kumlien J. Constructing biological knowledge bases by extracting information from text sources[C]//Proceedings of the ISMB, 1999: 77-86.
[11] Bunescu R, Mooney R. Learning to extract relations from the Web using minimal supervision[C]//Proceedings of the Annual meeting-Association for Computational Linguistics. 2007, 45(1): 576.
[12] Cristianini N, Shawe-Taylor J. An introduction to support vector machines and other kernel-based learning methods[M]. Cambridge university press, 2000.
[13] Collins M, Duffy N. Convolution kernels for natural language[C]//Proceedings of the Advances in Neural Information Processing Systems. 2001: 625-632.
[14] Bunescu R C, Mooney R J. A shortest path dependency kernel for relation extraction[C]//Proceedings of the Conference on Human Language Technology and Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 2005: 724-731.
[15] Hoffmann R, Zhang C, Ling X, et al. Knowledge-based weak supervision for information extraction of overlapping relations[C]//Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies-Volume 1. Association for Computational Linguistics, 2011: 541-550.
[16] Surdeanu M, Tibshirani J, Nallapati R, et al. Multi-instance multi-label learning for relation extraction[C]//Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning. Association for Computational Linguistics, 2012: 455-465.
[17] Jiang J, Zhai C X. A Systematic exploration of the feature space for relation extraction[C]//Proceedings of the HLT-NAACL. 2007: 113-120.
[18] Qian L, Zhou G, Kong F, et al. Exploiting constituent dependencies for tree kernel-based semantic relation extraction[C]//Proceedings of the 22nd International Conference on Computational Linguistics-Volume 1. Association for Computational Linguistics, 2008: 697-704.
[19] Mooney R J, Bunescu R C. Subsequence kernels for relation extraction[C]//Proceedings of the Advances in neural information processing systems. 2005: 171-178.
[20] Alfonseca E, Filippova K, Delort J Y, et al. Pattern learning for relation extraction with a hierarchical topic model[C]//Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Short Papers-Volume 2. Association for Computational Linguistics, 2012: 54-59.
[21] Bollegala D T, Matsuo Y, Ishizuka M. Relational duality: Unsupervised extraction of semantic relations between entities on the web[C]//Proceedings of the 19th international conference on World wide Web. ACM, 2010: 151-160.
[22] Wang W, Besanon R, Ferret O, et al. Filtering and clustering relations for unsupervised information extraction in open domain[C]//Proceedings of the 20th ACM international conference on Information and knowledge management. ACM, 2011: 1405-1414.
{{custom_fnGroup.title_cn}}
脚注
{{custom_fn.content}}
基金
国家自然科学基金(61402532)
{{custom_fund}}