开放关系抽取从海量数据中获取知识,是自然语言处理的一个关键技术。开放关系抽取可以实现多种关系的抽取,由于中文领域可供训练的标注数据较少且语义句式较为复杂,面向中文的开放关系抽取存在较多困难。现有的中文开放关系抽取方法存在实体识别覆盖率较低且抽取关系种类单一的问题,无法满足知识图谱扩展等应用需求。该文提出了多策略的开放关系抽取方法,该方法综合利用知识图谱提高了实体识别的覆盖度,依靠实体上下文信息实现了实体对关系的抽取,根据依存句法分析抽取得到全要素三元组,并实现了从文本中抽取实体属性的方法。实验证明,该文的抽取方法准确率高,抽取关系种类多样,可以服务于知识图谱扩展等任务。
Abstract
Open relation extraction is to obtain knowledge from massive texts, which is a challenging task in natural language processing community. With few annotation data and complex sentences, Chinese open relation extraction faces more difficulties. This paper proposes a multi strategy open relation extraction method, which comprehensively uses the knowledge graph to improve the coverage of entity recognition, realizes the relation extraction by the entity context, obtains the all element triples by the dependency parsing, and extracts the entity attribute from the text. Experiments show that the proposed method has high accuracy for various types of relationships.
关键词
开放关系抽取 /
多策略 /
知识图谱
{{custom_keyword}} /
Key words
open relation extraction /
multi strategy /
knowledge graph
{{custom_keyword}} /
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
参考文献
[1] TSENG Y H,LEE L H,LIN S Y,et al. Chinese open relation extraction for knowledge acquisition[C]//Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics,volume 2: Short Papers,2014: 12-16.
[2] QIU L,ZHANG Y. ZORE: A syntax-based system for Chinese open relation extraction[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing,2014: 1870-1880.
[3] JIA S,E S,LI M,et al. Chinese open relation extraction and knowledge base establishment[J]. ACM Transactions on Asian and Low-Resource Language Information Processing,2018,17(3): 1-22.
[4] YATES A, BANKO M, BROADHEAD M, et al. TextRunner: Open information extraction on the web[C]//Proceedings of the Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics,2007: 25-26.
[5] FEI WU AND DANIEL S. Weld. Open information extraction using wikipedia[C]//Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, 2010: 118-127.
[6] FADER A,SODERLAND S,ETZIONI O. Identifying relations for open information extraction[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics,2011: 1535-1545.
[7] YAHYA M,WHANG S,GUPTA R,et al. ReNoun: Fact extraction for nominal attributes[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing. 2014: 325-335.
[8] MAUSAM, MICHAEL S, ROBERT B, et al. Open language learning for information extraction[C]//Proceedings of the Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, 2012: 523-534.
[9] LUCIANO D C, RAINER G. Clausie: Clause-based open information extraction[C]//Proceedings of the 22nd International Conference on World Wide Web,2013: 355-366.
[10] 秦兵,刘安安,刘挺. 无指导的中文开放式实体关系抽取[J]. 计算机研究与发展,2015,52(5): 1029-1035.
[11] DEVLIN J, CHANG M, LEE K. et al. BERT: Pre-training of deep bidirectional transformers for language understanding[J]. ArXiv, abs/1810.04805.2019.
[12] HOCHREITER S,SCHMIDHUBER J. Long short-term memory[J]. Neural Computation,1997,9(8): 1735-1780.
[13] CHURC H,WARD K. Word2Vec[J]. Natural Language Engineering,2017,23(01): 155-162.
[14] EFFREY P, RICHARD S, CHRISTOPHER M. GloVe: Global vectors for word representation[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2014: 1532-1543.
[15] PETERS M,NEUMANN M,IYYER M,et al. Deep contextualized word representations[C]//Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies,2018: 2227-2237.
{{custom_fnGroup.title_cn}}
脚注
{{custom_fn.content}}
基金
国家自然科学基金(62006136);NSFC-通用技术基础研究联合基金(U1736204)
{{custom_fund}}