开放式信息抽取研究进展

PDF(2407 KB)

中文信息学报 ›› 2014, Vol. 28 ›› Issue (4) : 1-11.

综述

开放式信息抽取研究进展

杨博¹,蔡东风¹,杨华²

作者信息 +

Progress in Open Information Extraction

YANG Bo¹, CAI Dongfeng¹, YANG Hua²

Author information +

History +

摘要

从大规模非结构化文本中自动地抽取有用信息是自然语言处理和人工智能的一个重要目标。开放式信息抽取在高效挖掘网络文本信息方面已成为必然趋势,按关系参数可分为二元、多元实体关系抽取,该文按此路线对典型方法的现状和存在问题进行分析与总结。目前多数开放式实体关系抽取仍是浅层语义处理,对隐含关系抽取很少涉及。采用马尔科夫逻辑、本体结构推理等联合推理方法可综合多种特征,有效推断细微完整信息,为深入理解文本打开新局面。

Abstract

Extracting useful information automatically from large-scale unstructured texts has been a long-standing goal of NLP and AI. And open information extraction is now widely pursued for effective web information acquisition. Open information extraction can be divided into dual and n-tuple entity relation extraction according to the number of arguments involved. In accordance with these two aspects, this paper analyses several typical methods for open relation extraction together with their defects. It is indicated that most current methods still belong to shallow semantic processing, hardly considering the implicit relation. Therefore, it is beleved that the adoption of joint inference strategy such as the markov logic and the ontology structure based inference can take advantage of multiple features. The combination of open and open up a promising prospect to infer the fine and full information for open information extraction.

导出引用

杨博,蔡东风,杨华. 开放式信息抽取研究进展. 中文信息学报. 2014, 28(4): 1-11

YANG Bo, CAI Dongfeng, YANG Hua. Progress in Open Information Extraction. Journal of Chinese Information Processing. 2014, 28(4): 1-11

参考文献

[1] Oren Etzioni, Michele Banko, Michael J. Cafarella. Machine reading[C]//Proceedings of AAAI Conference on Artificial Intelligence, 2006.
[2] K Barker, B Agashe, S Chaw, et al. Learning by reading: A prototype system, performance baseline and lessons learned[C]//Proceedings of 22nd National Conference of Artificial Intelligence, 2007.
[3] 赵军,刘康,周光有,蔡黎.开放式文本信息抽取[J].中文信息学报,2011,25(6):98-110.
[4] O Etzioni, M Cafarella, D Downey, et al. Unsupervised named-entity extraction from the web: An experimental study[J]. Artificial Intelligence, 2005, 165(1):91-134.
[5] Michele Banko, Michael J Cafarella, Stephen Soderland, et al. Open information extraction from the web[C]//Proceedings of IJCAI, 2007.
[6] Michele Banko, Oren Etzioni. The tradeoffs between open and traditional relation extraction[C]//Proceedings of Annual Meeting of the Association for Computational Linguistics, 2008.
[7] F Wu, D S Weld. Open information extraction using Wikipedia[C]//Proceedings of Annual Meeting of the Association for Computational Linguistics, 2010: 118-127.
[8] Fei Wu, Daniel S Weld. Automatically semantifying Wikipedia[C]//Proceedings of the 16th Conference on Information and Knowledge Management, 2007.
[9] Anthony Fader, Stephen Soderland, Oren Etzioni. Identifying relations for open information extraction[C]//Proceedings of Conference on Empirical Methods in Natural Language Processing, 2011.
[10] Oren Etzioni, Anthony Fader, Janara Christensen, et al. Open information extraction: the second generation[C]//Proceedings of International Joint Conference on Artificial Intelligence, 2011.
[11] Mausam, Michael Schmitz, Robert Bart, Stephen Soderland, Oren Etzioni. Open Language Learning for Information Extraction[C]//Proceedings of Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CONLL), 2012.
[12] Janara Christensen, Mausam, Stephen Soderland, Oren Etzioni. An analysis of open information extraction based on semantic role labeling[C]//Proceedings of K-CAP, 2011: 113-120.
[13] Johannes Hoffart, Fabian M. Suchanek, Klaus Berberich, et al. YAGO2: A Spatrally and Iemporally Enhanced Knowledge Base Powwikipedia[J].Artificial Intelligence, 2013,194:28-16.
[14] Xiao Ling, Daniel S.Weld. Temporal information extraction[C]//Proceedings of the 24th AAAI Conference on Artificial Intelligence, 2010.
[15] Gerhard Weikum, Nikos Ntarmos, Marc Spaniol, et al. Longitudinal analytics on web archive data: Its about time![C]//Proceedings of CIDR, 2011: 199-202.
[16] Alan Akbik, Alexander Loser. KRAKEN: N-ary Facts in Open Information Extraction[C]//Proceedings of AKBC-WEKEX at NAACL, 2012: 52-56.
[17] Alan Akbik, Jurgen Bross. Wanderlust: Extracting semantic relations from natural language text using dependency grammar patterns[C]//Proceedings of the 1st Workshop on Semantic Search at 18th WWWW Conference, 2009.
[18] D T Bollegala, Y Matsuo, M Ishizuka. Relational duality: Unsupervised extraction of semantic relations between entities on the web[C]//Proceedings of the 19th international conference on world wide web, 2010: 151-160.
[19] Bonan Min, Shuming Shi, Ralph Grishman, Chin-Yew Lin. Ensemble Semantics for Large-scale Unsupervised Relation Extraction[C]//Proceedings of Empirical Methods in Natural Language Processing and Computational Natural Language Learning, 2012: 1027-1037.
[20] M Mintz, S Bills, R Snow, D Jurafsky. Distant supervision for relation extraction without labeled data[C]//Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, 2009: 1003-1011.
[21] Del Corro L, Gemulla R. ClansIE: Clanse-based Open Information Extraction[C]//Proceedings of the 22nd International conference on world wide web, 2013: 355-366.
[22] Andrew McCallum. Joint Inference for Natural Language Processing[C]//Proceedings of the 13th Conference on Computational Natural Language Learning, 2009.
[23] P Domingos, D Lowd. Markov Logic: An Interface Layer for Artificial Intelligence[M]. Morgan & Claypool, San Rafael, CA, 2009.
[24] Wanxiang Che, Ting Liu. Jointly Modeling WSD and SRL with Markov Logic[C]//Proceedings of the 23rd International Conference on Computational Linguistics, 2010: 161-169.
[25] Yang Song, Jing Jiang, Wayne Xin Zhao, et al. Joint Learning for Coreference Resolution with Markov Logic[C]//Proceedings of Conference on Empirical Methods in Natural Language Processing, 2012.
[26] Xipeng Qiu, Ling Cao, Zhao Liu, Xuan jing Huang. Recongnizing Inference in Iexts with Markov Logic Networks[J]. ACM Language Information Processing, 2012, 11(4), Article 15.
[27] Hongjie Dai, Richard Tzong-Han Tsai, Wen-Lian Hsu. Entity Disambiguation Using a Markov Logic Network[C]//Proceedings of the 5th International Joint Conference on Natural Language Processing, 2011: 846-855.
[28] Hoifung Poon, Pedro Domingos. Joint Inference in Information Extraction[C]//Proceedings of the 22nd National Conference on Artificial Intelligence, 2007: 913-918.
[29] Jun Zhu, Zaiqing Nie, Xiaojiang Liu, Bo Zhang, Jirong Wen. StatSnowball: a statistical approach to extracting entity relationships[C]//Proceedings of the 18th international conference on World Wide Web, 2009: 101-110.
[30] E Agichtein, L Gravano. Snowball: Extracting relations from large plain-text collections[C]//Proceedings of the 5th ACM International Conference on Di-gital Libraries, 2000.
[31] Xiaojiang Liu, Nenghai Yu. People Summarization by Combining Named Entity Recognition and Relation Extraction[J]. Journal of Convergence Information Technology, 2010, 5(10): 233-241.
[32] Yongbin Liu, Bingru Yang. Joint Inference: a Statistical Approach for Open Information Extraction[J]. Appl. Math. Inf. 2012, 6(2): 627-633.
[33] James Clarke. Global Inference for Sentence Compression: An Integer Linear Programming Approach[D]. PHD thesis, University of Edinburgh, 2008.[34] Sebastian Riedel. Efficient Prediction of Relational Structure and its Application to Natural Language Processing[D]. PHD thesis, University of Edinburgh, 2009.
[35] Tuyen N. Huynh, Raymond J. Mooney. Online Max-Margin Weight Learning for Markov Logic Networks [C]//Proceedings of the 11th SIAM International Conference on Data Mining, 2011: 642-651.
[36] A Carlson, J. Betteridge, B. Kisiel, et al. Toward an architecture for never-ending language learning[C]//Proceedings of the 24th National Conference on Artificial Intelligence, 2010: 1306-1313.
[37] Thahir Mohamed, Estevam R. Hruschka Jr., Tom M.Mitchell. Discovering Relations between Noun Categories[C]//Proceedings of EMNLP, 2011.
[38] S Schoenmackers. Inference over the web[D]. PHD thesis, University of Washington, 2011.
[39] Fei Wu, Daniel S. Weld. Automatically refining the wikipedia infobox ontology[C]//Proceedings of the 17th International Conference on World Wide Web, 2008.
[40] Congle Zhang, Raphael Hoffmann, Daniel S. Weld. Ontological Smoothing for Relation Extraction with Minimal Supervision[C]//Proceedings of AAAI, 2012.
[41] A Moro, R Navigli. Integrating Syntactic and Semantic Analysis into the Open Information Extraution Paradigm[C]//Proceedings of IJCAI, 2013.
[42] D Roth. On the hardness of approximate reasoning[J]. Artificial Intelligence, 1996, 82:273-302.
[43] V Gogate, P Domingos. Probabilistic theorem proving[C]//Proceedings of the 27th Conference on Uncertainty in Artificial Intelligence, 2011:256-265.
[44] C Kiddon, P Domingos. Coarse-to-fine inference and learning for first-order probabilistic models[C]//Proceedings of the 25th AAAI Conference on Artificial Intelligence, 2011:1049-1056.
[45] P Domingos, Austin Webb. A Tractable First-Order Probabilistic Logic[C]//Proceedings of the 26th AAAI Conference on Artificial Intelligence, 2012.
[46] Chloe Kiddon, Pedro Domingos. Knowledge Extraction and Joint Inference Using Tractable Markov Logic [C]//Proceedings of AKBC-WEKEX at NAACL, 2012: 79-83.

基金

国家“十二五”科技支撑计划(2012BAH14F00),国家自然科学基金(61073123)

PDF(2407 KB)

Accesses

Citation

Detail

段落导航

摘要
Abstract
关键词
Key words
引用本文
参考文献
基金

Received	Published
2012-07-19	2014-04-10
Issue Date
2014-04-10

选择文件类型/文献管理软件名称

选择包含的内容

摘要

Abstract

关键词

Key words

引用本文

{{custom_sec.title}}

{{custom_sec.title}}

参考文献

{{custom_fnGroup.title_cn}}

脚注

基金