探究复述策略对获取实体属性槽“源信息”的意义

宋睿,陈鑫,洪宇

PDF(5475 KB)
PDF(5475 KB)
中文信息学报 ›› 2019, Vol. 33 ›› Issue (7) : 88-100.
信息抽取与文本挖掘

探究复述策略对获取实体属性槽“源信息”的意义

  • 宋睿,陈鑫,洪宇
作者信息 +

A Study on Paraphrasing for Entity-Slot Provenance Acquisition

  • SONG Rui, CHEN Xin, HONG Yu
Author information +
History +

摘要

实体属性槽填充是一种抽取命名实体特定属性(slot)实例(也称槽值,即filler)的自然语言处理研究。其中,“源信息”特指属性实例的来源,即一段或一句佐证实例正确反映属性的文本片断。观测语料可以发现,实体属性源信息中存在大量同质异构现象,即复述现象。因此,该文结合复述技术与现有知识库,探究了复述识别模型在仅有小规模种子“源信息”的基础上,对于实体属性槽源信息分类的有效性。实验证明,基于树编辑模型的复述识别方法在先验知识较少的情况下,能够很好地捕获实体属性的相关“源信息”。

Abstract

Slot Filling is to extract the value of specific slot (also called filler)for a named entity. Provenance refers to the source of the fillers, which it is usually a passage or a sentence used to prove that the filler correctly reflects the slot type. It is revealed in the corpus that there are many homogeneity in the provenance of slot filling, which is called paraphrase. Therefore, we combine the paraphrase technique with the existing knowledge base to explore the provenance identification via the paraphrase identification model derived from small-scale seed “provenance”. The results show that the paraphrase identification method based on tree edit model can capture the relevant “provenance” of slot fillers well with less prior knowledge.

关键词

槽填充 / 复述 / 树编辑模型

Key words

slot filling / paraphrasing / tree edit model

引用本文

导出引用
宋睿,陈鑫,洪宇. 探究复述策略对获取实体属性槽“源信息”的意义. 中文信息学报. 2019, 33(7): 88-100
SONG Rui, CHEN Xin, HONG Yu. A Study on Paraphrasing for Entity-Slot Provenance Acquisition. Journal of Chinese Information Processing. 2019, 33(7): 88-100

参考文献

[1] Yao X,Van Durme B,Callison-Burch C,et al.Answer extraction as sequence tagging with tree edit distance[C]//Proceedings of North American Chapter of the Association for Computational Linguistics.Atlanta,Georgia:The Association for Computational Linguistics,2013:858-867.
[2] Mintz M,Bills S,Snow R,et al.Distant supervision for relation extraction without labeled data[C]//Proceedings of the 47th Annual Meeting of the Association for Computational Linguistics and the 4th IJCNLP of the AFNLP.Singapore:The Association for Computational Linguistics,2009:1003-1011.
[3] Riedel S,Yao L,Mc Callum A.Modeling relations and their mentions without labeled text[J].Machine Learning and Knowledge Discovery in Databaes,2010,6323:148-163.
[4] Angeli G,Tibshirani J,Wu J,et al.Combining distant and partial supervision for relation extraction[C]//Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing.Doha,Qatar:The Association for Computational Linguistics,2014:1556-1567.
[5] Angeli G,Zhong V,Chen D,et al.Bootstrapped self training for knowledge base population[C]//Proceedings of the 8th Text Analysis Conference.Gaithersburg,Maryland:NIST,2015.
[6] Roller R,Agirre E,Soroa A,et al.Improving distant supervision using inference learning[C]//Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing.Beijing,China:2015:273-278.
[7] Bing L,Chaudhari S,Wang R C,et al.Improving distant supervision for information extraction using label propagation through lists[C]//Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing.Lisbon,Portugal:The Association for Computational Linguistics,2015:524-529.
[8] Min B,Li X,Grishman R,et al.New York University 2012 System for KBP Slot Filling[C]//Proceedings of the 5th Text Analysis Conference.Gaithersburg,Maryland NIST,2012.
[9] Li Y,Zhang Y,Li D,et al.PRIS at knowledge base population 2013[C]//Proceedings of the 6th Text Analysis Conference.Gaithersburg,Maryland:NIST,2013.
[10] Byrne L,Dunnion J.UCD IIRG at TAC 2010 KBP slot filling task[C]//Proceedings of the 3rd Text Analysis Conference.Gaithersburg,Maryland:NIST,2010.
[11] Yu D,Ji H.Unsupervised person slot filling based on graph mining[C]//Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics.Berlin,Germany,2016:44-53.
[12] 赵世奇,刘挺,李生.复述技术研究[J].软件学报,2009,20(8):2124-2137.
[13] Romano L,Kouylekov M,Szpektor I,et al.Investigating a generic paraphrase-based approach for relation extraction[C]//Proceedings of 11th Conference of the European Chapter of the Association for Computational Linguistics.Trento,Italy,2006:409-416.
[14] Bille P.A survey on tree edit distance and related problems[J].Theoretical Computer Science,2005,337(1):217-239.
[15] K Zhang,D Shasha.Simple fast algorithms for the editing distance between trees and related problems[J].SIAM J.Comput.,1989,18 (6):1245-1262.
[16] McDonald R,Lerman K,Pereira F.Multilingual dependency analysis with a two-stage discriminative parser[C]//Proceedings of the 10th Conference on Computational Natural Language Learning.Association for Computational Linguistics.New York City,USA,2006:216-220.

基金

国家自然科学基金(61672367,61672368,61773276)
PDF(5475 KB)

699

Accesses

0

Citation

Detail

段落导航
相关文章

/