开放关系抽取(Open Relation Extraction, OpenRE)旨在从开放域语料库中抽取关系事实。大多数OpenRE方法通常局限于无监督方法提取命名实体之间的关系模式,然后将语义等价的模式聚类成一个关系簇,但由于缺少监督信息且聚类精度较低,影响了最终的关系抽取效果。为了进一步提高聚类性能,该文提出一种无监督集成聚类框架(Unsupervised Ensemble Clustering,UEC),它将无监督集成学习与基于信息度量的多步聚类算法相结合自主创建高质量伪标签,并以此作为监督信息改进关系特征的学习,从而引导聚类过程,获得更好的标签质量,最后通过多次迭代聚类发现文本中的关系类型。在FewRel和NYT-FB数据集上的实验结果表明,该文方法优于其他主流的基线OpenRE模型,F1值分别达到了65.2%和67.1%。
Abstract
Open relation extraction (OpenRE) aims to extract relations for facts from open domain corpus. Most OpenRE methods are unsupervised methods to cluster semantically equivalent patterns into a relation cluster. To further improve the clustering performance, we proposed an unsupervised ensemble clustering framework(UEC), which combines unsupervised ensemble learning with iterative clustering algorithm based on information measurement to create high-quality labels. Such high-quality label can be used as supervised information to improve the feature learning and the clustering process to obtain better labels. Finally, through multiple iterative clustering, the relational types in the text can be effectively discovered. The experimental results on FewRel and NYT-FB datasets show that UEC is superior to other mainstream OpenRE models, with F1 score reaching 65.2% and 67.1%, respectively.
关键词
开放关系抽取 /
集成聚类 /
伪标签
{{custom_keyword}} /
Key words
open relation extraction /
ensemble clustering /
pseudo label
{{custom_keyword}} /
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
参考文献
[1] Mintz M , Bills S, Snow R, et al. Distant supervision for relation extraction without labeled data[C]//Proceedings of the 47th Annual Meeting of the ACL and the 4th IJCNLP of the AFNLP. Suntec, Singapore: ACL and AFNLP, 2009: 1003-1011.
[2] Yates A, Banko M, Broadhead M, et al. TextRunner: open information extraction on the web[C]//Proceed- ings of Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics. Rochester, New York, USA: Association for Computational Linguistics, 2007: 25-26.
[3] Fader A, Soderland S G, Etzioni O W. Identifying relations for open information extraction[C]//Proceed- ings of the Conference on Empirical Methods in Natural Language Processing. John McIntyre Conference Centre, Edinburgh, UK: Association for Computational Linguistics, 2011: 1535-1545.
[4] Banko M, Etzioni O. The tradeoffs between open and traditional relation extraction[C]//Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics. Columbus, Ohio, USA: Association for Computational Linguistics, 2008: 28-36.
[5] Yao L , Haghighi A , Riedel S , et al. Structured relation discovery using generative models[C]// Proceedings of the Conference on Empirical Methods in Natural Language Processing. John McIntyre Conference Centre, Edinburgh, UK: Association for Computational Linguis- tics, 2011: 1456-1466.
[6] Yao L, Riedel S, Mccallum A.Unsupervised relation discovery with sense disambiguation[C]// Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics. Jeju Island, Korea: Association for Computational Linguistics, 2012: 712-720.
[7] Marcheggiani D , Titov I. Discrete-state variational auto encoders for joint discovery and factorization of relations[J]. Transactions of the Association for Computational Linguistics, 2016, 4(2): 231-244.
[8] Elsahar H , Demidova E , Gottschalk S , et al. Unsupervised open relation extraction[C]//Proceedings of European Semantic Web Conference. Springer, Cham, 2017: 12-16.
[9] Wu R , Yao Y , Han X, et al. Open relation extraction: relational knowledge transfer from supervised data to unsupervised data[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing. Hong Kong, China: Association for Computational Linguistics, 2019: 219-228.
[10] Hu X, Wen L, Xu Y, et al. SelfORE: self-supervised relational feature learning for open relation extraction[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 2020: 3673-3682.
[11] Banko M, Cafarella M J, Soderland S, et al. Open information extraction from the web[C]// Proceedings of the 20th International Joint Conference on Artificial Intelligence. Hyderabad, India: Morgan Kaufmann Publishers Inc,2007: 2670-2676.
[12] Cui L, Wei F, Zhou M. Neural open information extraction[C]// Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics. Melbourne, Australia: Association for Computational Linguistics,2018: 407-413.
[13] Jia S, Yang X, Chen X. Supervised neural models revitalize the open relation extraction[J/OL]. arXiv preprint arXiv: 1809.09408,2018.
[14] Gabriel S, Julian M, Luke Z,et al. Supervised open information extraction[C]//Proceedings of NAACL-HLT. New Orleans, Louisiana: Association for Computational Linguistics, 2018: 885-895.
[15] Brin S.Extracting patterns and relations from the world wide web[C]//Proceedings of International Workshop on the World Wide Web and Databases. Berlin, Heidelberg: Springer-Verlag,1998: 172-183.
[16] Gao T, Han X, Xie R, et al. Neural snowball for few-shot relation learning[C]//Proceedings of the
AAAI Conference on Artificial Intelligence, 2020, 34(5): 7772-7779.
[17] Lin D, Pantel P. Dirt-discovery of inference rules from text[C]//Proceedings of the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. San Francisco, USA: ACM. 2001: 323-328.
[18] Xie J,Girshick R, Farhadi A. Unsupervised deep embedding for clustering analysis[C]//Proceedings of the 33th International Conference on Machine Learning. New York, USA: JMLR,2016: 478-487.
[19] Gupta D, Ramjee R, Kwatra N, et al. Unsupervised clustering using pseudo-semi-supervised learning[C]// Proceedings of International Conference on Learning Representations. Addis Ababa, Ethiopia, 2020.
[20] Tran T T , Le P , Ananiadou S. Revisiting unsupervised relation extraction[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics,2020: 7498-7505.
[21] Peng H, Gao T , Han X , et al. Learning from context or names? an empirical study on neural relation extraction[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 2020: 3661-3672.
[22] Simon , Guigue V, Piwowarski B. Unsupervised information extraction: regularizing discriminative approaches with relation distribution losses[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Florence, Italy: Association for Computational Linguistics, 2019: 1378-1387.
[23] Tran T T, Le P, Ananiadou S. Revisiting unsupervised relation extraction[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020: 7498-7505.
{{custom_fnGroup.title_cn}}
脚注
{{custom_fn.content}}
基金
山西省重点研发计划(重点)高新领域项目(201703D111027);山西省重点研发计划项目(201803D121048);山西省重点研发计划项目(201803D121055)
{{custom_fund}}