实体关系抽取作为信息抽取领域内的重要研究方向,其目的是把无结构或半结构的自然语言文本中所蕴含的实体之间的语义关系抽取成结构化的关系三元组。人物关系抽取是实体关系抽取的细粒度分支,以往的实体关系抽取研究多针对来自新闻或百科的英文短句语料,对于中文文学作品的人物关系抽取的研究刚刚起步。该文针对中长篇中文文学作品的特点,首先引入对抗性学习框架来训练句子级的噪声分类器以降低数据集中人物关系数据噪声,并在此基础上构建了人物关系的分类模型MF-CRC。分类模型首先基于预训练模型BERT抽取文本内容的基本语义特征,并采用BiLSTM模型进行深层语义特征的获取,然后根据中文用语习惯抽取了中文人物姓氏、性别与关系指示特征并进行嵌入表示,最后基于多维特征融合完成了人物关系分类模型的训练。该文选用名著《平凡的世界》《人生》和《白鹿原》为研究对象,首次构建了三个通用的面向中文文学作品的人物关系标签数据集,并在这些数据集上进行对比实验及消融实验。结果表明,该文MF-CFC模型效果高于其他对比模型,分别在Micro-F1和Macro-F1指标上比SOTA模型高出1.92%和2.14%,验证了该方法的有效性。
Abstract
Entity relation extraction aims to extract structured relation triples between entities from unstructured or semi-structured nature language texts. Character relation extraction is a finer-grained branch of entity relation extraction. Focusing on character relation extraction in Chinese literature, we presents a MF-CRC character relation extraction model. We first introduce adversarial learning framework to build the sentence-level noise classifier so as to filter the noise in the dataset. Then BERT and BiLSTM are employed and feature representations of Chinese surnames, gender and relation are designed. The character relation extraction model is finally established by integrating the multi-dimensional features. Experiments on three Chinese classics show that the proposed method outperforms SOTA models by 1.92% and 2.14% in micro-F1 and macro-F1 , respectively.
关键词
实体关系抽取 /
中文文学作品 /
人物关系抽取
{{custom_keyword}} /
Key words
entity relation extraction /
Chinese literature /
character relation extraction
{{custom_keyword}} /
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
参考文献
[1] CHATURVEDI S,IYYER M, DAUME III H. Unsupervised learning of evolving relationships between literary characters[C]//Proceedings of the AAAI Conference on Artificial Intelligence, 2017: 3159-3165.
[2] 赵京胜,张丽,朱巧明,等.中文文学作品中的社会网络抽取与分析[J].中文信息学报,2017,31(02): 99-106,116.
[3] 王一博,俞敬松,赵常煜.共词方法在三国人物关系分析中的应用研究[J].情报探索,2017(07): 52-56.
[4] 杨鑫, 宋卓远, 朱东霖,等. 基于共词分析的小说人物关系研究[J]. 现代计算机, 2019(35): 3-7.
[5] 张旋,梁循,李志宇,等.金庸小说中主角复杂爱情模式的识别与分析[J].中文信息学报,2019,33(04): 109-119.
[6] 陈蕾,胡亦旻,艾苇,等.《红楼梦》中社会权势关系的提取及网络构建[J].中文信息学报,2015,29(05): 185-193.
[7] SURDEANU M, TIBSHIRANI J, NALLAPATI R, et al. Multi-instance multi-label learning for relation extraction[C]//Proceedings of the Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning. Association for Computational Linguistics, 2012: 455-465.
[8] LIN Y, SHEN S, LIU Z, et al. Neural relation extraction with selective attention over instances[C]//Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, 2016: 2124-2133.
[9] LI Y, LONG G, SHEN T, et al. Self-attention enhanced selective gate with entity-aware embedding for distantly supervised relation extraction[C]//Proceedings of the AAAI Conference on Artificial Intelligence, 2020, 34(5): 8269-8276.
[10] QIN P, XU W, WANG W Y. DSGAN: Generative adversarial training for distant supervision relation extraction[C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, 2018: 496-505.
[11] 路遥. 平凡的世界[M]. 北京: 北京十月文艺出版社, 2009.
[12] 路遥. 人生[M]. 北京: 北京十月文艺出版社, 2010.
[13] 陈忠实. 白鹿原[M]. 北京: 人民文学出版社, 1993.
[14] LI S, ZHAO Z, HU R, et al. Analogical reasoning on Chinese morphological and semantic relations[C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, 2018: 138-143.
[15] ZENG D, LIU K, LAI S, et al. Relation classification via convolutional deep neural network[C]//Proceedings of COLING, the 25th International Conference on Computational Linguistics: Technical Papers, 2014: 2335-2344.
[16] 赵娜.姓名与性别意识[J].现代妇女(下旬),2014(07): 31-32.
[17] 杨秀喜.漫谈小说中人物关系的设计[J].写作,2007(08): 17-18.
[18] GAUT A, SUN T, TANG S, et al. Towards understanding gender bias in relation extraction[J]. Association for computational linguistics, 2020: 2943-2953.
[19] MIKOLOV T, YIH W, ZWEIG G. Linguistic regularities in continuous space word representations[C]//Proceedings of the HLT-NAACL,2013: 746-751.
[20] LI S, ZHAO Z, HU R, et al. Analogical reasoning on Chinese morphological and semantic relations[C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, 2018: 138-143.
[21] SRIVASTAVA N, HINTON G,KRIZHEVSKY A, et al. Dropout: A simple way to prevent neural networks from overfitting[J]. The Journal of Machine Learning Research, 2014, 15(1): 1929-1958.
[22] LOSHCHILOV I, HUTTER F. Fixing weight decay regularization in adam[J]. arXiv preprint arXiv: 1711.0510, 2017.
[23] ZHOU P, SHI W, TIAN J, et al. Attention-based bidirectional long short-term memory networks for relation classification[C]//Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, 2016: 207-212.
[24] WU S, HE Y. Enriching pre-trained language model with entity information for relation classification[C]//Proceedings of the 28th ACM International Conference on Information and Knowledge Management, 2019: 2361-2364.
[25] 万莹,孙连英,赵平,等.基于信息增强BERT的关系分类[J].中文信息学报,2021,35(03): 69-77.
[26] HAN X, GAO T, LIN Y, et al. More data, more relations, more context and more openness: A review and outlook for relation extraction[C]//Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing, 2020: 745-758.
[27] 鄂海红, 张文静, 肖思琪, 等. 深度学习实体关系抽取研究综述[J]. 软件学报, 2019,30(06): 1793-1818.
[28] 李冬梅, 张扬, 李东远, 等. 实体关系抽取方法研究综述[J]. 计算机研究与发展, 2020,57 (07): 1424-1448.
[29] 杨穗珠, 刘艳霞, 张凯文, 等. 远程监督关系抽取综述[J]. 计算机学报,2021,44(08): 1636-1660.
{{custom_fnGroup.title_cn}}
脚注
{{custom_fn.content}}
基金
国家重点研发计划项目与国家自然科学基金(2021QY2102,62172089,61972087,62172090,62106045,62172458)
{{custom_fund}}