一种无结构金融公告多元关系抽取方法

周映彤,孟剑,郭岩,刘悦,贺广福,董琳,程学旗

PDF(5013 KB)
PDF(5013 KB)
中文信息学报 ›› 2022, Vol. 36 ›› Issue (2) : 76-84.
信息抽取与文本挖掘

一种无结构金融公告多元关系抽取方法

  • 周映彤1,孟剑1,郭岩1,刘悦1,贺广福1,董琳2,程学旗1
作者信息 +

Multiple Relationship Extraction from Unstructured Financial Announcements

  • ZHOU Yinɡtonɡ1, MENG Jian1, GUO Yan1, LIU Yue1, HE Guanɡfu1, DONG Lin2, CHENG Xueqi1
Author information +
History +

摘要

金融公告信息披露了企业运营的关键数据,具有应用价值。无结构金融公告中涉及复杂的财务关系,即多元关系。该文设计了基于依存分析树和频繁子图挖掘的垂直域多元关系抽取方法TextMining,可大大降低对数据集的依赖。进一步,受图卷积神经网络启发,该文设计了垂直域优化的FTA-GCN算法。在构建的适用金融公告数据集上,算法较强地关注以金融公告中常见的名词实体为核心的多元关系抽取,实验结果表明,算法具有良好的抽取效果。

Abstract

The financial announcement information discloses the key data of the company's operation, involving complex financial relationships, namely multiple relationships. This paper designs TextMining, a vertical domain multivariate relationship extraction method based on dependency tree and frequent subgraph mining. Furthermore, inspired by the graph convolutional neural network, the FTA-GCN algorithm for vertical domain optimization is designed. In financial announcement dataset constructed in this paper, the algorithm can capture the multiple relationships between the common entities. Indicating that the algorithm has a good extraction performance.

关键词

金融公告 / 关系抽取 / 图卷积

Key words

financial announcement / relation extraction / graph convolution

引用本文

导出引用
周映彤,孟剑,郭岩,刘悦,贺广福,董琳,程学旗. 一种无结构金融公告多元关系抽取方法. 中文信息学报. 2022, 36(2): 76-84
ZHOU Yinɡtonɡ, MENG Jian, GUO Yan, LIU Yue, HE Guanɡfu, DONG Lin, CHENG Xueqi. Multiple Relationship Extraction from Unstructured Financial Announcements. Journal of Chinese Information Processing. 2022, 36(2): 76-84

参考文献

[1] Fintech. 中国量化金融行业白皮书[EB]. http://xlyzg.cn/guona/1526.html. 2020.
[2] Zelenko D, Aone C, Richardella A. Kernel methods for relation extraction[J]. The Journal of Machine Learning Research, 2003: 1083-1106.
[3] Culotta A, Sorensen J. Dependency tree kernels for relation extraction[C]//Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics, 2004.
[4] Bunescu R C, Mooney R J. A shortest path dependency kernel for relation extraction[C]//Proceedings of the Conference on Human Language Technology and Empirical Methods in Natural Language Processing.
[5] 黄瑞红, 孙乐, 冯元勇,等. 基于核方法的中文实体关系抽取研究[J]. 中文信息学报, 2008, 22 (5): 102-108.
[6] Zhang M, Zhang J, Su J, et al. A composite kernel to extract relations between entities with both flat and structured features[C]//Proceedings of the ACL, 2006: 825-832.
[7] Zhou G D, Zhang M, Ji D H, et al. Tree kernel-based relation extraction with context sensitive structured parse tree information[C]//Proceedings of the EMNLP/CoNLL, 2007: 728-736.
[8] Qian L H, Zhou G D, Kong F, et al: Exploiting constituent dependencies for tree kernel-based semantic relation extraction[C]//Proceedings of the 22nd International Conference on Computational Linguistics. Association for Computational Linguistics, 2008: 697-704.
[9] 庄成龙, 钱龙华, 周国栋. 基于树核函数的实体语义关系抽取方法研究[J]. 中文信息学报, 2009, 23 (1): 3-9.
[10] 虞欢欢, 钱龙华, 周国栋,等. 基于合一句法和实体语义树的中文语义关系抽取[J]. 中文信息学报, 2010, 24(5): 17-23.
[11] Hasegawa T,Sekine S, Grishman R. Discovering relations among named entities from large corpora[C]//Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics, 2004.
[12] Stevenson M. An unsupervised wordnet-based algorithm for relation extraction[C]//Proceedings of the 4th International Conference on Language Resources and Evaluation Workshop, Lisbon, Portugal, 2004.
[13] Zhang M,Su J, Wang D, et al. Discovering relations between named entities from a large raw corpus using tree similarity-based clustering[C]//Proceedings of the 2nd International Joint Conference on Natural Language Processing, 2005: 378-389.
[14] Rosenfeld B, Feldman R. Clustering for unsupervised relation identification[C]//Proceedings of the 16th ACM Conference on Information and Knowledge Management. ACM, 2007: 411-418.
[15] Yuhao Zhang, Peng Qi. Graph convolution over pruned dependency trees improves relation extraction[C]//Proceedings of the EMNLP, 2018.
[16] Xifeng Yan, Jiawei Han. gSpan: graph-based substructure pattern mining[C]//Proceedings of the IEEE Computer Society, 2002.
[17] Zhang Yan, Guo Zhijiang, Lu Wei. Attention guided graph convolutional networks for relation extraction[C]//Proceedings of the ACL, 2019.
[18] Jianpeng Cheng, Li Dong, Mirella Lapata. Long short-term memory-networks for machine reading[C]//Proceedings of the EMNLP, 2016.
[19] Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need[C]//Proceedings of the NeurIPS,2017,30.
[20] Huang G, Liu Z, Maaten L V D, et al. Densely connected convolutional networks[C]//Proceedings of the IEEE Computer Society, 2017.
[21] Jeffrey Pennington, Richard Socher, Christopher D M. Glove: global vectors for word representation[C]//Proceedings of the EMNLP, 2014.
[22] Zhanming Jie, Wei Lu. Dependency-guided LSTM-CRF for Named Entity Recognition[C]//Proceedings of the EMNLP, 2019.
[23] Patrick Ernst,Amy Siu, Gerhard Weikum. HighLife: Higher-arity fact harvesting[C]//Proceedings of the WWW, 2018:1013-1022.
[24] Nanyun Peng, Hoifung Poon. Cross-sentence N-ary relation extraction with graph LSTMs[C]//Proceedings of the ACL, 2017.
[25] 王树伟. 面向金融文本的实体识别与关系抽取研究[D]. 哈尔滨,哈尔滨工业大学博士学位论文, 2014.
[26] 杨鹏坤, 彭慧, 周晓锋. 改进的基于频繁模式树的最大频繁项集挖掘算法FP-MFIA[J].计算机应用: 2015,35(3):4.
[27] 李明耀, 杨静. 基于依存分析的开放式中文实体关系抽取方法[J]. 计算机工程: 2016,42(6):201-207.
[28] Zhijiang Guo, Yan Zhang. Densely connected graph convolutional networks for graph-to-sequence learning[C]//Proceedings of the TACL, 2019.
[29] Yujie Qian, Enrico Santus. GraphIE: A graph-based framework for information extraction[C]//Proceedings of the NAACL, 2019.

基金

国家重点研发计划(2017YFB0803302);国家自然科学基金(61802370)
PDF(5013 KB)

1447

Accesses

0

Citation

Detail

段落导航
相关文章

/