该文研究了基于HowNet的KDML语法体系的术语DEF自动生成问题,提出一种基于树形解码器的生成方法。在编码器端输入专业术语以及其他外部信息(术语的定义、术语子词的义原等);在解码器端交替使用义原解码器和关系解码器,同时使用注意力机制关注编码器端的各种表征信息,最终得到“义原-关系-义原”形式的输出,并组合成术语对应的义原树,进而得到术语的DEF表示以辅助专业领域HowNet的构建,最终取得了首义原F1值74.13%、总义原F1值53.92%、总关系F1值53.33%、总三元组F1值30.48%的结果。
Abstract
This paper investigates the automatic generation of DEF based on KDML of HowNet, and proposes a generation method based on tree-structured decoder. The inputs of the encoder are technical terms and other external information (definition of the terms, sememes of sub-words of the terms, etc.). As for decoding, sememe decoder and role decoder are used alternately, and attention mechanism is used to capture various representation information. Finally, the output in the form of "sememe-role-sememe" is obtained, which is combined into the sememe tree corresponding to terms to finalize the DEF representation of terms in HowNet. Experimental results show that the proposed method achieves 74.13% F1-value for the first sememe generation, 53.92% for the overall sememe generation, 53.33% for the role generation and 30.48% for the overall triple generation.
关键词
知网 /
DEF生成 /
树形结构解码
{{custom_keyword}} /
Key words
HowNet /
DEF generation /
tree-structured decoder
{{custom_keyword}} /
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
参考文献
[1] DONG Z, DONG Q. HowNet: A hybrid language and knowledge resource[C]//Proceedings of the International Conference on Natural Language Processing and Knowledge Engineering, 2003: 820-824.
[2] NIU Y, XIE R, LIU Z, et al. Improved word representation learning with sememes[C]//Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, 2017: 2049-2058.
[3] 刘群, 李素建. 基于《知网》的词汇语义相似度计算[J]. 中文计算语言学, 2002, 7(2): 59-76.
[4] 江敏, 肖诗斌, 王弘蔚, 等. 一种改进的基于《知网》 的词语语义相似度计算[J]. 中文信息学报, 2008, 22(5): 84-89.
[5] LIU Q. Word similarity computing based on HowNet[J]. Computational Linguistics and Chinese Language Processing, 2002, 7(2): 59-76.
[6] DUAN X, ZHAO J, XU B. Word sense disambiguation through sememe labeling[C]//Proceedings of the IJCAI, 2007: 1594-1599.
[7] HUANG M, YE B, WANG Y, et al. New word detection for sentiment analysis[C]//Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, 2014: 531-541.
[8] 傅继彬, 刘杰, 贾可亮, 等. 基于知网和术语相关度的本体关系抽取研究[J]. 现代图书情报技术, 2008,24 (9): 36-40.
[9] 张桂平, 刁丽娜, 王裴岩. 基于 HowNet 的航空术语语义知识库的构建[J]. 中文信息学报, 2014, 28(5): 92-101.
[10] 王思博, 王裴岩, 张桂平. 航空术语语义知识库辅助构建方法[J]. 中文信息学报, 2018, 32(12): 57-66.
[11] XIE R, YUAN X, LIU Z, et al. Lexical sememe prediction via word embeddings and matrix factorization[C]//Proceedings of the IJCAI, 2017: 4200-4206.
[12] JIN H, ZHU H, LIU Z, et al. Incorporating Chinese characters of words for lexical sememe prediction[C]//Proceedings of the ACL, 2018.
[13] 杜家驹, 岂凡超, 孙茂松, 等. 基于局部语义相关性的定义文本义原预测[J]. 中文信息学报, 34(5): 1-9.
[14] QI F, LIN Y, SUN M, et al. Cross-lingual lexical sememe prediction[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2018: 358-368.
[15] DONG L, LAPATA M. Language to logical form with neural attention[C]//Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, 2016: 33-43.
[16] RABINOVICH M, STERN M,KLEIN D. Abstract syntax networks for code generation and semantic parsing[C]//Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, 2017: 1139-1149.
[17] SCHUSTER M, PALIWAL K K. Bidirectional recurrent neural networks[J]. IEEE Transactions on Signal Processing, 1997, 45(11): 2673-2681.
[18] HOCHREITER S, SCHMIDHUBER J. Long short-term memory[J]. Neural Computation, 1997, 9(8): 1735-1780.
[19] BAHDANAU D, CHO K, BENGIO Y. Neural machine translation by jointly learning to align and translate[J]. arXiv preprint arXiv: 1409.0473, 2014.
[20] LUONG M T, PHAM H, MANNING C D. Effective approaches to attention-based neural machine translation[J]. arXiv preprint arXiv: 1508.04025, 2015.
[21] XU K, BA J, KIROS R, et al. Show, attend and tell: Neural image caption generation with visual attention[C]//Proceedings of the International Conference on Machine Learning. PMLR, 2015: 2048-2057.
[22] VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]//Proceedings of the Advances in Neural Information Processing Systems, 2017: 5998-6008.
[23] QI F, YANG C, LIU Z, et al. Openhownet: An open sememe-based lexical knowledge base[J/OL]. arXiv preprint arXiv: 1901.09957, 2019.
[24] WILLIAMS R J, ZIPSER D. A learning algorithm for continually running fully recurrent neural networks[J]. Neural Computation, 1989, 1(2): 270-280.
[25] LOSHCHILOV I, HUTTER F. Decoupled weight decay regularization[J/OL]. arXiv preprint arXiv: 1711.05101, 2017.
{{custom_fnGroup.title_cn}}
脚注
{{custom_fn.content}}
基金
国家自然科学基金(U1908216);辽宁省重点研发计划(2019JH2/10100020)
{{custom_fund}}