该文通过详细分析La格的用法及特征,在研制La格浅层语义标记规范的基础上,提出一种端到端的长短时记忆神经网络藏文La格浅层语义分析方法,该方法首先借鉴LSTM的设计思路,通过在LSTM的垂直方向上装置一个新颖的“门控高速连接”机制(Gated high-speed connection mechanism,GM),学习了输入句子的时序语义特征。GM包含对单元内部输入和输出的线性连接,使信息可以通畅地在不同层之间传播;然后使用Softmax计算每一时刻语义标签的局部归一化分布,以供输出层进行约束解码;最后使用维特比算法进行解码时通过强制执行该文设定的BIO和La格浅层语义标注约束,规范了输出语义标签之间的结构关系。虽然这个模型比较简单,不需要输入任何额外特征,但取得了理想的实验结果,在测试集上的藏文La格浅层语义分析准确率达到90.59%。
Abstract
Based on the detailed analysis of the usage and characteristics of La case and the designed shallow semantic annotation standard of La, this paper proposes an end-to-end LSTM based shallow semantic analysis method for Tibetan La lattice. Firstly, this method proposes a novel “gated high-speed connection” mechanism (GM) in the vertical direction of LSTM to allow the input information spread into between different layers. Then, softmax is used to calculate the local normalized distribution of semantic tags at each time. Finally, when decoding with Viterbi algorithm, the structural relationship between output semantic tags is realized by enforcing the constraints of BIO and La case set in this paper. Experimental results demonstate an accuracy of 90.59% for shallow semantic analysis of Tibetan La case.
关键词
自然语言处理 /
La格 /
浅层语义分析 /
门控高速连接 /
约束解码
{{custom_keyword}} /
Key words
NLP /
La case /
shallow semantic analysis /
gated high-speed connection /
constraint decoding
{{custom_keyword}} /
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
参考文献
[1] 吉太加.藏语语法疑难释义[M].北京: 民族出版社, 2017.
[2] 梁志剑,郝淼.基于改进深度注意神经网络的语义角色标注[J].计算机工程与设计, 2020, 41(08): 2327-2331.
[3] 徐建国,刘泳慧,刘梦凡.基于BiLSTM-CRF的高校政策语义角色标注研究[J].计算机工程与应用, 2021, 57(06): 207-211.
[4] ZHOU J, XU W. End-to-end learning of semantic role labeling using recurrent neural networks[C]//Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing. Beijing, 2015: 1127-1137.
[5] HE L, KENTON L, MIKE L, et al. Deep semantic role labeling: What works and what’s next[C]//Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics. Vancouver, 2017: 473-483.
[6] 王明轩,刘群.基于深度神经网络的语义角色标注[J].中文信息学报, 2018, 32(02): 50-57.
[7] TAN Z, WANG M, XIE J, et al. Deep semantic role labeling with self-attention[C]//Proceedings of the AAAI Conference on Artificial Intelligence. New Orleans, 2018: 4929-4936.
[8] HE L, KENTON L, OMER L, et al. Jointly predicting predicates and arguments in neural semantic role labeling[C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics. Melbourne, 2018: 364-369.
[9] 班玛宝,才让加,张瑞,等.融合双通道音节特征的藏文La格例句自动分类模型[J].北京大学学报(自然科学), 2022, 58(01): 91-98.
[10] PALMER M, GILDEA D, KINGSBURY P. The proposition bank: An annotated corpus of semantic roles[J]. Computational Linguistics, 2005, 31(1): 71-106.
[11] MEYERS A, REEVERS R, MACLEOD C, et al. Annotatingnoun argument structure for NomBank[C]//Proceedings of the Language Resources and Evaluation Conference. Lisbon, 2004: 803-806.
[12] BAKER C F, FILLMORE C J, LOWE J B. The Bekeley framenet project[C]//Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics. Montreal, 1998: 86-90.
[13] XUE N, PALMER M, Annotating the propositions in the Penn Chinese Treebank[C]//Proceedings of the 2nd SIGHAN Workshop on Chinese Language Proccessing. Sapporo, 2003: 47-54.
[14] XUE N. Annotating the predicate: Argument structure of Chinese nominalzations[C]//Proceeding of the Language Resources and Evaluation Conference. GENOA, 2006: 1382-1387.
[15] 杨敏,常宝宝.基于北大网库的语义角色分类[C]//第五届全国青年计算语言学研讨会论文集, 2010.
[16] 刘亚慧,杨浩苹,李正华,等.一种轻量级的汉语语义角色标注规范[J].中文信息学报, 2020, 34(04): 10-20.
[17] PENNINGTON J, SOCHER R, MANNING C. Glove: Global vectors for word representation[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing. Doha, 2014: 1532-1543.
[18] YARIN G, ZOUBIN G. A theoretically grounded application of dropout in recurrent neural networks[C]//Proceedings of the 30th International Conference on Neural Information Processing Systems. Barcelona, 2016: 1027-1035.
[19] 韩永鹏,陈彩,苏航,等.融合通道特征的混合神经网络文本分类模型[J].中文信息学报, 2021, 35(2): 78-88.
{{custom_fnGroup.title_cn}}
脚注
{{custom_fn.content}}
基金
国家自然科学基金(61866032,619660316,62206146);青海省重点研发项目(2022-GX-104)
{{custom_fund}}