在基于神经网络的依存句法分析中,对分析栈和决策层信息的表示和利用依然有值得深入研究的空间。针对分析栈的表示,已有工作并没有对单棵依存子树独立编码的表示,导致无法利用各个依存子树的局部特征;也没有对生成的依存弧序列进行编码,导致无法利用依存弧的全局信息。针对决策层的表示,已有工作利用MLP预测转移动作,该结构无法利用历史决策动作的信息。对此,该文提出基于多特征融合编码的神经网络依存句法分析模型,基于依存子树和历史生成的依存弧表示分析栈,利用TreeLSTM网络编码依存子树信息,利用LSTM网络编码历史生成的依存弧序列,以更好地表示分析栈的局部信息和全局信息。进一步提出基于LSTM网络的结构预测转移动作序列,引入历史决策动作信息作为特征辅助当前决策。该文以汉语为具体研究对象,在CTB5汉语依存分析数据上验证所提出的多特征融合编码的神经网络模型。实验结果显示,汉语依存句法分析性能得到改进,在目前公布的基于转移的分析系统中取得最好成绩,在UAS和LAS评价指标上分别达到87.8%和86.8%的精度,表明所提出的对依存子树局部特征及历史依存弧信息和历史决策动作信息的编码方法,在改进依存分析模型性能方面的有效性。
Abstract
For neural networks based dependency parsing, this paper presents a novel architecture for transition-based dependency parsing leveraging fused multi-feature encoding. We model the stack states based on subtrees representations and encode structural dependency subtrees with TreeLSTM. Particularly, we propose a LSTM-based technique to encode the historical parsed dependency arcs and states as global features. Finally, based on fused multi-feature encoding, we combine the extracted local features and global features for parsing decision. Experiments on Chinese Penn TreeBank (CTB5) show that our parser reaches 87.8% (unlabeled) and 86.8% (labeled) attachment accuracy with a greedy strategy, which effectively improves neural transition-based dependency parsing.
关键词
依存句法分析 /
多特征融合编码 /
依存子树 /
TreeLSTM神经网络
{{custom_keyword}} /
Key words
dependency parsing /
multi-feature encoding /
dependency subtree /
TreeLSTM neural network
{{custom_keyword}} /
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
参考文献
[1] Su N K,Baldwin T.Interpreting semantic relations in noun compounds via verb semantics [C]//Proceedings of International Conference on Computational Linguistics and Meeting of the Association for Computational Linguistics,Sydney,Australia,2006: 17-21.
[2] 高源,席耀一,李弼程.基于依存句法分析与分类器融合的触发词抽取方法[J].计算机应用研究,2016,33(5):1407-1410.
[3] 胡禹轩.基于依存句法分析的语义角色标注[D].哈尔滨工业大学硕士学位论文,2009.
[4] Nivre J.An efficient algorithm for projective dependency parsing [C]//Proceedings of the 8th International Workshop on Parsing Technologies (IWPT),2003: 149-160.
[5] Yamada H,Matsumoto Y.Statistical dependency analysis with support vector machines [C]//Proceeding of International Workshop on Parsing Technologies(IWPT).2003:195-206.
[6] McDonald R.Discriminative learning and spanning tree algorithms for dependency parsing [D].University of Pennsylvania PHD Thesis,2006.
[7] Zhang Y,Nivre J.Transition-based dependency parsing with rich non-local features [C]//Proceedings of Meeting of the Association for Computational Linguistics: Human Language Technologies: Short Papers.Association for Computational Linguistics,2011:188-193.
[8] He H,Daum III H,Eisner J.Dynamic feature selection for dependency parsing [C]//Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing,2013: 1455-1464.
[9] Chen D,Manning C.A fast and accurate dependency parser using neural networks [C]//Proceedings of Conference on Empirical Methods in Natural Language Processing,2014:740-750.
[10] Dyer C,et al.Transition-based dependency parsing with stack long short-term memory [J].Computer Science,2015,37(2):321 332.
[11] Kiperwasser E,Goldberg Y.Simple and accurate dependency parsing using bidirectional LSTM feature representations [J].Transactions of the Association of Computational Linguistics,2016,4(1): 313-327.
[12] Wang Y,et al.A neural transition-based approach for semantic dependency graph parsing [C]//Proceedings of AAAI.2018.
[13] Tai K S,Socher R,Manning C D.Improved semantic representations from tree-structured long short-term memory networks [J].Computer Science,2015,5(1): 36.
[14] Schmidhuber J,Hochreiter S.Long short-term memory [J].Neural Computation,1997,9(8): 1735-1780.
[15] Mikolov T,Chen K,Corrado G,et al.Efficient estimation of word representations in vector space[J].arXiv preprint arXiv:1301.3781,2013.
[16] Kingma D P,Ba J.Adam: A method for stochastic optimization[J].arXiv preprint arXiv:1412.6980,2014.
[17] Srivastava N,et al.Dropout: A simple way to prevent neural networks from overfitting[J].Journal of Machine Learning Research,2014,15(1):1929-1958.
[18] Ioffe S,Szegedy C.Batch normalization: Accelerating deep network training by reducing internal covariate shift[C]//Proceedings of International Conference on International Conference on Machine Learning.JMLR.org,2015.
[19] Ballesteros M,et al.Training with exploration improves a greedy stack LSTM parser[C]//Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing,2016: 2005-2010.
{{custom_fnGroup.title_cn}}
脚注
{{custom_fn.content}}
基金
中央高校基本科研业务费专项资金(2018YJS025,2015JBM033);国家自然科学基金(61370130,61473294);科学技术部国际科技合作计划(K11F100010);国家自然科学基金(61876198)
{{custom_fund}}