为了提高自然语言处理的准确度,很多工作将句法成分树与LSTM相结合,提出了各种针对成分树的LSTM模型(文中用C-TreeLSTM统称这类模型)。考虑到C-TreeLSTM模型在计算内部节点隐藏状态的过程中,由于一个重要信息来源(即单词)的缺失导致文本建模的准确度不高,该文提出一种针对成分树的混合神经网络模型,通过在C-TreeLSTM模型的节点编码过程中注入各节点所覆盖的短语语义向量来增强节点对文本语义的记忆,故将此模型命名为SC-TreeLSTM。实验结果表明,该模型在情感分类和机器阅读理解两类任务上表现优异。
Abstract
Current methods of combining constituent trees with LSTM (C-TreeLSTM) suffere from low accuracy for text modeling due to withouth computing the words in hidden state of internal nodes. This paper proposes a hybrid neural network model, i.e. SC-TreeLSTM, based on the constituent tree structure. The model enhances nodes memory of text semantics by injecting phrase semantic vectors which is covered by corresponding node during encoding. The experimental results show that the SC-TreeLSTM achieves excellent performance in both sentiment classification and machine reading comprehension tasks.
关键词
成分树 /
C-TreeLSTM /
短语语义向量 /
混合模型
{{custom_keyword}} /
Key words
constituent tree /
C-TreeLSTM /
phrase semantic vector /
hybrid model
{{custom_keyword}} /
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
参考文献
[1] Peter W Foltz, Walter Kintsch, Thomas K Landauer. The measurement of textual coherence with latent semantic analysis[J]. Discourse Processes, 1998, 25(2-3): 285-307.
[2] Elman J L. Finding structure in time[J]. Cognitive Science, 1990, 14(2): 179-211.
[3] Mikolov T A. Statistical language models based on neural networks[C]//Proceedings of the Presentation at Google, Mountain View, 2012.
[4] 霍欢,张薇,刘亮,等.一种针对句法树的混合神经网络模型[J].中文信息学报, 2017, 31(06): 58-66.
[5] Miwa M, Bansal M. End-to-end relation extraction using LSTMs on sequences and tree structures[C]//Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, 2016: 1105-1116.
[6] Chen H, et al. Improved neural machine translation with a syntax-aware encoder and decoder[C]//Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, 2017: 1936-1945.
[7] Gers F A, Schmidhuber J. Recurrent nets that time and count[C]//Proceedings of Ieee-Inns-Enns International Joint Conference on Neural Networks. IEEE Computer Society, 2000: 189-194.
[8] Zhu X, Sobhani P, Guo H. Long short-term memory over tree structures[J]. arXiv preprint arXiv: 1503.04881, 2015.
[9] Kai Sheng Tai, Richard Socher, Christopher D. Manning. Improved semantic representations from tree-structured long short-term memory networks[C]//Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics, 2015: 1556-1566.
[10] Phong Le, Willem H Zuidema. Compositional distributional semantics with long short term memory[C]//Proceedings of the 4th Joint Conference on Lexical and Computational Semantics, 2015: 10-19.
[11] Teng Z,Zhang Y.Bidirectional tree-structured LSTM with head lexicalization[J].arXiv preprint arXiv: 1611.06788, 2016.
[12] Graves A, Jaitly N, Mohamed A R. Hybrid speech recognition with deep bidirectional LSTM[C]//Proceedings of Automatic Speech Recognition and Understanding. IEEE, 2014: 273-278.
[13] Pennington J, Socher R, Manning C. Glove: Global vectors for word representation[C]//Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, 2014(14): 1532-1543.
[14] Mikolov T, Sutskever I, Chen K, et al. Distributed representations of words and phrases and their compositionality[J].Advances in Neural Information Processing Systems, 2013(26): 3111-3119.
[15] Yoon Kim. Convolutional neural networks for sentence classification[C]//Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, 2014(14): 1746-1751.
[16] Huo Huan, et al. Collaborative filtering recommendation model based on convolutional denoising auto encoder[C]//Proceedings of the 12th Chinese Conference on Computer Supported Cooperative Work and Social Computing (ChineseCSCW 17). ACM, 2017: 64-71.
[17] Cross J, Huang L. Span-based constituency parsing with a structure-label system and provably optimal dynamic oracles[C]//Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, 2016: 1-11.
[18] Socher R, et al. Recursive deep models for semantic compositionality over a sentiment treebank[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2013: 1631-1642.
[19] Klein D, Manning C D. Accurate unlexicalized parsing[C]//Proceedings of Meeting on Association for Computational Linguistics, 2003: 423-430.
[20] Kingma D P, Ba J. Adam: A method for stochastic optimization[J]. arXiv preprint arXiv: 1412.6980: 1-15,2014.
[21] Jiwei Li, et al. When are tree structures necessary for deep learning of representations? [C]//Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, 2015: 2304-2314.
[22] Ling Gan, Houyu Gong. Text sentiment analysis based on fusion of structural information and Serialization information[C]//Proceedings of the 8th International Joint Conference on Natural Language Processing, 2017: 336-341.
[23] Seo M, et al. Bidirectional attention flow for machine comprehension[J]. arXiv preprint arXiv: 1611.01603,2016.
[24] Pranav Rajpurkar, Jian Zhang, Konstantin Lopyrev. SQuAD: 100, 000+ questions for machine comprehension of text[C]//Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, 2016: 2383-2392.
[25] Christopher D Manning, Mihai Surdeanu, John Bauer. The Stanford CoreNLP natural language processing toolkit[C]//Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, 2014: 55-60.
[26] Alexandros Potamianos, Filippos Kokkinos. Structural attention neural networks for improved sentiment analysis[C]//Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics, 2017: 586-591.
[27] Ma M, et al. Dependency-based convolutional neural networks for sentence embedding[C]//Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics, 2015: 174-179.
[28] Huadong Chen,et al.Improved neural machine translation with a syntax-aware encoder and decoder[C]//Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, 2017: 1936-1945.
[29] Lili Mou, et al. Discriminative neural sentence modeling by tree-based convolution[C]//Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, 2015: 2315-2325.
[30] Mou L, et al. Backward and forward language modeling for constrained sentence generation[J]. arXiv preprint arXiv: 1604.0006: 473-482, 2016.
[31] Jin Wang, et al. Dimensional sentiment analysis using a regional CNN-LSTM model[C]//Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, 2016: 225-230.
[32] Xiao Y, Cho K. Efficient character-level document classification by combining convolution and recurrent layers[J]. arXiv preprint arXiv: 1602.00367, 2016.
{{custom_fnGroup.title_cn}}
脚注
{{custom_fn.content}}
基金
国家自然科学基金(61003031);上海重点科技攻关项目(14511107902);上海市工程中心建设项目(GCZX14014);上海市一流学科建设项目(XTKX2012);上海市数据科学重点实验室开放课题(201609060003);沪江基金研究基地专项(C14001)
{{custom_fund}}