近年来,基于注意力(attention)机制的循环神经网络在文本分类中表现出显著的性能。然而,当训练集数据有限时,测试集数据中许多领域实体指称项在训练集中处于低频,甚至从未出现,如中文话语领域分类任务。该文提出结合特殊领域实体识别的远监督话语分类模型。首先,通过远监督(distant supervision)的方式获取数据集中的领域知识,显著地减少了人工操作;其次,利用特殊领域实体识别和本地构建的补充性知识库去补全远监督获取的领域知识,旨在为模型提供更加全面的领域知识;最后,对基于上下文的语义特征和知识特征这两种异构信息提出了细粒度拼接机制,在词级上融合了预训练词汇语义表达和领域知识表达,有效提升了分类模型的性能。通过与研究进展的文本分类模型的对比实验表明,该文模型在中文话语领域分类基准数据集的实验上取得了较高的正确率,特别是在知识敏感型领域,较研究进展方法具有显著优势。
Abstract
Recently recurrent neural networks with an attention mechanism have achieved strong results on text classification. However, when the labeled training data is not enough, such as in Chinese utterance domain classification (DC) task, the data sparseness of domain entity mentions remains a significant challenge. To address this issue, this paper proposes knowledge-based neural DC (K-NDC) models that incorporate domain knowledge from external sources to enrich the representations of utterances. Firstly, domain entities and types are obtained by distant supervision from CN-Probase. Then domain-special named entity recognition (NER) and complement KB are exploited to further complement the knowledge coverage. Finally we design a novel mechanism for merging knowledge with utterance representations at fine-grained (Chinese word level). Experiments on the SMP-ECDT benchmark corpus show that, compared with the state of the art text classification models, the proposed method achieves a better performance, especially in knowledge-intensive domains.
关键词
领域分类 /
外部知识 /
远监督 /
话语表达 /
神经分类器
{{custom_keyword}} /
Key words
domain classification /
external knowledge /
distant supervision /
utterance representation /
neural classifier
{{custom_keyword}} /
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
参考文献
[1] 俞凯,陈露,陈博,等.任务型人机对话系统中的认知技术——概念、进展及其未来 [J].计算机学报,2015, 38 (12):2333-2348.
[2] G Tur, R De Mori. Spoken language understanding: Systems for extracting semantic information from speech [M]. Hoboker: John Wiley & Sons, Inc.,2011.
[3] G Tür, L Deng, D Hakkani-Tür, et al. Towards deeper understanding: Deep convex networks for semantic utterance classification [C]//Proceedings of the 37th IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2012), 2012: 5045-5048.
[4] P Y Xu, R Sarikaya. Contextual domain classification in spoken language understanding systems using recurrent neural network [C]//Proceedings of the 39th International Conference on Acoustics, Speech and Signal Processing (ICASSP 2014), 2014: 136-140.
[5] S V Ravuri, A Stolcke. Recurrent neural network and LSTM models for lexical utterance classification[C]//Proceedings of the 16th Annual Conference of the International Speech Communication Association (INTERSPEECH 2015), 2015: 135-139.
[6] K S Yao, B L Peng, Y Zhang, et al. Spoken language understanding using long short-term memory neural networks [C]//Proceedings of the 2014 IEEE Spoken Language Technology Workshop (SLT 2014), 2014: 189-194.
[7] S W Lai, L H Xu, K Liu, et al. Recurrent convolutional neural networks for text classification [C]//Proceedings of the 29th AAAI Conference on Artificial Intelligence (AAAI 2015), 2015: 2267-2273.
[8] T Mikolov, I Sutskever, K Chen, et al. Distributed representations of words and phrases and their compositionality[C]//Proceedings of the 27th Annual Conference on Neural Information Processing Systems (NIPS 2013), 2013: 3111-3119.
[9] S Takamatsu, I Sato, H Nakagawa. Reducing wrong labels in distant supervision for relation extraction[C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (ACL 2012),2012: 721-729.
[10] X Ling, D S Weld. Fine-grained entity recognition [C]//Proceedings of the 26th AAAI Conference on Artificial Intelligence (AAAI 2012), 2012: 94-100.
[11] X Ren, W Q He, M Qu, et al. AFET: Automatic fine-grained entity typing by hierarchical partial-label embedding [C]//Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing (EMNLP 2016), 2016: 1369-1378.
[12] L H Chen, J Q Liang, C H Xie, et al. Short text entity linking with fine-grained topics [C]//Proceedings of the 27th ACM International Conference on Information and Knowledge Management (CIKM 2018), 2018: 457-466.
[13] J D Chen, A Wang, J J Chen, et al. CN-Probase: A data-driven Approach for Large-scale Chinese taxonomy construction [C]//Proceedings of the 35th IEEE International Conference on Data Engineering (ICDE 2019), 2019: 1706-1709.
[14] Y Zhang, J Yang. Chinese NER using lattice LSTM [C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (ACL 2018),2018: 1554-1564.
[15] J Wang, Z Y Wang, D W Zhang, et al. Combining knowledge with deep convolutional neural networks for short text classification [C]//Proceedings of the 26th International Joint Conference on Artificial Intelligence (IJCAI 2018),2018: 2915-2921.
[16] Y Deng, Y Shen, M Yang, et al. Knowledge as a bridge: improving cross-domain answer selection with external knowledge [C]//Proceedings of the 27th International Conference on Computational Linguistics (COLING 2018), 2018: 3295-3305.
[17] M G Chen, X M Jin, D Shen. Short text classification improved by learning multigranularity topics [C]//Proceedings of the 22nd International Joint Conference on Artificial Intelligence (IJCAI 2011), 2011: 1776-1781.
[18] P Haffner, G Tur, J H Wright. Optimizing SVMs for complex call classification [C]//Proceedings of the 28th IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2003), 2003: 632-635.
[19] 马成龙, 姜亚松, 李艳玲,等. 基于词矢量相似度的短文本分类 [J]. 山东大学学报(理学版), 2014(12):18-22.
[20] T Kenter, M D Rijke. Short text similarity wityh word embeddings[C]//Proceedings of the 24th ACM International Conference on Information and Knowledge Management (CIKM 2015), 2015, 1411-1420.
[21] X H Phan, L M Nguyen, S Horiguchi. Learning to classify short and sparse text & web with hidden topics from large-scale data collections [C]//Proceedings of the 17th International World Wide Web Conference (WWW 2008), 2008: 91-100.
[22] C Chelba, M Mahajan, A Acero. Speech utterance classification [C]//Proceedings of the 28th IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2003), 2003: 280-283.
[23] X Yang, A Loukina, K Evanini. Machine learning approaches to improving pronunciation error detection on an imbalanced corpus [C]//Proceedings of the 4th IEEE Workshop on Spoken Language Technology (SLT 2014), 2014: 300-305.
[24] R Sarikaya, G E Hinton, B Ramabhadran. Deep belief nets for natural language call-routing [C]//Proceedings of the 36th IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2011), 2011: 5680-5683.
[25] P Y Xu, R Sarikaya. Convolutional neural network based triangular CRF for joint intent detection and slot filling [C]//Proceedings of the 2013 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU 2013), 2013: 78-83.
[26] Y Kim. Convolutional neural networks for sentence classification [C]//Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP 2014),2014: 1746-1751.
[27] S Ravuri, A Stolcke. A comparative study of recurrent neural network models for lexical domain classification [C]//Proceedings of the 41st IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2016), 2016: 6075-6079.
[28] J P Cheng, L Dong, M Lapata. Long short-term memory-networks for machine reading [C]//Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing (EMNLP 2016), 2016: 551-561.
[29] N T Vu, P Gupta, H Adel, et al. Bi-directional recurrent neural network with ranking loss for spoken language understanding [C]//Proceedings of the 41st IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2016),2016: 6060-6064.
[30] 柯子烜, 黄沛杰, 曾真. 基于优化“未定义”类话语检测的话语领域分类 [J]. 中文信息学报, 2018, 32(4): 105-113.
[31] B Liu, I Lane. Attention-based recurrent neural network models for joint intent detection and slot filling [C]//Proceedings of the 17th Annual Conference of the International Speech Communication Association (INTERSPEECH 2016),2016: 685-689.
[32] Y Kim, D Kim, A Kumar. Efficient large-scale neural domain classication with personalized attention [C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (ACL 2018),2018: 2214-2224.
[33] Z C Yang, D Y Yang, C Dyer, et al. Hierarchical attention networks for document classification [C]//Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics (NAACL 2016),2016: 1480-1489.
[34] L P Heck, D Hakkani-Tür, G Tür. Leveraging knowledge graphs for web-scale unsupervised semantic parsing [C]//Proceedings of the 14th Annual Conference of the International Speech Communication Association (INTERSPEECH 2013), 2013: 1594-1598.
[35] Y N Chen, D Hakanni-Tur, G Tür, et al. Syntax or semantics? knowledge-guided joint semantic frame parsing [C]//Proceedings of 2016 IEEE Spoken Language Technology Workshop (SLT 2016), 2016: 348-355.
[36] C Shi, S J Liu, S Ren, et al. Knowledge-based semantic embedding for machine translation [C]//Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (ACL 2016), 2016: 2245-2254.
[37] A Moro, A Raganato, R Navigli. Entity linking meets word sense disambiguation: a unified approach [J]. Transactions of the Association for Computational Linguistics, 2014 (2): 231-244.
[38] J D Chen, Y Z Hu, J P Liu, et al. Deep short text classification with knowledge powered attention[C]//Proceedings of the 33rd AAAI Conference on Artificial Intelligence (AAAI 2019),2019: 6252-6259.
[39] M Schuster, K K Paliwal. Bidirectional recurrent neural networks [J]. IEEE Transactions on Signal Processing, 1997(45): 2673-2681.
[40] W N Zhang, Z G Chen, W X Che, et al. The first evaluation of Chinese human-computer dialogue technology[J]. Computing Research Repository, arXiv:1709.10217.
[41] N Srivastava, G E Hinton, A Krizhevsky, et al. Dropout: A simple way to prevent neural networks from overfitting [J]. Journal of Machine Learning Research, 2014(15): 1929-1958.
[42] A Vaswani, N Shazeer, N Parmar, et al. Attention is all you need [C]//Proceedings of the 41st Annual Conference on Neural Information Processing Systems (NIPS 2017), 2017: 6000-6010.
[43] S Shankar, S Garg, S Sarawagi, Surprisingly easy hard-attention for sequence to sequence learning [C]//Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (EMNLP 2018), 2018: 640-645.
{{custom_fnGroup.title_cn}}
脚注
{{custom_fn.content}}
基金
国家自然科学基金(71472068);广东省大学生创新训练计划项目(201810564094,201910564164)
{{custom_fund}}