问题分类旨在对问题的类型进行自动分类,该任务是问答系统研究的一项基本任务。该文提出了一种基于问题和答案联合表示学习的问题分类方法。该方法的特色在于利用问题及其答案作为共同的上下文环境,学习词的分布式表示,从而充分利用未标注样本中问题和答案隐含的分类信息。具体而言,首先,我们引入神经网络语言模型,利用问题与答案联合学习词向量表示,增加问题词向量的信息量;其次,加入大量未标注的问题与答案样本参与词向量学习,进一步增强问题词向量表示能力;最后,将已标注的问题样本以词向量形式表示作为训练样本,采用卷积神经网络建立问题分类模型。实验结果表明,该文提出的基于半监督问题分类方法能够充分利用词向量表示和大量未标注样本来提升性能,明显优于其他基准半监督分类方法。
Abstract
Question classification aims at classifying the types of questions automatically, which is essential to most question answering systems. This paper proposes a method of semi-supervised question classification with jointly learning question and answer representations. It is featured by considering the question and its corresponding answer as conjunct context to learn the word distributed representation. Specifically, neural network language model is introduced to learn question and answer representations jointly, so that the word vectors of question are added more information. Secondly, large numbers of unlabeled questions and answers participate in word vectors learning, which could strengthen the representation capacity of question word vectors. Finally, we represent the questions of word vectors as training samples, adopting the convolutional neural network to construct the question classifier. The experimental results demonstrate that the method of semi-supervised question classification with synergetic representations learning in this paper can make full use of word vectors and the unlabeled samples to improve the performance, and is better than other strong semi-supervised methods.
关键词
问题分类 /
联合表示 /
半监督
{{custom_keyword}} /
Key words
question classification /
joint representations /
semi-supervised classification
{{custom_keyword}} /
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
参考文献
[1] 李鑫, 黄萱菁, 吴立德. 基于错误驱动算法组合分类器及其在问题分类中的应用[J]. 计算机研究与发展, 2008, 45(3): 535-541.
[2] 高超. 中文问题分类中特征选择研究[D]. 安徽工业大学硕士学位论文, 2011.
[3] Li S, Huang C R, Zhou G, et al. Employing Personal/Impersonal Views in Supervised and Semi-supervised Sentiment Classification[C]//Proceedings of ACL, 2010: 414-423.
[4] Li S, Lee S Y M, Gao W, et al. Semi-supervised Text Categorization by Considering Sufficiency and Diversity[M]. Natural Language Processing and Chinese Computing. Springer Berlin Heidelberg, 2013: 105-115.
[5] 高伟, 王中卿, 李寿山. 基于集成学习的半监督情感分类方法研究[J]. 中文信息学报, 2013, 27(3): 120-126.
[6] Wang J, Xue Y, Li S, et al. Leveraging Interactive Knowledge and Unlabeled Data in Gender Classification with Co-training[C]//Proceedings of the International Conference on Database Systems for Advanced Applications. Springer International Publishing, 2015: 246-251.
[7] Li S, Huang L, Wang J, et al. Semi-Stacking for Semi-supervised Sentiment Classification[C]//Proceedings of ACL, 2015.
[8] Aikawa N, Sakai T, Yamana H. Community qa question classification: Is the asker looking for subjective answers or not?[J]. IPSJ Online Transactions, 2011, 4: 160-168.
[9] Hui Z, Liu J, Ouyang L. Question Classification Based on an Extended Class Sequential Rule Model[C]//Proceedings of IJCNLP, 2011: 938-946.
[10] Mishra M, Kumar Mishra V, Sharma H R. Question Classification Using Semantic, Syntactic and Lexical features[J]. International Journal of Web & Semantic Technology, 2013, 4(3): 39-47.
[11] 杨思春, 高超, 秦锋,等. 融合基本特征和词袋绑定特征的问句特征模型[J]. 中文信息学报, 2012, 26(5): 46-52.
[12] Liu L, Yu Z, Guo J, et al. Chinese Question Classification Based on Question Property Kernel[J]. International Journal of Machine Learning & Cybernetics, 2014, 5(5): 713-720.
[13] Chen J, Su L, Li Y, et al. Label Propagation for Question Classification in CQA[M]. Advances in Swarm and Computational Intelligence. Springer International Publishing, 2015: 333-340.
[14] 张栋, 李寿山, 周国栋. 基于答案辅助的半监督问题分类方法[J]. 计算机工程与科学, 2015, 37(12): 2352-2357.
[15] Mikolov T, Chen K, Corrado G, et al. Efficient Estimation of Word Representations in Vector Space[J]. arXiv preprint arXiv: 1301.3781, 2013.
[16] Kalchbrenner N, Grefenstette E, Blunsom P. A Convolutional Neural Network for Modelling Sentences[J]. arXiv preprint arXiv: 1404.2188, 2014.
[17] Collobert R, Weston J, Bottou L, et al. Natural language processing (almost) from scratch[J]. The Journal of Machine Learning Research, 2011, 12: 2493-2537.
[18] Kim Y. Convolutional neural networks for sentence classification[J]. arXiv preprint arXiv: 1408.5882, 2014.
{{custom_fnGroup.title_cn}}
脚注
{{custom_fn.content}}
基金
国家自然科学基金(61331011);国家自然科学(61375073,61273320)
{{custom_fund}}