论坛帖子对话行为分类可以明确每个帖子在当前线索中的角色,有助于重构论坛线索中的对话关系,提高论坛信息检索的效果。该文提出了一种基于弱监督学习的论坛帖子对话行为分类方法,把帖子的对话行为分类作为线索的序列标注问题来解决。该方法的特点是只要指定合理的特征约束,就可以训练对话行为分类模型。方法在CNET和edX数据集上的分类精确率分别达到75.6%和60.7%,优于有监督的条件随机域方法。
Abstract
Dialogue act classification for online forum post can indicate the role of a post in a thread, which is helpful for reconstructing the conversation relation in a thread and improving the performance of forum retrieval. This paper proposes a weakly supervised learning method for online forum post dialogue act classification, which trests the posts dialogue act classification as sequential labeling problem for threads. The proposed approach can lean the model for dialogue act classification with feature constrains and unlabeled data. It achieved an accuracy of 75.6% and 60.7% in CNET data set and edX data set respectively, which are better that the performances of supervised CRF model.
关键词
弱监督学习 /
特征约束 /
对话行为分类 /
论坛线索结构分析
{{custom_keyword}} /
Key words
weakly supervised learning /
feature constrains /
dialogue act classification /
forum thread structure analysis
{{custom_keyword}} /
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
参考文献
[1] Wang L, Kim S, Baldwin T. The Utility of Discourse Structure in Forum Thread Retrieval[C]//Proceedings of 9th Asia Information Retrieval Societies Conference. 2013: 284-295.
[2] 王宝勋, 刘秉权, 孙承杰等. 基于论坛话题段落划分的答案识别[J]. 自动化学报, 2013, 39(1): 11-20.
[3] Seo J, Croft W, Smith D. Online community search using thread structure[C]//Proceedings of the 18th ACM conference on Information and knowledge management. 2009: 1907-1910.
[4] Riahi F, Zolaktaf Z, Shafiei M, et al. Finding expert users in community question answering[C]//Proceedings of the 21st international conference companion on World Wide Web-WWW ’12 Companion. New York, New York, USA: ACM Press, 2012(i): 791-798.
[5] Kim S, Wang L, Baldwin T. Tagging and linking web forum posts[C]//Proceedings of the Fourteenth Conference on Computational Natural Language Learning. 2010: 192-202.
[6] Wang L, Lui M, Kim S N, et al. Predicting thread discourse structure over technical web forums[C]//Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing. 2011: 13-25.
[7] Wang H, Wang C, Zhai C, et al. Learning online discussion structures by conditional random fields[C]//Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval. 2011: 435-444.
[8] Lampert A, Dale R, Paris C. The nature of requests and commitments in email messages[C]//Proceedings of the AAAI 2008 Workshop on Enhanced Messaging. 2008: 42-47.
[9] Lin F-R, Hsieh L-S, Chuang F-T. Discovering genres of online discussion threads via text mining[J]. Computers & Education, Elsevier Ltd, 2009, 52(2): 481-495.
[10] Joty S, Carenini G, Ng R T. Topic Segmentation and Labeling in Asynchronous Conversations[J]. Journal of Artificial Intelligence Research, 2013, 47: 521-573.
[11] Cong G, Wang L, Lin C-Y, et al. Finding question-answer pairs from online forums[C]//Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval. New York, New York, USA: ACM Press, 2008: 467-474.
[12] Gangadhar R, Kar R. Does Similarity Matter?? The Case of Answer Extraction from Technical Discussion Forums[C]//Proceedings of COLING 2012: Posters. 2012, 1: 175-184.
[13] Ramesh A, Goldwasser D. Modeling Learner Engagement in MOOCs using Probabilistic Soft Logic[C]//Proceedings of NIPS Workshop on Data Driven Education. 2013: 1-7.
[14] Anderson A, Huttenlocher D, Kleinberg J. Discovering Value from Community Activity on Focused Question Answering Sites?: A Case Study of Stack Overflow[C]//Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining. 2012: 850-858.
[15] Mccallum A, Mann G, Druck G. Generalized expectation criteria[R]. 2007.
[16] Mann G, Mccallum A. Generalized expectation criteria for semi-supervised learning with weakly labeled data[J]. The Journal of Machine Learning Research, 2010(11): 955-984.[17] Druck G, Mann G, Mccallum A. Learning from labeled features using generalized expectation criteria[C]//Proceedings of the 31st annual international ACM SIGIR conference on research and development in information retrieval. 2008: 595-602
{{custom_fnGroup.title_cn}}
脚注
{{custom_fn.content}}
基金
国家自然科学基金(61100094, 61300114)
{{custom_fund}}