面向不平衡数据的隐式篇章关系分类方法研究

朱珊珊,洪 宇,丁思远,姚建民,朱巧明

PDF(3479 KB)
PDF(3479 KB)
中文信息学报 ›› 2015, Vol. 29 ›› Issue (6) : 110-118.
综述

面向不平衡数据的隐式篇章关系分类方法研究

  • 朱珊珊,洪 宇,丁思远,姚建民,朱巧明
作者信息 +

Implicit Discourse Relation Recognition for Imbalanced Data

  • ZHU Shanshan, HONG Yu, DING Siyuan, YAO Jianmin, ZHU Qiaoming
Author information +
History +

摘要

隐式篇章关系分类是篇章分析领域的一个重要研究子任务,大部分已有研究都假设参与分类的正类样本和负类样本数量相等,采用随机欠采样等不平衡数据处理方法保持训练样本中数据平衡,然而,在实际语料中正类样本和负类样本的分布是不平衡的,这一现象往往制约隐式篇章关系分类性能的有效提升。针对该问题,该文提出一种基于框架语义向量的隐式篇章关系分类方法,该方法借助框架语义知识库,将论元表示成框架语义向量,在此基础上,从外部数据资源中挖掘有效的篇章关系样本,对训练样本进行扩展,解决数据不平衡问题。在宾州篇章树库(Penn Discourse Treebank, PDTB)语料上的实验结果表明,相较于目前主流的不平衡数据处理方法,该文方法能够明显提高隐式篇章关系分类性能。

Abstract

Implicit discourse relation recognition is an important subtask in the discourse analysis field. Most existing studies assume the balance between the numbers of positive and negative samples, and employ random under-sampling method to keep the training data well balanced. However, the training data has imbalanced distribution in reality which affect the recognition performance of the implicit discourse relation. To solve this problem, we propose a novel implicit discourse relation recognition method based on the frame semantic vectors. Firstly, we represent the argument as a frame semantic vector using the FrameNet resource, and then mine a number of effective discourse relation samples from the external data resources based on this new representation. Finally, we add the mined samples into the origin training data sets and perform experiment on this extended data sets. Evaluation on the Penn Discourse Treebank (PDTB) show that the proposed method perform better than the current mainstream imbalanced classification methods.
Key words implicit discourse recognition; imbalanced data; frame semantic vectors
   
   
   

关键词

隐式篇章关系分类 / 不平衡数据 / 框架语义向量

Key words

implicit discourse recognition / imbalanced data / frame semantic vectors

引用本文

导出引用
朱珊珊,洪 宇,丁思远,姚建民,朱巧明. 面向不平衡数据的隐式篇章关系分类方法研究. 中文信息学报. 2015, 29(6): 110-118
ZHU Shanshan, HONG Yu, DING Siyuan, YAO Jianmin, ZHU Qiaoming. Implicit Discourse Relation Recognition for Imbalanced Data. Journal of Chinese Information Processing. 2015, 29(6): 110-118

参考文献

[1] R Prasad, N Dinesh, A Lee, et al. The Penn Discourse TreeBank 2.0[C]//Proceedings of the 6th International Conference on Language Resources and Evaluation (LREC),2008:2961-2968.
[2] E Miltsakaki, L Robaldo, A Lee, et al. Sense Annotation in the Penn Discourse Treebank[C]//Proceedings of the Computational Linguistics and Intelligent Text Processing. Springer Berlin Heidelberg, 2008:275-286.
[3] E Pitler, M Raghupathy, H Mehta, et al. Easily Identifiable Discourse Relations[R]. Technical Reports (CIS), 2008:87-90.
[4] E Pitler, A Louis, A Nenkova. Automatic Sense Prediction for Implicit Discourse Relations in Text[C]//Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP (ACL-AFNLP). 2009, 2:683-691.
[5] Z M Zhou, Y Xu, Z Y Niu, et al. Predicting Discourse Connectives for Implicit Discourse Relation Recognition[C]//Proceedings of the 23rd International Conference on Computational Linguistics (COLING). Posters, 2010:1507-1514.
[6] Z H Lin, M Y Kan, H T Ng. Recognizing Implicit Discourse Relations in the Penn Discourse Treebank[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP). 2009, 1:343-351.
[7] W T Wang, J Su, C L Tan. Kernel Based Discourse Relation Recognition with Temporal Ordering Information[C]//Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics (ACL). 2010:710-719.
[8] J Park, C Cardie. Improving Implicit Discourse Relation Recognition through Feature Set Optimization[C]//Proceedings of the 13th Annual Meeting of the Special Interest Group on Discourse and Dialogue (SIGDIAL). 2012:108-112.
[9] X Wang, S J Li, J Li, et al. Implicit Discourse Relation Recognition by Selecting Typical Training Examples[C]//Proceedings of the 24th International Conference on Computational Linguistics (COLING). 2012: 2757-2772.
[10] A T Rutherford, N Xue. Discovering implicit discourse relations through brown cluster pair representation and coreference patterns [C]//Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics (EACL). 2014: 645-654.
[11] J J Li, M Carpuat, A Nenkova. Cross-lingual Discourse Relation Analysis: A corpus study and a semi-supervised classification system[C]//Proceedings of the 25th International Conference on Computational Linguistics (COLING). 2014: 577-587.
[12] I Mani, J P Zhang. KNN approach to unbalanced data distributions: a case study involving information extraction[C]//Proceedings of Workshop on Learning from Imbalanced Datasets. 2003.
[13] X Y Liu, J Wu, Z H Zhou. Exploratory under-sampling for class-Imbalance learning [J]. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 2009, 2(39): 539-550.
[14] N V Chawla, K W Bowyer, L O Hall, et al. SMOTE: synthetic minority over-sampling technique [J]. Journal of artificial intelligence research, 2002: 321-357.
[15] H Han, W Y Wang, B H Mao. Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning [M]. Advances in intelligent computing. Springer Berlin Heidelberg, 2005: 878-887.
[16] C Elkan. The foundations of cost-sensitive learning[C]//Proceedings of the International joint conference on artificial intelligence (IJCAI). Lawrence Erlbaum Association Ltd, 2001, 17(1): 973-978.
[17] C Fillmore. Frame semantics [J]. Linguistics in the morning calm, 1982: 111-137.
[18] Y Hong, X P Zhou, T T Che, et al. Cross-argument inference for implicit discourse relation recognition[C]//Proceedings of the 21st ACM International Conference on Information and Knowledge Management (CIKM). 2012: 295-304.
[19] C C Chang, C J Lin. LIBSVM: a library for support vector machines [J]. ACM Transactions on Intelligent Systems and Technology (TIST), 2001, 2(3): 389-396.

基金

国家自然科学基金(61373097,61272259,61272260)
PDF(3479 KB)

Accesses

Citation

Detail

段落导航
相关文章

/