针对文本蕴含的训练数据不足的问题,该文提出了基于协同训练的文本蕴含识别方法。该方法利用少量已标注的蕴含数据和大量未标注数据进行协同训练。为此,该文利用改写视图和评估视图,从结构和非结构两个角度考察蕴含关系,并将语义树核分类器和基于统计特征的分类器应用于两个视图,同时利用协同训练的结果训练一个综合分类器,用于对新数据进行预测。实验表明,基于协同训练的蕴含识别方法能在少量训练数据的情况下获得较好的识别性能。
Abstract
This paper introduces a co-training approach to recognizing textual entailment. In this approach, a small labeled entailment dataset as well as a large unlabeled one are employed for co-training, which aims at solving the lack of entailment data . Two different views, rewriting view and assessing view, are proposed to measure structural and non-structural entailment relations, likewise two classifiers, namely semantic tree kernel based classifier and statistical features based classifier, are applied to train under the two views respectively. For predication, a global classifier is built, trained by the results of co-training. Experiments show that the co-training based approach achieves a good performance in the case of a small training dataset.
关键词
文本蕴含识别 /
协同训练 /
语义树核
{{custom_keyword}} /
Key words
recognizing textual entailment /
co-training /
semantic tree kernel
{{custom_keyword}} /
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
参考文献
[1] Shachar Mirkin, Roy Bar-Haim, Jonathan Berant, et al. Bar-Ilan Universitys Submissions to RTE-5[C]//Proceedings of The Text Analysis Conference 2009. Gaithersburg, Maryland, USA, 2009.
[2] Han Ren, Donghong Ji, Jing Wan. WHU at TAC 2009: A Tri-categorization Approach to Textual Entailment Recognition[C]//Proceedings of Text Analysis Conference 2009. Gaithersburg, Maryland, USA, 2009.
[3] Mark Sammons, V G Vinod Vydiswaran, Tim vieira, et al. Relation Alignment for Textual Entailment Recognition[C]//Proceedings of the Text Analysis Conference 2009. Gaithersburg, Maryland, USA, 2009.
[4] Alicia Ageno, David Farwell, Daniel Ferres, et al. TALP at TAC 20008: A Semantic Approach to Recognizing Textual Entailment[C]//Proceedings of the 4th PASCAL Challenges Workshop on Recognizing Textual Entailment. Gaithersburg, Maryland, USA, 2008.
[5] Eugene Agichtein, Walt Askew, Yandong Liu. Combining Lexical, Syntactic, and Semantic Evidence for Textual Entailment Classification[C]//Proceedings of the 4th PASCAL Challenges Workshop on Recognizing Textual Entailment. Gaithersburg, Maryland, USA, 2008.
[6] Fabio Massimo Zanzotto. PeMoZa submission to TAC 2008[C]//Proceedings of the 4th PASCAL Challenges Workshop on Recognizing Textual Entailment. Gaithersburg, Maryland, USA, 2008.
[7] Fabio Massimo Zanzotto, Marco Pennacchiotti. Expanding Textual Entailment Corpora from Wikipedia using Co-training[C]//Proceedings of the COLING-Workshop on The Peoples Web Meets NLP: Collaboratively Constructed Semantic Resources. Beijing, China, 2010.
[8] Prodromos Malakasiotis, Ion androutsopoulos. Learning Textual Entailment using SVMs and String Similarity Measures[C]//Proceedings of the The ACL-PASCAL Workshop on Textual Entailment and Paraphrasing. Prague, Czech, 2007.
[9] Miguel Angel Ríos Gaona, Alexander Gelbukh, Sivaji Bandyopadhyay. Recognizing Textual Entailment Using a Machine Learning Approach[C]//Proceedings of the 9th Mexican International Conference on Artificial Intelligence Conference on Advances in Soft Computing: Part II, Pachuca, Mexico, 2010.
[10] Roy Bar-Haim, Jonathan Berant, Ido Dagan, et al. Efficient Semantic Deduction and Approximate Matching over Compact Parse Forests[C]//Proceedings of the 4th PASCAL Challenges Workshop on Recognizing Textual Entailment. Gaithersburg, Maryland, USA, 2008.
[11] Alvaro Rodrigo, Anselmo Penas, Felisa Verdejo. Towards an Entity-based Recognition of Textual Entailment[C]//Proceedings of the 4th PASCAL Challenges Workshop on Recognizing Textual Entailment. Gaithersburg, Maryland, USA, 2008.
[12] Yashar Mehdad, Alessandro Moschitti, Fabio Massiomo Zanzotto. SemKer: Syntactic/Semantic Kernels for Recognizing Textual Entailment[C]//Proceedings of the Text Analysis Conference 2009. Gaithersburg, Maryland, USA, 2009.
[13] Zhibiao Wu, Martha Palmer. Verb Semantics and Lexical Selection[C]//Proceedings of the 32nd Annual Meeting of the Association for Computational Linguistics. Las Cruces, New Mexico, 1994.
{{custom_fnGroup.title_cn}}
脚注
{{custom_fn.content}}
基金
国家自然科学基金(61402341,61373108,61173062),中国博士后科学基金(2014M552073, 2013M540594),中央高校基本科研业务费专项资金(2012GSP017)
{{custom_fund}}