中文句法异构蕴含语块标注和边界识别研究

金天华,姜姗,于东,赵美倩,刘璐

PDF(2054 KB)
PDF(2054 KB)
中文信息学报 ›› 2019, Vol. 33 ›› Issue (2) : 17-25.
语言分析与计算

中文句法异构蕴含语块标注和边界识别研究

  • 金天华1,姜姗1,于东1,2,赵美倩1,刘璐1
作者信息 +

Chinese Chunked-based Heterogeneous Entailment Parser and Boundary Identification

  • JIN Tianhua1, JIANG Shan1, YU Dong1,2, ZHAO Meiqian1, LIU Lu1
Author information +
History +

摘要

文本蕴含是自然语言处理的难点,其形式、类型复杂,知识难以概括。早期多利用词汇蕴含和逻辑推理知识识别蕴含,但该方法反对特定类型的蕴含有效。近年来,利用大规模数据训练深度学习模型的方法在句级蕴含关系识别任务上取得优异性能,但模型不可解释,尤其是无法标定引起蕴含的具体语言片段。该文研究文本蕴含成因形式,归纳为词汇、句法异构、常识和社会经验三类,并以句法异构蕴含为研究对象。针对上述两个问题,提出句法异构蕴含语块的概念,定义其边界识别任务。该文制定句法异构蕴含语块标注规范,建立标注数据集。在此基础上,分别建立基于规则和基于深度学习的模型,探索句法异构蕴含语块的自动识别方法。实验结果表明,该文提出的深度学习模型能有效发现蕴含语块,为下一步的研究提供可靠的基线方法。

Abstract

Textual entailment(RTE) is a challenging issue for natural language processing. This paper proposes to categorize the textual entailment into three tyes: lexical entailment, chunked-based heterogeneous entailment and common-sense entailment. Focused on the concept of chunked-based heterogeneous, we further present a chunk annotation standard and a labeled dataset. Then we explore the rule-based model and the deep learning model respectively for the automatic detection of the chunk entailments. The experimental results show that the deep learning model adopted in this paper can discover the entailment fragments effectively.

关键词

文本蕴含 / 句法异构 / 语块标注

Key words

textual entailment / syntactic heterogeneous / chunks-labeling

引用本文

导出引用
金天华,姜姗,于东,赵美倩,刘璐. 中文句法异构蕴含语块标注和边界识别研究. 中文信息学报. 2019, 33(2): 17-25
JIN Tianhua, JIANG Shan, YU Dong, ZHAO Meiqian, LIU Lu. Chinese Chunked-based Heterogeneous Entailment Parser and Boundary Identification. Journal of Chinese Information Processing. 2019, 33(2): 17-25

参考文献

[1] 郭茂盛, 张宇, 刘挺. 文本蕴含关系识别与知识获取研究进展及展望[J]. 计算机学报, 2017, 40(4):889-910.
[2] Bos J, Markert K. Combining shallow and deep NLP methods for recognizing textual entailment[J]. Proc of the Pascal Rte Challenge, 2005:65-68.
[3] Rodolfo Delmonte, et al. VENSES - A Linguistically-Based System for Semantic Evaluation[J]. Lecture Notes in Computer Science, 2005, 3944:344-371.
[4] 刘茂福, 李妍, 顾进广. 基于统计与词汇语义特征的中文文本蕴涵识别[J]. 计算机工程与设计, 2013, 34(5):1777-1782.
[5] Marelli M, et al. A SICK cure for the evaluation of compositional distributional semantic models[C]//Proceedings of the Language Resources and Evaluation Conference. 2014:A-696.
[6] Bowman S R, et al. A large annotated corpus for learning natural language inference[J]. arXiv preprint arXiv:1508.05326, 2015.
[7] Williams A, Nangia N, Bowman S R. A broad-coverage challenge corpus for sentence understanding through inference[J]. arXiv preprint arXiv:1704.05426, 2017.
[8] Peters M E, Neumann M, Iyyer M, et al. Deep contextualized word representations[J]. arXiv preprint arXiv:1802.05365, 2018.
[9] Chen Q, et al. Enhanced lstm for natural language inference[J]. arXiv preprint arXiv:1609.06038, 2016.
[10] Gong Y, Luo H, Zhang J. Natural language inference over interaction space[J]. arXiv preprint arXiv:1709.04348, 2017.
[11] 叶蜚声 徐通锵. 语言学纲要[M]. 北京: 北京大学出版社, 2006:110-111.
[12] Skehan P. A Cognitive Approach to Language Learning. Oxford Applied Linguistics.[M]. 上海: 上海外语教育出版社, 2000.
[13] Wang S, Jiang J. Learning natural language inference with LSTM[J]. arXiv preprint arXiv:1512.08849, 2015.
[14] Dagan I, Glickman O, Magnini B. The PASCAL recognising textual entailment challenge[C]//Proceedings of International Conference on Machine Learning Challenges: Evaluating Predictive Uncertainty Visual Object Classification, and Recognizing Textual Entailment. Springer-Verlag, 2005:177-190.
[15] Bar-Haim R, et al. The Second PASCAL Recognising Textual Entailment Challenge[C]//Proceedings of the Pascal Challenges Workshop on Recognising Textual, 2006, 3944:177-190.
[16] Giampiccolo D, Magnini B, et al. The third PASCAL recognizing textual entailment challenge[C]//Proceedings of Acl-Pascal Workshop on Textual Entailment and Paraphrasing. Association for Computational Linguistics, 2007:1-9.
[17] Khot T, Sabharwal A, Clark P. SciTail: A textual entailment dataset from science question answering[C]//Proceedings of AAAI. 2018.
[18] Dagan I, Glickman O. Probabilistic textual entailment: Generic applied modeling of language variability[C]//Proceedings of the PASCAL Workshop on Learning Methods for Text Understanding and Mining, Grenoble, France, 2004:26-29.
[19] Matsuyoshi S, et al. Overview of the NTCIR-11 Recognizing Inference in Text and Validation (RITE-VAL) Task[C]//Proceedings of the 11th NTCIR Conference. 2014:223-232.
[20] 任函. 面向汉语文本推理的语言现象标注规范研究[J]. 河南科技学院学报, 2017(7):75-78.
[21] 陆俭明. 现代汉语语法研究教程[M]. 北京: 北京大学出版社, 2013:8-12.
[22] 范晓.关于汉语的语序问题(一)[J].汉语学习,2001(05):1-12.
[23] Wang S, Jiang J. Machine comprehension using match-lstm and answer pointer[J]. arXiv preprint arXiv:1608.07905, 2016.
[24] Vinyals O, Fortunato M, Jaitly N. Pointer networks[C]//Proceedings of Advances in Neural Information Processing Systems. 2015: 2692-2700.

基金

北京语言大学语言资源高精尖创新中心项目(TYR17001J);国家社会科学基金(16AYY007);中央高校基本科研业务费专项资金(北京语言大学梧桐创新项目:17PT05)
PDF(2054 KB)

Accesses

Citation

Detail

段落导航
相关文章

/