基于宏观语义表示的宏观篇章关系识别方法

周懿,褚晓敏,朱巧明,蒋峰,李培峰

PDF(1335 KB)
PDF(1335 KB)
中文信息学报 ›› 2019, Vol. 33 ›› Issue (3) : 1-7,24.
语言分析与计算

基于宏观语义表示的宏观篇章关系识别方法

  • 周懿,褚晓敏,朱巧明,蒋峰,李培峰
作者信息 +

Macro Discourse Relation Classification Based on Macro Semantics Representation

  • ZHOU Yi, CHU Xiaomin, ZHU Qiaoming, JIANG Feng, LI Peifeng
Author information +
History +

摘要

宏观篇章分析旨在分析相邻段落或段落群之间的语义联系,是自然语言处理领域其他任务的工作基础。该文研究了宏观篇章分析中的关系识别问题,提出了一个宏观篇章关系识别模型。该模型利用基于词向量的宏观篇章语义表示方法和适用于宏观篇章关系识别的结构特征,从两个层面提高了模型分辨宏观篇章关系的能力。在汉语宏观篇章树库(MCDTB)上的实验表明,该模型在大类分类中F1值达到了68.22%,比基准系统提升了4.17%。

Abstract

The macro discourse analysis aims to analyze the semantic relations between adjacent paragraphs or paragraph groups, which is a less-addressed fundamental work of in the field of natural language processing. This paper proposes a classification model to decide the relation in macro discourse-level. This model introduces a distribute representation of macro discourse semantics on word vectors and a set of structure features to improve the performance. The experimental results on the Macro Chinese Discourse Tree Bank (MCDTB) show that the F1 value of our model reaches 68.22%, achieving 4.17% improvement.

关键词

宏观篇章关系识别 / 宏观篇章结构特征 / 宏观篇章语义表示

Key words

macro discourse-level relation classification / structure features of macro discourse / representation of marco discourse

引用本文

导出引用
周懿,褚晓敏,朱巧明,蒋峰,李培峰. 基于宏观语义表示的宏观篇章关系识别方法. 中文信息学报. 2019, 33(3): 1-7,24
ZHOU Yi, CHU Xiaomin, ZHU Qiaoming, JIANG Feng, LI Peifeng. Macro Discourse Relation Classification Based on Macro Semantics Representation. Journal of Chinese Information Processing. 2019, 33(3): 1-7,24

参考文献

[1] Liakata M, et al. A discourse-driven content model for summarising scientific articles evaluated in a complex question answering task[C]//Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, 2013: 747-757.
[2] Zhou L, et al. Unsupervised discovery of discourse relations for eliminating intra-sentence polarity ambiguities[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 2011: 162-171.
[3] Huttunen S, et al. Relevance prediction in information extraction using discourse and lexical features[J]. Nealt Proceedings Series Vol,2011(11): 114-121.
[4] Xue N, et al. The Penn Chinese TreeBank: Phrase structure annotation of a large corpus[J]. Natural Language Engineering, 2005, 11(2): 207-238.
[5] Carlson L, Okurowski M E, Marcu D. RST discourse treebank[M]. Linguistic Data Consortium, University of Pennsylvania, 2002.
[6] Mann W C, Thompson S A. Relational propositions in discourse[J]. Discourse Processes,1986, 9(1): 57-90.
[7] Mann W C, Thompson S A. Rhetorical structure theory: A theory of text organization[J].Text-Interdisciplinary Journal for the Study of Discourse, 1987, 8(3): 243-281.
[8] Hernault H, Prendinger H, Ishizuka M. HILDA: A discourse parser using support vector machine classification[J]. Dialogue and Discourse, 2010, 1(3): 1-33.
[9] Joty S, Carenini G, Ng R T. A novel discriminative framework for sentence-level discourse analysis[C]//Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning. Association for Computational Linguistics, 2012: 904-915.
[10] Joty S, et al. Combining intra-and multi-sentential rhetorical parsing for document-level discourse analysis[C]//Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2013(1): 486-496.
[11] Feng V W, Hirst G. Text-level discourse parsing with rich linguistic features[C]//Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics(Long Papers-Volume 1). Association for Computational Linguistics, 2012: 60-68.
[12] Feng V W, Hirst G. A linear-time bottom-up discourse parser with constraints and post-editing[C]//Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2014(1): 511-521.
[13] Wang Y, Li S, Wang H. A two-stage parsing method for text-level discourse analysis[C]//Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), 2017(2): 184-188.
[14] PDTB-Group. The Penn Discourse Treebank 2.0 Annotation Manual[OL]. The PDTB Research Group, 2007.
[15] Li Y, Kong F, Zhou G. Building Chinese discourse corpus with connective-driven dependency tree structure[C]//Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2014: 2105-2114.
[16] Lin Z, Kan M Y, Ng H T. Recognizing implicit discourse relations in the Penn Discourse Treebank[C]//Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing(Volume 1), Association for Computational Linguistics, 2009: 343-351.
[17] Park J, Cardie C. Improving implicit discourse relation recognition through feature set optimization[C]//Proceedings of the 13th Annual Meeting of the Special Interest Group on Discourse and Dialogue. Association for Computational Linguistics, 2012: 108-112.
[18] Qin L, et al. Adversarial Connective-exploiting Networks for Implicit Discourse Relation Classification[C]//Proceedings of the 55th Annual Meeting of the Association for Computational Linguisties(Volume 1: Long Papers).2017,1: 1006-1017.
[19] 李艳翠. 汉语篇章结构表示体系及资源构建研究[D]. 苏州: 苏州大学博士学位论文, 2015.
[20] Kong F, Wang H, Zhou G. A CDT-styled end-to-end Chinese discourse parser[M]. Natural Language Understanding and Intelligent Applications. Springer, Cham, 2016: 387-398.
[21] 蒋峰, 褚晓敏, 徐昇, 等. 基于主题相似度的宏观篇章主次关系识别方法[J]. 中文信息学报, 2018, 32(1): 43-50.
[22] Chu, X, et al. Building a Macro Chinese Discourse Treebank[C]//Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC-2018). 2018: 1920-1924.
[23] Mikolov T, et al. Efficient estimation of word representations in vector space[J]. arXiv preprint arXiv: 1301.3781, 2013.[24] Pennington J, Socher R, Manning C. Glove: Global vectors for word representation[C]//Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2014: 1532-1543.

基金

国家自然科学基金(61772354,61773276,61836007);国防科技先导计划(17-ZLXDXX-02-06-02-04)
PDF(1335 KB)

809

Accesses

0

Citation

Detail

段落导航
相关文章

/