衔接性驱动的篇章一致性建模研究

徐 凡,朱巧明,周国栋,王明文

PDF(1102 KB)
PDF(1102 KB)
中文信息学报 ›› 2014, Vol. 28 ›› Issue (3) : 11-21.
语言分析与生成

衔接性驱动的篇章一致性建模研究

  • 徐 凡1,朱巧明2,周国栋2,王明文1
作者信息 +

Cohesion-driven Discourse Coherence Modeling

  • XU Fan1, ZHU Qiaoming2, ZHOU Guodong2, WANG Mingwen1
Author information +
History +

摘要

该文系统地探索了衔接性理论对篇章一致性建模的作用。不同于目前有监督的基于实体和篇章关系网格的模型,该文提出的无监督模型揭示了系统功能语法中主位—述位结构理论对于篇章一致性建模的重要性,同时显示了基于主位和指代消解两种过滤机制对于篇章一致性建模的适用性。在三种不同文体的国际基准语料上进行的句子排序和文本摘要一致性检测任务实验表明主位—述位结构和指代消解信息能使篇章一致性检测准确率得到显著提升。

Abstract

This paper systematically explores the impact of cohesion theory in Discourse Coherence Modeling (DCM). Different from the state-of-the-art supervised entity-based and discourse relation-based grid models, our unsupervised model shows the importance of the theme-rheme structure, a cohesion theory of systemic-functional grammar, to DCM, and the appropriateness of theme and coreference based filtering mechanism to discourse consistency in DCM. Evaluation on three publicly available benchmark data sets via sentence ordering and summary coherence rating tasks shows the effectiveness of both theme-rheme structure and coreference resolution in DCM. It also shows that our system significantly outperforms the state-of-the-art ones.

关键词

篇章衔接性 / 篇章一致性 / 主位—述位结构 / 指代消解

Key words

discourse cohesion / discourse coherence / theme-rheme structure / coreference resolution

引用本文

导出引用
徐 凡,朱巧明,周国栋,王明文. 衔接性驱动的篇章一致性建模研究. 中文信息学报. 2014, 28(3): 11-21
XU Fan1, ZHU Qiaoming2, ZHOU Guodong2, WANG Mingwen1. Cohesion-driven Discourse Coherence Modeling. Journal of Chinese Information Processing. 2014, 28(3): 11-21

参考文献

[1] 黄国文. 语篇分析概要[M]. 长沙:湖南教育出版社,1987:1-221.
[2] Halliday M A K. An Introduction to Functional Grammar[M]. New York: Oxford University Press Inc., 2004:1-700.
[3] Fox H J. Phrasal cohesion and statistical machine translation[C]//Proceedings of the EMNLP. Philadelphia: Association for Computational Linguistics,2002:304-311.
[4] Soricut R, Marcu D. Discourse generation using utility-trained coherence models[C]//Proceedings of the ACL-COLING. Sydney:Association for Computational Linguistics,2006:803-810.
[5] Barzilay R, Lee L. Catching the drift: probabilistic content models, with applications to generation and summarization[C]//Proceedings of the NAACL. Boston: Association for Computational Linguistics,2004:113-120.
[6] Lin Z H, Liu C, Ng H W, et al. Combining coherence models and machine translation evaluation metrics for summarization evaluation[C]//Proceedings of the ACL. Jeju: Association for Computational Linguistics,2012:1006-1014.
[7] Bollegala D, Okazaki N, Ishizuka M. A bottom-up approach to sentence ordering for multi-document summarization[C]//Proceedings of the ACL-COLING. Sydney: Association for Computational Linguistics,2006:385-392.
[8] Yannakoudakis H, Briscoe T. Modeling coherence in ESOL learner texts[C]//Proceedings of the workshop of the innovative use of NLP for building educational applications. Canada,2012:33-43.
[9] Yannakoudakis H, Briscoe T, Medlock B. A new dataset and method for automatically grading ESOL texts[C]//Proceedings of the ACL:HLT. Portland: Association for Computational Linguistics,2011:180-189.
[10] Burstein J, Tetreault J, Andreyev S. Using entity-based features to model coherence in student essays[C]//Proceedings of the NAACL:HLT. Los Angeles: Association for Computational Linguistics,2010:681-684.
[11] Higgins D, Burstin J, Marcu D, et al. Evaluating multiple aspects of coherence in student essays[C]//Proceedings of the NAACL:HLT. Boston: Association for Computational Linguistics,2004:185-192.
[12] Louis A, Nenkova A. A coherence model based on syntactic patterns[C]//Proceedings of the EMNLP-CNLL. Jeju: Association for Computational Linguistics,2012:1157-1168.
[13] Feng V W, Hirst G. Extending the entity-based coherence model with multiple ranks[C]//Proceedings of the EACL. Avignon: Association for Computational Linguistics,2012:315-324.
[14] Lin Z H, Ng HT, Kan M Y. Automatically evaluating text coherence using discourse relations[C]//Proceedings of the ACL. Portland: Association for Computational Linguistics,2011:997-1006.
[15] Barzilay R, Lapata M. Modeling local coherence: an entity-based approach[J]. Computational Linguistics,2008,34(1):1-34.
[16] Barzilay R, Lapata M. Modeling local coherence: an entity-based approach[C]//Proceedings of the ACL. Ann Arbor: Association for Computational Linguistics, 2005:141-148.
[17] Lapata M, Barzilay R. Automatic evaluation of text coherence: models and representations[C]//Proceedings of the IJCAI. Edinburgh, 2005: 1085-1090.

[18] Iida R, Tokunaga T. A metric for evaluating discourse coherence based on coreference resolution[C]//Proceedings of the COLING. Mumbai, 2012:483-494.
[19] Elsner M, Charniak E. Coreference-inspired coherence modeling[C]//Proceedings of the ACL. Ohio: Association for Computational Linguistics, 2008:41-44.
[20] Foltz P W, Walter K, Thomas K L. The measurement of textual coherence with latent semantic analysis[J]. Discourse Processes,1998,25(2&3):285-307.
[21] Elsner M, Austerweil J, Charniak E. A unified local and global model for discourse coherence[C]//Proceedings of the NAACL. Rochester :Association for Computational Linguistics,2008:436-443.
[22] Grosz B J, Weinstein S, Joshi A K. Centering: a framework for modeling the local coherence of discourse[J]. Computational Linguistics, 1995,21(2):203-225.
[23] Landauer T K, Dumais S T. A solution to platos problem: the latent semantic analysis theory of acquisition, induction and representation of knowledge[J]. Psychological Review, 1997,104(2):211-240.
[24] 徐凡,朱巧明,周国栋. 篇章分析技术综述[J]. 中文信息学报, 2013, 27(3):20-32.
[25] 王建波,王开铸.自然语言篇章理解及基于理解的自动文摘研究[J].中文信息学报,1992,6(2):1-7.
[26] 王建波,杜春玲,王开铸.基于篇章理解的自动文摘研究[J].中文信息学报,1995,9(3):33-42.
[27] 吴华,黄泰翼.问答篇章生成系统中的用户模型和文本规划[J].中文信息学报,2001,15(4):28-34.
[28] 崔耀,陈永明.一个实验性的汉语篇章理解系统[J].中文信息学报,1994,8(3):24-34.
[29] 袁毓林.用逻辑和篇章知识来约束模板匹配——逻辑结构和篇章结构知识在信息抽取中的运用[J].中文信息学报,2004,18(4):39-45.
[30] 徐凡,朱巧明,周国栋. 基于树核的隐式篇章关系识别[J]. 软件学报,2013,24(5):1022-1035.
[31] 胡壮麟. 语篇的衔接与连贯[M]. 上海:上海外语教育出版社,1994:1-235.
[32] 程晓堂. 从主位结构看英语作文的衔接与连贯[J]. 山东师范大学外国语学院学报,2002,(2):94-98.
[33] Salton G, Wong A, Yang C S. A vector space model for automatic indexing[J]. Communications of the ACM,1975,18(11):613-620.

基金

国家自然科学基金(61272260; 61273320;61272212)、国家高技术研究发展计划(863)(2012AA011102)、江苏省自然科学基金(BK2011282)、江苏省高校自然科学基金重大研究项目(11KJA520003)
PDF(1102 KB)

631

Accesses

0

Citation

Detail

段落导航
相关文章

/