基于转移的中文篇章结构解析研究

孙成,孔芳

PDF(2011 KB)
PDF(2011 KB)
中文信息学报 ›› 2018, Vol. 32 ›› Issue (12) : 48-56.
语言分析与计算

基于转移的中文篇章结构解析研究

  • 孙成,孔芳
作者信息 +

A Transition-based Framework for Chinese Discourse Structure Parsing

  • SUN Cheng, KONG Fang
Author information +
History +

摘要

篇章结构解析作为篇章分析的子任务,对于篇章理解和下游篇章应用至关重要。该文基于中文连接依存树篇章标注语料,利用转移系统和深度学习的方法,给出了一个完整的从平文本到树形结构的篇章结构自动解析框架。该文统计了中文篇章语料的基本特点,提出了针对树形篇章结构的评测方法,并采用不同的方法对篇章解析过程的篇章子结构进行分布式表示,对比了不同方法下篇章结构解析的性能。

Abstract

As a subtask of discourse analysis, generating a proper discourse structure is critical for discourse comprehension and downstream discourse applications. Based on Chinese discourse treebank annotated under connective-driven dependency tree schema, a complete Chinese discourse structure generating framework is proposed. A statistical result on Chinese discourse corpus is given along with an evaluation protocol to measure the performance of discourse parser. The effectiveness in encoding discourse substructure is also compared between different distributed representation approaches.

关键词

篇章分析 / 中文篇章结构 / 转移系统

Key words

discourse parsing / Chinese discourse structure / transition-based system

引用本文

导出引用
孙成,孔芳. 基于转移的中文篇章结构解析研究. 中文信息学报. 2018, 32(12): 48-56
SUN Cheng, KONG Fang. A Transition-based Framework for Chinese Discourse Structure Parsing. Journal of Chinese Information Processing. 2018, 32(12): 48-56

参考文献

[1] 徐凡,朱巧明,周国栋.篇章分析技术综述[J].中文信息学报,2013,27(3): 20-33.
[2] 周强.汉语句法树库标注体系[J].中文信息学报,2004,18(4): 2-9.
[3] 胡金柱,等.面向中文信息处理的复句关系词提取算法研究[J].计算机工程与科学,2009,31(10): 90-93.
[4] Zhou Y,Xue N.The Chinese Discourse TreeBank: A Chinese corpus annotated with discourse relations[J].Language Resources and Evaluation,2015,49(2): 397-431.
[5] Li Y,Kong F,Zhou G.Building Chinese discourse corpus with connective-driven dependency tree structure[C]//Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP),2014: 2105-2114.
[6] Mann W C,Thompson S A.Rhetorical structure theory: Description and construction of text structures[M].Natural Language Generation.Springer,Dordrecht,1987: 85-95.
[7] Miltsakaki E,et al.The Penn Discourse Treebank[C]//Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04).2004.
[8] 郉福义.汉语复句研究[M].北京: 商务印书馆,2001.
[9] 徐凡,朱巧明,周国栋.基于树核的隐式篇章关系识别[J].软件学报,2013,24(5): 1022-1035.
[10] 孙静,等.汉语隐式篇章关系识别[J].北京大学学报 (自然科学版),2014,50(1): 111-117.
[11] Huang H H,Chen H H.Chinese discourse relation recognition[C]//Proceedings of 5th International Joint Conference on Natural Language Processing,2011: 1442-1446.
[12] Bhatia P,Ji Y,Eisenstein J.Better document-level sentiment analysis fromrst discourse parsing[J].arXiv preprint arXiv:1509.01599,2015.
[13] Cheng S,Fang K,Guodong Z.Towards better Chinese zero pronoun resolution from discourse perspective[C]//Proceedings of National CCF Conference on Natural Language Processing and Chinese Computing.Springer,Cham,2017: 406-418.
[14] 李艳翠.汉语篇章结构表示体系及资源构建研究[D].苏州: 苏州大学博士学位论文,2015.
[15] Hernault H,Prendinger H,Ishizuka M.HILDA: A discourse parser using support vector machine classification[J].Dialogue and Discourse,2010,1(3):1-33.
[16] Feng V W,Hirst G.A linear-time bottom-up discourse parser with constraints and post-editing[C]//Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics,2014(1): 511-521.
[17] Ji Y,Eisenstein J.Representation learning for text-level discourse parsing[C]//Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics.2014(1): 13-24.
[18] Li J,Li R,Hovy E.Recursive deep models for discourse parsing[C]//Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP).2014: 2061-2069.
[19] Kong F,Zhou G.A CDT-styled end-to-end Chinese discourse parser[J].ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP),2017,16(4): 26.
[20] 吴永芃,等.中英文篇章依存树库构建与分析[J].中文信息学报,2018,32(1): 75-82.
[21] Bowman S R,et al.A fast unified model for parsing and sentence understanding[J].arXiv preprint arXiv:1603.06021,2016.
[22] Tai K S,Socher R,Manning C D.Improved semantic representations from tree-structured long short-term memory networks[J].arXiv preprint arXiv:1503.00075,2015.
[23] 李艳翠,等.基于逗号的汉语子句识别研究[J].北京大学学报 (自然科学版),2013,49(1): 7-14.

基金

国家自然科学基金(61472264,61751206);国家重点研发计划子课题(2017YFB1002101);国家自然科学基金(61502149)
PDF(2011 KB)

Accesses

Citation

Detail

段落导航
相关文章

/