基于序列标注的引语识别初探

贾泓昊,罗智勇

PDF(2325 KB)
PDF(2325 KB)
中文信息学报 ›› 2019, Vol. 33 ›› Issue (2) : 1-7.
语言分析与计算

基于序列标注的引语识别初探

  • 贾泓昊,罗智勇
作者信息 +

A Study on Quotation Recognition Based on Sequence Labeling

  • JIA Honghao, LUO Zhiyong
Author information +
History +

摘要

句间引用关系自动识别是篇章分析中一项重要内容。句间引用关系影响着对句群篇章的分析,而目前自然语言处理中对引用这一句间关系的研究较少。句间引用关系主要体现在引语中的引用句上。引语由引导句和引用句组成,一般分为直接引语和间接引语,其中间接引语的识别难度最大。引导句和引用句相对位置不定、不同领域语料的引语与非引语比例极不均衡等进一步增加了引语自动识别的难度。该文主要尝试对引用这一句间关系进行初步探索,采用条件随机场(CRF)以及双向长短期记忆网络与条件随机场相结合(BLSTM-CRF)的方法对引语进行自动识别,并引入引导句中管领词特征进行实验对比。实验结果表明,CRF模型和BLSTM-CRF模型对引语的识别精确率分别达到85.49%和80.19%,F值分别达到78.75%和79.60%。

Abstract

The automatic recognition of inter-sentence quotation relationship is a valid issue in discourse analysis. The quotation relationship between sentences influences the analysis of sentence groups. At present, there are few studies on the relationship between quotations in natural language processing. This paper attempted to make a preliminary exploration of the relationship between quoted sentences and studied the identification of quotation with conditional random fields(CRF) and Bidirectional Long Short-Term Memory network Enhanced CRF (BLSTM-CRF). It introduces the governors in the leading sentence into the model. The experimental results show that CRF model performs better with 85.49% in precision, and BLSTM outperforms with 79.60% in F-value.

关键词

引语的识别 / 序列标注 / 条件随机场 / 双向长短期记忆网络

Key words

quotation recognition / sequence labeling / CRF / BLSTM

引用本文

导出引用
贾泓昊,罗智勇. 基于序列标注的引语识别初探. 中文信息学报. 2019, 33(2): 1-7
JIA Honghao, LUO Zhiyong. A Study on Quotation Recognition Based on Sequence Labeling. Journal of Chinese Information Processing. 2019, 33(2): 1-7

参考文献

[1] 徐赳赳. 叙述文中的直接引语分析[J], 语言教学与研究,1996(1):52-66.
[2] 宋柔. 小句复合体的理论研究和应用. [DB/OL]. http://2011.gdufs.edu.cn/info/1070/2085.htm, 2017-11-13.
[3] William Mann, Sandra Thompson. Rhetorical structure theory: Toward a functional theory of text organization[J]. Text, 1988, 8(3): 243-281.
[4] Carlson L, Marcu D, Okurowski M E. Building a discourse-tagged corpus in the framework of rhetorical structure theory[M]. Springer Netherlands, 2003:85-112.
[5] R Prasad, et al. The Penn discourse Treebank 2.0 [C]//Proceedings of LREC 2008,2008.
[6] A AlSaif, K Markert.The leeds arabic discourse Treebank:Annotating discourse connectives for arabic [C]//Proceedings of LREC,2010: 2046-2053.
[7] 张牧宇,等. 中文篇章级句间语义关系识别[J].中文信息学报, 2013, 27(6):51-57.
[8] A McCallum, D Freitag, F Pereira. Maximum entropy Markov models for information extraction and segmentation[C]//Proceedings of ICML,2000:591-598.
[9] J Lafferty, A McCallum, F Pereira. Conditional random fields: Probabilistic models for segmenting and labeling sequence data[C]//Proceedings of ICML,2001:282-289.
[10] Collobert R, et al. Natural language processing (almost) from scratch.[J]. Journal of Machine Learning Research, 2011, 12(1):2493-2537.
[11] Huang Z, Xu W, Yu K. Bidirectional LSTM-CRF models for sequence tagging[J]. arXiv preprint arXiv:1508.01991,2015.
[12] 李航.统计机器学习法[M].北京: 清华大学出版社,2012:191-209.
[13] Xuezhe Ma, Eduard Hovy. End-to-end sequence labeling via bi-directional LSTM-CNNsCRF[C]//Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, 2016: 1064-1074.
[14] Libin Shen, Giorgio Satta, Aravind Joshi.Guided learning for bidirectional sequence classification[C]//Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics,2007: 760-767.
[15] Sun X. Structure regularization for structured prediction: theories and experiments[J]. Advances in Neural Information Processing Systems, 2014,3: 2402-2410.
[16] Kaisheng Yao, et al. Spoken language understanding using long shortterm memory neural networks[C]//Proceedings of the IEEE SLT, 2014,12: 189-194.
[17] 李丽双,郭元凯.基于CNN-BLSTM-CRF模型的生物医学命名实体识别[J]. 中文信息学报, 2018, 32(1): 116-122.
[18] Yao L, et al. Biomedical named entity recognition based on deep neutral network[J].International Journal of Hybrid Information Technology, 2015, 8(8): 279-288.

基金

北京市哲学社会科学规划研究基地项目(13JDZHB005)
PDF(2325 KB)

655

Accesses

0

Citation

Detail

段落导航
相关文章

/