汉语块依存语法与树库构建

钱青青,王诚文,荀恩东,王贵荣,饶高琦

PDF(6838 KB)
PDF(6838 KB)
中文信息学报 ›› 2022, Vol. 36 ›› Issue (7) : 50-58.
语言资源建设与应用

汉语块依存语法与树库构建

  • 钱青青1,王诚文1,2,荀恩东1,王贵荣1,饶高琦1,3
作者信息 +

Chinese Chunk-based Dependency Grammar and Its Treebank

  • QIAN Qingqing1, WANG Chengwen1,2, XUN Endong1, WANG Guirong1, RAO Gaoqi1,3
Author information +
History +

摘要

该文提出了以谓词为核心的块依存语法,以谓词为核心,以组块为研究对象,在句内和句间寻找谓词所支配的组块,利用汉语中组块和组块间的依存关系补全缺省部分,明确谓词支配关系。根据块依存文法体系,目前共标注2 199篇文本,涵盖百科、新闻两个领域,共约180万字语料。该文简述了块依存文法的原则,并对组块及其依存关系进行了定义。该文详细介绍了标注流程、标注一致率、数据分布等情况。基于现有的树库,该文发现汉语中有约25%的小句是非自足的,约有88%的核心谓词可支配1~3个从属成分。

Abstract

This paper presents a Chinese Chunk-Based Dependency Grammar(CCDG). With this grammar, predicate-dominated chunks can be found within and between sentences, and default parts of sentences can be completed by the relations between chunks. This paper describes the principles of CCDG and defines the chunks and relations. We have annotated 2 199 texts, altogether 1800,000 words from encyclopedia and news texts based on the CCDG. The annotation procedure, label consistency, data distribution, and so on are described in detail. Based on current treebank, it is found that about 25% of clauses in Chinese are not self-sufficient, and about 88% of core predicates govern 1-3 subordinate components.

关键词

组块 / 块依存语法 / 树库

Key words

chunk / Chinese chunk-based dependency grammar / treebank

引用本文

导出引用
钱青青,王诚文,荀恩东,王贵荣,饶高琦. 汉语块依存语法与树库构建. 中文信息学报. 2022, 36(7): 50-58
QIAN Qingqing, WANG Chengwen, XUN Endong, WANG Guirong, RAO Gaoqi. Chinese Chunk-based Dependency Grammar and Its Treebank. Journal of Chinese Information Processing. 2022, 36(7): 50-58

参考文献

[1] Abney S. Parsing by chunks[C]//Principle-based parsing, Kluwer Academic Publishers, 1991: 257-278.
[2] 刘芳,赵铁军,于浩, 等.基于统计的汉语组块分析[J].中文信息学报,2000,14(06): 28-32.
[3] 周强,孙茂松,黄昌宁.汉语句子的组块分析体系[J].计算机学报,1999,22(11): 1158-1165.
[4] 周强.汉语句法树库标注体系[J].中文信息学报,2004,18(4): 2-9.
[5] 周强.汉语基本块描述体系[J].中文信息学报,2007,21(03): 21-27.
[6] 陈亿,周强,宇航.分层次的汉语功能块描述库构建分析[J].中文信息学报,2008(03): 24-31.
[7] 李素建. 汉语组块计算的若干研究[D].中国科学院研究生院博士学位论文,2002.
[8] Liu T, Ma J, Li Sh. Building a dependency treebank for improving Chinese parser[J]. Journal of Chinese Language and Computing, 2006(16): 207-224.
[9] 邱立坤,史林林,王厚峰.多领域中文依存树库构建与影响统计句法分析因素之分析[J].中文信息学报,2015,29(5): 69-75.
[10] 郭丽娟,彭雪,李正华,等.面向多领域多来源文本的汉语依存句法树库构建[J].中文信息学报, 2019,33(2): 38-46.
[11] 郭丽娟,李正华,彭雪,等. 适应多领域多来源文本的汉语依存句法数据标注规范.中文信息学报, 2018,32(10): 32-39.
[12] Zhou M. A block-based robust dependency parser for unrestricted Chinese text[C]//Proceedings of the 2nd Chinese Language Processing Workshop Attached to ACL 2000, HongKong, China, 2000: 78-84.
[13] 闻媛,宋丽,吴泰中,等. 基于中文AMR语料库的非投影结构研究[J].中文信息学报,2018,32(12): 31-40.
[14] 宋柔.汉语篇章广义话题结构的流水模型[J].中国语文,2013,(06): 483-494.
[15] 宋柔,葛诗利,尚英,等.面向文本信息处理的汉语句子和小句[J].中文信息学报,2017,31(02): 18-24.
[16] 卢露,矫红岩,李梦,等.基于篇章的汉语句法结构树库构建[J/OL]. http://kns.cnki.net/kcms/detail/11.2109.TP.20200521.1558.007.html.[2020-08-18].

基金

国家语委项目(ZDI135-114)
PDF(6838 KB)

884

Accesses

0

Citation

Detail

段落导航
相关文章

/