摘要
该文提出汉语的块依存语法,以谓词为核心,以组块为研究对象,在句内和句间寻找谓词所支配的组块,构建句群级别的句法分析框架。这一操作可提升叶子节点的语言单位,并针对汉语语义特点进行了分析方式和分析规则上的创新,能够较好地解决微观层次的逻辑结构知识,并为中观论元知识和宏观篇章知识打好基础。该文主要介绍了块依存语法理念、表示、分析方法及特点,并简要介绍了块依存树库的构建情况。截至2020年8月,树库规模为187万字符(4万复句、10万小句),其中包含67%新闻文本和32%百科文本。
Abstract
This paper proposes a Chinese chunk-based dependency grammar (CCDG), which is focused on the chunks governed by the predicates within and between sentences. As an effort in establishing a syntactic analysis framework at the level of sentence group, the CCDG propose a novel idea to enlarge the linguistic granularity of leaf nodes. It can solve the logical structure knowledge at the micro level and pave a foundation for the meso argument knowledge and macro textual knowledge. This paper presents the concept, representation, analysis method and characteristics of CCDG, as well as the development of corresponding tree-bank. By August, 2020, the treebank was scaled up to 1.87 million tokens (including 40,000 complex sentences and 100,000 sub-sentences), consisting of 67% news texts and 32% encyclopedia texts.
关键词
组块 /
依存 /
依存语法 /
谓词
{{custom_keyword}} /
Key words
chunk /
dependency /
dependency grammar /
predicate
{{custom_keyword}} /
钱青青,王诚文,王贵荣,饶高琦,荀恩东.
基于组块分析的汉语块依存语法. 中文信息学报. 2022, 36(8): 20-28
QIAN Qingqing, WANG Chengwen, WANG Guirong, RAO Gaoqi, XUN Endong.
Chinese Chunk-Based Dependency Grammar. Journal of Chinese Information Processing. 2022, 36(8): 20-28
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
参考文献
[1] 袁毓林.流水句中否定的辖域及其警示标志[J].世界汉语教学,2000(03): 22-33.
[2] 陈平.汉语零形回指的话语分析[J].中国语文, 1987,(5): 363-378.
[3] 徐赳赳. 现代汉语篇章回指研究[M].北京: 中国社会科学出版社,1992.
[4] 宋柔,葛诗利,尚英,等.面向文本信息处理的汉语句子和小句[J].中文信息学报,2017,31(02): 18-24.
[5] 宋柔.汉语篇章广义话题结构的流水模型[J].中国语文,2013(06): 483-494.
[6] Abney S, Parsing by chunks[M]. Principlebased parsing. Kluwer Academic Publishers, 1991: 257-278.
[7] Robinson J J. Dependency structures and transformation rules[J]. Language, 1970,46(2): 13-17.
[8] 闻媛,宋丽,吴泰中,等.基于中文AMR语料库的非投影结构研究[J].中文信息学报,2018,32(12): 31-40.
[9] 袁毓林.信息抽取的语义知识资源研究[J].中文信息学报,2002(05): 8-14.
{{custom_fnGroup.title_cn}}
脚注
{{custom_fn.content}}
基金
国家自然科学基金(62076038)
{{custom_fund}}