QIAN Qingqing1, WANG Chengwen1,2, WANG Guirong1, RAO Gaoqi1,3, XUN Endong1
Author information+
1.School of Information Science, Beijing Language and Culture University, Beijing 100083, China; 2.MOE Key Loboratory of Computational Linguistics, Peking University, Beijing 100871, China; 3.Research Institute of International Chinese Language Education, Beijing Language and Culture University, Beijing 100083, China
This paper proposes a Chinese chunk-based dependency grammar (CCDG), which is focused on the chunks governed by the predicates within and between sentences. As an effort in establishing a syntactic analysis framework at the level of sentence group, the CCDG propose a novel idea to enlarge the linguistic granularity of leaf nodes. It can solve the logical structure knowledge at the micro level and pave a foundation for the meso argument knowledge and macro textual knowledge. This paper presents the concept, representation, analysis method and characteristics of CCDG, as well as the development of corresponding tree-bank. By August, 2020, the treebank was scaled up to 1.87 million tokens (including 40,000 complex sentences and 100,000 sub-sentences), consisting of 67% news texts and 32% encyclopedia texts.
QIAN Qingqing, WANG Chengwen, WANG Guirong, RAO Gaoqi, XUN Endong.
Chinese Chunk-Based Dependency Grammar. Journal of Chinese Information Processing. 2022, 36(8): 20-28