非投影结构是指依存树上的词语节点与原句中的词语序列出现错位的现象,对于句法分析器的影响较大,在语言理论上也有较大研究价值。在世界多种语言的依存树或图库上,都发现了含有非投影结构的句子,并对比展开了相关研究。而汉语的非投影结构尚未得到重视,语料库构建过程中也因遵循了投影性原则而缺乏对非投影结构的标注。该文基于概念对齐版的中文AMR语料库,在10 149句语料上统计出带有非投影结构的句子比例为31.62%,其三种主要类型为模态词提升、话题化和成分分离,并提出了相应的自动分析方案,以提高中文AMR自动分析效果。
The non-projective structure refers to the phenomenon that the word nodes on the dependency tree are misplaced with different word sequence in the original sentence. It has not been discussed in Chinese, following only the projection principle in the construction of Chinese dependency corpus. In this paper, we construct a Chinese abstract meaning representation (AMR) corpus of 10 149 sentences, in which 31.62% sentences have non-projective structures. Then we distinguish the three main types of the non-projective structures, modal words, topicalization and the component separation. Finally, we provide the solutions for the structures in the AMR parsing.
抽象语义表示 /
概念对齐 /
非投影 /
语义分析 /
Key words
abstract meaning representation /
concept-to-word alignment /
non-projective /
semantic parsing /
Chinese information processing
