非投影结构是指依存树上的词语节点与原句中的词语序列出现错位的现象,对于句法分析器的影响较大,在语言理论上也有较大研究价值。在世界多种语言的依存树或图库上,都发现了含有非投影结构的句子,并对比展开了相关研究。而汉语的非投影结构尚未得到重视,语料库构建过程中也因遵循了投影性原则而缺乏对非投影结构的标注。该文基于概念对齐版的中文AMR语料库,在10 149句语料上统计出带有非投影结构的句子比例为31.62%,其三种主要类型为模态词提升、话题化和成分分离,并提出了相应的自动分析方案,以提高中文AMR自动分析效果。
Abstract
The non-projective structure refers to the phenomenon that the word nodes on the dependency tree are misplaced with different word sequence in the original sentence. It has not been discussed in Chinese, following only the projection principle in the construction of Chinese dependency corpus. In this paper, we construct a Chinese abstract meaning representation (AMR) corpus of 10 149 sentences, in which 31.62% sentences have non-projective structures. Then we distinguish the three main types of the non-projective structures, modal words, topicalization and the component separation. Finally, we provide the solutions for the structures in the AMR parsing.
关键词
抽象语义表示 /
概念对齐 /
非投影 /
语义分析 /
中文信息处理
{{custom_keyword}} /
Key words
abstract meaning representation /
concept-to-word alignment /
non-projective /
semantic parsing /
Chinese information processing
{{custom_keyword}} /
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
参考文献
[1] Nivre J,et al.The CoNLL 2007 shared task on dependency parsing[C]//Proceedings of Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning,2007,117(1):53-55.
[2] Oepen S,et al.SemEval 2014 task 8: Broad-coverage semantic dependency parsing[C]//Proceedings of International Workshop on Semantic Evaluation,2015:63-72.
[3] Havelka J.Beyond projectivity: Multilingual evaluation of constraints and measures on non-projective structures[C]//Proceedings of 45th Annual Meeting of the Association of Computational Linguistics,2007:608-615.
[4] McDonald R,et al.Non-projective dependency parsing using spanning tree algorithms[C]//Proceedings of Conference on Human Language Technology and Empirical Methods in Natural Language Processing.2005:523-530.
[5] Banarescu L,et al.Abstract meaning representation for sembanking[C]//Proceedings of Linguistic Annotation Workshop and Interoperability with Discourse.2013:178-186.
[6] 李斌,等.融合概念对齐信息的中文AMR语料库的构建[J],中文信息学报,2017,31(6):93-102.
[7] Tesnière L.Eléments de Syntaxe Structurale[M].Librairie C.Klincksieck,1959.
[8] Ihm P,Lecerf Y.éléments Pour une Grammaire Générale des Langues Projectives[M].Bruxelles: Presses Académiques Européennes,1963.
[9] Hays D G.Dependency theory: A formalism and some observations[J].Language,1964,40(4):511-525.
[10] Marcus S.Sur la Notion de Projectivité[J].Mathematical Logic Quarterly,1965,11(2):181-192.
[11] Robinson J J.Dependency structures and transformational rules[J].Language,1970,46(2):36.
[12] Uhlírová L.On the non-projective constructions in czech[J].Prague Studies in Mathematical Linguistics,1972,(3): 171-181.
[13] ?tícha F.Krí?ení vět v ce?tině[J].Na?e Rec,1996(79):26-31.
[14] Oliva K.Některé aspekty komplexity ceského slovního neporádku[J].Ce?tina-univerzália a specifika,2001,(3):163-172.
[15] Petkevic V.Neprojektivní Konstrukce v Ce?tině z Hlediska Automatické Morfologické Disambiguace Ceskych Textu[J].Ce?tina-univerzália a Specifika.Brno: Masarykova univerzita,2001:197-205.
[16] Hajic J,et al.The Prague dependency treebank: A three-level annotation scenario[C]//Proceedings of the Treebanks: Building and using parsed corpora,amsterdam.Kluwer,2000:103-127.
[17] Hajicová E,et al.Issues of projectivity in the prague dependency treebank[J].Prague Bulletin of Mathematical Linguistics,2004,(81):5-22.
[18] Mannem P,Chaudhry H,Bharati A.Insights into non-projectivity in Hindi[C]//Proceedings of 4th International Joint Conference on Natural Language Processing,2009: 10-17.
[19] Ambati B R,Deoskar T,Steedman M.Hindi CCG Bank: A CCG treebank from the Hindi dependency treebank[J].Language Resources and Evaluation,2018,52(1):67-100.
[20] Zeman D,et al.HamleDT: Harmonized multi-language dependency treebank[J].Language Resources and Evaluation,2014,48(4): 601-637.
[21] 郑丽娟,邵艳秋,杨尔弘.中文非投射语义依存现象分析研究[J].中文信息学报,2014,28(6):41-47.
[22] Cai S,Knight K.Smatch: An evaluation metric for semantic feature structures[C]//Proceedings of Meeting of the Association for Computational Linguistics,2013:748-752.
[23] Xue N,et al.The Penn Chinese TreeBank: Phrase structure annotation of a large corpus[J].Natural Language Engineering,2005,11(2): 207-238.
[24] Carnie A.Syntax: A generative introduction[M].Wiley-Blackwell,2013.
[25] Lyu C,Titov I.AMR parsing as graph prediction with latent alignment[C]//Proceedings of 56th Annual Meeting of the Association for Computational Linguistics,2018: 397-407.
[26] Wang C,Li B,Xue N.Transition-Based Chinese AMR parsing[C]//Proceedings of Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies,2018,2: 247-252.
{{custom_fnGroup.title_cn}}
脚注
{{custom_fn.content}}
基金
国家社会科学基金(18BYY127)
{{custom_fund}}