借助汉-越双语词对齐语料构建越南语依存树库

李发杰,余正涛,郭剑毅,李 英,周兰江

PDF(2531 KB)
PDF(2531 KB)
中文信息学报 ›› 2015, Vol. 29 ›› Issue (6) : 69-74.
综述

借助汉-越双语词对齐语料构建越南语依存树库

  • 李发杰1,2,余正涛1,2,郭剑毅1,2,李 英1,2,周兰江1,2
作者信息 +

Vietnamese Dependency Treebank Construction Via Chinese-Vietnamese Bilingual Corpus

  • LI Fajie1,2, YU Zhengtao1,2, GUO Jianyi1,2, LI Ying1,2, ZHOU Lanjiang1,2
Author information +
History +

摘要

由于对越南语的研究工作相对较少,因此还没有建立规模相对较大的依存树库。相对于已经拥有了形态丰富、语料成熟的汉语,越南语的依存句法分析要困难得多,所以该文提出了一种借助汉-越双语词对齐语料构建越南语依存树库的方法。首先对汉语-越南语句子对进行词对齐处理,然后对汉语句子进行依存句法分析。最后结合越南语本身的语言特点和有关的语法规则将汉语的依存关系通过汉-越双语词对齐关系映射到越南语句子中,从而生成越南语的依存树库。实验表明,该方法简化了人工收集和标注越南语依存树库的过程,节省了人力和构建树库的时间。实验结果表明,该方法相比采用机器学习的方法准确率明显提高。

Abstract

To leverage the rich and mature Chinese corpus for Vietnamese dependency treebank, this paper presents an approach to Vietnamese Dependency Treebank construction via Chinese-Vietnamese bilingual corpus with word alignments. Based on the word alignment information, the Chinese dependency parsing is mapped into Vietnamese Dependency structure. Experimental results show that this approach can simplify the process of manual collection and annotation of Vietnamese Treebank, also can save manpower and time building the Treebank. Experimental results show that the accuracy of this method compared to machine learning methods has improved significantly.
Key words vietnamese dependency treebank;chinese dependency parsing;word alignment
   
   
   

关键词

越南语依存树库 / 汉语依存句法分析 / 汉-越语言对齐关系

Key words

vietnamese dependency treebank / chinese dependency parsing / word alignment

引用本文

导出引用
李发杰,余正涛,郭剑毅,李 英,周兰江. 借助汉-越双语词对齐语料构建越南语依存树库. 中文信息学报. 2015, 29(6): 69-74
LI Fajie,YU Zhengtao,GUO Jianyi,LI Ying,ZHOU Lanjiang. Vietnamese Dependency Treebank Construction Via Chinese-Vietnamese Bilingual Corpus. Journal of Chinese Information Processing. 2015, 29(6): 69-74

参考文献

[1] 马金山.基于统计方法的汉语依存句法分析研究[D],哈尔滨工业大学博士学位论文,2007.
[2] J Hajic. Building a Syntactically Annotated Corpus: The Prague Dependency Treebank[C]//Proceedings of the Issues of Valency and Meaning,1998:106-132.
[3] Tracy Holloway King, Richard Crouch, Stefan Riezler, et al. The PRAC700 dependency bank[C]//Proceedings of the EACL03: 4th International Workshop on Linguistically Interpreted Corpora(LINC-03). 2003:1-8.
[4] I Boguslavsky,S Grigorieva, N Grigoriev, et al. Dependency treebank for Russian: concept, tools, types of information[C]//Proceedings of the 18th International Conference on Computational Linguistically(COLING),2000:987-991.
[5] C Bosco and V Lombardo. Dependency and relational structure in treebank annotation.[C]//Proceedings of the Workshop on Recent Advances in Dependency Grammar,2004:1-8.
[6] P T Nguyen, L V Xuan, T M H Nguyen, et al. Building a large syntactically-annotated corpus of Vietnamese[C]//Proceeding of the 3rd Linguistic Annotation Workshop, ACL-IJCNLP, Singapore, 2009:182-185.
[7] SU Xiang,LI Yu-jian.Computational Performance Analysis of GIZA++[J].Computer Engineering & Science, 2010.ztyu@bit.edu.cn.
[8] 车万翔,张梅山,刘挺. 基于主动学习的中文依存句法分析[J]. 中文信息学报, 2012,5(6),18-22.
[9] Luong Nguyen Thi,Dalat Univ,Lamdong,et al. Building a Treebank for Vietnamese Dependency Parsing[C]//Proceedings of the IEEE RIVF International Conference on Computing and Communication Technologies-Research, Innovation, and Vision for the Future (RIVF), 2013,NOV 10-13.
[10] Joakim Nivre,Johan Hall, Jens Nilsson. MaltParser: A Data-Driven Parser-Generator for Dependency Parsing[C]//Proceedings of the 15th International Conference on Language Resources and Evaluation, 2006: 2216-2219 .
[11] R McDonald, K Lerman, F Pereira. Multilingual Dependency Analysis with a Two-Stage Discriminative Parser[C]//Proceedings of the 12th Conference on Computational Natural Language Learning,2006: 216-220.
[12] 陈鑫.基于主动学习的汉语依存树库构建[D],哈尔滨工业大学硕士学位论文,2011.

基金

国家自然科学基金(61262041,61472168);云南省自然科学基金(2013FA030)
PDF(2531 KB)

569

Accesses

0

Citation

Detail

段落导航
相关文章

/