基于汉维映射关系构建维吾尔语依存树库

吐尔洪·吾司曼,杨雅婷,王磊,周喜,程力

PDF(3654 KB)
PDF(3654 KB)
中文信息学报 ›› 2019, Vol. 33 ›› Issue (1) : 103-110.
民族、跨境及周边语言信息处理

基于汉维映射关系构建维吾尔语依存树库

  • 吐尔洪·吾司曼1,2,3,杨雅婷1,2,3,王磊1,2,3,周喜1,2,3,程力1,2,3
作者信息 +

Uyghur Dependency Treebank Based on Chinese-Uyghur Mapping

  • TURGHUN Osman1,2,3, YANG Yating1,2,3, WANG Lei1,2,3, ZHOU Xi1,2,3, CHENG Li1,2,3
Author information +
History +

摘要

该文提出一种基于汉语依存句法信息来构建维维吾尔语依存句法树库的方法。首先对维吾尔语进行形态分析,之后进行汉维词对齐、中文依存分析,然后根据词对齐信息以及汉语依存信息得到维吾尔语依存信息,最终对结果进行优化,获得维吾尔语依存句法库。在此基础上训练得到的依存句法分析器在CoNLL 2017 Shared Task 测试集上进行实验,带标记依存正确率LAS(Labeled Attachment Score)和无标记依存正确率UAS(Unlabeled Attachment Score)分别为34.38%和52.53%。

Abstract

This paper presents a novel approach to construct Uyghur dependency treebank using Chinese-Uyghur bilingual corpus. The Chinese dependency information is mapped into Uyghur sentences through word alignment. And then the Uyghur dependency information is further optimize by morphological constraints. Experimental results in CoNLL 2017 Shared Task dataset show that the proposed Uyghur parsing model can achieve 34.38% and 52.53% improvements in terms of LAS and UAS accuracy, respectively.

关键词

维吾尔语 / 依存句法 / 映射

Key words

Uyghur / dependency grammar / mapping

引用本文

导出引用
吐尔洪·吾司曼,杨雅婷,王磊,周喜,程力. 基于汉维映射关系构建维吾尔语依存树库. 中文信息学报. 2019, 33(1): 103-110
TURGHUN Osman, YANG Yating, WANG Lei, ZHOU Xi, CHENG Li. Uyghur Dependency Treebank Based on Chinese-Uyghur Mapping. Journal of Chinese Information Processing. 2019, 33(1): 103-110

参考文献

[1] 宗成庆.统计自然语言处理[M].北京:清华大学出版社,2013:179-179.
[2] Kenji Yamada,Kevin Knight. A syntax-based statistical translation Model[C]//Proceedings of 39th Annual Meeting of the Association for Computational Linguistics,2001:523-530.
[3] Dong L,et al. A statistical parsing framework for sentiment classification[J].Computational Linguistics,2015,41(2):293-336.
[4] Berant J,et al. Semantic parsing on freebase from question-answer pairs[C]//Proceedings of Empirical Methods in Natural Language Processing,2013:1533-1544.
[5] Kikuchi Y,et al. Single document summarization based on nested tree structure[C]//Proceedings of the Meeting of the Association for Computational Linguistics,2014:315-320.
[6] Nivre J. Dependency grammar and dependency parsing[R]. V?xj? University:School of Mathematics and Systems Engineering,2005.
[7] Li Mingqin,et al. Building a large Chinese corpus annotated with semantic dependency[C]//Proceedings of the Second SIGHAN Workshop on Chinese Language Processing,2003:84-91.
[8] Boguslavsky I M,et al. Dependency treebank for Russian:Concept,tools,types of information[C]//Proceedings of International Conference on Computational Linguistics,2000:987-991.
[9] Der Beek L V,et al. The Alpino dependency treebank[C]///Proceedings of the Computational Linguistics in the Netherlands,2002:8-22.
[10] Hajic J,Pajas P. The Prague dependency treebank:Annotation structure and support[C]//Proceedings of the IRCS Workshop on Linguistic Databases,2001:105-114.
[11] Abeillé A. Treebanks:Building and Using Parsed Corpora[M]. Dordrecht:Springer Science & Business Media, 2003:261-277.
[12] Mcdonald R T,et al. Universal dependency annotation for multilingual parsing[C]//Proceedings of Meeting of the Association for Computational Linguistics,2013:92-97.
[13] 玉素甫·艾白都拉.维语句法分析器中的词义排歧问题的研究[J].计算机应用与软件,2002(04):59-62.
[14] 吐尔根·依布拉音,袁保社.新疆少数民族语言文字信息处理研究与应用[J].中文信息学报,2011,25(06):149-156.
[15] 玉素甫·艾白都拉,阿不都热依木·沙力,阿拉帕提古丽.信息处理用维语词汇标注标记集的确定[J].计算机应用,2009,29(07):2006-2008.
[16] 阿里甫·库尔班,吾买尔江·库尔班,房鼎益.维吾尔语框架语义角色标注标记集研究[J].中文信息学报,2013,27(02):41-46.
[17] Samat M,Turgun I,Marhaba E.The annotation scheme for Uyghur dependency treebank[C]//Proceedings of the Asian Language Processing (IALP),2013.
[18] Aili M,et al. Building Uyghur dependency treebank:Design principles,Annotation schema and tools[C]//Proceedings of the Worldwide Language Service Infrastructure. WLSI 2015. Lecture Notes in Computer Science,2016(9442).
[19] Mairehaba A,Weinila M,Tuergen Y. Universal dependencies for Uyghur[C]//Proceedings of WLSI/OIAF4HLT,2016:44-50.
[20] Zeljko Agic,Danijela Merkler,Dasa Berovi. Slovene-Croatian treebank transfer using bilingual lexicon improves croatian dependency parsing[C]//Proceedings of IS-LTC,2012:5-9.
[21] Li Y,et al. Building vietnamese dependency treebank based on Chinese-Vietnamese bilingual word alignment[C]//Proceedings of International Conference on Natural Computation,2016:1330-1335.
[22] Tursun E,et al.A semisupervised tag-transition-based markovian model for Uyghur morphology analysis[J]. Acm Transactions on Asian and Low Resource Language Information Processing,2016,16(2):8:1-8:23.
[23] Chen D,Manning C D. A fast and accurate dependency parser using neural networks[C]//Proceedings of Empirical Methods in Natural Language Processing,2014:740-750.

基金

中国科学院“西部之光”基金(2017-XBQNZ-A-005,2017-XBZG-BR-001);国家千人计划项目(Y32H251201);国家自然科学基金(U1703133);新疆自治区重大科技专项(2016A03007-3)
PDF(3654 KB)

Accesses

Citation

Detail

段落导航
相关文章

/