基于中心语块扩展的汉藏基本名词短语对的识别

诺明花,刘汇丹,马龙龙,吴 健,丁治明

PDF(1803 KB)
PDF(1803 KB)
中文信息学报 ›› 2013, Vol. 27 ›› Issue (4) : 63-70.
综述

基于中心语块扩展的汉藏基本名词短语对的识别

  • 诺明花,刘汇丹,马龙龙,吴 健,丁治明
作者信息 +

Chinese-Tibetan Base Noun Phrase Alignment Based on Head-Phrase Extension

  • NUO Minghua, LIU Huidan, MA Longlong, WU Jian, DING Zhiming
Author information +
History +

摘要

该文提出汉藏基本名词短语对齐框架。从汉语基本名词短语出发,找藏文正确译文过程中,参考英汉短语对齐的方法,针对藏语的特殊性,提出基于中心语块扩展的藏语基本名词短语识别方法。提出词典与自动词对齐结果相结合的方法和基于序列相交的方法抽取藏语中心语块,再以扩展可信度为依据扩展中心语块。实验结果表明,基于序列相交的方法所抽取的汉藏基本名词短语对能够节省人工校正的工作量,有效辅助于汉藏基本名词短语库的建设。

Abstract

This paper presents a Chinese-Tibetan base noun phrase alignment method. Its a two-phase procedureChinese base noun phrases identification and finding their Tibetan correspondences. We propose head-phrase extension based Tibetan base noun phrase identification method in accordance with the morphologic characteristics of Tibetan. In the first phase, we use sequence intersection operation to get Tibetan head-phrase. In the second phase, head-phrase extension confidence is defined and applied to determine the boundary of correspondence. Experimental result indicates that sequence intersection outperforms other methods in head-phrase extension. Chinese-Tibetan base noun phrase produced by our method is effective in reducing subsequent manual check, facilitating the construction of translation lexicon on phrase level.
Key wordsTibetan information processing;BaseNP;head-phrase extension

关键词

藏文信息处理 / 基本名词短语 / 中心语块扩展

Key words

Tibetan information processing / BaseNP / head-phrase extension

引用本文

导出引用
诺明花,刘汇丹,马龙龙,吴 健,丁治明. 基于中心语块扩展的汉藏基本名词短语对的识别. 中文信息学报. 2013, 27(4): 63-70
NUO Minghua, LIU Huidan, MA Longlong, WU Jian, DING Zhiming. Chinese-Tibetan Base Noun Phrase Alignment Based on Head-Phrase Extension. Journal of Chinese Information Processing. 2013, 27(4): 63-70

参考文献

[1] 孙宏林,俞士汶.浅层句法分析方法综述[J]. 当代语言学,2000, 2(2):74-83.
[2] 于新,吴健,洪锦玲. 基于词典的汉藏句子对齐研究与实现[J]. 中文信息学报,2011,25(4):57-62.
[3] Huidan Liu, Weina Zhao, Minghua Nuo, et al. Tibetan number identification based on classification of number components in Tibetan word segmentation[C]//Proceedings of the 23rd International Conference on Computational Linguistics (COLING 10): Posters, 2010: 719-724.
[4] 刘汇丹,诺明花,赵维纳,等. SegT: 一个实用的藏文分词系统[J]. 中文信息学报, 2012, 26(2):97-103.
[5] Steven P Abney. Principle-Based Parsing[M], Kluwer Academic Publishers. 1991.
[6] Ramshaw L A, Marcus M P. Text Chunking using Transformation-Based Learning[C]//Proceedings of Schiffrin A. Proceedings of ACL Workshop on Very Large Corpora. Boston, 1995: 82-94.
[7] Erik F Tjong Kim Sang, S Buchholz. Introduction to the CoNLL-2000 shared task: Chunking.[C]//Proceedings of CoNLL-2000, 2000: 127-132.
[8] Taku Kudo, Yuji Matsumoto. Chunking with support vector machine [DB/OL]. acl.ldc.upenn.edu/N/N01/N01-1025.pdf. 2000.
[9] Fei Sha, Fernando Pereira. Shallow Parsing with Conditional Random Fields. Eduard Hovy[C]//Proceedings of HLT-NAACL, Edmonton, Alberta, 2003: 134-141.
[10] Zhang Tong, Fred Damerau, David Johnson. Text chunking using regularized Winnow[C]//Proceedings of ACL 01, 2001: 539-546.
[11] Ando R K, Zhang Tong. A High-Performance Semi-Supervised Learning Method for Text Chunking[C]//Kevin Knight. Proceedings of the 43rd Annual Meeting of ACL. Ann Arbor, Michigan, 2005: 1-9.
[12] 赵军. 汉语基本名词短语识别及结构分析研究[D]. 清华大学博士研究生学位论文. 1998.
[13] 赵军,黄昌宁. 基于转换的汉语基本名词短语识别模型[J].中文信息学报,1999 ,13(2): 1-7.
[14] Heng Li, Jonathan J. Webster, Chunyu Kit, et al. Transductive HMM based Chinese text chunking[C]//Proceedings of IEEE NLP-KE 2003, 2003: 257-262, Beijing.
[15] 李素建, 刘群, 杨志峰. 基于最大熵模型的组块分析[J].计算机学报, 2003, 26(12):1722-1727.
[16] Yuqi Zhang, Qiang Zhou. Chinese base-phrases chunking[C]//Proceedings of the First SIGHAN Workshop on Chinese Language Processing, vol(18): 1-5, Taipei, Taiwan, 2002.
[17] Wenliang Chen, Yujie Zhang, Hitoshi Isahara. An Empirical Study of Chinese Chunking[C]//Proceedings of the 43rd Annual Meeting of ACL. Sydney, Australia, 2006: 97-104.
[18] 徐昉,宗成庆,王霞. 中文Base NP识别: 错误驱动的组合分类器方法[J]. 中文信息学报, 2007, 21(1): 115-119.
[19] 江荻. 现代藏语组块分词的方法和过程概述[J]. 民族语文,2003,(4).
[20] 江荻. 现代藏语的句法组块与形式标记, 语言计算与基于内容的文本处理[C]//全国第七届计算语言学联合学术会议论文集. 2003: 160-166.
[21] 黄行,孙宏开,江荻,等. 现代藏语名词组块的类型及形式标记特征[C]//全国第八届计算语言学联合学术会议. 2005:615-617.
[22] 黄行,江荻. 现代藏语判定动词句主宾语的自动识别方法[M]. 语言计算与基于内容的文本处理. 清华大学出版社. 2003: 167-172.
[23] 刘冬明,赵军,杨尔弘. 汉英双语语料库中名词短语的自动对应[J].中文信息学报,2003, 17(5):6-12.
[24] 屈刚,陈笑蓉,陆汝占. 基于有效句型的英汉双语短语对齐[J].计算机研究与发展,2003, 40(2):143-149.
[25] 张春祥,李 生,赵铁军. 基于中心语块扩展的短语对齐[J]. 计算机研究与发展,2006, 43(9):1658-1665.
[26] 王辰,宋国龙,吴宏林,等. 基于序列相交的短语译文获取[J].中文信息学报,2009, 23(1):38-43.

基金

国家重大科技专项资助项目(2010ZX01036-001-002,2010ZX01037-001-002),国家自然科学基金资助项目(61202219,61202220)
PDF(1803 KB)

581

Accesses

0

Citation

Detail

段落导航
相关文章

/