双语句子对齐在双语语料库的处理中有着非常重要的地位,是构建双语词典的第一步工作。该文利用基于带权二部图的最大权重匹配模型为生物医学文献双语摘要建模。在无双语词典的情况下,将基于长度的句子对齐方法和句子的位置信息相结合,充分利用医学文献双语摘要语料中的锚信息,将生物医学摘要段落和句子进行分类计算相似度,实现了生物医学文献双语摘要的句子对齐,取得了较好的实验结果。
Abstract
Sentence alignment is an essential step in bilingual corpus processing. Sentence alignment of bilingual biomedical abstract is the first step to construct a biomedical bilingual lexicon. This paper describes a sentences alignment method using maximum weight matching on bipartite graph. After combing the sentence length and sentence location information, the anchor information is employed to calculate the paragraph similarity and sentence similarity in biomedical bilingual abstract. The good experimental results prove the effectiveness of our method.
关键词
计算机应用 /
中文信息处理 /
句子对齐 /
二部图 /
双语语料 /
相似度
{{custom_keyword}} /
Key words
computer application /
Chinese information processing /
sentence alignment /
bipartite graph /
bilingual corpora /
similarity computation
{{custom_keyword}} /
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
参考文献
[1] Dolan W. B., J. Pinkhan and S. D. Richardson. The Microsoft Research Machine Translation[C]//AMTA 2002: 237-239.
[2] Jutras J. M.. The TrandCheck System.[C]//Appled Natural Language Processing 2002: 127-134.
[3] Chen A., Gey F. C.. Translation term weighting and combining translation resources in cross-language retrieval[C]//TREC 2001.
[4] Gey F. C., A. Chen, M. K. Buckland and R. R. Larson. Translingual vocabulary mappings for multilingual information access[C]//SIGIR 2002: 455-456.
[5] Morre R. C.. Fast and accurate sentence alignment of bilingual corpora[C]//AMTA 2002: 135-144.
[6] Chuang T., G. N. You and J. S. Chang. Adaptive bilingual sentence alignment[C]//Lecture Notes in Artificial Intelligence 2002: 21-30.
[7] Kueng T. L. and K. Y. Su. A robust cross-domain bilingual sentence alignment model[C]//Proceedings of the 19th International Conference on Computational Linguistics, 2002.
[8] Ker S. J and J. S. Chang. A Class-based Approach to Word Alignment[J]. Computational Lnguistics,1997, 23 (2): 313-344.
[9] Wu D. Bilingual Collocation Extraction Based on Linguistic and Statistical Analyses. Master thesis, National Tsing Hua University, Taiwan,2003.
[10] Gale W. F., Church K. W.. A Program for Alignment Sentences in Bilingual Corpora[J]. Computational Linguistics,1993, 19 (1): 75-102.
[11] Brown P. F., Lai J. C., Mercer R. L. el al.. Aligning Sentences in Parallel Corpora[C]// Proceedings of the 29th Annual Meeting of the Association for Computational Linguistics,Berkeley,CA,1991: 169-176.
[12] Thomas C., Kevin C. Aligning Parallel Bilingual Corpora Statistically with Punctuation Criteria[J]. Computational Linguistics and Chinese Language Processing,2005, 10 (1): 95-122.
[13] Wu D. Aligning a parallel English-Chinese Corpus Statistically with Lexical Criteria[C]// Proceeding s of the 32th Annual Conference of the Association for Computational Linguistics. Las Cruces, NM: ACL,1994: 80-87.
[14] Chen S. F.. Aligning Sentences in Bilingual Corpora Using Lexical Information[C]// Proceedings of ACL-93, Columbus,1993: 9-16.
[15] 吕学强,吴宏林,姚天顺.无双语词典的英汉词对齐[J].计算机学报, 2004,27(8): 1036-1045.
[16] Mohamed Abdel Fattah, David B. Bracewell, Fuji Ren. el al. . Sentence Alignment Using P-NNT and GMM[J]. Computer Speech and Language,2007: 549-608.
[17] 吕学强,李清隐,黄志丹,沈嫣娜,姚天顺. 基于统计的汉英句子对齐研究[J]. 小型微型计算机系统, 2004, 25 (6): 990-992.
[18] Xiaojie Wang, Fuji Ren. Chinese-Japanese clause alignment. LNCS3406, Springer-verlag,2005.
[19] Franz J. Och, Hermann Ney. A Systematic Comparison of Various Statistical Alignment Models[J]. Computational Linguistics,2003, 29(1): 19-51.
[20] Chuang T. C. and K. C. Yeh. Alignment Parallel Bilingual Corpora Statistically with Punctuation Criteria[J].Computational Linguistics and Chinese Language Processing, 2005: 95-122.
[21] 李维刚,刘挺,张宇,李生. 基于长度和位置信息的双语句子对齐方法[J]. 哈尔滨工业大学学报, 2006, 38 (5):689-692.
[22] 张艳,柏冈秀纪. 基于长度的扩展方法的汉英句子对齐[J]. 中文信息学报, 2005, 19(5): 31-36.
{{custom_fnGroup.title_cn}}
脚注
{{custom_fn.content}}
基金
国家自然科学基金资助项目(60373095,60673039);国家863高科技计划资助项目(2006AA01Z151);教育部留学人员归国科研启动基金资助项目
{{custom_fund}}