Abstract:An algorithm for the automatic extraction of a bilingual term lexicon from English-Chinese parallel corpora is proposed in this paper. Parallel corpora are firstly aligned by improved statistical method ,which is based on character length ,and tagged with their part-of-speech categories respectively. The term candidate set is produced by statistical the nouns and noun phrases of both corpora. Then the translation probability between every English candidate term and its Chinese translation term are calculated. Finally , the Chinese translation of English term is selected by threshold value ,which varies with word frequency. A better performance is obtained in the experiments of term extraction on real corpora.
[1] Brown P F ,Lai J C ,Mercer R L. Aligning Sentences in Parallel Corpora. In :Proceedings of the 29th Annual Meeting of the Association for Computational Linguistics (ACL’91) ,1991 ,169 - 176 [2] Chen S F. Aligning Sentences in Bilingual Corpora Using Lexical Information. In :Proceedings of the 31th Annual Meeting of the Association for Computational Linguistics (ACL’93) ,1993 ,9 - 16 [3] Dagan I ,Church K W. Jermight : Identifying and Translating Technical Terminology. In : Procedings of EACL ,1994 [4] Frank Smadja. Retrieving Collocations from Text :XTRACT. Computational Linguistics ,1993 [5] Fung P ,Church K W. K- vec :A New Approach for Aligning Parallel Texts. In :Proceedings of the 15th International Conference on Computational Linguistics (COLING’94) ,Tokyo ,Japan ,1994 ,1096 - 1102 [6] Gale W A ,Church K W. A Program for Aligning Sentences in Bilingual Corpora. In : Proceedings of the 29th Annual Meeting of the Association for Computational Linguistics (ACL’91) ,1991 ,177 - 184 [7] Melamed I D. Automatic Detection of Omissions in Translations. In :Proceedings of the 16th International Conference on Computational Linguistics ,Copenhagen ,Denmark ,1996 [8] Chang J S ,Chen M H. An alignment method for noisy parallel corpora based on image processing techniques. In : Proceedings of the 35th Meeting of the Association for Computational Linguistics ,Madrid , 1997 ,297 - 304 [9] Kay M ,Roscheisen M. Text - Translation Alignment . Computational Linguistics ,1993 ,19 (1) :121 - 142 [10] Ph. Langlais M ,Simard J ,Veronis S et al . Arcade :A cooperative research project on parallel text alignment evaluation. In : First International Conference on Language Resources and Evaluation , Granada , Spain ,1998 [11] Sun Le ,Du Lin ,Sun Yufang et al . Sentence Alignment of English - Chinese Complex Bilingual Corpora. In :Proceeding of the workshop MAL’99 ,1999 ,135 - 139 [12] Wu Daikai. Aligning a Parallel English-Chinese Corpus Statistically with Lexical Criteria. In :Proceedings of the 32nd Annual Meeting of the Association for Computational Linguistics (ACL’94) ,1994 ,80 - 87 [13] Wu Daikai ,Xia Xuanyin. Large - Scale Automatic Extraction of an English - Chinese translation Lexicon. Machine Translation ,1995 ,9 (3 - 4) :285 - 313 [14] 王斌. 汉语语料库自动对齐研究[博士学位论文] . 北京:中国科学院计算技术研究所,1999