Abstract:Identification of translingual equivalence of named entities is substantial to multilingual natural language processing. Some approaches to named entity translation, such as bilingual dictionary lookup, word/sub-word translation or transliteration, have been explored in the past years. Another promising approach is to extract named entity translingual equivalence automatically from a parallel corpus, which usually requires the named entities to be annotated manually or automatically for both languages. In this paper, we propose a new approach to extract equivalence of named entities from a parallel corpus with only the source language annotation and the result of HMM alignment. The experiment is carried in a Chinese-English parallel copus, and we treat Chinese as the source language and English as the target language. The result shows that our new approach achieves high quality of named entity pairs with relatively high precision, even though sometimes the word alignment result is partially correct.
[1] D. Bikel, S. Milker, R. Schward, etc. A High-performance Learning Name-finder [C]//Proceedings of Applied Natural Language Processing, Washington DC: 1997. [2] Y. Al-Onaizan, and K. Knight. Translating Named Entity Using Bilingual and Monolingual Resources [C]//Proceedings of Association of Computational Linguistics, Philadelphia PA: 2002. [3] H. Meng, W. K. Lo, B. Chen, and K. Tang. Generating Phonetic Cognates to Handle Named Entities in English-Chinese Cross-Language Spoken Document Retrieval [C]//Proceedings of the Automatic Speech Recognition and Understanding Workshop, Trento, Italy: 2001. [4] B. Stalls, and K. Knight. Translating Names and Technical Terms in Arabic Text [C]//Proceedings of the COLING/ACL Workshop on Computational Approaches to Semitic Languages, Philadelphia, Pennsylvania: 1998. [5] Huang. F, Vogel. S, and Waibel. A. Automatic Extraction of Named Entity Translingual Equivalence Based on Multi-Feature Cost Minimization [C]//Proceedings of Association of Computational Linguistics, Sapporo, Japan: 2003. [6] Peter F. Brown, Stephen A. Della Pietra, Vincent J. Della Pietra, and Robert L. Mercer. The mathematics of Statistical Machine Translation: Parameter Estimation [J]. Computational Linguistics, 1993, 19(2):263-311. [7] Stephan Vogel, Hermann Ney, and Christoph Tillmann. HMM-based Word Alignment in Statistical Translation [C]//The 16th International Conference on Computational Linguistics, Copenhagen, Denmark: 1996. [8] Zhou Jun-sheng, Dai Xin-yu, Ni Rui-yu, Chen Jia-jun. A Hybrid Approach to Chinese Word Segmentation around CRFs [C]//Proceedings of the 4th SIGHAN Workshop on Chinese Language Processing, Jeju Island, Korea: 2005. [9] Ashish Venugopal, Stephan Vogel, and Alex Waibel. Effective Phrase Translation Extraction from Alignment Models [C]//Proceedings of 41st Annual Meeting of ACL, Sopporo, Japan: July, 2003. [10] Bing Zhao, and Stephan Vogel. Word Alignment Based on Bilingual Bracketing [C]//HLT-NAACL 2003 Workshop: Building and Using Parallel Texts: Data Driven Machine Translation and Beyond, Edmonton, Alberta, Canada: May 2003. [11] Al’ Onaizan, Jan Curin, Michael Jahr, Kevin Knight, John Lafferty, Dan Melamed, Franz-Josef Och, David Prudy, Noah H. Smith and David Yarowsky. Statistical Machine Translation, Final Report [D]. Johns Hopkins University, JHU Summer Workshop,1999. [12] 刘冬明,赵军,杨尔弘. 汉英双语语料库中名词短语的自动对应[J]. 中文信息学报,2003,17(5): 6-12.