Abstract:Named entity translation equivalents play a critical role in cross-language information processing. The traditional method is usually based on large-scale parallel or comparable corpus, which is limited by the size and quality of the corpus resources. In Japanese-Chinese translation, the bilingual corpora resources are relatively scarce: the Chinese Hanzi and Japanese Kanji mapping table is often adopted to deal with Chinese named entity and a SMT model to deal with the Japanese named entities in pure kana. In this paper, we propose a monolingual corpora based approach. Firstly, the conditional random field model is adopted to extract Japanese and Chinese names from monolingual corpus. Then the Japanese-Chinese transliteration rule base is developed by instance based inductive learning in a iterative process employing the feedback learning. Experimental results show that the proposed method is simple and efficient, leverging the severely dependency on bilingual resource by the classical methods.
[1] D Bikel, S Miller, R Schwartz, et al. A high-performance learning name-finder[C]//Proceedings of Applied Natural Language Processing,Washington DC:1997. [2] 李婷婷,赵铁军,张春越. 基于统计的日本人名的识别和翻译[J]. 智能计算机与应用, 2012, 2(1) :4-7. [3] 赵军. 命名实体识别、排歧和跨语言关联[J]. 中文信息学报,2009,23(2):3-17 [4] 邹波,赵军. 英汉人名音译方法研究[A]. 第四届全国学生计算语言学研讨会会议论文集[C],2008:24-30. [5] Jenq-Haur Wang, Jei-Wen Teng, Pu-Jen Cheng,et al. Translating unknown cross-lingual queries in digital libraries using a web-based approach[C]//Proceedings of the 4th ACM/IEEE-CS Joint Conference on Digital Libraries.ACM,2004:108-116 [6] Jiang L,Zhou M,Chien L F,et al.Named entity translation with web mining and Tansliteration[C]//Proceedings of the IJCAI.2007,7:1629-1634 [7] 蒋龙,周明,简立峰. 利用音译和网络挖掘翻译命名实体[J].中文信息学报,2007,21(1):23-28. [8] Huang F, Vogel S, Waibel A. Automatic Extraction of Named Entity Translingual Equivalence Based on Multi-Feature Cost Minimization[C]//Proceeding of Association of Computational Linguistics, Sapporo,Japan,2003. [9] 茹旷. 日汉双语命名实体对获取方法及其应用研究[D]. 北京交通大学,2014. [10] Ru K,Xu J,Zhang Y,et al.A Method to Construct Chinese-Japanese Named Entity Translation Equivalents Using Monolingual Corpora[A].Natural Language Processing and Chinese Computing. Springer Berlin Heidelberg,2013:164-175 [11] 荒木健治,高橋祐治,桃内佳雄,等.帰納的学習を用いたかな漢字変換[C]//電子情報通信学会論文誌,1996,J79-D-Ⅱ(3):391-402. [12] 罗晓莹. 日语假名罗马字标记法的历史及发展[J]. 郑州航空工业管理学院学报(社会科学版). 2014. [13] 孙镇,王惠临. 命名实体识别研究进展综述[J]. 现代图书情报技术,2010,(6):42-47. [14] John Lafferty, Andrew McCallum, Fernando C N Pereira, Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data, 2001. [15] 何功星. 日语中日汉人名的声调规则[J]. 科技信息,2011,(17). [16] http://www.statmt.org/moses/ [17] http://code.google.com/p/giza-pp/downloads/detail?name=giza-pp-v1.0.7.tar.gz [18] http://www.aies.cn/pinyin.htm [19] http://o-oo.net.cn/katakana-Roman.asp