Abstract:A bilingual lexicon of biomedical terms plays an important role in biomedical cross-language information retrieval. Sentence alignment is the first step to build a bilingual lexicon. The Gaussian mixture model and transfer learning are applied to align sentences. The basic idea is to consider the sentence alignment as a classification task, which can be solved by the Gaussian mixture model classifiers based on the anchor information included in medical literature abstracts. At the same time, the sentence alignment model is built by combining biomedicine literature abstracts with New Concept English corpora, and it aims at applying transfer learning to train the length features and transfer them to the model. The experiments show it improves the performance of the sentence alignment model. Key wordscomputer application; Chinese information processing; sentence alignment; gaussian mixture model; transfer learning; anchor information
[1] Gale W. F., Church K. W.. A program for alignment sentences in bilingual corpora[J]. Computational Linguistics, 1993,19(1):75-102. [2] Brown P. F., Lai J. C., Mercer R. L.. Aligning sentences in parallel corpora[C]// Proceedings of the 29th Annual Meeting of the Association for Computational Linguistics,Berkeley,CA,USA,1991: 169-176. [3] Thomas C., Kevin C. Aligning parallel bilingual corpora statistically with punctuation criteria[J]. Computational Linguistics and Chinese Language Processing, 2005,10(1):95-122. [4] Wu D. Aligning a parallel English-Chinese corpus statistically with lexical criteria[C]// Proceedings of the 32th Annual Conference of the Association for Computational Linguistics. Las Cruces, NM,USA,1994: 80-87. [5] 张艳, 柏冈秀纪. 基于长度的扩展方法的汉英句子对齐[J]. 中文信息学报, 2005, 19(5):31-36. [6] Chen S. F.. Aligning sentences in bilingual corpora using lexical information[C]// Proceedings of the 31th Annual Conference of the Association for Computational Linguistics, Columbus,USA, 1993: 9-16. [7] 吕学强, 吴宏林, 姚天顺.无双语词典的英汉词对齐[J].计算机学报, 2004,27(8):1036-1045. [8] Mohamed Abdel Fattah, David B. Bracewell, Fuji Ren. el al. . Sentence alignment using P-NNT and GMM[J].Computer Speech and Language, 2007,21(4):594-608. [9] J. Pan, J. Kwok, Q. Yang. Adaptive localization in a dynamic Wifi environment through mutil-view learning[C]// Proceedings of the 22nd conference on artificial intelligence (AAAI-07), Vancouve, Canada ,2007: 1108-1113. [10] R. Raina, A Ng and D. Koller. Constructing informative priors using transfer learning[C]// Proceedings of the 23th International Conference on Machine Learning(ICML2006), Pittsburgh,USA,2006: 713-720. [11] W. Dai, Q. Yang, G. R. Xue and Y. Yu. Boosting for transfer learning[C]// Proceedings of the 24th International Conference on Machine Learning, Corvallis, OR, USA,2007: 193-200. [12] Hal DaumeIII, Daniel Marcu. Domain adaptation for statistical classifiers[J]. Journal of Artificial Intelligence Research, 2006, 26(1):101-126. [13] Pengcheng Wu, Thomas G Dietterich. Improving SVM accuracy by training on auxiliary data sources[C]// Proceedings of the 21st International Conference of Machine Learning(ICML2004), Banff, Alberta, Canada, 2004.