基于高斯混合模型的生物医学领域双语句子对

齐陈 相, 林鸿飞, 杨志豪

PDF(666 KB)
PDF(666 KB)
中文信息学报 ›› 2010, Vol. 24 ›› Issue (4) : 68-74.
综述

基于高斯混合模型的生物医学领域双语句子对

  • 齐陈 相, 林鸿飞, 杨志豪
作者信息 +

Sentence Alignment for Biomedicine Texts Based on Gaussian Mixture Model

  • CHEN Xiang, LIN Hongfei, YANG Zhihao
Author information +
History +

摘要

双语术语词典在生物医学跨语言检索系统中有着非常重要的地位,而双语句子对齐是构建双语词典的第一步工作。为了构想面向生物医学领域的双语词典,该文将分类思想和迁移学习方法引入汉英句子对齐任务中,将句子对齐任务看成一个多类分类任务,考虑生物医学领域双语摘要的锚信息,利用高斯混合模型完成分类目标。同时,在模型训练过程中,该文引入了迁移学习的思想,结合无噪音的《新概念英语》双语语料对模型的句子长度特征进行训练,使得模型在测试语料上句子对齐的正确率得到较大提高。

Abstract

A bilingual lexicon of biomedical terms plays an important role in biomedical cross-language information retrieval. Sentence alignment is the first step to build a bilingual lexicon. The Gaussian mixture model and transfer learning are applied to align sentences. The basic idea is to consider the sentence alignment as a classification task, which can be solved by the Gaussian mixture model classifiers based on the anchor information included in medical literature abstracts. At the same time, the sentence alignment model is built by combining biomedicine literature abstracts with New Concept English corpora, and it aims at applying transfer learning to train the length features and transfer them to the model. The experiments show it improves the performance of the sentence alignment model.
Key wordscomputer application; Chinese information processing; sentence alignment; gaussian mixture model; transfer learning; anchor information

关键词

计算机应用 / 中文信息处理 / 句子对齐 / 高斯混合模型 / 迁移学习 / 锚信息

Key words

computer application / Chinese information processing / sentence alignment / gaussian mixture model / transfer learning / anchor information
 
/   /   /
 
/   /   /
 
/   /  

引用本文

导出引用
齐陈 相, 林鸿飞, 杨志豪. 基于高斯混合模型的生物医学领域双语句子对. 中文信息学报. 2010, 24(4): 68-74
CHEN Xiang, LIN Hongfei, YANG Zhihao. Sentence Alignment for Biomedicine Texts Based on Gaussian Mixture Model. Journal of Chinese Information Processing. 2010, 24(4): 68-74

参考文献

[1] Gale W. F., Church K. W.. A program for alignment sentences in bilingual corpora[J]. Computational Linguistics, 1993,19(1):75-102.
[2] Brown P. F., Lai J. C., Mercer R. L.. Aligning sentences in parallel corpora[C]// Proceedings of the 29th Annual Meeting of the Association for Computational Linguistics,Berkeley,CA,USA,1991: 169-176.
[3] Thomas C., Kevin C. Aligning parallel bilingual corpora statistically with punctuation criteria[J]. Computational Linguistics and Chinese Language Processing, 2005,10(1):95-122.
[4] Wu D. Aligning a parallel English-Chinese corpus statistically with lexical criteria[C]// Proceedings of the 32th Annual Conference of the Association for Computational Linguistics. Las Cruces, NM,USA,1994: 80-87.
[5] 张艳, 柏冈秀纪. 基于长度的扩展方法的汉英句子对齐[J]. 中文信息学报, 2005, 19(5):31-36.
[6] Chen S. F.. Aligning sentences in bilingual corpora using lexical information[C]// Proceedings of the 31th Annual Conference of the Association for Computational Linguistics, Columbus,USA, 1993: 9-16.
[7] 吕学强, 吴宏林, 姚天顺.无双语词典的英汉词对齐[J].计算机学报, 2004,27(8):1036-1045.
[8] Mohamed Abdel Fattah, David B. Bracewell, Fuji Ren. el al. . Sentence alignment using P-NNT and GMM[J].Computer Speech and Language, 2007,21(4):594-608.
[9] J. Pan, J. Kwok, Q. Yang. Adaptive localization in a dynamic Wifi environment through mutil-view learning[C]// Proceedings of the 22nd conference on artificial intelligence (AAAI-07), Vancouve, Canada ,2007: 1108-1113.
[10] R. Raina, A Ng and D. Koller. Constructing informative priors using transfer learning[C]// Proceedings of the 23th International Conference on Machine Learning(ICML2006), Pittsburgh,USA,2006: 713-720.
[11] W. Dai, Q. Yang, G. R. Xue and Y. Yu. Boosting for transfer learning[C]// Proceedings of the 24th International Conference on Machine Learning, Corvallis, OR, USA,2007: 193-200.
[12] Hal DaumeIII, Daniel Marcu. Domain adaptation for statistical classifiers[J]. Journal of Artificial Intelligence Research, 2006, 26(1):101-126.
[13] Pengcheng Wu, Thomas G Dietterich. Improving SVM accuracy by training on auxiliary data sources[C]// Proceedings of the 21st International Conference of Machine Learning(ICML2004), Banff, Alberta, Canada, 2004.

基金

国家自然科学基金资助项目(60373095,60673039);国家863高科技计划资助项目(2006AA01Z151);教育部留学人员归国科研启动基金项目(教外司留[2007]118号);国家社科基金资助项目(08BTQ025)
PDF(666 KB)

552

Accesses

0

Citation

Detail

段落导航
相关文章

/