药名识别的直接目的是从生物医学文本中寻找药名。目前,药物相关研究不断出现,远远超出了维护人员更新药物信息数据库的速度,这就迫切需要一种自动提取药物信息的技术。该文采用了一种基于特征耦合泛化(FCG)的半监督学习方法生成药名词典,然后将药名词典和条件随机场结合进行药名实体识别。首先我们用模板的方法构造了一个药名词典,然后用FCG方法对词典去噪,最后将去噪后的词典用在测试集上进行药名实体识别,得到了76.73%的F值。
Abstract
Drug name recognition aims to find drugs in biomedical texts, which is a demanding technology in face of overwhelming drug researches. We adopt a semi-supervised learning method to build a dictionary and then use the combination of the dictionary and the Condition Random Field method to recognize the drug name entities. Firstly, we extract a drug name dictionary using template matching method and then Feature Coupling Generalization (FCG) is used to filter the dictionary. Finally, we combine the dictionary and the Condition Random Field method to recognize the drug entities. As a result, our method achieved an F-score of 0.7673 on the drug name recognition corpus.
关键词
药名识别 /
机器学习 /
特征耦合泛化 /
CRF
{{custom_keyword}} /
Key words
drug name recognition /
machine learning /
Feature Coupling Generalization /
CRF
{{custom_keyword}} /
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
参考文献
[1] 王浩畅,赵铁军,于浩. 生物文本中蛋白质名称的识别[J]. 计算机应用,2007,24(1):100-102.
[2] 郑 强,刘齐军,王正华,等. 生物医学命名实体识别的研究与进展[J]. 计算机应用研究,2010,27(3):811-815.
[3] Tuason O, Chen L, Liu H, et al. Biological nomenclatures: A source of lexical knowledge and ambiguity [J]. Pac Symp Biocomput, 2004, 9: 238-249.
[4] Fukuda K, Tamura A, Tsunoda T, et al. Toward information extraction: Identifying protein names from biological papers [J]. Pac Symp Biocomput, 1998: 707-718
[5] Sang EFTK, Meulder F D. Introduction to the CoNLL-2003 shared task: language-independent named entity recognition[C]//Proceedings of the seventh conference on Natural language learning at HLT-NAACL. Edmonton, Canada, 2003:142-147.
[6] Kim S, Song Y, Kim K, et al. MMR-based Active Machine Learning for Bio Named Entity Recognition[C]//Proceedings of the Human Language Technology Conference of the NAACL, New York, 2006:69-72.
[7] Tsochantaridis I, Hofmann T, Joachims T, et al. Support vector machine learning for interdependent and structured output spaces[C]//Proceedings of the twenty-first international conference on Machine learning. Banff, Alberta, Canada, 2004:104.
[8] Zhou G D, Su J. Named entity recognition using an HMM-based chunk tagger[C]//Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, Philadelphia, Pennsylvania, 2002: 473-480.
[9] Lin Y F, Tsai T H, Chou W C, et al. A maximum entropy approach to biomedical named entity recognition[C]//Proceedings of the 4th ACM SIGKDD Workshop on Data Mining in Bioinformatics. Seattle, WA, 2004:56-61.
[10] Settles B. Biomedical named entity recognition using conditional random fields and rich feature sets[C]//Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and its Applications, Geneva, Switzerland, 2004:104-107.
[11] Florian R, Ittycheriah A, Jing H Y, et al. Named entity recognition through classifier combination[C]//Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003, Edmonton, Canada, 2003:168-171.
[12] Li Y P, Lin H F, Yang Z H. Incorporating rich background knowledge for gene named entity classification and recognition [J]. BMC Bioinformatics, 2009,10:223.
[13] Segura-Bedmar I, Martínez P, Segura-Bedmar M. Drug name recognition and classification in biomedical texts: A case study outlining approaches underpinning automated systems [J]. Drug Discovery Today, 2008, 13(17-18): 816-823.
[14] World Health Organization Programme on International Nonproprietary Names. (2006) The use of stems in the selection of International Nonproprietary Names (INN) forpharmaceutical substances. WHO Press, World Health Organization. [DB/OL]. http://www.who.int/medicines/services/inn/RevisedFinalStemBook2006.pdf.
[15] 徐博,林鸿飞,杨志豪. 基于模板抽取和丰富特征的药名词典生成[C]. 第五届全国信息检索学术会议论文集,2009.
{{custom_fnGroup.title_cn}}
脚注
{{custom_fn.content}}
基金
国家自然科学基金(61070098, 60973068, 61272373);高等学校博士学科点专项科研基金(20090041110002);中央高校基本科研业务费专项资金资助项目(DUT10JS09);辽宁省博士启动基金资助项目(20091015)
{{custom_fund}}