从医疗社交网站的用户评论中挖掘药物副作用时,由于人们可能采用不同的表述方式来描述副作用,而新药的上市与用药者的差异性也会造成新的副作用出现,因此从评论中识别新的副作用名称并进行标准化十分重要。该文利用条件随机场模型识别评论中的副作用,对识别出的副作用名称进行标准化,最后得到药物的副作用。通过将挖掘出的药物已知的副作用与数据库记录进行对比验证了本文方法的有效性,同时得到一个按评论中的发生频率排序的药物潜在副作用列表。实验结果显示,条件随机场模型可以识别出已知的与新的副作用名称,而标准化技术将副作用名称进行聚合与归并,有利于药物副作用的发现。
Abstract
When mining adverse drug reactions (ADRs) from the user comments on healthcare social networks, it is very important to recognize novel ADR expressions from comments and normalize them, since people probably adopt different expressions to describe adverse reactions and new adverse reactions may emerge with the listing of new drugs as well as the diversity of drug users. This paper utilizes Conditional Random Field (CRF) model to recognize adverse reaction entities, and proposes a normalization method applied to the recognized entities. The effectiveness of this mining method is verified by comparing the mined results of known ADRs with database records, and a list of potential ADRs sorted by occurrence frequency in comments is obtained. Experimental results indicate that CRF model is capable of identifying both known and novel adverse reaction entities, and the standardization aggregates and merges the entities, which benefits the ADR discovery.
Key words adverse drug reaction; user comment; text mining; entity normalization
关键词
药物副作用 /
用户评论 /
文本挖掘 /
实体标准化
{{custom_keyword}} /
Key words
adverse drug reaction /
user comment /
text mining /
entity normalization
{{custom_keyword}} /
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
参考文献
[1] Yang C C, Jiang L, Yang H, et al. Detecting signals of adverse drug reactions from health consumer contributed content in social media[C]//Proceedings of ACM SIGKDD Workshop on Health Informatics. 2012.
[2] Leaman R, Wojtulewicz L, Sullivan R, et al. Towards internet-age pharmacovigilance: extracting adverse drug reactions from user posts to health-related social networks[C]//Proceedings of the 2010 workshop on biomedical natural language processing. Association for Computational Linguistics, 2010: 117-125.
[3] Chee B W, Berlin R, Schatz B. Predicting adverse drug events from personal health messages[C]//Proceedings of the AMIA Annual Symposium Proceedings. American Medical Informatics Association, 2011: 217.
[4] Nikfarjam A, Gonzalez G H. Pattern mining for extraction of mentions of adverse drug reactions from user comments[C]//Proceedings of the AMIA Annual Symposium Proceedings. American Medical Informatics Association, 2011: 1019.
[5] Zeng Q T, Tse T. Exploring and developing consumer health vocabularies[J]. Journal of the American Medical Informatics Association. 2006, 13(1): 24-29.
[6] Wu H, Fang H, Stanhope S J. Exploiting online discussions to discover unrecognized drug side effects[J]. Nervenheilkunde. 2007, 26(11): 969-980.
[7] Online Support Groups and Forums at DailyStrength. Available[DB]. www.dailystrength.org. Accessed March 28, 2014.
[8] Kuhn M, Campillos M, Letunic I, et al. A side effect resource to capture phenotypic effects of drugs[J]. Molecular systems biology. 2010, 6(1):343-348.
[9] 唐旭日,陈小荷,许超,等. 基于篇章的中文地名识别研究[J]. 中文信息学报,2010,24(02): 24-32.
[10] Settles B. Biomedical named entity recognition using conditional random fields and rich feature sets[C]//Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and its Applications. Association for Computational Linguistics, 2004: 104-107.
[11] 刘凯,周雪忠,于剑,等. 基于条件随机场的中医临床病历命名实体抽取[J]. 计算机工程,2014(9): 312-316.
[12] Toutanova K, Klein D, Manning C D, et al. Feature-rich part-of-speech tagging with a cyclic dependency network[C]//Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology-Volume 1. Association for Computational Linguistics, 2003: 173-180.
[13] Miller G A. WordNet: a lexical database for English[J]. Communications of the ACM. 1995, 38(11): 39-41.
{{custom_fnGroup.title_cn}}
脚注
{{custom_fn.content}}
基金
国家自然科学基金(661572102,61277370);辽宁省自然科学基金(201202031,201402003)
{{custom_fund}}