摘要化学物与蛋白质之间的相互作用关系抽取对精准医学和药物发现等方面的研究有着重要作用。该文提出了一种基于最短依存路径和注意力机制的双向LSTM模型,并将其应用于化学物蛋白质关系抽取。在特征上综合考虑了最短依存路径上的词性、位置和依存关系类型等。在BioCreative VI CHEMPROT任务上的实验表明,该方法在基于依存信息的系统中获得了较好的F1值性能。同时,集成学习也进一步提高了化学物蛋白质关系抽取性能。
Abstract:The extraction of interaction between chemical and protein plays an important role in the research of precision medicine and drug discovery. This paper proposes a Bi-LSTM model based on the shortest dependency path and attention mechanism, and applies it to chemical protein relation extraction. In terms of features, part-of-speech, position and dependency type on the shortest dependent path are considered. Experiments on the BioCreative VI CHEMPROT task show that the proposed method achieves better F1-value performance than systems based on dependency information. At the same time, the ensemble method further improves the performance of chemical protein relation extraction.
[1] Krallinger M, Rabal O, Akhondi S A. Overview of the BioCreative VI chemical-protein interaction track[C]//Proceedings of the 6th BioCreative Challenge Evaluation Workshop. Washington DC, 2017(1):141-146. [2] Chowdhury F M, Lavelli A, Moschitti A. A study on dependency tree kernels for automatic extraction of protein-protein interaction[C]//Proceedings of Bionlp Workshop. Portland, Oregon, USA: Association for Computational Linguistics, 2011: 124-133. [3] Lung P Y, Zhao T, He Z. Extracting chemical protein interactions from literature[C]//Proceedings of the BioCreative VI Workshop. Maryland, USA, 2017:160-163. [4] Tripodi I, Boguslav M, Hailu N. Knowledge based enriched relation extraction[C]//Proceedings of the BioCreative VI Workshop. Maryland, USA, 2017:164-167. [5] Warikoo N, Chang Y C, Hsu W L. LPTK: a linguistic pattern-aware dependency tree kernel approach for the BioCreative VI CHEMPROT task [J]. Database(Oxford), 2018,1-21. [6] Zeng D J, Liu K, Lai S W, et al. Relation classification via convolutional deep neural network[C]//Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers. Dublin, Ireland: COLING, 2014: 2335-2344. [7] Katiyar A, Cardie C. Investigating LSTMs for joint extraction of opinion entities and relations[C]//Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, 2016:919-929. [8] Luong T, Pham H, Manning C D. Effective approaches to attention-based neural machine translation[C]//Proceedings of Empirical Methods in Natural Language Processing, 2015: 1412-1421. [9] Xiang Y, Chen Q, Wang X, et al. Answer selection in community question answering via attentive neural networks[J]. IEEE Signal Processing Letters, 2017(24):505-509. [10] Yang Z, Yang D, Dyer C, et al. Hierarchical attention networks for document classification[C]//Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. San Diego, California: Association for Computational Linguistics, 2016:1480-1489. [11] Peng Y F, Rios A, Kavuluru R, Lu Z. Extracting chemical-protein relations with ensembles of SVM and deep learning models[J]. Database(Oxford), 2018, 1-9. [12] Yuksel A, Ozturk H, Ozkirimli E. CNN based chemical-protein interactions classification[C]//Proceedings of 2017 BioCreative VI Workshop. Maryland, USA, 2017: 185-187. [13] Liu S J, Shen F, Elayavilli R K, Wang Y S, Mojarad M R, Chaudhary V, Liu H F. Extracting chemical-protein relations using attention-based neural networks[J]. Database(Oxford), 2018, 1-12. [14] Mehryary F, Bjrne J, Salakoski T. Potent pairing: en-semble of long short-term memory networks and support vector machine for chemical-protein relation extraction[J]. Database(Oxford), 2018, 1-23. [15] Corbett P, Boyle J. Improving the learning of chemi-cal-protein interactions from literature using transfer learning and specialized word embeddings[J]. Database(Oxford), 2018, 1-10. [16] Sergio M. Extracting chemical-protein interactions using long short term memory networks[C]//Proceedings of 2017 BioCreative VI Workshop.Maryland, USA, 2017: 152-155. [17] Lim S, Kang J. Chemical-gene relation extraction using recursive neural network[J]. Database(Oxford), 2018, 1-11. [18] Tesnière L. Eléments de syntaxe structurale[M]. Paris: Klincksieck,1959. [19] Zhou P, Shi W, Tian J, et al. Attention-based bidirectional long shortterm memory networks for relation classification[C]//Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics. Berlin, Germany: Association for Computational Linguistics, 2016(2): 207-212. [20] Bjrne J, Salakoski T. Biomedical event extraction using convolutional neural networks and dependency parsing[C]//Proceedings of the BioNLP 2018 workshop, 2018: 98-108. [21] De Marneffe M, Dozat T, Silveira N, et al. Universal Stanford dependencies: A crosslinguistic typology[C]//Proceedings of 9th International Conference on Language Resources and Evaluation (LREC).Reykjavik,Iceland: European Language Resources Association (ELRA), 2014:4585-4592. [22] Eugene Chamiak, Mark Johnson. Coarse to fine best parsing and MaxEnt discriminative reranking [C]//Proceedings of the 43rd Annual meeting on Association for Comutational Linguistics, USA, 2005.