Automatic Term Extraction in TCM Acupuncture Domain
SUN Shuihua1,2,HUANG Degen1,NIU Ping1
1.School of Computer Science and Technology,Dalian University of Technology,Dalian,Liaoning 116024,China; 2.College of Information Science and Engineering,Fujian University of Technology,Fuzhou,Fujian 350118,China
Abstract:A term extraction algorithm model based on language rules in TCM acupuncture domain is established. Firstly,the seed set of TCM acupuncture domain term is iterated finitely to generate the component set. Secondly, by regarding the component set as the domain dictionary,the model applies maximum forward matching algorithm to segment the sentences and extracts term candidates. Finally,the term candidates are filtrated by rules. The F-measures for open test are 76.96% and 35.59%,with keywords and traditional Chinese medicine dictionary as the seed set,respectively.
[1] Bourigault D. Surface grammatical analysis for the extraction of terminological noun phrases[C]//Proceedings of the 14th conference on Computational linguistics-Volume 3. Association for Computational Linguistics,1992: 977-981. [2] Li D,Wang Q,Li Y,et al. A Domain-Specific Chinese Term Extraction Method Based on Prefix and Suffix[C]//Proceedings of the Computer Science & Service System (CSSS),2012 International Conference on IEEE,2012: 1356-1359. [3] 何婷婷,张勇. 基于质子串分解的中文术语自动抽取[J]. 计算机工程,2006,32(23): 188-190. [4] 梁颖红,张文静,周德富. 基于混合策略的高精度长术语自动抽取[J]. 中文信息学报,2009,23 (6): 26-30. [5] 游宏梁,张巍,沈钧毅,等. 一种基于加权投票的术语自动识别方法[J]. 中文信息学报,2011,25 (3): 9-16. [6] 李丽双,党延忠,张婧,等. 基于条件随机场的汽车领域术语抽取[J]. 大连理工大学学报,2013,53(2): 267-272. [7] 岑咏华,韩哲,季培培. 基于隐马尔科夫模型的中文术语识别研究[J]. 现代图书情报技术,2008,12: 54-58. [8] 刘豹,张桂平,蔡东风. 基于统计和规则相结合的科技术语自动抽取研究[J]. 计算机工程与应用,2009,44(23): 147-150. [9] Ji L,Sum M,Lu Q,et al. Chinese terminology extraction using window-based contextual information[M].Computational Linguistics and Intelligent Text Processing. Springer Berlin Heidelberg,2007: 62-74. [10] 周浪,张亮,冯冲,等. 基于词频分布变化统计的术语抽取方法[J]. 计算机科学,2009,36(5): 177-180. [11] Zhang C,Niu Z,Jiang P,et al. Domain-specific term extraction from free texts[C]//Proceedings of the Fuzzy Systems and Knowledge Discovery (FSKD),2012 9th International Conference on. IEEE,2012: 1290-1293. [12] Kim S,Yoon J. Experimental Study on a Two Phase Method for Biomedical Named Entity Recognition[J].IEICE Transactions on Information and Systems,2007,E90-D(7): 1103-1110. [13] Chan S K,Lam W,Yu X F. A cascaded approach to biomedical named entity recognition using a unified model[C]//Proceedings of the 7th IEEE International Conference on Data Mining,Omaha,Nebraska,USA,2007: 93-102. [14] Gu B,Popowich F,Dahl V. Recognizing biomedical named entities in Chinese research abstracts[M].Advances in Artificial Intelligence. Springer Berlin Heidelberg,2008: 114-125. [15] 蒋锦文,于鹏. 浅谈中医学术语的特点和研究方法[J]. 天津中医学院学报,2000,3: 023.