该文依据关系判断任务特点将主动学习应用到本体概念关系的辅助判断中,对边缘采样、熵采样、最不确信采样等主动学习查询生成策略进行了比较研究。在此基础上,从实际应用角度出发,讨论了在三种不同样本初始情况下主动学习技术的应用。对于初始样本正反例充足的情况,采用基于熵采样和边缘采样产生查询;对于初始样本仅有正例的情况,依据样本相似度主动的学习策略生成候选反例;对于缺乏初始样本的情况,使用概念在样本间距离等统计信息,同时生成候选正例和候选反例。从而,实现了在概念关系判定过程中对用户反馈信息的有效利用。
Abstract
According to the characteristics of relation judgment task, this paper applied active learning to the ontology conceptual relation judgment, making a comparative study for active learning query generation strategy, including margin sampling, entropy sampling, least confident sampling etc. From a practical point of view, we discussed the application of active learning techniques in three different samples of the initial case. For the initial sample of positive and negative sufficient condition, we used margin sampling and the entropy sampling to generate queries; for the initial sample only the positive cases, we generated candidate negative-sample according to the similarity active learning strategies; for lack of the initial sample, we used the concept of distance between the model and other statistical information to generated a candidate for positive-sample and the candidate negative-sample. Thus, we achieved the effective use of user feedback in the decision process of the conceptual relationship.
Key wordsontology; concept relation; assistant judgment; active learning
关键词
本体 /
概念关系 /
辅助判断 /
主动学习
{{custom_keyword}} /
Key words
ontology /
concept relation /
assistant judgment /
active learning
{{custom_keyword}} /
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
参考文献
[1] Thomas R Gruber. A translation approach to portable ontologies[J]. Knowledge Acquisition, 1993, 5(2):199-220.
[2] 何琳,侯汉清.基于统计自然语言处理技术的领域本体半自动构建研究[J].情报学报, 2009,28(2):201-207.
[3] 杜小勇, 李曼, 王珊. 本体学习研究综述[J]. 软件学报, 2006,17(9):1837-1847.
[4] David Sanchez, Antonio Moreno. Pattern-based Automatic Taxonomy Learning from the Web [J]. AI Communications. 2008, 21(1): 27-48.
[5] Tao Jiang, Ah-Hwee Tan, Ke Wang. Mining Generalized Associations of Semantic Relations from Textual Web Content [J]. IEEE Transactions on Knowledge and Data Engineering, 2007, 19(2): 164-179.
[6] 徐力斌,刘宗田,周文,等.基于WordNet和自然语言处理技术的半自动领域本体构建[J].计算机科学,2007,34(6):219-222.
[7] Kavalec M, Svate K V. A study on automated relation labeling in ontology learning[J]. Buitelaar P, Cimiano P, Magnini B, eds. Ontology Learning from Text: Methods, Evaluation and Applications. Amsterdam: IOS Press,2005.
[8] Faure D, Nedellec C. A corpus-based conceptual clustering method for verb frames and ontology acquisition[C]//Velardi P, ed. Proc. of the LREC Workshop on Adapting Lexical and Corpus Resources to Sublanguages and Applications Granada: LREC, 1998: 5-12.
[9] 张晓莹,张桂平,王裴岩.领域本体构建中关系辅助判断技术研究[C]//中国计算语言学研究前沿进展(2009-2011). 中国:中文信息学会,2011:276-282.
[10] Burr Settles. Active Learning Literature Survey[R].Computer Sciences Technical Report, University of Wisconsin-Madison, 2009.
[11] B Settles, M Craven. An analysis of active learning strategies for sequence labeling tasks[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), USA: ACL Press, 2008: 1070-1079.
[12] 车万翔,张梅山,刘挺. 基于主动学习的中文依存句法分析[J].中文信息学报,2012,26(2):18-22.
[13] 陈荣,曹永锋,孙洪. 基于主动学习和半监督学习的多类图像分类[J].自动化学报,2011,37(8):954-962.
[14] Guiping ZHANG, Xiaoying ZHANG, Peiyan WANG, et al. Study on Assistant Concept Acquisition in Domain Ontology Construction for Chinese Texts[C]//Proceedings of 7th International Conference on Natural Language Processing and Knowledge Engineering. Japan:2011:177-182.
[15] A Culotta, A McCallum. Reducing labeling effort for stuctured prediction tasks[C]//Proceedings of the National Conference on Artificial Intelligence (AAAI), USA: AAAI Press,2005: 746-751.
[16] T Scheffer, C Decomain, S Wrobel. Active hidden Markov models for information extraction[C]//Proceedings of the International Conference on Advances in Intelligent Data Analysis (CAIDA).Springer-Verlag, 2001: 309-318.
[17] R Hwa. Sample selection for statistical parsing[J]. Computational Linguistics, 2004,30(3): 253-276.
[18] Katrenko S, Adriaans P. Learning Relations from Biomedical Corpora Using Dependency Tree Levels[C]//Proceedings of the BENELEARN conference. Springer-Verlag,2007: 61-80.
{{custom_fnGroup.title_cn}}
脚注
{{custom_fn.content}}
基金
国家自然科学基金资助项目(61073123);辽宁省教育厅创新团队资助项目(LT2010084)
{{custom_fund}}