知识图谱中实体相似度计算研究

李 阳;高大启

PDF(5261 KB)
PDF(5261 KB)
中文信息学报 ›› 2017, Vol. 31 ›› Issue (1) : 140-146.
信息抽取与文本挖掘

知识图谱中实体相似度计算研究

  • 李 阳,高大启
作者信息 +

Research on Entities Similarity Calculation in Knowledge Graph

  • LI Yang, GAO Daqi
Author information +
History +

摘要

实体相似度的计算有诸多应用,例如,电商平台的相似商品推荐,医疗疗效分析中的相似病人组等。在知识图谱的实体相似度计算中,给出了每个实体的属性值,并对部分实体进行相似度的标注,要求能得到其他实体之间的相似度。该文把该问题归结为监督学习问题,提出一种通用的实体相似度计算方法,通过清洗噪声数据,对数值、列表以及文本等不同数据类型进行预处理,使用SVM, Logistic回归等分类模型、Random Forest等集成学习模型以及排序学习模型进行建模,得到了较好的结果。

Abstract

Entities similarity is useful in many areas, such as recommendation system in E-commerce platforms, and patients grouping in healthcare, etc. In our task of calculating the entity similarity in a given knowledge graph, the attributes of every entity is provided, and a sample of entity pairs are provided with their similarity score. Therefore, we treat this task as a supervised learning problem, testing SVM, Logistic Regression, Random Forest, and Learning to rank models.

关键词

实体相似度 / 监督学习 / 分类模型 / 集成学习

Key words

entity similarity / supervised learning / classification model / ensemble learning

引用本文

导出引用
李 阳;高大启. 知识图谱中实体相似度计算研究. 中文信息学报. 2017, 31(1): 140-146
LI Yang; GAO Daqi. Research on Entities Similarity Calculation in Knowledge Graph. Journal of Chinese Information Processing. 2017, 31(1): 140-146

参考文献

[1] Y Chen, J Yang, D Xu, et al. Inference analysis and adaptive training for belief rule based systems[J]. Expert Systems with Applications, 2011,38(9): 12845-12860.[2] Ricci F,Shapira B. Recommender systems handbook[M]. Springer, 2011.
[3] Sun J, Wang F, Hu J, et al. Supervised patient similarity measure of heterogeneous patientrecords[J]. ACM SIGKDD Explorations Newsletter, 2012, 14(1): 16-24.
[4] 华秀丽,朱巧明,李培峰.语义分析与词频统计相结合的中文文本相似度量方法研究[J].计算机应用研究,2012,29(3): 833-836.
[5] Salton G,Mcgill M J. Introduction to modern information retrieval[M].New York: McGraw-Hill,1983.
[6] Sussna M. Word sense disambiguation for free-text indexing using a massive semantic network[C]//Proceedings of the 2nd International Conference on Information and Knowledge Management(CIKM93),Washington DC,US,1993: 67-74.
[7] Bouras C, Tsogkas V. A clustering technique for news articles using WordNet[J]. Knowledge-Based Systems,2012,36(6): 115-128.
[8] Abdalgader K, Skabar A. Unsupervised similarity-based word sense disambiguation using context vectors and sentential word importance[J]. ACM Trans. on Speech and Language Processing,2012,9(1): 1-21.
[9] Martinez S, Sanchez D,Valls A. Semantic adaptive microaggregation of categorical micro data[J]. Computer Security,2012,31(5): 653-672.
[10] Huang HB,Liu Z Z, Zhang W M, et al. Research on calculating semantic similarity based on HOM[J]. Systems Engineering and Electronics,2009,31(7): 1750-1754.
[11] 李荣, 杨冬, 刘磊. 基于本体的概念相似度计算方法研究[J]. 计算机研究与发展, 2011, 48(S3): 312-317.
[12] 刘杰. 一种基于自动特征权值的实体相似度计算方法[J]. 重庆科技学院学报: 自然科学版, 2014, 16(3): 157-160.
[13] 薛咏, 冯博琴, 武艳芳. ABox推理计算实体相似度[J]. 西安交通大学学报, 2015, 49(09): 70-76.
[14] Hang Li. Learning to Rank for Information Retrieval and Natural LanguageProcessing[M]. Morgan & Claypool, 2011.

基金

心血管疾病与肿瘤疾病中西医临床大数据处理分析与应用研究(2015AA020107)
PDF(5261 KB)

1293

Accesses

0

Citation

Detail

段落导航
相关文章

/