概率式关联可信中文知识图谱——“文脉”

李文浩,刘文长,孙茂松,矣晓沅

PDF(2680 KB)
PDF(2680 KB)
中文信息学报 ›› 2022, Vol. 36 ›› Issue (12) : 67-73.
知识表示与知识获取

概率式关联可信中文知识图谱——“文脉”

  • 李文浩1,2,3,刘文长2,3,4,孙茂松1,2,3,5,矣晓沅6
作者信息 +

Wenmai—A Probablistic-Like Association Reliable Chinese Knowledge Graph

  • LI Wenhao1,2,3, LIU Wenchang2,3,4, SUN Maosong1,2,3,5, YI Xiaoyuan6
Author information +
History +

摘要

国内现有的中文知识图谱往往以维基百科、百度百科等群体智能贡献的知识库作为资源抽取得到,但这些知识图谱利用的主要是百科的实体名片信息和分类体系信息。然而,这些百科中也有大量的内部链接信息,其中蕴含了大量知识。故而该文中利用维基百科的内部链接构造边,并统计目标实体在源实体定义文本中出现的频度,利用其对应的TF-IDF值作为边权,构造了一个概率式中文知识图谱。该文还提出了一种可信链接筛选算法,对偶发链接进行了去除,使知识图谱更加可信。基于上述方法,该文挖掘出了一个概率式关联可信中文知识图谱,命名为“文脉”,将其在GitHub上进行了开源,以期能对知识指导的自然语言处理以及其他下游任务有所襄助。

Abstract

The existing Chinese knowledge graphs are derived from Wikipedia and Baidu Baike by leveraging the information of the entity infobox and categorical system. Differently,This article proposes a Chinese knowledge graph with probabilistic links by treat the hyperlinks in these resources as entity relations, weighted by the TF-IDF value of the mention frequency of the target entity in the entry article of the source entity. A reliable link screening algorithm is further desgned to remove the occasional links to make the knowledge graph more reliable. Based on the above methods, this article has constructed a probabilistically probabilistic-like association reliable Chinese knowledge graph named "Wenmai", which is public available in GitHub as a support for knowledge-guided natural language processing.

关键词

维基百科 / 知识图谱构建 / 可信链接筛选

Key words

Wikipedia / knowledge graph construction / reliable link screening

引用本文

导出引用
李文浩,刘文长,孙茂松,矣晓沅. 概率式关联可信中文知识图谱——“文脉”. 中文信息学报. 2022, 36(12): 67-73
LI Wenhao, LIU Wenchang, SUN Maosong, YI Xiaoyuan. Wenmai—A Probablistic-Like Association Reliable Chinese Knowledge Graph. Journal of Chinese Information Processing. 2022, 36(12): 67-73

参考文献

[1] Niu X,Sun X,Wang H,et al. Zhishi. me,weaving chinese linking open data[C]//Proceedings of International Semantic Web Conference. Berlin,Heidelberg: Springer,2011: 205-220.
[2] Wang Z,Li J,Wang Z,et al. XLore: A Large-scale english-Chinese bilingual knowledge graph[C]//Proceedings of International Semantic Web Conference (Posters & Demos),2013,1035: 121-124.
[3] Xu B,Xu Y,Liang J,et al. CN-DBPedia: A never-ending Chinese knowledge extraction system[C]//International Conference on Industrial,Engineering and Other Applications of Applied Intelligent Systems. Cham: Springer,2017: 428-438.
[4] Zeng Y,Wang D,Zhang T,et al. CASIA-KB: A multi-source Chinese semantic knowledge base built from structured and unstructured web data[C]//Proceedings of Joint International Semantic Technology Conference. Cham: Springer,2013: 75-88.
[5] Jin H,Li C,Zhang J,et al. XLORE 2: Large-scale cross-lingual knowledge graph construction and application[J]. Data Intelligence,2019,1(1): 77-98.
[6] Xu B,Liang J,Xie C,et al. CN-DBPedia 2: An extraction and verification framework for enriching Chinese encyclopedia knowledge gase[J]. Data Intelligence,2019,1(3): 271-288.
[7] Lin Y,Liu Z,Sun M,et al. Learning entity and relation embeddings for knowledge graph completion[C]//Proceedings of 29th AAAI Conference on Artificial Intelligence,2015: 2181-2187.
[8] Lin Y,Liu Z,Luan H,et al. Modeling relation paths for representation learning of knowledge bases[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing,2015: 705-714.
[9] Han X,Cao S,Lv X,et al. Openke: An open toolkit for knowledge embedding[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing: System Demonstrations,2018: 139-144.

基金

国家社会科学基金(18ZDA238)
PDF(2680 KB)

1543

Accesses

0

Citation

Detail

段落导航
相关文章

/