国内现有的中文知识图谱往往以维基百科、百度百科等群体智能贡献的知识库作为资源抽取得到,但这些知识图谱利用的主要是百科的实体名片信息和分类体系信息。然而,这些百科中也有大量的内部链接信息,其中蕴含了大量知识。故而该文中利用维基百科的内部链接构造边,并统计目标实体在源实体定义文本中出现的频度,利用其对应的TF-IDF值作为边权,构造了一个概率式中文知识图谱。该文还提出了一种可信链接筛选算法,对偶发链接进行了去除,使知识图谱更加可信。基于上述方法,该文挖掘出了一个概率式关联可信中文知识图谱,命名为“文脉”,将其在GitHub上进行了开源,以期能对知识指导的自然语言处理以及其他下游任务有所襄助。
Abstract
The existing Chinese knowledge graphs are derived from Wikipedia and Baidu Baike by leveraging the information of the entity infobox and categorical system. Differently,This article proposes a Chinese knowledge graph with probabilistic links by treat the hyperlinks in these resources as entity relations, weighted by the TF-IDF value of the mention frequency of the target entity in the entry article of the source entity. A reliable link screening algorithm is further desgned to remove the occasional links to make the knowledge graph more reliable. Based on the above methods, this article has constructed a probabilistically probabilistic-like association reliable Chinese knowledge graph named "Wenmai", which is public available in GitHub as a support for knowledge-guided natural language processing.
关键词
维基百科 /
知识图谱构建 /
可信链接筛选
{{custom_keyword}} /
Key words
Wikipedia /
knowledge graph construction /
reliable link screening
{{custom_keyword}} /
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
参考文献
[1] Niu X,Sun X,Wang H,et al. Zhishi. me,weaving chinese linking open data[C]//Proceedings of International Semantic Web Conference. Berlin,Heidelberg: Springer,2011: 205-220.
[2] Wang Z,Li J,Wang Z,et al. XLore: A Large-scale english-Chinese bilingual knowledge graph[C]//Proceedings of International Semantic Web Conference (Posters & Demos),2013,1035: 121-124.
[3] Xu B,Xu Y,Liang J,et al. CN-DBPedia: A never-ending Chinese knowledge extraction system[C]//International Conference on Industrial,Engineering and Other Applications of Applied Intelligent Systems. Cham: Springer,2017: 428-438.
[4] Zeng Y,Wang D,Zhang T,et al. CASIA-KB: A multi-source Chinese semantic knowledge base built from structured and unstructured web data[C]//Proceedings of Joint International Semantic Technology Conference. Cham: Springer,2013: 75-88.
[5] Jin H,Li C,Zhang J,et al. XLORE 2: Large-scale cross-lingual knowledge graph construction and application[J]. Data Intelligence,2019,1(1): 77-98.
[6] Xu B,Liang J,Xie C,et al. CN-DBPedia 2: An extraction and verification framework for enriching Chinese encyclopedia knowledge gase[J]. Data Intelligence,2019,1(3): 271-288.
[7] Lin Y,Liu Z,Sun M,et al. Learning entity and relation embeddings for knowledge graph completion[C]//Proceedings of 29th AAAI Conference on Artificial Intelligence,2015: 2181-2187.
[8] Lin Y,Liu Z,Luan H,et al. Modeling relation paths for representation learning of knowledge bases[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing,2015: 705-714.
[9] Han X,Cao S,Lv X,et al. Openke: An open toolkit for knowledge embedding[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing: System Demonstrations,2018: 139-144.
{{custom_fnGroup.title_cn}}
脚注
{{custom_fn.content}}
基金
国家社会科学基金(18ZDA238)
{{custom_fund}}