OpenConcepts: 一个开放的细粒度中文概念知识图谱

叶宏彬,张宁豫,陈华钧,邓淑敏,毕祯,陈想

PDF(4491 KB)
PDF(4491 KB)
中文信息学报 ›› 2023, Vol. 37 ›› Issue (1) : 46-53.
知识表示与知识获取

OpenConcepts: 一个开放的细粒度中文概念知识图谱

  • 叶宏彬1,2,张宁豫1,2,陈华钧1,2,邓淑敏1,2,毕祯1,2,陈想1,2
作者信息 +

OpenConcepts: A Public Available Fine-Grained Chinese Concept Knowledge Graph

  • YE Hongbin1,2,ZHANG Ningyu 1,2, CHEN Huajun1,2,DENG Shumin1,2,BI Zhen1,2,CHEN Xiang1,2
Author information +
History +

摘要

知识图谱是通过符号形式描述世界万物的实体及其之间的关联关系,是一种具备强大知识处理能力的大规模语义网络。概念知识图谱是一种特殊的知识图谱,在语义搜索、自动问答等场景具有广泛的应用价值。之前的概念图谱较难覆盖长尾实体,且存在概念粒度较粗和更新困难等问题。针对这些问题,该文提出了一种全新的自动化概念图谱构建方法,能够自动地从海量文本及半结构化数据中构建细粒度的中文概念层次结构,还发布了一个开放的细粒度中文概念知识图谱OpenConcepts,包含440万概念核心实例,5万多个细粒度概念和1 300万概念-实例三元组,并提供相应的调用接口。

Abstract

Knowledge graph is a large-scale semantic network that uses graph models to describe the knowledge. Concept knowledge graph is a special knowledge graph with a wide range of applications in semantic search, question, and other scenarios. In this paper, we propose a concept graph construction approach that can automatically construct a fine-grained Chinese concept hierarchy from massive texts. We also release an open and fine-grained Chinese concept graph called OpenConcepts, including 4.4 million concept instances, more than 50 000 fine-grained concepts, and 13 million concept-instance triples, with APIs to access the data.

关键词

知识图谱 / 三元组抽取 / 关系分类

Key words

knowledge graph / triple extraction / relation classification

引用本文

导出引用
叶宏彬,张宁豫,陈华钧,邓淑敏,毕祯,陈想. OpenConcepts: 一个开放的细粒度中文概念知识图谱. 中文信息学报. 2023, 37(1): 46-53
YE Hongbin,ZHANG Ningyu , CHEN Huajun,DENG Shumin,BI Zhen,CHEN Xiang. OpenConcepts: A Public Available Fine-Grained Chinese Concept Knowledge Graph. Journal of Chinese Information Processing. 2023, 37(1): 46-53

参考文献

[1] KARER B,SCHELER I,HAGEN H,et al. Conceptgraph: a formal model for interpretation and reasoning during visual analysis[J]. Computer Graphics Forum, 2020,39(6): 5-18.
[2] NOY N,GAO Y,JAIN A,et al. Industry-scale knowledge graphs: lessons and challenges[J]. Communications of the ACM,2019,62(8): 36-43.
[3] WANG Z,WANG H,WEN J R,et al. An inference approach to basic level of categorization[C]//Proceedings of the 24th ACM International on Conference on Information and Knowledge Management, 2015: 653-662.
[4] ZHANG N,DENG S,BI Z,et al. OpenUE: An open toolkit of universal extraction from text[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing: System Demonstrations, 2020: 1-8.
[5] WU W,LI H,WANG H,et al. Probase: A probabilistic taxonomy for text understanding[C]//Proceedings of the ACM SIGMOD International Conference on Management of Data, 2012: 481-492.
[6] LIU B,GUO W,NIU D,et al. A user-centered concept mining system for query and document understanding at tencent[C]//Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2019: 1831-1841.
[7] AUER S,BIZER C,KOBILAROV G,et al. DBpedia: a nucleus for a web of open data[M]. The Semantic Web. Springer,Berlin,Heidelberg,2007: 722-735.
[8] HU W,LI H,SUN Z,et al. Clinga: bringing Chinese physical and human geography in linked open data[C]//Proceedings of the International Semantic Web Conference. Springer,Cham,2016: 104-112.
[9] WANG M,ZHANG J,LIU J,et al. PDD graph: bridging electronic medical records and biomedical knowledge graphs via entity linking[C]//Proceedings of the International Semantic Web Conference. Springer,Cham,2017: 219-227.
[10] WANG Z,LI J,WANG Z,et al. XLORE: a large-scale English-Chinese bilingual knowledge graph[C]//Proceedings of the International Semantic Web Conference. 2013,1035: 121-124.
[11] WANG Z,WANG H,WEN J R,et al. An inference approach to basic level of categorization[C]//Proceedings of the 24th ACM International on Conference on Information and Knowledge Management, 2015: 653-662.
[12] LIU K. A survey on neural relation extraction[J]. Science China Technological Sciences,2020,(63): 1971-1984.
[13] QI F,XIE R,ZANG Y,et al. Sememe knowledge computation: A review of recent advances in application and expansion of sememe knowledge bases[J]. Frontiers of Computer Science,2021,15(5): 1-11.
[14] ZHANG N,JIA Q,DENG S,et al. AliCG: fine-grained and evolvable conceptual graph construction for semantic search at alibaba[C]//Proceedings of KDD, 2021: 3895-3905.
[15] 鄂海红,张文静,肖思琪,等. 深度学习实体关系抽取研究综述[J]. 软件学报,2019,30(6): 1793-1818.
[16] 白龙,靳小龙,席鹏弼,等. 基于远程监督的关系抽取研究综述[J]. 中文信息学报,2019,33(10): 10-17.
[17] 杨玉基,许斌,胡家威,等. 一种准确而高效的领域知识图谱构建方法[J]. 软件学报,2018,29(10): 2931-2947.
[18] ZENG X,ZENG D,HE S,et al. Extracting relational facts by an end-to-end neural model with copy mechanism[C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, 2018: 506-514.
[19] PHI V T,SANTOSO J,TRAN V H,et al. Distant supervision for relation extraction via piecewise attention and bag-level contextual inference[J]. IEEE Access,2019,7: 103570-103582.
[20] ZHANG N,DENG S,SUN Z,et al. Attention-based capsule networks with dynamic routing for relation extraction[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2018: 986-992.
[21] ZHANG N,DENG S,YE H,et al. Robust triple extraction with cascade bidirectional capsule network[J]. Expert Systems with Applications,2021: 115806.
[22] ZHANG N,DENG S,SUN Z,et al. Long-tail relation extraction via knowledge graph embeddings and graph convolution networks[C]//Proceedings of NAACL-HLT, 2019: 3016-3025.
[23] WU R,YAO Y,HAN X,et al. Open relation extraction: Relational knowledge transfer from supervised data to unsupervised data[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, 2019: 219-228.
[24] YAN H,GUI T,DAI J,et al. A unified Generative framework for various NER subtasks[C]//Proceedings of the ACL, 2021: 5808-5822.
[25] CAO B,LIN H,HAN X,et al. Knowledgeable or educated guess? revisiting language models as knowledge bases[C]//Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joine Conference on Natural Language Processing. 2021: 1860-1874.
[26] ZHANG N,DENG S,CHENG X,et al. a. Drop redundant,shrink irrelevant: Selective knowledge injection for language pretraining[C]//Proceedings of IJCAI, 2021: 4007-4014.
[27] YE H,ZHANG N,DENG S,et al. Contrastive triple extraction with generative transformer[C]//Proceedings of the AAAI Conference on Artificial Intelligence, 2021,35(16): 14257-14265.
[28] ZHANG N,YE H,DENG S,et al. Contrastive information extraction with generative transformer[J]. IEEE/ACM Transactions on Audio,Speech,and Language Processing,2021,29: 3077-3088.
[29] ZHANG N,CHEN X,XIE X,et al. Document-level relation extraction as semantic segmentation[C]//Proceedings of IJCAI, 2021: 3999-4006.
[30] DENG S,ZHANG N,KANG J,et al. Meta-learning with dynamic-memory-based prototypical network for few-shot event detection[C]//Proceedings of the 13th International Conference on Web Search and Data Mining, 2020: 151-159.
[31] DENG S,ZHANG N,LI L,et al. Ontoed: low-resource event detection with ontology embedding[C]//Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joine Conference on Natural Language Processing, 2021: 2828-2839.
[32] DENG S,ZHANG N,CHEN H,et al. Low-resource extraction with knowledge-aware pairwise prototype learning[J]. Knowledge-based Systems,2021,235(C): 107584.
[33] LOU D,LIAO Z,DENG S,et al. MLBiNet: a cross-sentence collective event detection network[C]//Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joine Conference on Natural Language Processing, 2021: 4829-4839.
[34] 张宁豫,陈曦,陈矫彦,等. 基于位置的知识图谱链接预测[J]. 中文信息学报,2018,32(4): 80-86,129.

基金

国家自然科学基金(91846204,U19B2027)
PDF(4491 KB)

1418

Accesses

0

Citation

Detail

段落导航
相关文章

/