煤矿企业正从信息化建设向智能化迈进,以大数据、人工智能为代表的网络新技术已促进了矿山领域的智能化发展。但是由于煤矿领域数据信息的繁杂性,难以对其进行统一而高效地收集、信息挖掘,进而促进深一步的特定领域研究和应用。将知识图谱技术初步引入煤矿安全领域,对相关知识概念分类建模,并基于图数据库存储,用实体关系图的方式直观地描述各类概念及概念之间的关系,然后基于初步构建的知识图谱,提出了一种自然语言知识查询方法。实验证明,该文提出的方法具有较高的查全率和查准率,基于Spark的并行朴素贝叶斯问题分类方法可以在保证准确率的同时,显著提升训练效率。该文工作为煤矿安全知识图谱构建及智能查询做了初步有益探索。
Abstract
Coal mining enterprises are developing beyond information construction into intelligence era, motivated by new network technologies like big data and artificial intelligence. In this paper, knowledge graph is introduced into the domain of coalmine safety. The domain knowledge concept is first classified, stored in the graph database, and visually presented for its concept relations. Then, to facilitate the query search over this knowledge graph, a question classification approach is implemented to identify the best query types for a specific question. The experiment results show that the proposed entity extraction method has higher scores on recall and precision, and the Spark-based parallel question classification algorithm significantly improves efficiency while ensuring the accuracy.
关键词
煤矿安全 /
知识图谱 /
实体识别 /
知识查询 /
Spark /
朴素贝叶斯
{{custom_keyword}} /
Key words
coalmine safety /
knowledge graph /
entity recognition /
knowledge query /
Spark /
naive Bayes
{{custom_keyword}} /
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
参考文献
[1] 徐素强,张林,肖宇评,等.智慧矿山的研究[J].世界有色金属,2019,5: 48-49.
[2] Kamilaris A, Yumusak S, Ali M I. WOTS2E: A search engine for a Semantic Web of Things[C]//Proceedings of the 3rd IEEE World Forum on Internet of Things, 2017.
[3] Bao M, Cafarella M J, Soderland S, et al. Open information extraction from the web [C]//Proceedings of the 20th International Joint Conference on Artifical Intelligence, 2007, 7: 2670-2676.
[4] Carlson A, Betteridge J, Kisiel B, et al. Toward an architecture for Never-Ending language learning [C]//Proceedings of the 24th AAAI Conference on Artificial Intelligence AAAI, 2010, 5: 3.
[5] Pasca M, Lin D, Bigham J, et al. Organizing and searching the world wide web of facts-step one: the one-million fact extraction challenge [C]//Proceedings of the 21st National Conference on Artificial Intelligence and the 18th Innovative Applications of Artificial Intelligence Conference, 2006, 6: 1400-1405.
[6] Maedche A, Staab S. Ontology learning for the Semantic Web [J]. Intelligent Systems IEEE, 2001, 16(2): 72-79.
[7] Popescu A M, Etzioni O, Kautz H. Towards a theory of natural language interfaces to databases [C]//Proceedings of the 8th International Conference on Intelligent User Interfaces, 2003: 149-157.
[8] Miller J J. Graph database application sand concepts with Neo4J [C]//Proceedings of the Annual Conference of Southern Association for Information Systems, 2013: 141-147.
[9] Zhang Y, Yang J. Chinese NER using lattice LSTM[C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics,ACL,2018: 1554-1564.
[10] 王志,夏士雄,牛强,等.基于本体的矿井电机故障知识库构建[J].计算机工程, 2010, 36 (10): 270-272.
[11] 张帝,孟磊,董飞,等.基于本体的矿井突水预警知识库的构建[J].煤矿安全, 2018, 49(11): 91-95.
[12] 史秦甫,刘旭红,盛立国,等.煤矿安全本体研究[J].工矿自动化, 2018, 44(3): 42-49.
[13] 郭晓黎,王宇,刘瑞祥.面向煤矿安全事件本体模型研究与应用[J].中国煤炭, 2014, 40(12): 113-116.
[14] 叶帅.基于Neo4 j的煤矿领域知识图谱构建及查询方法研究[D].徐州: 中国矿业大学硕士学位论文, 2019.
[15] 宗成庆.统计自然语言处理(第二版)[M].北京: 清华大学出版社, 2013.
[16] Liu Peng, Zhao Huihan, Teng Jiayu, et al. Parallel naive Bayes algorithm for large-scale Chinese text classification based on spark[J]. Journal of Central South University, 2019, 26 (1): 1-12.
{{custom_fnGroup.title_cn}}
脚注
{{custom_fn.content}}
基金
十三五国家重点研发计划(2017YFCO804401);国家自然科学基金(61801198);江苏省自然科学基金(BK20180174);淄矿智慧矿山开放基金(2019lh10)
{{custom_fund}}