子图增强的实时同名消歧

韩天翼,程欣宇,张帆进,陈波

PDF(2914 KB)
PDF(2914 KB)
中文信息学报 ›› 2024, Vol. 38 ›› Issue (1) : 45-56.
语言分析与计算模型

子图增强的实时同名消歧

  • 韩天翼1,2,程欣宇1,2,张帆进3,陈波3
作者信息 +

Real-time Name Disambiguation with Subgraph Enhancement

  • HAN Tianyi1,2, CHENG Xinyu1,2, ZHANG Fanjin3, CHEN Bo3
Author information +
History +

摘要

实时同名消歧旨在实时、准确地将具有歧义的作者姓名的新增论文关联到同名候选作者中的正确作者。当前同名消歧算法主要解决冷启动同名消歧问题,较少探索如何高效并有效地解决实时同名消歧问题。该文提出了子图增强的实时同名消歧模型RND-all,该模型通过高效地融合待消歧论文与候选作者之间的结构特征来提升模型的准确率。模型根据待消歧论文的属性与同名候选作者的档案分别构建子图,使用子图结构特征提取框架来计算图相关性特征,最后,通过特征工程以及文本嵌入方法计算语义匹配特征,并利用集成学习实现语义信息与结构信息的融合。实验结果表明,融入结构信息能够有效提升实时同名消歧任务的准确性,RND-all在百万级同名消歧基准WhoIsWho测试集上效果排名第一。

Abstract

Real-time name disambiguation aims to accurately associate new papers to the correct author among same-name candidates in real-time. This paper proposes a subgraph-enhanced real-time name disambiguation model, RND-all, which uses the structural features between the disambiguation paper and the candidate authors to improve the accuracy. In this model, we construct subgraphs based on the attributes of the paper to be disambiguated and the profiles of the candidate authors with the same name, respectively. Then a subgraph structure feature extraction framework is established to calculate graph-correlation features. Finally, the ensemble learning is applied to integrate the structural information and the semantic information, which are derived by feature engineering and semantic text embedding. Experimental results show that incorporating structural information can effectively improve the accuracy of real-time name disambiguation tasks, and RND-all ranks first on the test set of million-level name disambiguation benchmark WhoIsWho.

关键词

实时同名消歧 / 图神经网络 / 结构信息 / 集成学习

Key words

real-time name disambiguation / graph neural network / structural information / ensemble learning

引用本文

导出引用
韩天翼,程欣宇,张帆进,陈波. 子图增强的实时同名消歧. 中文信息学报. 2024, 38(1): 45-56
HAN Tianyi, CHENG Xinyu, ZHANG Fanjin, CHEN Bo. Real-time Name Disambiguation with Subgraph Enhancement. Journal of Chinese Information Processing. 2024, 38(1): 45-56

参考文献

[1] SHIN D, KIM T, CHOI J, et al. Author name disambiguation using a graph model with node splitting and merging based on bibliographic information[J]. Scientometrics, 2014, 100: 15-50.
[2] HAN H, GILES L, ZHA H, et al. Two supervised learning approaches for name disambiguation in author citations[C]//Proceedings of the 4th ACM/IEEE-CS Joint Conference on Digital Libraries, 2004: 296-305.
[3] HUANG J, ERTEKIN S, GILES C L. Efficient name disambiguation for large-scale databases[C]//Proceedings of the 10th European Conference on Principles and Practice of Knowledge Discovery in Databases, 2006: 536-544.
[4] LOUPPE G, ALNATSHEH H T, SUSIK M, et al. Ethnicity sensitive author disambiguation using semi-supervised learning[C]//Proceedings of the Knowledge Engineering and Semantic Web: 7th International Conference, 2016: 272-287.
[5] TANG J, FONG A C, WANG B, et al. A unified probabilistic framework for name disambiguation in digital library[J]. IEEE Transactions on Knowledge and Data Engineering, 2011, 24(6): 975-987.
[6] ZHANG Y, ZHANG F, YAO P, et al. Name disambiguation in AMiner: Clustering, maintenance, and human in the Loop[C]//Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. London United Kingdom: ACM, 2018: 1002-1011.
[7] CHEN B, ZHANG J, TANG J, et al. Conna: Addressing name disambiguation on the fly[J]. IEEE Transactions on Knowledge and Data Engineering, 2020, 34(7): 3139-3152.
[8] TANG J, ZHANG J, YAO L, et al. Arnetminer: Extraction and mining of academic social networks[C]//Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2008: 990-998.
[9] ZHANG B, AL HASAN M. Name disambiguation in anonymized graphs using network embedding[C]//Proceedings of the ACM on Conference on Information and Knowledge Management, 2017: 1239-1248.
[10] 陈晨, 王厚峰. 基于社会网络的跨文本同名消歧[J]. 中文信息学报, 2011, 25(05): 75-82.
[11] SUBRAMANIAN S, KING D, DOWNEY D, et al. S2and: A benchmark and evaluation system for author name disambiguation[C]//Proceedings of the ACM/IEEE Joint Conference on Digital Libraries, 2021: 170-179.
[12] DONG Y, CHAWLA N V, SWAMI A. Metapath2vec: Scalable representation learning for heterogeneous networks[C]//Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2017: 135-144.
[13] ZHANG C, HUANG C, YU L, et al. Camel: Content-aware and meta-path augmented metric learning for author identification[C]//Proceedings of the World Wide Web Conference, 2018: 709-718.
[14] ZHAO X. The scorecard solution to the author-paper identification challenge[C]//Proceedings of the KDD Cup, 2013: 1-6.
[15] KENTON J D M W C, TOUTANOVA L K. BERT: Pretraining of deep bidirectional transformers for language understanding[C]//Proceedings of NAACL-HLT, 2019: 4171-4186.
[16] LI N, ZHU R, ZHOU X, et al. On disambiguating authors: Collaboration network reconstruction in a bottom-up manner[C]//Proceedings of the IEEE 37th International Conference on Data Engineering. IEEE, 2021: 888-899.
[17] NIEPERT M, AHMED M, KUTZKOV K. Learning convolutional neural networks for graphs[C]//Proceedings of the International Conference on Machine Learning. PMLR, 2016: 2014-2023.
[18] KIPF T N, WELLING M. Semi-supervised classification with graph convolutional networks[C]//Proceedings of the 5th International Conference on Learning Representations, Conference Track Proceedings, 2017.
[19] VILICKOVIC P, CUCURULL G, CASANOVA A, et al. Graph attention networks[C]//Proceedings of the 6th International Conference on Learning Representations, 2018.
[20] LIU X, YIN D, ZHENG J, et al. OAG-BERT: Towards a unified backbone language model for academic knowledge services[C]//Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2022: 3418-3428.
[21] XIONG C, DAI Z, CALLAN J, et al. End-to-end neural ad-hoc ranking with kernel pooling[C]//Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2017: 55-64.
[22] CHEN B, ZHANG J, ZHANG F, et al. Web-scale academic name disambiguation: The who is who benchmark, leaderboard, and toolkit[J]. arXiv preprint arXiv: 2302.11848, 2023.
[23] DWIVEDI V P, BRESSON X. A generalization of transformer networks to graphs[C]//Proceedings of the AAAI Workshop on Deep Learning on Graphs: Methods and Applications, 2021.
PDF(2914 KB)

580

Accesses

0

Citation

Detail

段落导航
相关文章

/