诗人密码:唐诗作者身份识别

周爱,桑晨,张益嘉,鲁明羽

PDF(5136 KB)
PDF(5136 KB)
中文信息学报 ›› 2022, Vol. 36 ›› Issue (6) : 162-170.
自然语言处理应用

诗人密码:唐诗作者身份识别

  • 周爱,桑晨,张益嘉,鲁明羽
作者信息 +

Poet Code:Authorship Attribution for Poetry in Tang Dynasty

  • ZHOU Ai, SANG Chen, ZHANG Yijia, LU Mingyu
Author information +
History +

摘要

作者身份识别是对作者个人写作风格的分析。虽然这一任务在多种语言中都得到了广泛的研究,但对中文而言,研究还没有涉及古典诗歌领域。唐诗同时具有跳跃性和整体性,为了兼顾这两种特点,该文提出了一种双通道的Cap-Transformer集成模型。上通道Capsule模型可以在提取特征的同时降低信息损失,能够更好地捕获唐诗各个意象的语义特征;下通道Transformer模型通过多头自注意力机制充分学习唐诗所有意象共同反映的深层语义信息。实验表明,该文提出的模型适用于唐诗作者身份识别任务,并通过错误分析,针对唐诗文本的特殊性,讨论了唐诗作者身份识别任务目前存在的问题及未来的研究方向和面临的挑战。

Abstract

Authorship attribution involving the analysis of individuals’ writing styles has been extensively studied among a wide range of languages. To address the Chinese authorship attribution related to classical poetry which is less touched,this paper proposes a dual-channel Cap-Transformer model. The capsule model in the upper channel can extract features and reduce the loss of information, better capturing the semantics in each imagery in Tang poetry. The transformer model in the lower channel captures the global semantic information reflected by all imagery in Tang poetry with the help of multi-head self-attention mechanism. The experimental results suggest that our model is appropriate for authorship attribution on classical poetry in Tang Dynasty. And the error analysis further probes into the problems and challenges in this task.

关键词

作者身份识别 / 古典诗词 / 胶囊网络 / Transformer

Key words

authorship attribution / classical poetry / capsule / network / Transformer

引用本文

导出引用
周爱,桑晨,张益嘉,鲁明羽. 诗人密码:唐诗作者身份识别. 中文信息学报. 2022, 36(6): 162-170
ZHOU Ai, SANG Chen, ZHANG Yijia, LU Mingyu. Poet Code:Authorship Attribution for Poetry in Tang Dynasty. Journal of Chinese Information Processing. 2022, 36(6): 162-170

参考文献

[1] 冯志伟.语言与数学[M].北京:世界图书出版公司北京公司,2011.
[2] Mendenhall T C. The characteristic curves of composition[J].Science(214S),1887: 237- 246.
[3] Abbasi A, Chen H. Applying authorship analysis to extremist-group web forum messages [J]. IEEE Intelligent Systems, 2005,20(5), 67-75.
[4] Patrick J. Verifying authorship for forensic purposes: a computational protocol and its validation [J] Forensic Science International, 2021(325): 110824.
[5] Roni M, Oren T, Robert M. Pkg2Vec: Hierarchical package embedding for code authorship attribution[J]. Future Generation Computer Systems,2021(116): 49-60.
[6] Luyckx K. Scalability issues in authorship attribution[M]. Antwerp: UPA University Press, 2010: 13-18.
[7] Savoy J. Estimating the probability of an authorship attribution[J]. Journal of the Association for Information Science and Technology, 2016,67: 1462-1472.
[8] Evert S, Proisl T, Jannidis F, et al. Understanding and explaining delta measures for authorship attribution[J]. Digital Scholarship in the Humanities, 2017,32, ii4-ii16.
[9] Kocher M, Savoy J. A simple and efficient algorithm for authorship verification. [J]. Journal of the Association for Information Science and Technology, 2017, 68, 259-269.
[10] Halvani O, Graner L. Cross-domain authorship attribution based on compression: notebook for pan at clef 2018[C]//Proceedings of Working Notes Papers of the CLEF 2018 Evaluation Labs CEUR Workshop Proceedings. CLEF and CEUR-WS.org,2018.
[11] Renkui H, Churen H. Robust stylometric analysis and author attribution based on tones and rimes[J]. Natural Language Engineering, 2019: 1-23.
[12] Stamatatos E. Authorship attribution using text distortion[C]//Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics, 2017: 1138-1149.
[13] Boenninghoff B, Rupp J, Nickel R M, et al. Deep bayes factor scoring for authorship verification[J]. arXiv:2008.10105,2020.
[14] Jafariakinabad F, Hua K A. Style-aware neural model with application in authorship attribution[C]//Proceedings of the 18th IEEE International Conference On Machine Learning And Applications,2019: 325-328.
[15] Misra K, Devarapalli H, Ringenberg T R. Authorship analysis of online predatory conversations using character level convolution neural networks[C]//Proceedings of the IEEE International Conference on Systems, Man and Cybernetics,2019: 623-628.
[16] Chanchal S, Rohit K, Sriparna S, et al. Pushpak bhattacharyya 2021 authorship attribution using capsule-based fusion approach[C]//Proceedings of the Natural Language Processing and Information Systems, 2021: 289-300.
[17] 肖天久,刘颖.基于聚类和分类的金庸与古龙小说风格分析[J].中文信息学报,2015,29(5):167-177.
[18] Sabour S,Frosst N,Hinton G E. Dynamic routing between capsules[C]//Proceedings of the Advances in Neural Information Processing Systems,2017:3856-3866.
[19] Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need[J]. Advances in Neural Information Processing Systems, 2017: 6000-6010.
[20] Cho K,Van M B,Gulcehre C,et al. Learning phrase representations using RNN encoder- decoder for statistical machine translation[J].arXiv:1406.1078,2014.
[21] Blei D M, Andrew Y Ng, Michael I J. Latent dirichlet allocation[J]. Journal of Machine Learning Research, 2003, (3): 993-1022.
[22] Chang J, Gerrish S, Wang C, et al. Reading tea leaves: how humans interpret topic models[J]. Advances in neural information processing systems, 2009, 288-296.
[23] David N, Youn N, Edmund T, et al. Evaluating topic models for digital libraries[C]//Proceedings of the 10th Annual Joint Conference on Digital Libraries, New York, NY, USA. ACM,2010.
[24] 易勇,郑艳,何中市,等. 基于机器学习的古典诗词作者的判别研究 [J]. 心智与计算, 2007,1(3): 359-364.
[25] Devlin J, Chang M W, Lee K, et al. BERT: Pre-training of deep bidirectional transformers for language understanding[C]//Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2019: 4171-4186.
PDF(5136 KB)

1917

Accesses

0

Citation

Detail

段落导航
相关文章

/