小说中的对话人物识别任务是将小说中对话的说话者归属识别为小说中某个具体的人物,是有声小说自动合成的基础。为了能够充分表示对话类型的区别以及表示文本前后的语义特征,该文提出了一种基于Rule-BertAtten的中文小说对话人物识别方法。首先将对话主要分成四类,即有明确人物名作为主语的对话、人称代词性别唯一匹配候选人作为主语的对话、人称代词性别多匹配候选人作为主语的对话以及其他无任何特征作为主语的对话,根据对话的类别,采用规则判断和加入注意力机制的BERT词向量语义表示的方法,实验表明,该方法具有更高的准确率。
Abstract
Quote attribution in novels aims at determining who says a quote in a given novel. This task is important for assigning appropriate voices to the given quotes when producing vocal novels. In order to fully express the difference of quote types and the semantic features in the context, this paper proposes a Rule-BertAtten method for quote attribution in Chinese novels. The quotes are divided into four categories: the quote with explicit speaker, the quote with pronoun speaker with one-match gender, the quote with pronoun speaker with multi-match gender and the quote with implicit speaker. According to these categories, a rule-based method and the BERT word embedding methods with Attention are applied respectively. The experiment result shows that our method is more accurate than previous approaches.
关键词
小说对话 /
BERT /
对话人物识别 /
规则法
{{custom_keyword}} /
Key words
novel quote /
BERT /
quote attribution /
rule-based
{{custom_keyword}} /
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
参考文献
[1] Jonathan Shen, Ruoming Pang, Ron J Weiss, et al. Natural TTS synthesis by conditioning wavenet on MEL spectrogram predictions [C]//Proceedings of the ICASSP. Calgary, AB, Canada: IEEE, 2018: 4779-4783.
[2] Wei Ping, Kainan Peng, Andrew Gibiansky, et al. Deep voice 3: scaling text-to-speech with convolutional sequence learning [C]//Proceedings of the ICLR. Vancouver, BC, Canada: OpenReview.net, 2018.
[3] Kevin Glass, Shaun Bangay. A nave, salience-based method for speaker identification in fiction books [C]//Proceedings of the 18th Annual Symposium of the Pattern Recognition Association of South Africa. Nicolls, F: University of Cape Town, 2007: 1-6.
[4] Julien Louis, Chloé Mercier, Eugène Nélou,et al. Apprentissage de données structurées: character identification in novels [Z]. PARIS: PARIS-SUD, 2018.
[5] Wei Ping, Kainan Peng, Andrew Gibiansky, et al. Deep voice 3: scaling text-to-speech with convolutional sequence learning [EB/OL]. ICLR. Vancouver, BC, Canada: OpenReview.net, 2018.
[6] Mark A Hall, Eibe Frank, Geoffrey Holmes, et al. The WEKA data mining software: an update [J]. SIGKDD Explorations, 2009, 11: 10-18.
[7] Hua He, Denilson Barbosa, Grzegorz Kondrak. Identification of speakers in novels[C]//Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics. Sofia, Bulgaria, Volume: ACL, 2013: 1312-1320.
[8] Yuxiang Jia, Huayi Dou, Shuai Cao, et al. Speaker identification and its application to social network construction for Chinese novels[C]//Proceedings of the IALP. Kuala Lumpur, Malaysia: IEEE, 2020: 13-18.
[9] Jiaxiang Chen, Zhenhua Ling, Lirong Dai. A Chinese dataset for identifying speakers in novels [C]//Proceedings of the 20th Annual Conference of the International Speech Communication Association. Graz, Austria: ISCA, 2019: 1561-1565.
[10] C Scheible, R Klinger, S Padó. Model architectures for quotation detection[C]//Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics. Stroudsburg: ACL, 2016: 836-1745.
[11] B Pouliquen, R Steinberger, C Best. Automatic detection of quotations in multilingual news [C]//Proceedings of Recent Advances in Natural Language Processing. Hissar, Bulgaria: ACL, 2007: 487-492.
[12] J Shen, R Pang, R J Weiss, et al. Natural TTS synthesis by conditioning wavenet on MEL spectrogram predictions [C]//Proceedings of the ICASSP. Piscataway: NJ, 2018: 4779-4783.
[13] Jacob Devlin, Mingwei Chang, Kenton Lee, et al. BERT: pre-training of deep bidirectional transformers for language understanding [C]//Proceedings of the NAACL-HLT. Minneapolis, MN, USA: ACL, 2019: 4171-4186.
[14] Chi Sun, Xipeng Qiu, Yige Xu, et al. How to fine-tune {BERT} for text classification?[C]//Proceedings of the CCL, 2019: 194-206.
[15] Manish Munikar, Sushil Shakya, Aakash Shrestha. Fine-grained sentiment classification using BERT[J]. arXiv Proprine arXiv: 1910.03474.2019.
[16] Ashish Vaswani, Noam Shazeer, Niki Parmar, et al. Attention is all you need [C]//Proceedings of the 31st Annual Conference on Neural Information Processing Systems. Long Beach, CA: NIPS, 2017: 5998-60
{{custom_fnGroup.title_cn}}
脚注
{{custom_fn.content}}
基金
国家自然科学基金(61771068, 61671079)
{{custom_fund}}