针对长尾问题的二重加权多音字消歧算法

高羽,熊一瑾,叶建成

PDF(1503 KB)
PDF(1503 KB)
中文信息学报 ›› 2022, Vol. 36 ›› Issue (11) : 169-176.
语音信息处理

针对长尾问题的二重加权多音字消歧算法

  • 高羽,熊一瑾,叶建成
作者信息 +

Double-Weighted Disambiguation Algorithm for Long-tail Polyphone Problem

  • GAO Yu, XIONG Yijin, YE Jiancheng
Author information +
History +

摘要

数据的长尾分布问题是NLP实践领域中的常见问题。以语音合成前端的多音字消歧任务为例,多音字数据的极度不均衡、尾部数据的缺乏,影响着语音合成系统的工业实用效果。该文观察到,汉语多音字的分布在“字符”与“字音”两个维度上都呈长尾特性,因此该文针对性地提出一种二重加权算法(Double Weighted, DW)。DW算法可分别与两种长尾算法: MARC,Decouple-cRT结合,进一步提升模型性能。在开源数据和工业数据上,DW算法较基线模型和两种原始算法取得了不同程度的准确率提升,为多维长尾问题提供解决方案与借鉴思路。

Abstract

The problem of long-tail distributed data is common in NLP practice. Taking the polyphone disambiguation task in text-to-speech (TTS) as an example, the extreme data imbalance and the lack of tail data affect industrial online TTS applications. Observging that the Chinese polyphone is long-tail distributed on both “character” and “pronunciation” dimensions, this paper proposes a double-weighted (DW) algorithm, which can be combined with the other two long-tail algorithms: MARC and Decouple-cRT. Given the perspectives of both open-source data and industrial data, DW demonstrates improvement in accuracy compared to the baseline model and the two original algorithms.

关键词

多音字消歧 / 长尾分布 / 重加权 / 解耦特征与分类器

Key words

polyphone disambiguation / long-tail distribution / re-weighting / decouple representation and classifier

引用本文

导出引用
高羽,熊一瑾,叶建成. 针对长尾问题的二重加权多音字消歧算法. 中文信息学报. 2022, 36(11): 169-176
GAO Yu, XIONG Yijin, YE Jiancheng. Double-Weighted Disambiguation Algorithm for Long-tail Polyphone Problem. Journal of Chinese Information Processing. 2022, 36(11): 169-176

参考文献

[1] JAMAL M A, BROWN M, YANG M H, et al. Rethinking class-balanced methods for long-tailed visual recognition from a domain adaptation perspective[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020: 7610-7619.
[2] BUDA M, MAKI A, MAZUROWSKI M A. A systematic study of the class imbalance problem in convolutional neural Networks[J]. Neural Networks, 2018, 106: 249-259.
[3] LIU Z, MIAO Z, ZHAN X, et al. Large-scale long-tailed recognition in an open world[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019: 2537-2546.
[4] 刘方舟,周游. 用决策树指导TBL进行多音字消歧[J]. 计算机工程与应用, 2011, 47(12): 137-140.
[5] ZHANG J, ZHAO Y, ZHU J,et al. Distant supervision for polyphone disambiguation in mandarin Chinese[C]//Proceedings of Interspeech, 2020: 1753-1757.
[6] 张子荣, 初敏. 解决多音字字-音转换的一种统计学习方法[J]. 中文信息学报, 2002, 16(3): 40-46.
[7] ZHANG S, LI Z, YAN S, et al. Distribution alignment: a unified framework for long-tail visual recognition[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021: 2361-2370.
[8] PARK K, LEE S. G2PM: A neural grapheme-to-phoneme conversion package for Mandarin Chinese based on a new open benchmark dataset[J/OL]. arXiv preprint arXiv: 2004.03136, 2020.
[9] WANG Y, ZHANG B, HOU W, et al. Margin calibration for long-tailed visual recognition[J/OL]. arXiv preprint arXiv: 2112.07225, 2021.
[10] KANG B, XIE S, ROHRBACH M, et al. Decoupling representation and classifier for long-tailed recognition[J/OL]. arXiv preprint arXiv: 1910.09217, 2019.
[11] POUYANFAR S, TAO Y, MOHAN A, et al. Dynamic sampling in convolutional neural networks for imbalanced data classification[C]//Proceedings of IEEE Conference on Multimedia Information Processing and Retrieval. IEEE, 2018: 112-117.
[12] HE H, GARCIA E A. Learning from imbalanced data[J]. IEEE Transactions on Knowledge and Data Engineering, 2009, 21(9): 1263-1284.
[13] GANGANWAR V. An overview of classification algorithms for imbalanced datasets[J]. International Journal of Emerging Technology and Advanced Engineering, 2012, 2(4): 42-47.
[14] ZHOU Z H, LIU X Y. Training cost-sensitive neural networks with methods addressing the class imbalance problem[J]. IEEE Transactions on Knowledge and Data Engineering, 2005, 18(1): 63-77.
[15] CAO K, WEI C, GAIDON A, et al. Learning imbalanced datasets with label-distribution-aware margin loss[C]//Proceedings of the 33rd International Conference on Neural Information Processing Systems, 2019: 1567-1578.
[16] HUANG C, LI Y, LOY CC, et al. Deep imbalanced learning for face recognition and attribute prediction[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019, 42(11): 2781-2794.
[17] CUI Y, JIA M, LIN T Y, et al. Class-balanced loss based on effective number of samples[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019: 9268-9277.
[18] REN J, YU C, MA X, et al. Balanced meta-softmax for long-tailed visual recognition[C]//Proceedings of the 34th International Conference on Neural Information Processing Systems, 2020, 33: 4175-4186.
[19] ZHOU B, CUI Q, WEI X S, et al. BBN: Bilateral-branch network with cumulative learning for long-tailed visual recognition[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020: 9719-9728.
[20] YU H, ZHANG N, DENG S, et al. The devil is the classifier: investigating long tail relation classification with decoupling analysis[J/OL]. arXiv preprint arXiv: 2009.07022, 2020.
[21] CHAWLA N V, BOWYER K W, HALL L O, et al. SMOTE: Synthetic minority over-sampling technique[J]. Journal of Artificial Intelligence Research, 2002, 16: 321-357.
[22] CHOU H P, CHANG S C,PAN J Y, et al. Remix: Rebalanced mixup[C]//European Conference on Computer Vision. Springer, Cham, 2020: 95-110.
[23] ZHANG X, FANG Z, WEN Y, et al. Range loss for deep face recognition with long-tailed training data[C]//Proceedings of the IEEE International Conference on Computer Vision, 2017: 5409-5418.
[24] HUANG C, LI Y, LOY C C, et al. Learning deep representation for imbalanced classification[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016: 5375-5384.
[25] YIN X, YU X, SOHN K, et al. Feature transfer learning for face recognition with under-represented data[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019: 5704-5713.
[26] SHU J, XIE Q, YI L, et al. Meta-weight-net: Learning an explicit mapping for sample weighting[C]//Proceedings of the 33rd International Conference on Neural Information Processing Systems, 2019, 32.
[27] YANG Y, XU Z. Rethinking the value of labels for improving class-imbalanced learning[C]//Proceedings of the 34rth International Conference on Neural Information Processing Systems, 2020, 33: 19290-19301.
[28] CLARK K, LUONG M T, LE Q V, et al. Electra: Pre-training text encoders as discriminators rather than generators[J/OL]. arXiv preprint arXiv: 2003.10555, 2020.
[29] SCHUSTER M, PALIWAL K K. Bidirectional recurrent neural networks[J]. IEEE Transactions on Signal Processing, 1997, 45(11): 2673-2681.
[30] YAN H, DENG B, LI X, et al. TENER: Adapting transformer encoder for named entity recognition[J/OL]. arXiv preprint arXiv: 1911.04474, 2019.
[31] KIM Y B, LEE S, STRATOS K. Onenet: Joint domain, intent, slot prediction for spoken language understanding[C]//Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop. IEEE, 2017: 547-553.
[32] AUGUSTYNIAK , KAJDANOWICZ T, KAZIENKO P. Aspect detection using word and char embeddings with (Bi) LSTM and CRF[C]//Proceedings of the IEEE Second International Conference on Artificial Intelligence and Knowledge Engineering. IEEE, 2019: 43-50.
[33] LIU H, ZHANG Y, WANG Y, et al. Joint character-level word embedding and adversarial stability training to defend adversarial text[C]//Proceedings of the AAAI Conference on Artificial Intelligence, 2020, 34(05): 8384-8391.
[34] BA J L, KIROS J R, HINTON G E. Layer normalization[J/OL]. arXiv preprint arXiv: 1607.06450, 2016.
[35] GLOROT X, BORDES A, BENGIO Y. Deep sparse rectifier neural networks[C]//Proceedings of the 14th International Conference on Artificial Intelligence and Statistics. JMLR Workshop and Conference Proceedings, 2011: 315-323.
[36] IOFFE S, SZEGEDY C. Batch normalization: Accelerating deep network training by reducing internal covariate shift[C]//Proceedings of the International Conference on Machine Learning. PMLR, 2015: 448-456.
[37] ZHANG H, PAN H, LI X. A mask-based model for Mandarin Chinese polyphone disambiguation[C]//Proceedings of the Interspeech, 2020: 1728-1732.
[38] ZHANG S, LI Z, YAN S, et al. Distribution alignment: A unified framework for long-tail visual recognition[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021: 2361-2370.
[39] 标贝科技. 中文标准女声音库(10000句) [DB/OL]. https://www.data-baker.com/data/index/TNtts/.
[40] LOSHCHILOV I, HUTTER F. Decoupled weight decay regularization[J/OL]. arXiv preprint arXiv: 1711.05101, 2017.
PDF(1503 KB)

Accesses

Citation

Detail

段落导航
相关文章

/