基于词性对齐与依存关系的中文排比句生成方法

钟茂生,刘蕾,吴如萍,甘家其,周新宇

PDF(2133 KB)
PDF(2133 KB)
中文信息学报 ›› 2025, Vol. 39 ›› Issue (2) : 131-142.
自然语言理解与生成

基于词性对齐与依存关系的中文排比句生成方法

  • 钟茂生,刘蕾,吴如萍,甘家其,周新宇
作者信息 +

Chinese Parallelism Generation Based on Part-of-Speech Alignment and Dependency Relation

  • ZHONG Maosheng, LIU Lei, WU Ruping, GAN Jiaqi, ZHOU Xinyu
Author information +
History +

摘要

排比句是一种常用的修辞手法,其使用具有增强气势、强调突出、层次清晰的效果。排比句生成对于文本生成具有重要意义,能够丰富文本生成的风格和形式,提升教育、广告和文学创作的质量,但目前暂无生成模型和公开的排比句语料库。为此,该文在收集和构建排比句数据集的基础上,结合排比句具有的语言学特征,提出了一种基于词性对齐与依存关系的中文排比句生成模型,称为CPG-PosDep。模型从语言学出发,首先用设定的分词、句内词序和句间位置三种特殊符号及随机采样策略对排比句进行全局标识,并结合改进的Transformer注意力机制学习排比句的词性对齐特征,然后使用BERT和注意模块将给定分句的依存关系信息融入模型中,融合生成排比句。在排比句数据集上的实验表明,模型能够生成与给定分句在对应位置词性一致、依存关系相同的通顺分句,相比使用现有对联或诗歌生成模型生成的排比句,该文所提模型生成的排比句质量更具优势。

Abstract

Parallelism is a popular rhetorical device featured by its ability to enhance momentum, emphasize key points, and provide clear structure, and there is now no dedicated generation model or publicly available corpus for parallelism. This paper proposes a Chinese parallelism generation model named CPG-PosDep, developed based on the collection and construction of a parallelism dataset and considering the linguistic characteristics of parallelism. Followinging a linguistic approach, the model employs predefined segmentation markers, intra-sentence word order indicators, and inter-sentence position symbols, combined with a random sampling strategy to globally annotate parallelism. It integrates an enhanced Transformer attention mechanism to learn part-of-speech alignment features, and adopts BERT and attention mechanisms to incorporate dependency relation information from sentences into the model. Experiments indicate that the model can generate coherent sentences that maintain consistent part-of-speech and dependency relations with the given clauses, much better than those by existing couplet or poetry models.

关键词

中文排比句生成 / 词性对齐 / 随机采样策略 / 依存关系

Key words

Chinese parallelism generation / part-of-speech alignment / random sampling strategy / dependency relation

引用本文

导出引用
钟茂生,刘蕾,吴如萍,甘家其,周新宇. 基于词性对齐与依存关系的中文排比句生成方法. 中文信息学报. 2025, 39(2): 131-142
ZHONG Maosheng, LIU Lei, WU Ruping, GAN Jiaqi, ZHOU Xinyu. Chinese Parallelism Generation Based on Part-of-Speech Alignment and Dependency Relation. Journal of Chinese Information Processing. 2025, 39(2): 131-142

参考文献

[1] 谭学纯,濮侃,沈孟璎.汉语修辞格大辞典[M].上海: 上海辞书出版社, 2010.
[2] 尚观胜.用多彩修辞打开孩子奇妙的想象[J].课外语文,2023,(10): 67-69.
[3] 矣晓沅. 具有文学表现力的中文古典诗歌自动写作方法研究[D]. 北京: 清华大学博士学位论文, 2021.
[4] 朱媛媛. 基于循环神经网络的对联生成模型研究[D]. 吉林: 吉林大学硕士学位论文, 2018.
[5] VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]//Proceedings of the 31st International Conference on Neural Information Pro- cessing Systems, 2017: 6000-6010.
[6] DEVLIN J, CHANG M W, LEE K, et al. BERT: Pre-training of deep bidirectional transformers for language understanding[C]//Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics, 2019: 4171-4186.
[7] 熊李艳,林晓乔,钟茂生.面向自动写作的中文排比句抽取方法[J].计算机应用研究,2018,35(6): 1751-1755.
[8] 张伯江,郭光.消极修辞的灵活度[J].当代修辞学,2019,(03): 1-10.
[9] 谯宇同.基于BERT和依存句法分析的小学作文优秀例句推荐[D]. 武汉: 华中师范大学硕士学位论文, 2021.
[10] 穆婉青,廖健,王素格.融合CNN和结构相似度计算的排比句识别及应用[J].中文信息学报,2018,32(02): 139-146.
[11] DAI Y G, SONG W, LIU X J, et al. Recognition of parallelism sentence based on recurrent neural network[C]//Proceedings of the IEEE 9th International Conference on Software Engineering and Service Science, 2018: 148-151.
[12] 朱晓亮,谯宇同.基于BERT模型的排比句自动识别方法[J].计算机应用与软件,2021,38(07): 153-158.
[13] 刘广辉. 基于修辞句识别的中学语文作文评价系统的设计与实现[D]. 辽宁: 中国科学院大学(中国科学院沈阳计算技术研究所)硕士学位论文, 2022.
[14] 高明虎,于志强.神经机器翻译综述[J].云南民族大学学报(自然科学版),2019,28(01): 72-76.
[15] 李雪晴,王石,王朱君,等.自然语言生成综述[J].计算机应用,2021,41(05): 1227-1235.
[16] 梁明轩,王石,朱俊武,等.知识增强的自然语言生成研究综述[J].计算机科学,2023,50(S1): 11-18.
[17] Yan R, Li C T, Hu X, et al. Chinese couplet generation with neural network structures [C]//Proceedingsof the 54th Annual Meeting of the Association for Computational Linguistics, 2016: 2347-2357.
[18] Wu X, Tosa N, Nakatsu R. New hitch haiku: An interactive renku poem composition supporting to-ol applied for sightseeing navigation system[C]//Proceedings of the International Conference on Entertainment Computing, 2009: 191-196.
[19] He J, Zhou M, Jiang L. Generating Chinese class-ical poems with statistical machine translation mo-dels[C]//Proceedings of the 26th AAAI Conference on Artificial Intelligence, 2012: 1650-1656.
[20] Zhang X X, Lapata M. Chinese poetry generation with recurrent neural networks[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2014: 670-680.
[21] Yi X, Sun M, Li R, et al. Chinese poetry generation with a working memory model[C]//Proceedings of the 27th International Joint Conference on Artificial Intelligence, 2018: 4553-4559.
[22] Tian Y, Peng N. Zero-shot sonnet generation with discourse-level planning and aesthetics features[C]//Proceedings of the Conference of the North American Chapter of the Association for C-omputational Linguistics, 2022: 3587-3597.
[23] Yan Y, Wen D, Yang L, et al. Poetry generation combining poetry theme labels representations[C]//Proceedings of the 14th International Conference on Recent Advances in Natural Language Processing, 2023: 1246-1255.
[24] Zou X, Yin D, Zhong Q, et al. Controllable generation from pretrained language models via inverse prompting[C]//Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, 2021.
[25] Ormazabal A, Artetxe M, Agirrezabal M, et al. PoeLM: A meter-and rhyme-controllable language model for unsupervised poetry generation[C]//Proceedings of the Association for Computational Linguistics: 2022: 3655-3670.
[26] Yang L, Shen Z, Zhou F, et al. TPoet: Topic-enhanced Chinese poetry generation[J]. ACM Transactions on Asian and Low-Resource Language Information Processing, 2023, 22(6): 1-15.
[27] GREENE E, BODRUMLU T, KNIGHT K. Automatic analysis of rhythmic poetry with applications to generation and translation[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2010: 524-533.
[28] JIANG L, ZHOU M. Generating Chinese couplets using a statistical MT approach[C]//Proceedings of the 22nd International Conference on Computational Linguistics, 2008: 377-384.
[29] 张开旭,孙茂松.统计与规则相结合的古文对联应对模型[J].中文信息学报,2009,23(01): 100-105.
[30] GUO Z, ZHANG Y, LU W. Attention guided graph convolutional networks for relation extraction [C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 2019: 241-251.
[31] ZHANG J, ZHANG Z, ZHANG S, et al.VV-Couplet: An open source Chinese couplet generation system[C]//Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2018: 1756-1760.
[32] GAO R, ZHU Y, LI M, et al. Encoder-Decoder couplet generation model based on ‘Trapezoidal Context’ character vector[J].The Computer Journal, 2020, 64(3): 286-295.
[33] SONG Y. Chinese couplet generation with syntactic information[C]//Proceedings of the 29th International Conference on Computational Linguistics,2022: 6436-6446.
[34] PAPINENI K, ROUKOS S, WARD T, et al. Bleu: A method for automatic evaluation of machine translation [C]//Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, 2002: 311-318.
[35] LI J, GALLEY M, BROCKETT C, et al. A diversity-promoting objective function for neural conversation models[C]//Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2016: 110-119.
[36] LI P, ZHANG H, LIU X, et al. Rigid formats controlled text generation[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020: 742-751.
[37] YI X, LI R, YANG C, et al. Mixpoet: Diverse poetry generation via learning controllable mixed latent space[C]//Proceedings of the AAAI Conference on Artificial Intelligence, 2020, 34(05): 9450-9457.

钟茂生(1974—),博士,教授,主要研究领域为自然语言处理、智能教育与软件、机器学习与数据挖掘等。
E-mail: zhongmaosheng@sina.com刘蕾(2000—),硕士研究生,主要研究领域为自然语言处理、考试软件与系统。
E-mail: 1419816401@qq.com吴如萍(1998—),通信作者,硕士研究生,主要研究领域为自然语言处理、智能教育与软件。
E-mail: 1455791641@qq.com

基金

国家自然科学基金(32460214,62366022)
PDF(2133 KB)

Accesses

Citation

Detail

段落导航
相关文章

/