代码摘要生成任务旨在实现全自动化地产生自然语言描述源代码的功能,使其便于软件维护和程序理解。目前,主流的基于Transformer的方法只考虑源代码的文本和结构化语义特征,忽略了与源代码密切相关的API文档等外部语义特征;其次,在使用大规模数据的情况下,由于Transformer结构的自注意力模块需要计算所有相似度分数,因此存在计算成本高和内存占用量大的问题。为解决以上问题,该文提出了一种基于改进Transformer结构的融合多种语义特征的代码摘要生成方法。该方法采用三个独立编码器充分学习源代码的多种语义特征(文本、结构和外部API文档),并使用非参数化傅里叶变换替代编码器中的自注意力层,通过线性变换降低使用Transformer结构的计算时间和内存占用量,在公开数据集上的实验结果证明了该方法的有效性。
Abstract
Code summarization aims to automatically generate the natural language description of source code snippets, which facilitates software maintenance and program understanding. Recent studies have shown that the popular methods utilizing Transformer-ignores the external semantic information such as API documents. Therefore, we propose an automatic code summary generation method based on an improved Transformer integrating multiple semantic features. This method uses three independent encoders to extract multiple semantic features of source code (text, structure and external API documentations information), and the non-parametric Fourier transform is used to replace the self-attention layer in the encoder. The computation time and memory usage of the Transformer structure are reduced by a linear transformation. Experimental results on open datasets prove the effectiveness of the method.
关键词
代码摘要 /
Transformer /
API文档 /
傅里叶变换
{{custom_keyword}} /
Key words
code summarization /
transformer /
API documentations /
Fourier transform
{{custom_keyword}} /
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
参考文献
[1] 陈翔, 杨光, 崔展齐,等. 代码注释自动生成方法综述[J]. 软件学报, 2021, 32(7): 2118-2141.
[2] 金芝, 刘芳, 李戈. 程序理解: 现状与未来[J]. 软件学报, 2019,30(1): 110-26.
[3] HILL E, POLLOCK L, VIJAY SHANKER K. Automatically capturing source code context of nl-queries for software maintenance and reuse[C]//Proceedings of the 31st International Conference on Software Engineering, 2009: 232-242.
[4] HAIDUC S, APONTE J, MARCUS A. Supporting program comprehension with source code summarization[C]//Proceedings of the 32nd International Conference on Software Engineering, 2010: 223-226.
[5] IYER S, KONSTAS I, CHEUNG A, et al. Summarizing source code using a neural attention model[C]//Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, 2016: 2073-2083.
[6] HU X, LI G, XIA X, et al. Deep code comment generation with hybrid lexical and syntacticalinformation[J]. Empirical Software Engineering, 2020, 25(3): 2179-2217.
[7] WEI B, LI G, XIA X, et al. Code generation as a dual task of code summarization[C]//Proceedings of the 32th Annual Conference on Neural Information Processing Systems, 2019: 6559-6569.
[8] LECLAIR A, HAQUE S, WU L, et al. Improved code summarization via a graph neural network[C]//Proceedings of the 28th International Conference on Program Comprehension, 2020: 184-195.
[9] WAN Y, ZHAO Z, YANG M, et al.Improving automatic source code summarization via deep reinforcement learning[C]//Proceedings of the 33rd International Conference on Automated Software Engineering, 2018: 397-407.
[10] WANG W, ZHANG, Y, ZENG Z, et al.TranS: A transformer-based framework for unifying code summarization and code search, 2020.arXiv preprint arXiv: 2003.03238.
[11] HU X, LI G, XIA X, et al. Summarizing source code with transferred API knowledge[C]//Proceedings of the 27th International Joint Conference on Artificial Intelligence, 2018: 2269-2275.
[12] AHMAD W, CHAKRABORTY S, RAY B, et al. A transformer-based approach for source code summarization[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020: 4998-5007.
[13] ZGNER D, KIRSCHSTEIN T, CATASTA M, et al. Language-agnostic representation learning of source code from structure and context[C]//Proceedings of the 9th International Conference on Learning Representations, 2021.
[14] SHI E, WANG Y, DU L. CAST: Enhancing code summarization with hierarchical splitting and reconstruction of abstract syntaxtrees[J]. arXiv preprint arXiv: 2108.12987, 2021.
[15] WEI B. Retrieve and refine: exemplar-based neural comment generation[C]//Proceedings of the 34th International Conference on Automated Software Engineering, 2019: 1250-1252.
[16] ZHANG J, WANG X, ZHANG H, et al. Retrieval-based neural source code summarization[C]//Proceedings of the 42nd International Conference on Software Engineering, 2020: 1385-1397.
[17] HAQUE S, LECLAIR A, WU L, et al. Improved automatic summarization of subroutines via attention to file context[C]//Proceedings of the 17th International Conference on Mining Software Repositories, 2020: 300-310.
[18] SHAHBAZI R, SHARMA R, FARD F H. API2Com: On the improvement of automatically generated code comments using API documentations[C]//Proceedings of the 29th International Conference on Program Comprehension, 2021: 411-421.
[19] LEETHORP J, AINSLIE J, ECKSTEIN I, et al.Fnet: Mixing tokens with fourier transforms. arXiv preprint arXiv: 2105.03824.
[20] TSAI Y, BAI S, LIANG P P, et al. Multimodal transformer for unaligned multimodal language sequences[C]//Proceedings of the 57th Conference of the Association for Computational Linguistics, 2019: 6558-6570.
[21] LOYOLA P,MARRESE-TAYLOR E, MATSUO Y. A neural architecture for generating natural language descriptions from source code changes[C]//Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, 2017: 287-292.
[22] KINGMA D, BA J. Adam: A method for stochastic optimization[C]//Proceedings of the 3rd In International Conference on Learning Representations, 2015.
[23] PAPINENI K, ROUKOS S, WARD T, et al. BLEU: A method for automatic evaluation of machine translation[C]//Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, 2002: 311-318.
[24] BANERJEE S,LAVIE A. METEOR: An automatic metric for MT evaluation with improved correlation with human judgments[J].ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization, 2005: 65-72.
[25] LIN C Y. Rouge: A package for automatic evaluation of summaries[C]//Proceedings of the Text SummarizationBranches Out, 2004: 74-81.
[26] ZHOU Y, SHEN J, ZHANG X, et al. Automatic source code summarization with graph attention networks[J]. Journal of Systems and Software, 2022, 188: 111257-11265.
[27] DAI Z, YANG Z,YANG Y, et al. Transformer-XL: Attentive language models beyond a fixed-length context [C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 2019: 2978-2988.
[28] ROY A,SAFFAR M, VASWANI A, et al. Efficient content-based sparse attention with routing transformers[J]. Transactions of the Association for Computational Linguistics, 2021, 9: 53-68.
{{custom_fnGroup.title_cn}}
脚注
{{custom_fn.content}}
基金
国家自然科学基金(62376062);教育部人文社会科学研究项目(23YJAZH220);广东省哲学社会科学“十四五”规划项目(GD23CTS03);广东省自然科学基金(2023A1515012718);湖南省自然科学基金(2022JJ30020,2021JJ30274)
{{custom_fund}}