高级检索

基于程序依赖图表示学习的代码搜索方法

Code Search Method Based on Program Dependency Graph Representation Learning

  • 摘要: 代码搜索是软件工程与自然语言处理交叉领域的一个重要任务。它根据开发者的查询句子,从大型代码库中提供语义上相似的程序片段。然而现有的代码搜索方法主要是基于抽象语法树表示代码片段,使用抽象语法树的表示方法缺乏控制流信息和数据流信息,无法完整表达代码片段的结构语义。为了解决现有工作的不足,该文提出了一种基于程序依赖图表示学习的代码搜索方法。该方法首先将程序代码片段转换为标记控制依赖关系和数据依赖关系的程序依赖图,并用Transformer编码器初始化图节点,然后通过结合自注意力分层池化的图神经网络模块提取程序片段的结构特征,解决了现有方法基于抽象语法树表达结构语义不完整的问题。该文在公开的CodeSearchNet数据集的Java语言和Python语言上评估了所提方法的有效性,实验结果表明,该文方法优于先进的代码搜索方法GraphSearchNet。

     

    Abstract: Code search provides semantically similar program fragments from a large codebase based on the developer's query sentence. Existing code search methods are mainly based on abstract syntax trees to represent code fragments, which cannot fully express the structural semantics of code fragments. This paper proposes a code search method based on program-dependent graph representation learning. Firstly, the program code fragment is converted into a program dependency graph of the marker control dependency and data dependency. The graph node is initialized with the Transformer encoder, and the structural features of the program fragment are extracted by combining the graph neural network module of self-attention hierarchical pooling, which better capture the complete semantical structure. Evaluated in Java language and Python language in the public CodeSearchNet dataset, the proposed method is superior to the advanced code search method GraphSearchNet.

     

/

返回文章
返回