%0 Journal Article %A HOU Danyang %A PANG Liang %A DING Hanxing %A LAN Yanyan %A CHENG Xueqi %T Automatic Evaluation Method for Aggressiveness of Language Model %D 2022 %R %J Journal of Chinese Information Processing %P 12-20 %V 36 %N 1 %X The language models trained on large scale corpus has achieved significant performance in text generation. However, these language models may generate uncertain aggressive texts when perturbed. In this paper, we proposed a method that automatic evaluates aggressiveness of language model, which is divided into two stages: induction and evaluation. In induction stage, based on controllable text generation technology, we update parameters in activation layers of language model along the gradient of trained text classification model to increase the probability of generating aggressive texts. In evaluation stage, we estimate proportion of induced aggressive generated text by utilizing text classification model to evaluate aggressiveness of language model. Experiment result shows that this approach can effectively evaluate aggressiveness of language model in different experiment settings, and analyze the relationship between aggressiveness and scale of model parameters, training corpus and prefix. %U http://jcip.cipsc.org.cn/EN/abstract/article_3242.shtml