|
|
Automatic Evaluation Method for Aggressiveness of Language Model |
HOU Danyang1,3, PANG Liang1,3, DING Hanxing1,3, LAN Yanyan2,3 , CHENG Xueqi2,3 |
1.Data Intelligence System Research Center, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China; 2.CAS Key Lab of Network Data Science and Technology, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China; 3.University of Chinese Academy of Sciences, Beijing 100049, China |
|
|
Abstract The language models trained on large scale corpus has achieved significant performance in text generation. However, these language models may generate uncertain aggressive texts when perturbed. In this paper, we proposed a method that automatic evaluates aggressiveness of language model, which is divided into two stages: induction and evaluation. In induction stage, based on controllable text generation technology, we update parameters in activation layers of language model along the gradient of trained text classification model to increase the probability of generating aggressive texts. In evaluation stage, we estimate proportion of induced aggressive generated text by utilizing text classification model to evaluate aggressiveness of language model. Experiment result shows that this approach can effectively evaluate aggressiveness of language model in different experiment settings, and analyze the relationship between aggressiveness and scale of model parameters, training corpus and prefix.
|
Received: 08 October 2020
|
|
|
|
|
[1] Ilya Sutskever, Oriol Vinyals, Quoc V Le, et al. Sequence to sequence learning with neural networks[C]//Proceedings of the 27th International Conference on in neural information processing systems. Montreal, Canada, 2014: 3104-3112. [2] Jiwei Li, Will Monroe, Alan Ritter, et al. Deep reinforcement learning for dialogue generation[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing, Austin, Texas, USA, 2016: 1192-1202. [3] Romain Paulus, Caiming Xiong, Richard Socher. A deep reinforced model for abstractive summarization[C]//Proceedings of the 6th International Conference on Learning Representations, Vancouver, BC, Canada, 2017. [4] Devlin J, Chang M W, Lee K, et al. BERT: Pre-training of deep bidirectional transformers for language understanding[C]//Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Minneapolis, USA, 2019: 4171-4186. [5] Radford A, Wu J, Child R, et al. Language models are unsupervised multitask learners[R]. OpenAI Blog, 2019: 1-24. [6] Eric Wallace, ShiFeng, Nikhil Kandpal, et al. Universal adversarial triggers for attacking and analyzing NLP[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, Hong Kong, China, 2019: 2153-2162. [7] Sumanth Dathathri, Andrea Madotto, Janice Lan, et al. Plug and play language models: A simple approach to controlled text generation[C]//Proceedings of the ICLR 2020, Addis Ababa, Ethiopia, 2020. [8] Zhiting Hu, Zichao Yang, Xiaodan Liang, et al. Toward controlled generation of text[C]//Proceedings of the 34th International Conference on Machine Learning, Sydney, NSW, Australia, 2017: 1587-1596. [9] Diederik P Kingma, Max Welling. Auto-encoding variational bayes[C]//Proceedings of the 2nd International Conference on Learning Representations, Banff, AB, Canada, 2014. [10] Ke Wang, Xiaojun Wan. SentiGAN: Generating sentimental texts via mixture adversarial networks[C]//Proceedings of the 27th International Joint Conference on Artificial Intelligence, Stockholm, Sweden, 2018: 4446-4452. [11] Ian J.Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, et al. Generative adversarial networks[C]//Proceedings of the 21st International Conference on Neural Information Processing Systems 27, Montreal, Canada, 2014. [12] Nitish Shirish Keskar, Bryan McCann, Lav R Varshney, et al. CTRL: A conditional transformer language model for controllable generation[J]. arXiv preprint arXiv 1909.05858, 2019. [13] Ashish Vaswani, Noam Shazeer, Niki Parmar, et al. Attention is all you need[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems, Long Beach, CA, USA. 2017: 5998-6008. [14] Thomas Wolf,Lysandre Debut, Victor Sanh, et al. Hugging Face's Transformers: State-of-the-art natural language processing[J]. arXiv preprint. arXiv 1910.03771, 2019. [15] Yoon Kim.Convolutional neural networks for sentence classification[C]// Proceedings of the Conference on Empirical Methods in Natural Language Processing, Doha, Qatar, 2014: 1746-1751. |
|
|
|