文本对抗样本攻击与防御技术综述

杜小虎,吴宏明,易子博,李莎莎,马俊,余杰

PDF(1255 KB)
PDF(1255 KB)
中文信息学报 ›› 2021, Vol. 35 ›› Issue (8) : 1-15.
综述

文本对抗样本攻击与防御技术综述

  • 杜小虎1,吴宏明2,易子博1,李莎莎1,马俊1,余杰1
作者信息 +

Adversarial Text Attack and Defense: A Review

  • DU Xiaohu1, WU Hongming2, YI Zibo1, LI Shasha1, MA Jun1, YU Jie1
Author information +
History +

摘要

对抗样本攻击与防御是最近几年兴起的一个研究热点,攻击者通过微小的修改生成对抗样本来使深度神经网络预测出错。生成的对抗样本可以揭示神经网络的脆弱性,并可以修复这些脆弱的神经网络以提高模型的安全性和鲁棒性。对抗样本的攻击对象可以分为图像和文本两种,大部分研究方法和成果都针对图像领域,由于文本与图像本质上的不同,在攻击和防御方法上存在很多差异。该文对目前主流的文本对抗样本攻击与防御方法做出了较为详尽的介绍,同时说明了数据集、主流攻击的目标神经网络,并比较了不同攻击方法的区别。最后总结文本对抗样本领域面临的挑战,并对未来的研究进行展望。

Abstract

Adversarial attack and defense is a popular research issue in recent years. Attackers use small modifications to generate adversarial examples to cause prediction errors from the deep neural network. The generated adversarial examples can reveal the vulnerability of the neural network, which can be repaired to improve the security and robustness of the model. This paper gives a more detailed and comprehensive introduction to the current mainstream adversarial text example attack and defense methods, the data set together with the target neural network of the mainstream attack. We also compare the differences between different attack methods in this paper. Finally, the challenges of the adversarial text examples and the prospect of future research are summarized.

关键词

自然语言处理 / 对抗样本 / 深度神经网络

Key words

natural language processing / adversarial example / deep neural network

引用本文

导出引用
杜小虎,吴宏明,易子博,李莎莎,马俊,余杰. 文本对抗样本攻击与防御技术综述. 中文信息学报. 2021, 35(8): 1-15
DU Xiaohu, WU Hongming, YI Zibo, LI Shasha, MA Jun, YU Jie. Adversarial Text Attack and Defense: A Review. Journal of Chinese Information Processing. 2021, 35(8): 1-15

参考文献

[1] Krizhevsky A, Sutskever I, Hinton G E. ImageNet classification with deep convolutional neural networks[J]. Communications of the ACM, 2017, 60(6): 84-90.
[2] O'Mahony N, Campbell S, Carvalho A, et al. Deep learning vs traditional computer vision[C]//Proceedings of the Science and Information Conference. Springer, Cham, 2019: 128-144.
[3] Wang F, Jiang M, Qian C, et al. Residual attention network for image classification[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017: 3156-3164.
[4] Belinkov Y, Glass J. Analysis methods in neural language processing: A survey[J]. Transactions of the Association for Computational Linguistics, 2019, 7: 49-72.
[5] Otter D W, Medina J R, Kalita J K. A survey of the usages of deep learning for natural language processing[J]. IEEE Transactions on Neural Networks and Learning Systems, 2020, 32(2): 604-624.
[6] Szegedy C, Zaremba W, Sutskever I, et al. Intriguing properties of neural networks[C]//Proceedings of the International Conference on Learning Representations. 2014.
[7] Guo X, Zhu E, Yin J. A fast and accurate method for detecting fingerprint reference point[J]. Neural Computing and Applications, 2018, 29(1): 21-31.
[8] Chen D, Bolton J, Manning C D. A thorough examination of the CNN/Dailymail reading comprehension task[C]//Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, 2016: 2358-2367.
[9] Su D, Zhang H, Chen H, et al. Isrobustness the cost of accuracy?--a comprehensive study on the robustness of 18 deep image classification models[C]//Proceedings of the European Conference on Computer Vision. 2018: 631-648.
[10] Wang G, Li C, Wang W, et al. Joint embedding of words and labels for text classification[C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, 2018: 2321-2331.
[11] Xue W, Li T. Aspect based sentiment analysis with gated convolutional networks[C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, 2018: 2514-2523.
[12] Wang Z, Hamza W, Florian R. Bilateral multi-perspective matching for natural language sentences[C]//Proceedings of the 26th International Joint Conference on Artificial Intelligence, 2017: 4144-4150.
[13] Bastings J, Titov I, Aziz W, et al. Graphconvolutional encoders for syntax-aware neural machine translation[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2017: 1957-1967.
[14] Henderson P, Sinha K, Angelard-Gontier N, et al. Ethical challenges in data-driven dialogue systems[C]//Proceedings of the 2018 AAAI/ACM Conference on AI, Ethics, and Society, 2018: 123-129.
[15] Liang B, Li H, Su M, et al. Deep text classification can be fooled[C]//Proceedings of the 27th International Joint Conference on Artificial Intelligence, 2018: 4208-4215.
[16] 王春柳,杨永辉,邓霏,等.文本相似度计算方法研究综述[J].情报科学, 2019, 37(03): 158-168.
[17] Kusner M, Sun Y, Kolkin N, et al. From word embeddings to document distances[C]//Proceedings of the International Conference on Machine Learning, 2015: 957-966.
[18] Rubner Y, Tomasi C, Guibas L J. A metric for distributions with applications to image databases[C]//Proceedings of the 6th International Conference on Computer Vision, 1998: 59-66.
[19] Maas A, Daly R E, Pham P T, et al. Learning word vectors for sentiment analysis[C]//Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics, 2011: 142-150.
[20] Zhang X, Zhao J, LeCun Y. Character-level convolutional networks for text classification[J]. Advances in Neural Information Processing Systems, 2015, 28: 649-657.
[21] Bowman S, Angeli G, Potts C, et al. A large annotated corpus for learning natural language inference[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2015: 632-642.
[22] Zhang W E, Sheng Q Z, Alhazmi A, et al. Adversarial attacks on deep-learning models in natural language processing: A survey[J]. ACM Transactions on Intelligent Systems and Technology, 2020, 11(3): 1-41.
[23] Alshemali B, Kalita J. Improving the reliability of deep neural networks in NLP: A review[J]. Knowledge-Based Systems, 2020, 191: 105210.
[24] Jia R, Liang P. Adversarial examples for evaluating reading comprehension systems[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2017: 2021-2031.
[25] 孟东宇. 黑盒威胁模型下深度学习对抗样本的生成[J]. 电子设计工程, 2018 (24): 35.
[26] Ebrahimi J, Rao A, Lowd D, et al. HotFlip: white-box adversarial examples for text classification[C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, 2018: 31-36.
[27] Ebrahimi J, Lowd D, Dou D. On adversarial examples for character-level neural machine translation[C]//Proceedings of the 27th International Conference on Computational Linguistics, 2018: 653-663.
[28] Tsai Y T, Yang M C, Chen H Y. Adversarial attack on sentiment classification[C]//Proceedings of the 2019 ACL Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP, 2019: 233-240.
[29] Wallace E, Rodriguez P, Feng S, et al. Trick me if you can: Human-in-the-loop generation of adversarial examples for question answering[J]. Transactions of the Association for Computational Linguistics, 2019, 7: 387-401.
[30] Gao J, Lanchantin J, Soffa M L, et al. Black-box generation of adversarial text sequences to evade deep learning classifiers[C]//Proceedings of the IEEE Security and Privacy Workshops (SPW), 2018: 50-56.
[31] Heigold G, Varanasi S, Neumann G, et al. How robust are character-based word embeddings in tagging and MT againstword scrambling or random noise?[C]//Proceedings of the 13th Conference of the Association for Machine Translation in the Americas. 2018: 68-80.
[32] Eger S, ?ahin G G, Rücklé A, et al. Text processing like humans do: visually attacking and shielding NLP systems[C]//Proceedings of NAACL-HLT, 2019: 1634-1647.
[33] Samanta S, Mehta S. Towards crafting text adversarial samples[J]. arXiv preprint arXiv: 1707.02812, 2017.
[34] Gan W C, Ng H T. Improving the robustness of question answering systems to question paraphrasing[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 2019: 6065-6075.
[35] Ren S, Deng Y, He K, et al. Generating natural language adversarial examples through probability weighted word saliency[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 2019: 1085-1097.
[36] Jin D, Jin Z, Zhou J T, et al. IsBert really robust? a strong baseline for natural language attack on text classification and entailment[C]//Proceedings of the AAAI Conference on Artificial Intelligence. 2020, 34(05): 8018-8025.
[37] Alzantot M, Sharma Y, Elgohary A, et al. Generating natural language adversarial examples[C]//Proceedings of the 2018 Conference on Empirical Methods in Natura, Language Processing. 2018: 2890-2896.
[38] Wang X, Jin H, He K. Natural language adversarial attacks and defenses in word level[J]. arXiv preprint arXiv: 1909.06723, 2019.
[39] Zhang Y, Baldridge J, He L. PAWS: paraphrase adversaries from word scrambling[C]//Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics, 2019: 1298-1308.
[40] Zang Y, Qi F, Yang C, et al. Word-level textual adversarial attacking as combinatorial optimization[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020: 6066-6080.
[41] Kim Y, Jernite Y, Sontag D, et al. Character-Aware neural language models[C]//Proceedings of the 30th AAAI Conference on Artificial Intelligence, 2016: 2741-2749.
[42] Belinkov Y, Bisk Y. Synthetic and natural noise both break neural machine translation[J]. arXiv preprint arXiv: 1711.02173, 2017.
[43] Behjati M, Moosavi-Dezfooli S M, Baghshah M S, et al. Universal adversarial attacks on text classifiers[C]//Proceedings of 2019 IEEE International Conference on Acoustics, Speech and Signal Processing. 2019: 7345-7349.
[44] Niven T, Kao H Y. Probing neural network comprehension of natural language arguments[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 2019: 4658-4664.
[45] Cer D, Yang Y, Kong S, et al. Universal sentence encoder[J]. arXiv preprint arXiv: 1803.11175, 2018.
[46] Chelba C, Mikolov T, Schuster M, et al. One billion word benchmark for measuring progress in statistical language modeling[J]. arXiv preprint arXiv: 1312.3005, 2013.
[47] Goodfellow I J, Shlens J, Szegedy C. Explaining and harnessing adversarial examples[J]. arXiv preprint arXiv: 1412.6572, 2014.
[48] Li J, Ji S, Du T, et al. Textbugger: Generating adversarial text against real-world applications[J]. arXiv preprint arXiv: 1812.05271, 2018.
[49] Zhao S, Cai Z, Chen H, et al. Adversarial training based lattice LSTM for Chinese clinical named entity recognition[J]. Journal of Biomedical Informatics, 2019, 99: 103290.
[50] Pruthi D, Dhingra B, Lipton Z C. Combating adversarial misspellings with robust word recognition[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 2019: 5582-5591.
[51] Devlin J, Chang M W, Lee K, et al. BERT: pre-training of deep bidirectional transformers for language understanding[C]//Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics. 2019: 4171-4186.
[52] Liu K, Liu X, Yang A, et al. A robust adversarial training approach to machine reading comprehension[C]//Proceedings of the AAAI Conference on Artificial Intelligence, 2020, 34(05): 8392-8400.
[53] Liu H, Zhang Y, Wang Y, et al. Joint character-level word embedding and adversarial stability training to defend adversarial text[C]//Proceedings of the AAAI Conference on Artificial Intelligence, 2020, 34(05): 8384-8391.
[54] Wang Z, Wang H. Defense of word-level adversarial attacks via random substitution encoding[C]//Proceedings of the International Conference on Knowledge Science, Engineering and Management. Springer, Cham, 2020: 312-324.
[55] Jones E, Jia R, Raghunathan A, et al. Robust encodings: A framework for combating adversarial typos[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 2020: 2752-2765.
[56] Xu W, Evans D, Qi Y. Feature squeezing: Detecting adversarial examples in deep neural networks[J]. arXiv preprint arXiv: 1704.01155, 2017.
[57] Roth K, Kilcher Y, Hofmann T. The odds are odd: a statistical test for detecting adversarial examples[C]//Proceedings of the International Conference on Machine Learning. 2019: 5498-5507.
[58] Zhou Y, Jiang J Y, Chang K W, et al. Learning todiscriminate perturbations for blocking adversarial attacks in text classification[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing. 2019: 4906-4915.
[59] Jiang Z, Gao Z, He G, et al. Detect camouflaged spam content via Stone Skipping: graph and text joint embedding for Chinese character variation representation[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, 2019: 6188-6197.
[60] 王文琦,汪润,王丽娜,等.面向中文文本倾向性分类的对抗样本生成方法.软件学报, 2019, 30(8): 2415-2427.
[61] Yeh J F, Lu Y Y, Lee C H, et al. Chinese word spelling correction based on rule induction[C]//Proceedings of the 3rd CIPS-SIGHAN Joint Conference on Chinese Language Processing, 2014: 139-145.
[62] Yu J, Li Z. Chinese spelling error detection and correction based on language model, pronunciation, and shape[C]//Proceedings of the 3rd CIPS-SIGHAN Joint Conference on Chinese Language Processing, 2014: 220-223.
[63] Li J, Du T, Ji S, et al. TextShield: robust text classification based on multimodal embedding and neural machine translation[C]//Proceedings of the29th USENIX Security Symposium, 2020: 1381-1398.
[64] Mikolov T, Sutskever I, Chen K, et al. Distributed representations of words and phrases and their compositionality[C]//Proceedings of the Advances in Neural Information Processing Systems, 2013: 3111-3119.
[65] Sun Y, Wang S, Li Y, et al. Ernie: Enhanced representation through knowledge integration[J]. arXiv preprint arXiv: 1904.09223, 2019.
[66] Clark K, Luong M T, Le Q V, et al. ELECTRA: Pre-training text encoders as discriminators rather than generators[C]//Proceedings of the International Conference on Learning Representations, 2019.

基金

国家重点研究与发展计划(2018YFB1004502)
PDF(1255 KB)

3896

Accesses

0

Citation

Detail

段落导航
相关文章

/