深度神经网络(DNN)已经被广泛应用于图像识别和自然语言处理等各个领域。近年来的研究表明,向DNN模型输入包含微小扰动的样本后,很容易对其输出结果造成严重破坏,这样处理过的样本被称为对抗样本。但中文对抗样本生成领域一直面临着一个严重问题,攻击成功率和对抗样本的可读性难以兼得。该文提出了一种在对抗样本生成的不同阶段,对对抗样本进行视觉相似度和语义相似度进行约束的对抗攻击方法 MCGC。MCGC 生成的对抗样本不但具有良好的可读性,且在针对 Text-CNN、Bi-LSTM、BERT-Chinese 等多个模型的定向和非定向攻击可以达到90%左右的攻击成功率。同时,该文还研究了以 BERT-Chinese 为代表的掩码语言模型(MLM)和传统自然语言处理模型在鲁棒性上的差异表现。
Abstract
Recent studies have shown that feeding DNNs with adversarial samples, i.e., samples containing small perturbations, can easily wreak havoc on their output. The field of Chinese adversarial sample generation has been challenged by achieving both the attack success rate and the sample readability. In this paper, we propose an adversarial attack method named MCGC that constrains the visual similarity and semantic similarity of adversarial samples at different stages of adversarial sample generation. Such generated adversarial samples have good readability and achieve an 90% or so success rate in target and untarget attacks against multiple models such as Text-CNN, Bi-LSTM, and BERT-Chinese. At the same time, this paper studies the differences in robustness between the mask language models (MLM) represented by BERT and the traditional natural language processing models.
关键词
字形相似度评估 /
语义相似度控制 /
黑盒对抗攻击
{{custom_keyword}} /
Key words
glyph similarity evaluation /
semantic similarity control /
black-box adversarial attack
{{custom_keyword}} /
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
参考文献
[1] KRIZHEVSKY A, SUTSKEVER I, HINTON G E. Imagenet classification with deep convolutional neural networks[J]. Communications of the ACM, 2017, 60(6): 84-90.
[2] BOWMAN S R, VILNIS L, VINYALS O, et al. Generating sentences from a continuous space[J]. arXiv preprint arXiv: 1511.06349, 2015.
[3] MILNE D, WITTEN I H.An open-source toolkit for mining Wikipedia[J].Artificial Intelligence, 2013, 194: 222-239.
[4] Ehsan U, Harrison B, Chan L, et al. Rationalization: A neural machine translation approach to generating natural language explanations[C]//Proceedings of the 2018 AAAI/ACM Conference on AI, Ethics, and Society. 2018: 81-87.
[5] DU M, LIU N, HU X. Techniques for interpretable machine learning[J].Communications of the ACM, 2019, 63(1): 68-77.
[6] MARSHAN A.Artificial intelligence: Explainability, ethical issues and bias[J].Annals of Robotics and Automation, 2021, 5(1): 034-037.
[7] SZEGEDY C, ZAREMBA W, SUTSKEVER I, et al. Intriguing properties of neural networks[J]. arXiv preprint arXiv: 1312.6199, 2013.
[8] GOODFELLOW I J, SHLENS J, SZEGEDY C. Explaining and harnessing adversarial examples[J]. arXiv preprint arXiv: 1412.6572, 2014.
[9] PAPERNOT N, MCDANIEL P, JHA S, et al. The limitations of deep learning in adversarial settings[C]//Proceedings of IEEE European Symposium on Security and Privacy. 2016: 372-387.
[10] PAPERNOT N, MCDANIEL P, SWAMI A, et al. Crafting adversarial input sequences for recurrent neural networks[C]//Proceedings of MILCOM IEEE Military Communications Conference. 2016: 49-54.
[11] NORTON A P, QI Y. Adversarial-Playground: A visualization suite showing how adversarial examples fool deep learning[C]//Proceedings of IEEE Symposium on Visualization for Cyber Security. IEEE Computer Society, 2017: 1-4.
[12] EBRAHIMI J, RAO A, LOWD D, et al. Hotflip: White-box adversarial examples for text classification[J]. arXiv preprint arXiv: 1712.06751, 2017.
[13] JIA R, LIANG P. Adversarial examples for evaluating reading comprehension systems[J]. arXiv preprint arXiv: 1707.07328, 2017.
[14] BELINKOV Y, BISK Y. Synthetic and natural noise both break neural machine translation[J]. arXiv preprint arXiv: 1711.02173, 2017.
[15] GLOCKNER M, SHWARTZ V, GOLDBERG Y. Breaking NLI systems with sentences that require simple lexical inferences[J]. arXiv preprint arXiv: 1805.02266, 2018.
[16] GAO J, LANCHANTIN J, SOFFA M L, et al. Black-box generation of adversarial text sequences to evade deep learning classifiers[C]//Proceedings of IEEE Security and Privacy Workshops. IEEE, 2018: 50-56.
[17] LI J, JI S, DU T, et al. TextBugger: Generating adversarial text against real-world applications[C]//Proceedings of the 26th Annual Network and Distributed System Security Symposium. 2019.
[18] 王文琦, 汪润, 王丽娜, 等. 面向中文文本倾向性分类的对抗样本生成方法[J]. Journal of Software, 2019, 30(8): 2415-2427.
[19] XIE Y, GU Z, ZHU B, et al. Adversarial examples for Chinese text classification[C]//Proceedings of the 5th International Conference on Data Science in Cyberspace. IEEE, 2020: 238-245.
[20] NUO C, CHANG G Q, GAO H, et al. WordChange: Adversarial examples generation approach for Chinese text classification[J]. IEEE Access, 2020, 8: 79561-79572.
[21] ZHANG Z, LIU M, ZHANG C, et al. Argot: Generating adversarial readable Chinese texts[C]//Proceedings of the 29th International Conference on International Joint Conferences on Artificial Intelligence. 2021: 2533-2539.
[22] JIN R, WU C. WordErrorSim: An adversarial examples generation method in Chinese by erroneous knowledge[C]//Proceedings of the 5th International Conference on Compute and Data Analysis. 2021: 155-161.
[23] WANG C, ZENG J, WU C. Generating fluent Chinese adversarial examples for sentiment classification[C]//Proceedings of the 14th International Conference on Anti-counterfeiting, Security, and Identification. IEEE, 149-154.
[24] CER D, YANG Y, KONG S, et al. Universal sentence encoder[J]. arXiv preprint arXiv: 1803.11175, 2018.
[25] 黄大方.四角号码法及其优化[J].汕头大学学报, 1994(02): 1-11.
[26] LUO R, XU J, ZHANG Y, et al. Pkuseg: A toolkit for multi-domain chinese word segmentation[J]. arXiv preprint arXiv: 1906.11455, 2019.
[27] XU L, HU H, ZHANG X, et al. CLUE: A Chinese language understanding evaluation benchmark[C]//Proceedings of the 28th International Conference on Computational Linguistics. 2020: 4762-4772.
[28] LI S, ZHAO Z, HU R, et al. Analogical reasoning on chinese morphological and semantic relations[J]. arXiv preprint arXiv: 1805.06504, 2018.
{{custom_fnGroup.title_cn}}
脚注
{{custom_fn.content}}
基金
NSFC-新疆联合基金(U2003206);国家自然科学基金(61972255);中央高校基本科研业务费专项资金(GK2060260303);黑龙江省重点研发计划项目(GA21C020)
{{custom_fund}}