神经网络结构搜索前沿综述

杨木润,曹润柘,杜权,李垠桥,肖桐,朱靖波

PDF(3679 KB)
PDF(3679 KB)
中文信息学报 ›› 2023, Vol. 37 ›› Issue (10) : 1-15.
综述

神经网络结构搜索前沿综述

  • 杨木润,曹润柘,杜权,李垠桥,肖桐,朱靖波
作者信息 +

Frontiers in Neural Architecture Search: A Literature Review

  • YANG Murun, CAO Runzhe, DU Quan, LI Yinqiao, XIAO Tong, ZHU Jingbo
Author information +
History +

摘要

深度学习已经在多个领域得到了广泛的使用,并取得了令人瞩目的成绩。然而优秀的网络结构设计在很大程度上仍然依赖于研究者的先验知识和大量的实验验证,整个过程对于人力、算力等资源消耗巨大。因此,能否让计算机自动地找到最适用于当前任务的神经网络结构成为了当前研究的热点。近年来,研究人员对神经网络结构搜索(Neural Architecture Search,NAS)进行了各种改进,相关研究工作复杂且丰富。为了让读者对神经网络结构搜索方法有更清晰的了解,该文从神经网络结构搜索的三个维度: 搜索空间、搜索策略和性能评估策略对现有方法进行了分析,并提出了未来可能的研究方向。

Abstract

Deep learning has been widely used in many fields, producing remarkable achievements. However, excellent network structure design still relies on the prior knowledge of researchers and a large number of experimental verification to a large extent. In recent years, researchers have made various improvements to Neural Architecture Search (NAS), and the related research work is complex and abundant. This paper analyzes the existing methods from three dimensions of neural network structure search: search space, search strategy and performance evaluation strategy. Possible research directions in the future are also proposed.

关键词

神经网络结构搜索 / 搜索空间 / 搜索策略 / 性能评估策略 / 自动机器学习

Key words

neural architecture search / search space / search strategy / performance estimation strategy / automatic machine learning

引用本文

导出引用
杨木润,曹润柘,杜权,李垠桥,肖桐,朱靖波. 神经网络结构搜索前沿综述. 中文信息学报. 2023, 37(10): 1-15
YANG Murun, CAO Runzhe, DU Quan, LI Yinqiao, XIAO Tong, ZHU Jingbo. Frontiers in Neural Architecture Search: A Literature Review. Journal of Chinese Information Processing. 2023, 37(10): 1-15

参考文献

[1] BAHDANAU D, CHO K, BENGIO Y. Neural machine translation by jointly learning to align and translate[J]. arXiv preprint arXiv: 1409.0473, 2014.
[2] VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]//Proceedings of Advances in Neural Information Processing Systems, 2017: 5998-6008.
[3] LUONG M T, PHAM H, MANNING C D. Effective approaches to attention based neural machine translation[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2015: 1412-1421.
[4] HE K, GKIOXARI G, DOLLR P, et al. MaskR-CNN[C]//Proceedings of IEEE International Conference on Computer Vision, 2017: 2980-2988.
[5] SZEGEDY C, VANHOUCKE V, IOFFE S, et al. Rethinking the inception architecture for computer vision[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016: 2818-2826.
[6] LIU W, ANGUELOV D, ERHAN D, et al. SSD: Single shot multibox detector[C]//Proceedings of the Computer Vision-ECCV European Conference, 2016: 21-37.
[7] ARK S , CHRZANOWSKI M, COATES A, et al. Deep voice: Realtime neural text-to-speech[C]//Proceedings of the 34th International Conference on Machine Learning, 2017: 195-204.
[8] AMODEI D, ANANTHANARAYANAN S, ANUBHAI R, et al. Deepspeech 2: End-to-end speech recognition in English and mandarin[C]//Proceedings of the 33nd International Conference on Machine Learning, 2016: 173-182.
[9] MILLER G F, TODD P M, HEGDE S U. Designing neural networks using genetic algorithms[C]//Proceedings of the 3rd International Conference on Genetic Algorithms, 1989: 379-384.
[10] KOZA J R, RICE J P.Genetic generation of both the weights and architecture for a neural network[C]//Proceedings of IJCNN-91-seattle International Joint Conference on Neural Networks, 1991: 397-404.
[11] HARP S, SAMAD T, GUHA A. Designing application specific neural networks using the genetic algorithm[C]//Proceedings of Advances in Neural Information Processing Systems, 1989: 447-454.
[12] KITANO H. Designingneural networks using genetic algorithms with graph generation system[J]. Complex Systems, 1990, 4(4): 461-476.
[13] ZOPH B, VASUDEVAN V, SHLENS J, et al. Learning transferable architectures for scalable image recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018: 8697-8710.
[14] REAL E, AGGARWAL A, HUANG Y, et al.Regularized evolution for image classifier architecture search[C]//Proceedings of the 33rd AAAI Conference on Artificial Intelligence. 2019: 4780-4789.
[15] CHEN L C, COLLINS M, ZHU Y, et al. Searching for efficient multi-scale architectures for dense image prediction[C]//Proceedings of Advances in Neural Information Processing Systems, 2018: 8713-8724.
[16] LIU H, SIMONYAN K, YANG Y. DARTS: Differentiablearchitecture search[J]. arXiv preprint arXiv: 1806.09055, 2018.
[17] LI Y, HU C, ZHANG Y, etal. Learning architectures from an extended search space for language modeling[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020: 6629-6639.
[18] JIANG Y, HU C, XIAO T,et al. Improved differentiable architecture search for language modeling and named entity recognition[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 2019: 3583-3588.
[19] REAL E, MOORE S, SELLE A, et al. Large-scale evolution of image classifiers[C]//Proceedings of the 34th International Conference on Machine Learning, 2017: 2902-2911.
[20] LIU H, SIMONYAN K, VINYALS O, et al. Hierarchical representations for efficient architecture search[J]. arXiv preprint arXiv: 1711.00436, 2017.
[21] ZOPH B, LE Q V. Neural architecture search with reinforcement learning[J]. arXiv preprint arXiv: 1611.01578, 2016.
[22] BAKER B, GUPTA O, NAIK N, et al. Designing neural network architectures using reinforcement learning[J]. arXiv preprint arXiv: 1611.02167, 2016.
[23] ZELA A, KLEIN A, FALKNER S, et al. Towardsautomated deep learning: Efficient joint neural architecture and hyperparameter search[J]. arXiv preprint arXiv: 1807.06906, 2018.
[24] FALKNER S, KLEIN A, HUTTER F. BOHB: Robust and efficient hyperparameter optimization at scale[C]//Proceedings of the 35th International Conference on Machine Learning, 2018: 1436-1445.
[25] KLEIN A, FALKNER S, BARTELS S, et al. Fast bayesian optimization of machine learning hyperparameters on large datasets[C]//Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, 2017: 528-536.
[26] CHRABASZCZ P, LOSHCHILOV I, HUTTER F. Adownsampled variant of imagenet as an alternative to the cifar datasets[C]//Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, 2017: 528-536.
[27] CAI H, YANG J, ZHANG W, et al. Path-level network transformation for efficient architecture search[C]//Proceedings of the 35th International Conference on Machine Learning, 2018: 677-686.
[28] ELSKEN T, METZEN J H, HUTTER F. Simple and efficient architecture search for convolutional neural networks[J]. arXiv preprint arXiv: 1711.04528, 2017.
[29] ELSKEN T, METZEN J H, HUTTER F. Efficient multi-objective neural architecture search via lamarckian evolution[J]. arXiv preprint arXiv: 1804.09081, 2018.
[30] KLEIN A, FALKNER S, SPRINGENBERG J T, et al.Learning curve prediction with bayesian neural networks[C]//Proceedings of the 4th International Conference on Learning Representations, 2016.
[31] BAKER B, GUPTA O, RASKAR R, et al. Acceleratingneural architecture search using performance prediction[J]. arXiv preprint arXiv: 1705.10823, 2017.
[32] DOMHAN T, SPRINGENBERG J T, HUTTER F. Speeding up automatic hyperparameter optimization of deep neural networks by extrapolation of learning curves[C]//Proceedings of the 24th International Joint Conference on Artificial Intelligence, 2015: 3460-3468.
[33] HU H, LANGFORD J, CARUANA R, et al. Macro neural architecture search revisited[C]//Proceedings of the 2nd Workshop on Meta-Learning at NeurIPS. 2018.
[34] PHAM H, GUAN M Y, ZOPH B, et al. Efficient neural architecture search via parameter sharing[C]//Proceedings of the 35th International Conference on Machine Learning, 2018: 4092-4101.
[35] SO D, LE Q, LIANG C. The evolved transformer[C]//Proceedings of the 36th International Conference on Machine Learning, 2019: 5877-5886.
[36] CHEN D, LI Y, QIU M, et al. Adabert: Task adaptive bert compression with differentiable neural architecture search[C]//Proceedings of the 29th International Joint Conference on Artificial Intelligence, 2020: 2463-2469.
[37] WANG H, WU Z, LIU Z, et al. Hat: Hardware aware transformers for efficient natural language processing[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020: 7675-7688
[38] HU C, WANG C, MA X, et al. Ranknas: Efficient neural architecture search by pairwise ranking[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2021: 2469-2480.
[39] XIE L, YUILLE A. Genetic cnn[C]//Proceedings of IEEE International Conference on Computer Vision, 2017: 1379-1388.
[40] SZEGEDY C, LIU W, JIA Y, et al. Going deeper with convolutions[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2015: 1-9.
[41] HUANG G, LIU Z, VAN DER MAATEN L, et al. Densely connected convolutional networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017: 4700-4708.
[42] TAN M, CHEN B, PANG R, et al. Mnasnet: Platform aware neural architecture search for mobile[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2019: 2820-2828.
[43] REAL E, LIANG C, SO D, et al. Automl-zero: Evolving machine learning algorithms from scratch[C]//Proceedings of the 37th International Conference on Machine Learning, 2020: 8007-8019.
[44] CHOLLET F. Xception: Deep learning with depthwise separable convolutions[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017: 1251-1258.
[45] FAN Y, TIAN F, XIA Y, et al. Searching better architectures for neural machine translation[J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2020, 28: 1574-1585.
[46] KANDASAMY K, NEISWANGER W, SCHNEIDER J, et al. Neural architecture search with bayesian optimisation and optimal transport[C]//Proceedings of Advances in Neural Information Processing Systems, 2018: 2020-2029.
[47] CHEN X, XIE L, WU J, et al. Progressive differentiable architecture search: Bridging the depth gap between search and evaluation[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2019: 1294-1303.
[48] DONG X, YANG Y. Searching for a robust neural architecture in four gpu hours[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2019: 1761-1770.
[49] NAYMAN N, NOY A, RIDNIK T, et al. Xnas: Neural architecture search with expert advice[C]//Proceedings of Advances in Neural Information Processing Systems, 2019: 1975-1985.
[50] LI G, QIAN G, Delgadillo I C, et al. Sgas: Sequential greedy architecture search[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020: 1620-1630.
[51] CHU X, WANG X, ZHANG B, et al.DARTS: Robustly stepping out of performance collapse without indicators[J]. arXiv preprint arXiv: 2009.01027, 2020.
[52] CHEN Y C, HSU J Y, LEE C K, et al. DARTS-ASR: Differentiablearchitecture search for multilingual speech recognition and adaptation[C]//Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020: 1803-1807.
[53] LUO R, TIAN F, QIN T, et al. Neural architecture optimization[J]. arXiv preprint arXiv: 1808.07233, 2018.
[54] RAMACHANDRAN P, ZOPH B, LE Q V. Searching for activation functions[J]. arXiv preprint arXiv: 1710.05941, 2017.
[55] ANGELINE P J, SAUNDERS G M, POLLACK J B. An evolutionary algorithm that constructs recurrent neural networks[J]. IEEE Transactions on Neural Networks, 1994, 5(1): 54-65.
[56] STANLEY K O, MIIKKULAINEN R. Evolving neural networks through augmenting topologies[J]. Evolutionary Computation, 2002, 10(2): 99-127.
[57] SUGANUMA M, SHIRAKAWA S, NAGAO T. A genetic programming approach to designingconvolutional neural network architectures[C]//Proceedings of the 27th International Joint Conference on Artificial Intelligence, 2018: 5369-5373.
[58] WISTUBA M. Deep learning architecture search by neuro-cell-based evolution with function-preserving mutations[C]//Proceedings of Machine Learning and Knowledge Discovery in Databases-European Conference, 2018: 243-258.
[59] GAO Y, YANG H, ZHANG P, et al. Graph neural architecture search[C]//Proceedings of the 29th International Joint Conference on Artificial Intelligence, 2020: 1403-1409.
[60] GONG X, CHANG S, JIANG Y, et al. Autogan: Neural architecture search for generative adversarial networks[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019: 3224-3234.
[61] 肖桐,朱靖波. 机器翻译:基础与模型[M].北京:电子工业出版社,2021.
[62] ZHONG Z, YAN J, WU W, et al. Practical block-wise neural network architecture generation[C]//Proceedings ofthe IEEE Conference on Computer Vision and Pattern Recognition, 2018: 2423-2432.
[63] WU B, DAI X, ZHANG P, et al. Fbnet: Hardware-aware efficient convnet design via differentiable neural architecture search[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2019: 10734-10742.
[64] MADDISON C J, MNIH A, TEH Y W. The concrete distribution: A continuous relaxation of discrete random variables[J]. arXiv preprint arXiv: 1611.00712, 2016.
[65] CAI H, ZHU L, HAN S. Proxylessnas: Direct neural architecture search on target task and hardware[J]. arXiv preprint arXiv: 1812.00332, 2018.
[66] XIE S, ZHENG H, LIU C, et al. SNAS: Stochastic neural architecture search[J]. arXiv preprint arXiv: 1812.09926, 2018.
[67] BERGSTRA J, YAMINS D, COX D. Making a science of model search: Hyperparameter optimization in hundreds of dimensions for vision architectures[C]//Proceedings of the 30th International Conference on Machine Learning, 2013: 115-123.
[68] SWERSKY K, DUVENAUD D, SNOEK J, et al. Raiders of the lost architecture: Kernels for bayesian optimization in conditional parameter spaces[J]. arXiv preprint arXiv: 1409.4011, 2014.
[69] LI L, TALWALKAR A. Random search and reproducibility for neural architecture search[C]//Proceedings of the 35th Conference on Uncertainty in Artificial Intelligence, 2020: 367-377.
[70] XIE S, KIRILLOV A, GIRSHICK R, et al. Exploring randomly wired neural networks for image recognition[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019: 1284-1293.
[71] YANG A, ESPERANA P M, CARLUCCI F M. Nas evaluation is frustratingly hard[J]. arXiv preprint arXiv: 1912.12522, 2019.
[72] BENDER G, LIU H, CHEN B, et al. Can weight sharing outperform random architecture search? an investigation with tunas[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020: 14323-14332.
[73] DEVRIES T, TAYLOR G W. Improved regularization of convolutional neural networks with cutout[J]. arXiv preprint arXiv: 1708.04552, 2017.
[74] HE K, ZHANG X, REN S, et al. Deep residual learning for image recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016: 770-778.
[75] HAN D, KIM J, KIM J. Deeppyramidal residual networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017: 5927-5935.
[76] YANG Z, WANG Y, CHEN X, et al. Cars: Continuous evolution for efficient neural architecture search[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020: 1829-1838.
[77] CUI J, CHEN P, LI R, et al. Fast and practical neural architecture search[C]//Proceedings of IEEE/CVF International Conference on Computer Vision, 2019: 6509-6518.
[78] XU Y, XIE L, ZHANG X, et al. PC-DARTS: Partial channel connections for memory efficient architecture search[J]. arXiv preprint arXiv: 1907.05737, 2019.
[79] DONG X, YANG Y. One-shot neural architecture search via self evaluated template network[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019: 3681-3690.
[80] ZHANG M, LI H, PAN S, et al. Overcoming multi-model forgetting in one-shot NAS with diversity maximization[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020: 7809-7818.
[81] BROCK A, LIM T, RITCHIE J M, et al. Smash: One-shot model architecture search through hypernetworks[J]. arXiv preprint arXiv: 1708.05344, 2017.
[82] ZHANG C, REN M, URTASUN R. Graph hypernetworks for neural architecture search[J]. arXiv preprint arXiv: 1810.05749, 2018.
[83] LIU C, ZOPH B, NEUMANN M, et al. Progressive neural architecture search[C]//Proceedings of Computer Vision-ECCV European Conference, 2018: 19-34.
[84] CHEN Y, MENG G, ZHANG Q, et al. Reinforced evolutionary neural architecture search[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2019: 4787-4796.
[85] HU J, SHEN L, SUN G. Squeeze and excitation networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018: 7132-7141.
[86] SANDLER M, HOWARD A, ZHU M, et al. Mobilenetv2: Inverted residuals and linear bottlenecks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018: 4510-4520.
[87] TAN M, LE Q. Efficientnet: Rethinking model scaling for convolutional neural networks[C]//Proceedings of the 36th International Conference on Machine Learning, 2019: 6105-6114.
[88] YING C, KLEIN A, CHRISTIANSEN E, et al. Nas-bench-101: Towards reproducible neural architecture search[C]//Proceedings of the 36th International Conference on Machine Learning, 2019: 7105-7114.
[89] DONG X, YANG Y.Nas-bench-201: Extending the scope of reproducible neural architecture search[J]. arXiv preprint arXiv: 2001.00326, 2020.
[90] MA A, WAN Y, ZHONG Y, et al. SceneNet: Remote sensing scene classification deep learning network using multiobjective neural evolution architecture search[J]. ISPRS Journal of Photogrammetry and Remote Sensing, 2021, 172: 171-188.
[91] WU F, FAN A, BAEVSKI A, et al. Pay less attention with lightweight and dynamic convolutions[J]. arXiv preprint arXiv: 1901.10430, 2019.
[92] REAL E, AGGARWAL A, HUANG Y, et al. Regularized evolution for image classifier architecture search[C]//Proceedings of the 33rd AAAI Conference on Artificial Intelligence, 2019: 4780-4789.
[93] CAI H, CHEN T, ZHANG W, et al. Efficient architecture search by network transformation[C]//Proceedings of the 32nd AAAI Conference on Artificial Intelligence, 2018: 2787-2794.
[94] GUO Z, ZHANG X, MU H, et al. Single path one-shot neural architecture search with uniform sampling[C]//Proceedings of Computer Vision-ECCV European Conference, 2020: 544-560.
[95] YU J, JIN P, LIU H, et al. Bignas: Scaling up neural architecture search with big single-stage models[C]//Proceedings of Computer Vision-ECCV European Conference, 2020: 702-717.
[96] CHU X, ZHANG B, XU R. Fairnas: Rethinking evaluation fairness of weight sharing neural architecture search[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021: 12219-12228.
[97] BENDER G, KINDERMANS P J, ZOPH B, et al. Understanding and simplifying one-shot architecture search[C]//Proceedings of the 35th International Conference on Machine Learning, 2018: 550-559.
[98] WEI T, WANG C, RUI Y, et al. Network morphism[C]//Proceedings of the 33nd International Conference on Machine Learning, 2016: 564-572.
[99] FANG J, SUN Y, PENG K, et al. Fast neural network adaptation via parameter remapping and architecture search[J]. arXiv preprint arXiv: 2001.02525, 2020.
[100] JIN H, SONG Q, HU X. Auto-keras: An efficient neural architecture search system[C]//Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2019: 1946-1956.
[101] ASHOK A, RHINEHART N, BEAINY F, et al. N2N learning: Network to network compression via policy gradient reinforcement learning[J]. arXiv preprint arXiv: 1709.06030, 2017.
[102] ZHENG X, JI R, TANG L, et al. Multinomial distribution learning for effective neural architecture search[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019: 1304-1313.
[103] REAL E, LIANG C, SO D, et al. Automl-zero: Evolving machine learning algorithms from scratch[C]//Proceedings of the 37th International Conference on Machine Learning, 2020: 8007-8019.
[104] GAO J H, XU H, SHI H, et al. Autobert-zero: Evolving bert backbone from scratch[C]//Proceedings of the 36th AAAI Conference on Artificial Intelligence, 2022: 10663-10671.
[105] DEVLIN J, CHANG M W, LEE K,et al. Bert: Pre-training of deep bidirectional transformers for language understanding[C]//Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2019: 4171-4186.
[106] PETERS M E, NEUMANN M, IYYER M, et al. Deep contextualized word representations[C]//Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2018: 2227-2237.
[107] RADFORD A, NARASIMHAN K, SALIMANS T, et al. Improving language understanding by generative pre-training[EB/OL]. https://www.mikecaptain.com/resources/pdf/GPT-1.pdf.
[108] YANG Z, DAI Z, YANG Y, et al. Xlnet: Generalized autoregressive pretraining forlanguage understanding[C]//Proceedings of Advances in Neural Information Processing Systems, 2019: 5754-5764.
[109] YU K, SCIUTO C, JAGGI M, et al. Evaluating the search phase of neuralarchitecture search[J]. arXiv preprint arXiv: 1902.08142, 2019.
[110] CAI H, GAN C, WANG T, et al. Once-for-all: Train one network and specialize it for efficient deployment[J]. arXiv preprint arXiv: 1908.09791, 2019.
[111] LI C, PENG J, YUAN L, et al. Block-wisely supervised neural architecture search with knowledge distillation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020: 1989-1998.

基金

国家自然科学基金(61876035,61732005)
PDF(3679 KB)

Accesses

Citation

Detail

段落导航
相关文章

/