近年来,尽管深度学习给语义依存分析带来了长足的进步,但由于语义依存分析数据标注代价非常高昂,并且在单领域上性能较好的依存分析器迁移到其他领域时,其性能会大幅度下降。因此为了使其走向实用,就必须解决领域适应问题。该文提出一个新的基于对抗学习的领域适应依存分析模型,该模型基于对抗学习的共享双编码器结构,并引入领域私有辅助任务和正交约束,同时也探究了多种预训练模型在跨领域依存分析任务上的效果和性能。
Abstract
To address the domain adaptation of a dependency parser with better performance in a single domain, this paper proposes a new semi-supervised method based on adversarial learning. We design a shared dual encoder structure based on adversarial learning, and introduce domain private auxiliary tasks and orthogonal constraints. At the same time, we explore a variety of pre-trained models in the cross domain dependency parsing task about the effectiveness and performance.
关键词
语义依存分析 /
领域适应 /
对抗学习
{{custom_keyword}} /
Key words
semantic dependency parsing /
domain adaptation /
adversarial learning
{{custom_keyword}} /
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
参考文献
[1] Che W, Shao Y, Liu T, et al. SemEval-2016 Task 9: Chinese semantic dependency parsing[C]//Proceedings of the 10th International Workshop on Semantic Evaluation,2016:1074-1080.
[2] Wouter M. Kouw , Marco Loog. An introduction to domain adaptation and transfer learning[J/OL]. arXiv preprint arXiv:1812.11806, 2018.
[3] Chen D, Manning C. A fast and accurate dependency parser using neural networks[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), 2014: 740-750.
[4] Chen W, Zhang M, Li H. Utilizing dependency language models for graph based dependency parsing models[C]//Proceedings of the Meeting of the Association for Computational Linguistics: Long Papers, 2013:213-222.
[5] Wang W, Chang B. Graph-based dependency parsing with bidirectional lstm[C]//Proceedings of the Meeting of the Association for Computational Linguistics, 2016: 2306-2315.
[6] Koo T, Collins M. Efficient third-order dependency parsers[C]//Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, 2010:1-11.
[7] Dozat T, Manning C. Deep biaffine attention for neural dependency parsing[J/OL]. arXiv preprint arXiv:1611.01734 [cs], 2017.
[8] Devlin J, Chang M, Lee K, et al. BERT: Pre-training of deep bidirectional transformers for language understanding[J/OL]. arXiv preprint arXiv:1810.04805 [cs], 2019.
[9] Liu N, Gardner M, Belinkov Y, et al. Linguistic knowledge and transferability of contextual representations[J/OL]. arXiv preprint arXiv:1903.08855[cs], 2019.
[10] Mulcaire P, Kasai J, Smith N. Polyglot contextual representations improve crosslingual transfer[C]//Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2019:3912-3918.
[11] Bousmalis K, Trigeorgis G, Silberman N, et al. Domain separation networks[J/OL]. arXiv preprint arXiv:1608.06019[cs], 2016.
[12] Ganin Y, Lempitsky V. Unsupervised domain adaptation by backpropagation[C]//Proceedings of the 32nd International Conference on International Conference on Machine Learning, 2015:1180-1189.
[13] Sato M, Manabe H, Noji H, et al. Adversarial training for crossdomain universal dependency parsing[C]//Proceedings of the CoNLL Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies, 2017:71-79.
[14] Chen X, Shi Z, Qiu X, et al. Adversarial multi-criteria learning for Chinese word segmentation[C]//Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, 2017:71-79.
[15] Liu P, Qiu X, Huang X. Adversarial multi-task learning for text classification[C]//Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, 2017:1-10.
[16] Shi G, Feng C, Huang L, et al. Genre separation network with adversarial training for cross-genre relation extraction[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2018:1018-1023.
[17] Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need[C]//Proceedings of the 31st Conference on Neural Information Processing Systems, 2017:456-464.
[18] Hendrycks D, Gimpel K. Gaussian error linear units (gelus)[J/OL]. arXiv preprint arXiv: 1606.08415[cs], 2016.
[19] Arjovsky M, Chintala S, Bottou L: Wasserstein GAN[J/OL]. arXiv preprint arXiv:1701.07875 [cs, stat], 2017.
[20] Arjovsky M, Bottou. Towards principled methods for training generative adversarial networks[J/OL]. arXiv preprint arXiv:1701.04862, 2017.
[21] Grave E, Joulin A, Cisse M, et al. Efficient softmax approximation for gpus[C]//Proceedings of the 34th International Conference on Machine Learning, 2017:1302-1310.
[22] Devlin J, Chang M, Lee K, et al. Bert: Pre-training of deep bidirectional transformers for language understanding[C]//Proceedings of NAACL-HLT, 2019:4171-4186.
[23] Yang Z, Dai Z, Yang Y, Jaime Carbonell, et al. Xlnet: Generalized autoregressive pretraining for language understanding[C]//Proceedings of the 33rd Conference on Neural Information Processing Systems, 2019:5753-5763.
[24] Liu Y, Ott M, Goyal N, et al. Roberta: A robustly optimized bert pretraining approach[C]//Proceedings of the ICLR, 2020.
{{custom_fnGroup.title_cn}}
脚注
{{custom_fn.content}}
基金
国家自然科学基金(61872402);教育部人文社科规划基金(17YJAZH068);北京语言大学校级项目(中央高校基本科研业务费专项资金)(18ZDJ03);模式识别国家重点实验室开放课题基金
{{custom_fund}}