基于序列图模型的多标签序列标注

PDF(2691 KB)

中文信息学报 ›› 2020, Vol. 34 ›› Issue (6) : 18-26.

语言分析与计算

基于序列图模型的多标签序列标注

王少敬,刘鹏飞,邱锡鹏

作者信息 +

Sequential Graph Neural Networks for Multi-Label Sequence Labeling

WANG Shaojing, LIU Pengfei, QIU Xipeng

Author information +

History +

摘要

该文针对实际中存在对同一句话标注多种序列标签问题,定义了多标签序列标注任务,并提出了一种新的序列图模型。序列图模型主要为了建模两种依赖关系: 不同单词在时序维度上面的关系和同一单词在不同任务之间的依赖关系。该文采用LSTM或根据Transformer修改设计的模型处理时序维度上的信息传递。同一单词在不同任务之间使用注意力机制处理不同任务之间的依赖关系,以获得每个单词更好的隐状态表示,并作为下次递归处理的输入。实验表明,该模型不仅能够在Ontonotes 5.0数据集上取得更好的结果,而且可以获取不同任务标签之间可解释的依赖关系。

Abstract

Aims at the problem of labeling multiple sequence labels in the same sentence, we propose a new sequence graph model. The sequence graph model is to capture two main kinds of dependencies: one is the relationship between the time series dimensions of different words, and the other is to unify the dependence of words on different tasks. We adopt LSTM or Transformer-like structure to model information interactions in a time series dimension. And we use attention mechanism at each step to model the interaction between different tasks and obtain a better representation of each word. The experimental results show that our model can not only achieve better performance at Ontonotes 5.0, but also can recover interpretable structures between different task labels.

导出引用

王少敬,刘鹏飞,邱锡鹏. 基于序列图模型的多标签序列标注. 中文信息学报. 2020, 34(6): 18-26

WANG Shaojing, LIU Pengfei, QIU Xipeng. Sequential Graph Neural Networks for Multi-Label Sequence Labeling. Journal of Chinese Information Processing. 2020, 34(6): 18-26

参考文献

[1] Adwait Ratnaparkhi. A maximum entropy model for part-of-speech tagging[C]//Proceedings of Conference on Empirical Methods in Natural Language Processing,1996: 133-142.
[2] Gréegoire Mesnil, Yann Dauphin, Kaisheng Yao, et al. Using recurrent neural networks for slot filling in spoken language understanding[C]//Proceedings of IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2015,23(3): 530-539.
[3] Dou Shen, Jiantao Sun, Hua Li,et al. Document summarization using conditional random fields[C]//Proceedings of IJCAI, 2007(7): 2862-2867.
[4] Andrew McCallum,Khashayar Rohanimanesh, Charles Sutton. Dynamic conditional random fields for jointly labeling multiple sequences[C]//Proceedings of NIPS-2003 Workshop on Syntax, Semantics and Statistics,2003.
[5] Yanxin Shi, Mengqiu Wang. A dual-layer CRF based joint decoding method for cascaded segmentation and labeling tasks[C]//Proceedings of IJCAI, 2007: 1707-1712.
[6] Yue Zhang,Stephen Clark. Joint word segmentation and pos tagging using a single perceptron[C]//Proceedings of ACL-08: HLT, 2008: 888-896.
[7] ChenLyu, Yue Zhang, Donghong Ji. Joint word segmentation, pos-tagging and syntactic chunking[C]//Proceedings of AAAI, 2016: 3007-3014.
[8] Ronan Collobert, Jason Weston. A unified architecture for natural language processing: Deep neural networks with multitask learning[C]//Proceedings of the 25th International Conference on Machine Learning,2008: 160-167.
[9] Zhilin Yang, Ruslan Salakhutdinov, William Cohen. Multi-task cross-lingual sequence tagging from scratch[J]. arXiv preprint arXiv: 1603.06270,2016.
[10] Junkun Chen, Kaiyu Chen, Xinchi Chen, et al. Exploring shared structures and hierarchies for multiple NLP tasks[J]. arXiv preprint arXiv: 1808.07658,2018.
[11] Ralph Weischedel, Martha Palmer, Mitchell Marcus, et al. Ontonotes release 5.0 ldc2013t19[DS]. Linguistic Data Consortium, Philadelphia, PA,2013.
[12] Caruana R. Multitask Learning[J]. Machine learning,1997,28(1): 41-75.
[13] Abu-Mostafa Y S. Learning from hints in neural networks[J]. Journal of Complexity, 1990,6(2): 192-198.
[14] Ishan Misra, Abhinav Shrivastava, Abhinav Gupta, et al. Cross-stitch networks for multi-task learning[C]//Proceedings of Computer Vision and Pattern Recognition (CVPR), 2016 IEEE Conference,2016: 3994-4003.
[15] Sebastian Ruder, Joachim Bingel, Isabelle Augenstein, et al. Learning what to share between loosely related tasks[J]. arXiv preprint arXiv: 1705.08142, 2017.
[16] Pengfei Liu, Xipeng Qiu, Xuanjing Huang. Adversarial multi-task learning for text classification[J]. arXiv preprint arXiv: 1704.05742, 2017.
[17] Sepp Hochreiter Jürgen Schmidhuber. Long short-term memory[J]. Neural Computation, 1997,9(8): 1735-1780.
[18] Ashish Vaswani, Noam Shazeer, Niki Parmar, et al. Attention is all you need.[C]//Proceedings of Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems,2017: 6000-6010.
[19] Myle Ott, Sergey Edunov, David Grangier,et al. Scaling neural machine translation[C]//Proceedings of the 3rd Conference on Machine Translation (WMT),2018.
[20] Jacob Devlin, Mingwei Chang, Kenton Lee,et al. Bert: Pre-training of deep bidirectional transformers for language understanding[J]. arXiv preprint arXiv: 1810.04805,2018.
[21] Radford A, Narasimhan K, Salimans T,et al. Improving language understanding by generative pre-training[OL]. URL https://s3-us-west-2.amazonaws.com/openai-assets/research-covers/language-unsupervised/language_understanding_paper.pdf,2018.
[22] LeCun Y, Bottou L, Bengio Y, et al. Gradient-based learning applied to document recognition[C]//Proceedings of the IEEE,1998, 86(11): 2278- 2324.
[23] Bahdanau D, Cho K, Bengio Y. Neural machine translation by jointly learning to align and translate[C]//Proceedings of the International Conference on Learning Representations,2015.
[24] Thomas NKipf, Max Welling. Semisupervised classification with graph convolutional networks[J]. arXiv preprint arXiv: 1609.02907,2016.
[25] Justin Gilmer, Samuel SSchoenholz, Patrick F Riley,et al. Neural message passing for quantum chemistry[C]//Proceedings of ICML,2017: 1263-1272.
[26] Petar Velickovic, Guillem Cucurull, Arantxa Casanova, et al. Graph attention networks[J]. arXiv preprint arXiv: 1710.10903,2017.
[27] Herman JC Berendsen, David van der Spoel, Rudi van Drunen.Gromacs: A message-passing parallel molecular dynamics implementation[C]//Proceedings of Computer Physics Communications,1995, 91(1-3): 43-56.
[28] Bertrand Serlet, Lee Boynton, Avadis Tevanian. Method for providing automatic and dynamic translation into operation system message passing using proxy objects[J]. US Patent,1996(5): 481,721.
[29] Pengfei Liu, Jie Fu, Yue Dong, et al. Multi-task learning over graph structures[J]. arXiv preprint arXiv: 1811.10211, 2018.
[30] John Lafferty, Andrew McCallum, Fernando CN Pereira. Conditional random fields: Probabilistic models for segmenting and labeling sequence data[C]//Proceedings of the 18th International Conference on Machine Learning,2001.
[31] Zoph B, Le Q V. Neural architecture search with reinforcement learning[J]. arXiv preprint arXiv: 1611.01578,2016.
[32] Zhiheng Huang, Wei Xu, Kai Yu.Bidirectional LSTM-CRF models for sequence tagging[J]. arXiv preprint arXiv: 1508.01991,2015.
[33] Yue Zhang, Qi Liu, Linfeng Song. Sentence-state LSTM for text representation[C]//Proceedings of ACL,2018: 317-327.
[34] Yongjing Yin, Linfeng Song, Jinsong Su, et al. Graph-based neural sentence ordering.[C]//Proceedings of IJCAI,2019: 5387-5393.

基金

国家自然科学基金(61672162)

PDF(2691 KB)

Accesses

Citation

Detail

段落导航

摘要
Abstract
关键词
Key words
引用本文
参考文献
基金

Received	Published
2019-11-22	2020-07-15
Issue Date
2020-07-15

选择文件类型/文献管理软件名称

选择包含的内容

摘要

Abstract

关键词

Key words

引用本文

{{custom_sec.title}}

{{custom_sec.title}}

参考文献

{{custom_fnGroup.title_cn}}

脚注

基金