郝永彬,周兰江,刘畅. 一种基于LSTM的端到端多任务老挝语分词方法[J]. 中文信息学报, 2021, 35(9): 75-81.
HAO Yongbin, ZHOU Lanjiang, LIU Chang. An End-to-end Multi Task Method for Laotian Word Segmentation via LSTM. , 2021, 35(9): 75-81.
An End-to-end Multi Task Method for Laotian Word Segmentation via LSTM
HAO Yongbin1, ZHOU Lanjiang1, LIU Chang2
1.School of Information Engineering and Automation, Kunming University of Science and Technology, Kunming, Yunnan 650504, China; 2.School of Information Science and Technology, Southwest Jiaotong University, Chengdu, Sichuan 611756, China
Abstract:Laotian is a non-space separated alphabetic language. The existing segmentation algorithms for Laotian mainly use rules to segment syllables first, and then segment words according to the results of syllable segmentation. This paper proposes an end-to-end Laotian word segmentation method based on neural networks. With multi-task joint learning, the Lao syllable segmentation and word segmentation are jointly processed via BiLSTM. Experiments show that the precision of the proposed method reaches 89.02%, out-performing previous word segmentation models.
[1] 张良民. 老挝语实用语法[M]. 北京: 外语教学与研究出版社, 2001. [2] Xue N, Shen L. Chinese word segmentation as LMR tagging[C]//Proceedings of the 2nd SIGHAN Workshop on Chinese Language Processing. Association for Computational Linguistics, 2003: 176-179. [3] 邓丽萍, 罗智勇. 基于半监督CRF的跨领域中文分词[J]. 中文信息学报, 2017, 31(4): 9-19. [4] 杨蓓. 老挝语分词和词性标注方法研究[D].昆明: 昆明理工大学硕士学位论文,2016. [5] Vanthanavong S, Haruechaiyasak C. LaoWS: Lao word segmentation based on conditional random fields[C]//Proceedings of Conference on Human Language Technology for Development, 2011: 21-26. [6] Xu J, Sun X. Dependency-based gated recursive neural network for Chinese word segmentation[C]//Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, 2016: 567-572. [7] Yang J, Zhang Y, Liang S.Subword encoding in lattice LSTM for Chinese word segmentation[C]//Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2019: 2720-2725. [8] Chen X, Shi Z,Qiu X, et al. Adversarial multi-criteria learning for Chinese word segmentation[C]//Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, 2017: 1193-1203. [9] 何力,周兰江,周枫,等.基于双向长短期记忆神经网络的老挝语分词方法[J].计算机工程与科学,2019,41(07): 1312-1317. [10] Zhang R, Kikui G, Sumita E. Subword-based tagging for confidence-dependent Chinese word segmentation[C]//Proceedings of the COLING/ACL on Main Conference Poster Sessions. Association for Computational Linguistics, 2006: 961-968. [11] Rei M. Semi-supervised multitask learning for sequence labeling[C]//Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, 2017: 2121-2130.