林颂凯,毛存礼,余正涛,郭剑毅,王红斌,张家富. 基于卷积神经网络的缅甸语分词方法[J]. 中文信息学报, 2018, 32(6): 62-70,79.
Lin Songkai, Mao Cunli, Yu Zhengtao, Guo Jianyi , Wang Hongbin, Zhang Jiafu. A Method of Myanmar Word Segmentation Based on Convolution Neural Network. , 2018, 32(6): 62-70,79.
基于卷积神经网络的缅甸语分词方法
林颂凯,毛存礼,余正涛,郭剑毅,王红斌,张家富
昆明理工大学 信息工程与自动化学院,云南 昆明 650500
A Method of Myanmar Word Segmentation Based on Convolution Neural Network
Lin Songkai, Mao Cunli, Yu Zhengtao, Guo Jianyi , Wang Hongbin, Zhang Jiafu
School of Information Engineering and Automation, Kunming University of Science and Technology, Kunming, Yunnan 650500, China
Abstract:In this paper, we propose a Burmese word segmentation method based on convolution neural network. Firstly, we apply the syllable structure features of Burmese to the distributed representation of the word vector feature of Burmese syllable words. Then,based on convolutional neural networks,we fuse the features of syllable and syllable's context to obtain effective feature representation. And the effective feature vectors of Burma word segmentation are automatically studied by using layer by layer feature optimization of deep network. Finally,we use softmax classifiers to predict syllable sequence markers. The experimental results show that the segmentation method proposed in this paper achieves good results.
[1] 黄昌宁,赵海.中文分词十年回顾[J].中文信息学报,2007,21(3):8-19. [2] 孙茂松,邹嘉彦.汉语自动分词研究评述[J].当代语言学,2001,3(1):22-32. [3] 李江波,周强,陈祖舜.汉语词典快速查询算法研究[J].中文信息学报,2006,20(5):31-39. [4] Huaping Zhang,Hongkui Yu,Deyi Xiong,et al.HHMM-based Chinese lexical ICTCLAS[C]//Proceedings of the 2nd SIGHAN workshop affiliated with 41th ACL,Sapporo,Japan,2003:184-187. [5] Jia Xu,Jianfeng Gao,Kristina Toutanova,et al.Bayesian semi-supervised Chinese word segmentation for statistical machine translation[C]//Proceedings of International Conference on Computational Linguistics.Association for Computational Linguistics,2008:1017-1024. [6] Zhao Hai,Huang Changning,Li Mu.An system with conditional random field [C]//Proceedings of the Workshop on Chinese Language Processing,improved Chinese word segmentation//Proceedings of the 5th SIGHAN Morristown,NJ:ACL,2006:108-117. [7] 罗彦彦,黄德根.基于CRFs边缘概率的中文分词[J].中文信息学报,2009,23(5):3-8. [8] Hla Hla,Kavi Htay,Narayana Murthy.Myanmar word segmentation using syllable level longest matching[C]//Proceedings of IJCNLP,2008:41-48. [9] Thu Y K,Finch A,Sumita E,et al.Integrating dictionaries into an unsupervised model for Myanmar word segmentation[C]//Proceedings of the COLING,2014,20:27. [10] Tun Thura Thet,Jin-Cheon Na.Word segmentation for the Myanmar language[J].Journal of Information Science,2008:1-17. [11] Mon A M,Phyue S L,Thein M M,et al.Analysis of Myanmar word boundary and segmentation by using statistical approach[C]//Proceedings of the International Conference on Advanced Computer Theory and Engineering.IEEE,2010:5-233. [12] Ding C,Thu Y K,Utiyama M,et al.Word segmentation for Burmese (Myanmar)[J].ACM Transactions on Asian and Low-Resource Language Information Processing,2016,15(4):22. [13] Hinton G E,Osindero S,Teh Y W.A fast learning algorithm for deep belief networks[J].Neural Computation,2006,18(7):1527-1554. [14] 余凯,贾磊,陈雨强,等.深度学习的昨天、今天和明天[J].计算机研究与发展,2013,50(9):1799-1804. [15] Collobert R,Weston J,Bottou L,et al.Natural language processing (almost) from scratch[J].The Journal of Machine Learning Research,2011,12(1):2493-2537. [16] 刘龙飞,杨亮,张绍武,等.基于卷积神经网络的微博情感倾向性分析[J].中文信息学报,2015,29(6):159-165. [17] 吴冬茵,桂林,陈钊,等.基于深度表示学习和高斯过程迁移学习的情感分析方法[J].中文信息学报,2017,31(1):169-176. [18] 来斯惟,徐立恒,陈玉博,等.基于表示学习的中文分词算法探索[J].中文信息学报,2013,27(5):8-14. [19] Xinchi Chen,Xipeng Qiu,Chenxi Zhu,et al.Gated recursive neural network for Chinese word segmentation[C]//Proceedings of the Annual Meeting of theAssociation for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing,2015:1744-1753. [20] Xu J,Sun X.Dependency-based gated recursive neural network for Chinese word segmentation[C]//Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics,2016:567. [21] Meishan Zhang,Yue Zhang,Guohong Fu.Transition-based neural word segmentation[C]//Proceedingsof the Annual Meeting of the Association for Computational Linguistics,2016:421-431. [22] Deng Cai,Hai Zhao.Neural word segmentation learning for Chinese[C]//Proceedings of the AnnualMeeting of the Association for Computational Linguistics,2016:409-420. [23] Deng Cai,Hai Zhao,Zhisong Zhang,et al.Fast and accurateneural word segmentation for Chinese[C]//Proceedings of the Annual Meeting of the Association for Computational Linguistics,2017. [24] Mikolov T,Chen K,Corrado G,et al.Efficient estimation of word representations in vector space[J].Computer Science,2013. [25] Zeng D,Liu K,Lai S,et al.Relation classification via convolutional deep neural network[C]//Proceedings of the COLING,2014:2335-2344. [26] Thu Y K,Finch A,Sumita E,et al.Integrating dictionaries into an unsupervised model for Myanmar word segmentation[C]//Proceedings of the COLING,2014:20. [27] Thu Y K,Finch A,Sagisaka Y,et al.A study of myanmar word segmentation schemes for statistical machine translation[C]//Proceeding of the 11th International Conference on Computer Applications,2013:167-179.