Abstract:The identification of noun phrases is of fundamental significance to natural language processing tasks such as syntactic analysis. At present, the study on the identification of Lao noun phrases is still in its infancy. Compared with other languages, the Lao has the problems such as fuzzy boundary, ambiguous definition description, limited corpus and excessively long sentences. This paper studies the structure of Lao noun phrases and builds the multi-channel model to identify Lao noun phrases. This model forms different channels by combining characters, words and POS features, and extract more hidden information from different aspects with multi BiLSTM networks, so as to alleviate the unenrolled noun phrases issue in low-resource corpus. To deal with the excessively long sentences in Lao, the model introduces the Attention mechanism to assign higher weight of important features, effectively abating the interference from useless information. The experimental results show that the F1 value of the model is up to 85.25% on a limited annotated corpus, which is better than other models and methods.
[1] 李荣,郑家恒.基于语料库的名词短语识别方法[J]. 济南大学学报(自然科学版), 2007(03): 58-60. [2] 马建军,裴家欢,黄德根. CRFs融合语义信息的英语功能名词短语识别[J]. 中文信息学报, 2016, 30(6): 59-66. [3] 李佳.融入依存关系的汉越组块对齐研究[D]. 昆明:昆明理工大学硕士学位论文,2018. [4] 单义栋,王衡军,黄河,等.基于注意力机制的命名实体识别模型研究: 以军事文本为例[J].计算机科学,2019(B06): 111-114. [5] 杨培,杨志豪,罗凌,等.基于注意机制的化学药物命名实体识别[J].计算机研究与发展,2018, 055(007): 1548-1556. [6] Huang Z, Xu W, Yu K. Bidirectional LSTM-CRF models for sequence tagging[J]. arXiv preprint arXiv: 1508.01991, 2015. [7] 方芳,王石,王亚,等.基于混合方法的含动词名词短语识别研究[J].山西大学学报(自然科学版),2019,42(01): 36-45. [8] 张文敏,李华勇,邵艳秋.汉语基本复合名词短语语义关系知识库构建与识别[J].中文信息学报, 2019, 33(12):28-36. [9] Lai H, Zhao C, Yu Z, et al. Vietnamese noun phrase chunking based on BiLSTM-CRF model and constraint rules[C]//Proceedings of the CCF Conference on Big Data. Springer, Singapore, 2019: 89-104. [10] Wei W, Wang Z, Mao X, et al. Enhancing neural sequence labeling with position-aware self-attention[J]. arXiv preprint arXiv:1908.09128, 2019. [11] 王闻慧.融入语言学特征的越南语名词短语自动识别研究[D].洛阳: 战略支援部队信息工程大学硕士学位论文,2019. [12] 邹宏梅,王挺.SVM和基于转换的错误驱动学习相结合的汉语组块识别[J].计算机工程与科学, 2007, 29(4): 91-94. [13] 周雅倩,郭以昆,黄萱菁,等.基于最大熵方法的中英文基本名词短语识别[J].计算机研究与发展,2003(03): 61-67. [14] 王月颖.中文最长名词短语识别研究[D].哈尔滨: 哈尔滨工业大学硕士学位论文,2007. [15] 赵军,黄昌宁.基于转换的汉语基本名词短语识别模型[J].中文信息学报,1999,13(2): 2-8. [16] 李业刚,黄河燕,鉴萍.引入混合特征的最大名词短语双向标注融合算法[J].自动化学报,2015,041(007): 1274-1282. [17] 李业刚,黄河燕.汉语组块分析研究综述[J].中文信息学报,2013,27(3): 1-9. [18] Naiquan H U, Qiaoming Z, Guodong Z. Hybrid method to Chinese base noun phrase recognition[J]. Computer Engineering, 2009, 35(20): 199-201. [19] Lin B Y, Xu F F, Luo Z, et al. Multi-channel bilstm-crf model for emerging named entity recognition in social media[C]//Proceedings of the 3rd Workshop on Noisy User-generated Text. 2017: 160-165. [20] Sang E F, Daelemans W, Déjean H, et al. Applying system combination to base noun phrase identification[J]. arXiv preprint cs/0008012, 2000. [21] Mendes P N, Daiber J, Rajapakse R K, et al. Evaluating the Impact of Phrase Recognition on Concept Tagging[C]//Proceedings of the 8th International Confererce on Language Resources and Evaluation, 2012: 1277-1280. [22] Huang Z, Xu W, Yu K. Bidirectional LSTM-CRF models for sequence tagging[J]. arXiv preprint arXiv:1508.01991, 2015. [23] 司念文,王衡军,李伟,等.基于注意力长短时记忆网络的中文词性标注模型[J].计算机科学,2018,45(4): 66-70,82. [24] Hochreiter S, Schmidhuber J. Long short-term memory[J]. Neural computation, 1997, 9(8): 1735-1780. [25] 李卫疆,漆芳.基于多通道双向长短期记忆网络的情感分析[J].中文信息学报,2019,33(12):119-128. [26] 李琳,龙从军,江荻.藏语句法功能组块的边界识别[J].中文信息学报,2013,27(6): 165-169.