长文本武侠小说外号识别研究

PDF(5242 KB)

中文信息学报 ›› 2019, Vol. 33 ›› Issue (8) : 132-142.

自然语言处理应用

长文本武侠小说外号识别研究

唐锋,梁循,赵晓磊,张旋,程恒超

作者信息 +

Nickname Recognition in Full-length Knight-errant Novels

TANG Feng, LIANG Xun, ZHAO Xiaolei, ZHANG Xuan, CHENG Hengchao

Author information +

History +

摘要

长文本武侠小说中主人公以侠客和义士为主,人物个性鲜明,外号可以概括人物最显著的特征。传统命名实体识别主要集中在人名、地名、机构名等领域,对于识别外号尚未有相关研究,但作为武侠小说中不可或缺的元素,外号识别对于同义词识别等研究方向具有借鉴意义。鉴于此,该文提出对武侠小说中武侠人名对应的外号的未登录词扩展识别筛选并辅以固定句式法则的识别方法。未登录词扩展识别筛选方法融合了对于左邻字符串的拓展和筛选同时定义了竞争外号子串和候选外号子串等概念,固定句式法则方法是通过外号指示词对观察窗口的候选外号子串进行筛选。经过统计和分类提出了武侠小说高频词表和低频指示字典,用于对竞争外号子串进行筛选。实验证明该文方法可行有效。

Abstract

In the full-length knight-errant novels, the protagonists are dominated by knights and martyrs with distinct characters. The nickname can summarize the most prominent features of the characters. To recognize such nicknames, this paper proposes a method combing OOV extension recognition and screening method and syntax patterns. OOV extension recognition and screening method combines the expansion and screening of the left-neighbor strings. The syntaxs pattern are performed to identify candidate nickname substrings of the observation window using nickname indicator. This paper also defines concepts such as candidate nickname substrings and optional nickname substrings. The high frequency word list of the martial arts novels and low-frequency pointer dictionary are derived from statistics and classification,The results show that this method is feasible and effective.

导出引用

唐锋,梁循,赵晓磊,张旋,程恒超. 长文本武侠小说外号识别研究. 中文信息学报. 2019, 33(8): 132-142

TANG Feng, LIANG Xun, ZHAO Xiaolei, ZHANG Xuan, CHENG Hengchao. Nickname Recognition in Full-length Knight-errant Novels. Journal of Chinese Information Processing. 2019, 33(8): 132-142

参考文献

[1] 贾崇柏.赵树理小说人物外号的艺术性[J].山西大学学报(哲学社会科学版),1989(3):82-85.
[2] Han J,Qu M,Ren X.Automatic synonym discovery with knowledge bases[C]//Proceedings of ACM SIGKNHDD International Conference on Knowledge Discovery and Data Mining.ACM,2017:997-1005.
[3] 刘冰洋,伍大勇,刘欣然,等.融合全局词语边界特征的中文命名实体识别方法[J].中文信息学报,2017,31(2):86-91.
[4] 郭喜跃,何婷婷.信息抽取研究综述[J].计算机科学,2015,42(2):14-17.
[5] 谢志宁.中文命名实体识别算法研究[D].杭州:浙江大学硕士学位论文,2017.
[6] 黄德根,岳广玲,杨元生.基于统计的中文地名识别[J].中文信息学报,2003,17(2):37-42.
[7] 宋柔,朱宏.基于语料库和规划库的人名识别法[C].全国计算机语言学联合学术会议.1993.
[8] Finkel J R,Grenager T,Manning C.Incorporating non-local information into information extraction systems by gibbs sampling[C]//Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics.Association for Computational Linguistics,2005:363-370.
[9] 张海楠,伍大勇,刘悦,等.基于深度神经网络的中文命名实体识别[J].中文信息学报,2017,31(4):28-35.
[10] 王俊.基于混合模型的中文人名识别方法研究[D].南京:华东交通大学硕士学位论文,2015.
[11] Eunji Yi.SVM-based biological named entity recognition using minimum edit-distance feature boosted by virtual examples[C]//Proceedings of IJCNLP 2004,2004:800-818.
[12] 钱晶,张玥杰,张涛等.基于最大熵的汉语基人名地名识别方法研究[J].小型微型计算机系统,2006,27(9):1761-1765.
[13] Bikel D M,Schwartz R,Weischedel R M.An algorithm that learns whats in a name[J].Machine Learning,1999,34(1-3):211-231.
[14] Mccallum A,Li W.Early results for named entity recognition with conditional random fields,feature induction and web-enhanced lexicons[C]//Proceedings of Conference on Natural Language Learning at HLT-NAACL.Association for Computational Linguistics,2003:188-191.
[15] Isozaki H.Japanese named entity recognition based on a simple rule generator and decision tree learning[J].IPSJ Journal,2002,43(5):1481-1491.
[16] 刘浏,王东波.命名实体识别研究综述[J].情报学报,2018,37(3):329-340.
[17] Collins M.Unsupervised models for named entity classification[C]//Proceedings of Joint Sigdat Conference on Empirical Methods in Natural Language Processing and Very Large Corpora.1999:100--110.
[18] Mikheev A,Moens M,Grover C.Named entity recog- nition without gazetteers[C]//Proceedings of Conference on European Chapter of the Association for Computational Linguistics.Association for Computational Linguistics,1999:1-8.
[19] Guo H.The unreasonable effectiveness of word rep resentations for twitter named entity recognition[C]//Proceedings of Conference of the North American Chapter of the Association for Computational Linguistics.2015.
[20] Tomori S,Ninomiya T,Mori S.Domain specific named entity recognition referring to the real world by deep neural networks[C]//Proceedings of Meeting of the Association for Computational Linguistics.2016:236-242.
[21] Dong X,Qian L,Guan Y,et al.A multiclass classifica tion method based on deep learning for named entity recognition in electronic medical records[C]//Proceedings of New York Scientific Data Summit.IEEE,2016:1-10.
[22] Lample G,Ballesteros M,Subramanian S,et al.Neu ral architectures for named entity recognition[C]//Proceedings of NAACL 2016,2016:260-270.

基金

北大方正集团有限公司数字出版技术国家重点实验室开放课题;国家自然科学基金(71531012,71271211);北京市自然科学基金(4172032);中国人民大学科学研究基金(中央高校基本科研业务费专项资金)项目成果(19XNH120)

PDF(5242 KB)

772

Accesses

Citation

Detail

段落导航

摘要
Abstract
关键词
Key words
引用本文
参考文献
基金

Received	Published
2018-07-05	2019-08-20
Issue Date
2019-08-20

选择文件类型/文献管理软件名称

选择包含的内容

摘要

Abstract

关键词

Key words

引用本文

{{custom_sec.title}}

{{custom_sec.title}}

参考文献

{{custom_fnGroup.title_cn}}

脚注

基金