|
|
A Method of Chinese Character Glyph Similarity Calculation |
LIU Mengdi, LIANG Xun |
School of Information, Renmin University of China, Beijing 100872, China |
|
|
Abstract The paper proposes a method for calculating the similarity of character glyphs, which aims to solve the problem of identifying similar Chinese characters. First, we construct a radical knowledge graph according to the character's composition. Then, based on the knowledge graph and structure features, the paper proposes 2CTransE to learn the semantic representation of entities. Finally, we calculate the character similarity by the entity vector. Results show that the method are effective in similar characters identification. And the component library can be used in the subsequent related researches. We also propose a novel method for Japanese and other similar languages in character similarity calculation.
|
Received: 01 August 2020
|
|
|
|
|
[1]周文德.现行汉字形近字分析[J].西南师范大学学报(哲学社会科学版), 2000(03): 125-129.
[2]徐增林,盛泳潘,贺丽荣,等.知识图谱技术综述[J].电子科技大学学报,2016,45(04): 589-606.
[3]Bordes A, Nicolas Usunier,et al. Translating embeddings for modeling multi-relational data [C]//Proceedings of the 26th International Conference on Neural Information Processing Systems, 2013: 2787-2795.
[4]栗青生,张莉,刘泉,等.一种基于云端信息保护的汉字计算模型[J].计算机科学,2015,42(11): 73-79.
[5]胡浩,李平,陈凯琪.基于汉字固有属性的中文字向量方法研究[J].中文信息学报,2017,31(03): 32-40.
[6]宋柔,林民,葛诗利.汉字字形计算及其在校对系统中的应用[J].小型微型计算机系统,2008(10): 1964-1968.
[7]刘波. 改进的图像匹配方法在汉字识别中的应用[D].广州: 暨南大学硕士学位论文,2015.
[8]Miller G A. Wordnet: A lexical database for English [J]. Communications of the Association for Computing Machinery, 1995, 38(11): 39-41.
[9]Bollacker K D, Evans C, Paritosh P, et al. Freebase: a collaboratively created graph database for structuring human knowledge[C]//Proceedings of International Conference on Management of Data, 2008: 1247-1250.
[10]Bordes A, Weston J, Collobert R, et al. Learning structured embeddings of knowledge bases[C]//Proceedings of National Conference on Artificial Intelligence, 2011: 301-306.
[11]Socher R, Chen D, Manning C D, et al. Reasoning with neural tensor networks for knowledge base completion[C]//Proceedings of Neural Information Processing Systems, 2013: 926-934.
[12]Li Y, Li W, Sun F, et al. Component-enhanced Chinese character embeddings[C]//Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, 2015: 829-834.
[13]Chen X, Xu L, Liu Z,et al. Joint learning of character and word embeddings[C]//Proceedings of International Conference on Artificial Intelligence. AAAI Press, 2015: 1236-1242.
[14]Su T R, Lee H Y. Learning Chinese word representations from glyphs of characters[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2017: 264-273.
[15]Cao S . Cw2Vec: Learning Chinese word embeddings with stroke n-gram information[C]//Proceedings of 32nd AAAI Conference on Artificial Intelligence, 2018: 5053-5061.
[16]费锦昌.现代汉字部件探究[J].语言文字应用,1996(02): 20-26.
[17]崔永华.汉字部件和对外汉字教学[J].语言文字应用,1997(03): 51-56.
[18]Jouili S, Vansteenberghe V. An empirical comparison of graph databases[C]//Proceedings of International Conference on Social Computing. IEEE, 2013: 708-715.
[19]宋柔,林民,葛诗利.汉字字形计算及其在校对系统中的应用[J].小型微型计算机系统,2008(10): 1964-1968.
[20]祁俊辉,龙华,邵玉斌,等.基于特征向量和笔顺编码的字形相似算法研究[J].重庆邮电大学学报(自然科学版),2019,31(06): 885-891.
[21] 孙华,张航.汉字识别方法综述[J].计算机工程,2010,36(20): 194-197.
[22]袁泉,成振华,江洋.基于知识图谱和协同过滤的电影推荐算法研究[J].计算机工程与科学,2020,42(04): 714-721.1
|
|
|
|