汉字的表义性是其区别于表音文字的一大特点。部件作为构字单位,同汉字的意义之间有着很大的联系。然而,汉字部件的表义能力究竟如何是学界尚待讨论的课题。针对这一问题,该文从汉字部件入手,提出了融合部件的字词分布式表示模型。该模型在向量内部评测任务上性能获得了一定提升,在汉字理据性测量任务上也与人工打分结果显著相关。基于该模型,进一步提出了部件表义能力的计算方法,对汉字部件的表义能力做了整体评估,并结合部件的构字能力建立了现代汉字部件的等级体系。测量结果显示,现代汉字部件具有一定表义能力,但整体而言表义能力偏低。最后,将测量结果应用于对外汉语教学中,确立了适用于部件教学法的部件范围,并提出了对应的汉字教学顺序方案。
Abstract
The semantic representation of Chinese characters is one of the characteristics that distinguishes them from phonetic characters. As a unit of character construction, components are closely related to the meaning of Chinese characters. However, how to measure the meaning of Chinese character components is an issue remains to be discussed. In this paper, we focus on components in Chinese character and train a multi-granularity Chinese word embedding, which are proved positive in the internal evaluation task of word embedding and the motivation mea-surement of Chinese character. Based on this model, we further put forward a formula to calculate the semantic ability of components, revealing that components in Chinese characters have certain but limited semantic ability. Meanwhile, we further establish the grading system of components by taking the semantic ability of components into account. Finally, for the teaching of Chinese as a foreign language, We establish the scope of component teaching, and put forward a scheme of teaching sequence of Chinese characters.
关键词
汉字部件 /
表义能力测量 /
分布式表示
{{custom_keyword}} /
Key words
Chinese character component /
semantic ability measurement /
distributed representation
{{custom_keyword}} /
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
参考文献
[1] 王宁.系统论与汉字构形学的创建[J].暨南学报(哲学社会科学),2000,64(02): 15-21.
[2] 苏培成.现代汉字学纲要(增订本)[M].北京: 北京大学出版社,2001: 74.
[3] 林柏松(Patrick Lin),周健.外国人汉字速成[M].北京: 华语教学出版社,1996: 2.
[4] 胡韧奋,曹冰,杜健一.现代汉字形声字声符在普通话中的表音度测查[J].中文信息学报,2013,27(03): 41-47.
[5] 李丽.古文字意符演变研究[D].重庆: 西南大学硕士学位论文,2012.
[6] 张莹莹.会意字意符的认知功能分析[J].东南学术,2017,29(01): 238-245.
[7] 崔永华.关于汉字教学的一种思路[J].北京大学学报(哲学社会科学版),1998,43(03): 113-117.
[8] 施正宇.现代形声字形符表义功能分析[J].语言文字应用,1992,1(04): 76-83.
[9] 李蕊.对外汉语教学中的形声字表义状况分析[J].语言文字应用,2005,2(02): 104-110.
[10] 吕菲. 现代形声字意符表义研究[D].北京: 中央民族大学硕士学位论文,2012.
[11] 陈爱华. 汉语国际教育等级汉字意符表义状况及教学研究[D].合肥: 安徽大学硕士学位论文,2017.
[12] Mikolov T, Sutskever I, Chen K, et al. Distributed representations of words and phrases and their compositionality[C]//Proceedings of the 26th Internatial Conference on Neural Information Processing Systems, 2013: 3111-3119.
[13] Firth J R. A synopsis of linguistic theory 1930-1955.[J]. Studies in Linguistic Analysis, 1957, 41(4): 1-32.
[14] Chen X, Xu L, Liu Z, et al. Joint learning of character and word embeddings[C]//Proceedings of the 24th International Conference on Artificial Intelligence. AAAI Press, 2015: 1236-1242.
[15] Sun Y, Lin L, Tang D, et al. Radical-enhanced Chinese character embedding[C]//Proceedings of the 2014 International Conference on Neural Information Processing. Springer International Publishing, 2014: 279-286.
[16] Yin R, Wang Q, Li P, et al. Multi-granularity Chinese word embedding[C]//Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, 2016: 981-986.
[17] Tzu-Ray Su,Hung-Yi Lee. Learning Chinese word representations from glyphs of characters[C]//Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. Springer International Publishing,2017: 264-273.
[18] Zhao Z, Liu T, Li S, et al. Ngram2vec: Learning improved word representations from ngram co-occurrence statistics[C]//Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, 2017: 244-253.
[19] Shen Li, Zhe Zhao, Renfen Hu, et al. Analogical reasoning on Chinese morphological and semantic relations [C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, 2018: 138-143.
[20] 李宝贵.汉字理据性与对外汉字教学[J].汉字文化,2005,16(01): 41-43.
[21] 李明.常用汉字部件分析与对外汉字教学研究[D].北京: 北京语言大学硕士学位论文,2006.
{{custom_fnGroup.title_cn}}
脚注
{{custom_fn.content}}
基金
国家语委科研项目(ZDI135-42);国家社会科学基金(18CYY029);教育部人文社会科学基金(18YJAZH112)
{{custom_fund}}