杨冰冰,赵慧周,王治敏. 基于词语聚类的汉语口语自动推送素材研究[J]. 中文信息学报, 2022, 36(6): 155-161.
YANG Bingbing, ZHAO Huizhou, WANG Zhimin. Automatic Push of Spoken Chinese Learning Materials Based on Word Clustering. , 2022, 36(6): 155-161.
Automatic Push of Spoken Chinese Learning Materials Based on Word Clustering
YANG Bingbing1, ZHAO Huizhou2, WANG Zhimin1
1.Research Institute of International Chinese Language Education, Beijing Language and Culture University, Beijing 100083, China; 2.School of Information Science, Beijing Language and Culture University, Beijing 100083, China
Abstract:The COVID-19 has made online teaching an inevitable trend. This paper presents the materials suitable for automatic pushing of Chinese textbooks. Firstly, we analyze the overall characteristics of vocabulary based on 10341 spoken Chinese corpus. On this basis, according to the Chinese word vector data published by Tencent AL LAB, we use the K-means algorithm to cluster the spoken words. And we construct a Chinese spoken topic-scene material library with reference to the word clustering results and the investigation of spoken corpus topics and scenes. The database containing 15 primary topics, 102 secondary topics and 81 communicative scenes. We also summarize the common words of topics at all levels. This paper can provide resource support for the material library of automatic customization of teaching materials.
[1] 刘华.基于文本分类中特征提取的领域词语聚类[J].语言文字应用,2007(01): 139-144. [2] 吕荣兰.基于语料库的对外汉语口语话题及话题词表构建[D].广州: 暨南大学硕士学位论文,2011. [3] 喻雪玲.基于语料库的商务汉语话题库及话题词表构建[D].广州: 暨南大学硕士学位论文,2013. [4] 陈珏铭.基于话题及交际图式的汉语会话常用词和常用句研究[D].广州: 暨南大学硕士学位论文,2015. [5] 国家汉语国际推广领导小组办公室.国际汉语教学通用课程大纲[M].北京:外语教学与研究出版社,2008:31-63. [6] 苏新春,唐诗瑶,周娟.话题分析模块及七套海外汉语教材的话题分析[J].江西科技师范学院学报,2011(06): 58-65. [7] 周小兵.建设数字化国际汉语教学资源库[J].华文教学与研究,2010(01): 1. [8] 郑艳群.多属性标注的汉语口语教学多媒体素材库建设及应用[J].语言教学与研究,2012(05): 34-39. [9] 杨寄洲.对外汉语初级阶段功能大纲[M].北京:北京语言大学出版社,2000. [10] 刘华,方沁.汉语教学用话题库及话题分类影视资源库构建[J].世界汉语教学,2014(03): 378-392. [11] 俞士汶,等.现代汉语语法信息词典详解(第2版)[M].北京:清华大学出版社,2003. [12] 中国语言文字工作委员会.中国语言生活状况报告[M].北京: 商务印书馆,2019. [13] Song Y,Shi Sh,Li J, et al. Directional skip-gram: explicitly distinguishing left and right context for word embeddings[C]//Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2018: 175-180.