邹嘉彦1,2,邝蔼儿2,路 斌1,2,蔡永富1. 汉语共时语料库与追踪语料库语料库语言学的新方向[J]. 中文信息学报, 2011, 25(6): 38-46.
Benjamin K. Tsou1,2, Oi Yee Kwong2, LU Bin1,2, Wing Fu Tsoi1. Chinese Synchronous Corpus and Monitoring Corpus: A New Direction of Corpus Linguistics. , 2011, 25(6): 38-46.
汉语共时语料库与追踪语料库语料库语言学的新方向
邹嘉彦1,2,邝蔼儿2,路 斌1,2,蔡永富1
1.香港教育学院 语言资讯科学研究中心;2. 香港城市大学 中文、翻译及语言学系
Chinese Synchronous Corpus and Monitoring Corpus: A New Direction of Corpus Linguistics
Benjamin K. Tsou1,2, Oi Yee Kwong2, LU Bin1,2, Wing Fu Tsoi1
1. Research Centre on Linguistics and Language Sciences Research, Hong Kong Institute of Education, Hong Kong, China; 2. Department of Chinese, Translation and Linguistics, City University of Hong Kong, Hong Kong, China
Abstract:The advancement of information technology and the Internet has offered important solutions to many classical problems in Chinese natural language processing. It has also opened up new opportunities for corpus linguistics, particularly the cultivation and utilization of large corpora for monitoring and tracking various language phenomena from the linguistic perspective, and investigating such language development in relation to the underlying social and cultural implications traditionally studied by humanities and social sciences. Over the past 17 years, the LIVAC corpus has grown into a very large corpus of its kind, containing results from the analysis of about 400 million Chinese characters drawn from news media from 7 communities of pan-Chinese regions. The long-term effort behind LIVAC has enabled it to function as serial time capsules, which provide a solid foundation for scientifically tracking and monitoring various phenomena of language changes together with the associated social and cultural developments within and across pan-Chinese regions. This paper introduces how the LIVAC synchronous corpus has evolved into a monitoring corpus of Chinese communities. Key wordscorpus linguistics; LIVAC corpus; synchronous corpus; monitoring corpus
[1] 黄昌宁,李涓子.语料库语言学[M]. 北京: 商务印书馆. 2002. [2] 俞士汶,朱学峰,王惠,等. 现代汉语语法信息词典详解(第二版)[M]. 北京: 清华大学出版社, 2002. [3] Lu, B. and Tsou, B.K. Cultivating Large-Scale Parallel Corpora from Comparable Patents: From Bilingual to Trilingual and Beyond [C]//Proceedings of the Roundtable Conference on Linguistic Corpus and Corpus Linguistics in the Chinese Context, Hong Kong Institute of Education, 2011. [4] 周强. 汉语句法树库标注体系[J]. 中文信息学报,2004, 18(4): 1-8. [5] Sproat, R. and T. Emerson. Report of the First International Chinese Word Segmentation Bakeoff [C]//The ACL Second SIGHAN Workshop on Chinese Language Processing, Sapporo, Japan. 2003. [6] 台北中央研究院平衡语料库[DB/OL]. http://www.sinica.edu.tw/ftms-bin/kiwi1/mkiwi.sh. [7] Chen, K. J., C. R. Huang, L. P. Chang, H. L. Hsu. Sinica Corpus: Design Methodology for Balanced Corpora [C]//Proceedings of the 11th Pacific Asia Conference on Language, Information, and Computation (PACLIC’11), Seoul Korea. 1996: 167-176. [8] Parker, R., Graff, D., Chen, K., Kong, J., and Maeda, K. Chinese Gigaword Fourth Edition[DB/CD]. Linguistic Data Consortium, Philadelphia. 2009. [9] Huang, C.R. Tagged Chinese Gigaword Version 2.0[DB/CD]. Linguistic Data Consortium, Philadelphia. 2009. [10] 香港教育学院语言资讯科学研究中心. LIVAC共时语料库[DB/OL]. http://www.livac.org. [11] 邹嘉彦,黎邦洋. 汉语共时语料库与信息开发[M]//徐波,孙茂松,靳光谨.中文信息处理若干重要问题,北京: 科学出版社, 2003:147-165. [12] 邹嘉彦,钱志安,邝蔼儿,等. 从共时语料库延伸到追踪语料库: LIVAC《汉语共时语料库》的新发展[C]//汉语语料库及语料库语言学圆桌会议论文集. 香港教育学院. 2011. [13] 邹嘉彦,游汝杰. 全球华语新词语词典[M]. 北京: 商务印书馆. 2010. [14] Kwong, O.Y. and Tsou, B.K. A Synchronous Corpus-Based Study of Verb-Noun Fluidity in Chinese [J]. Journal of Chinese Language and Computing, 2004,13(3): 227-278.