: 随着互联网走入社会生活,网络聊天逐渐成为一种新的沟通渠道,网络聊天语言便应运而生。这类语言的日益丰富,给语言信息处理带来了新的挑战。研究发现,困难主要来自网络聊天语言的奇异性和动态性。本文借助真实网络聊天语言文本,对网络聊天语言的奇异性和动态性进行详细分析和归纳,并设计了面向解决奇异性和动态性问题的网络聊天语言文本识别与转换方法。我们先以网络聊天语言语料库为基础建立网络聊天语言模型和语言转换模型,通过信源–信道模型实现网络聊天语言向标准语言的转换。但该方法过于依赖网络聊天语言语料库,虽然能较好解决奇异性问题,但不能处理动态性问题。因此,我们进而以标准汉语语料库为基础建立文字语音映射模型,对信源–信道模型进行改进,最终有效解决了网络聊天语言的动态性问题。
Abstract
Network chat language becomes ubiquitous due largely to the rapid proliferation of Internet applications. Online chat now acts as am important role in human communication, which in turn makes Network chat language popular. Network chat language processing is important but difficult. The challenges mainly come from the anomalous and dynamic nature of the new text genre. The two distinct features of Chinese Network chat language are investigated and analyzed in this paper. Methods seeking to address the two features in Network chat language processing are also proposed. We first develop a source channel model to convert chat language to standard language. Unfortunately this method relies too heavily on chat language corpus rendering the method poor in addressing the dynamic nature. We propose to introduce phonetic mapping model constructed with standard language corpus to the source channel model. The extended method is proved effective in addressing the dynamic issue by our experiments.
关键词
计算机应用 /
中文信息处理 /
网络聊天语言 /
奇异性 /
动态性 /
语言信息处理
{{custom_keyword}} /
Key words
computer application /
Chinese information processing /
Network chat language /
anomalous nature /
dynamic nature /
language and information processing
{{custom_keyword}} /
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
参考文献
:[1] 郭良.05年中国5城市互联网使用现状及影响调查报告[EB].社科院社会发展研究中心,2005.
[2] 马静.语言学视野中的网络语言[J].西北工业大学学报,2002, 22(3): 52-56.
[3] 李雪华.网络语言初探[J].广西社会科学,2004, (3): 154-155.
[4] 梁书杰.对网络语言规范的探讨[J].高教论坛,2005, (6): 191-193.
[5] 袁星新.试论网络语言的基本特点[J].语言研究,2005, (12): 20-23.
[6] 祁伟.试论社会流行语和网络语言[J].语言与翻译,2002, (3): 18-22.
[7] 李润生.网络词汇的造词法探析[J].江西教育学院学报,2003, 24(2): 47-49.
[8] 李梅.谈网络语言的语词类型、特点及规范[J].语言研究,2004, (3): 48-50.
[9] 郭笃凌, 郝怀芳. 网络语言的类型、特点及其语用学意义[J]. 语言应用研究, 2006, (3): 65-67.
[10] 王登文,吴晓云.英汉网络语言语用探析[J].外语研究,2006, (9): 177-178.
[11] 陈向红,黎昌抱.网络聊天中表情达意的非规范手段研究网络聊天中表情达意的非规范手段研究[J].广西社会科学,2006, (3): 190-193.
[12] 冯念,冯广艺.网络词语的谐音及规范问题[J].河南师范学院学报,2005, (1): 138-139.
[13] 王鸿雁.汉语网络语言变体探析[J]. 社科纵横, 2005, 20 (2): 156-158.
[14] 李少丹.谈网络语言的变异现象[J].四川理工学院院报,2006, 21(4): 102-104.
[15] 赵丽萍.谈网络语言中的词汇变异现象[J].应用语言研究,2006, (7): 76.
[16] 李艳.韩金龙. IRC-聊天室非语言交际研究[J].外语电化教学,2003, (94): 7-11.
[17] 周卫红.论网络语言的后现代文化内涵[J].哲学研究晋阳学刊,2006, (2): 76-79.
[18] Gianforte, G.. 2003. From Call Center to Contact Center: How to Successfully Blend Phone, Email, Web and Chat to Deliver Great Service and Slash Costs[R]. RightNow Technologies.
[19] Heard-White, M., Gunter Saunders and Anita Pincas. 2004. Report into the use of CHAT in education. Final report for project of Effective use of CHAT in Online Learning[R]. Institute of Education, University of London.
[20] Finkelhor, D., K. J. Mitchell, and J. Wolak. Online Victimization: A Report on the Nation’s Youth[R]. Alexandria, Virginia: National Center for Missing & Exploited Children, 2000, page ix.
[21] McCullagh, D.. 2004. Security officials to spy on chat rooms. News provided by CNET Networks[R]. November 24, 2004.
[22] Xia, Y. and K.-F. Wong. 2006a. Anomaly Detecting within Dynamic Chinese Chat Text[A]. In: Proc. of EACL’06 NEW TEXT workshop[C].
[23] Xia, Y., K.-F. Wong and W. Gao. 2005. NIL is not Nothing: Recognition of Chinese Network Informal Language Expressions[A]. 4th SIGHAN Work-shop at IJCNLP’05[C]: 95-102.
[24] Xia, Y., K.-F. Wong and W. Li. 2006b. Constructing A Chinese Chat Text Corpus with A Two-Stage Incremental Annotation Approach[A]. In: Proc. of LREC 2006[C].
[25] Xia, Y., K.-F. Wong and W. Li. 2006c. A Phonetic-Based Approach to Chinese Chat Text Normalization[A]. In: Proc. of ACL’06[C]. 993-1000.
[26] Zhang, Z., H. Yu, D. Xiong and Q. Liu. HMM-based Chinese Lexical Analyzer ICTCLAS[A]. SIGHAN’03 within ACL’03[C]. 2003. 184-187.
[27] Epstein, M. E.. 1996. Statistical Source Channel Models for Natural Language Understanding[D]. PhD Thesis. New York University.
[28] Graf, D., Chen, K., Kong, J., Maeda, K.: Chinese Gigaword Second Edition[DB]. LDC Catalog Number LDC2005T14 (2005).
{{custom_fnGroup.title_cn}}
脚注
{{custom_fn.content}}
基金
国家973重点基础研究发展规划资助项目(2002CB312103); 国家自然科学基金资助项目(60503054); 中国科学院软件研究所创新工程重大项目资助
{{custom_fund}}