Review
MA Xu, XU Weiran, GUO Jun, HU Rile
(. Peking University Health Science Center, Beijing 0008, China;
. School of Information and Communication Engineering, Beijing University of Posts and Telecommunications,
Beijing 00876, China; . Nokia Research Center(China), Beijing 000, China)
2009, 23(4): 22-27.
With the popularity of short messages, smart SMS tools are urgently demanded by users, operators and government departments. However, there is no open standard SMS corpus, which is an indispensable resource for the algorithm research, system development and performance test etc, due to the technology, the copyright protection, the privacy right and other various reasons. SMS-2008, as an annotated Chinese SMS Corpus, takes the lead in establishing a multi-purpose Chinese text message corpus, which includes the original corpus, privacy tagged corpus, content tagged corpus, errors tagged corpus. This Corpus can be applied in the research of SMS language, SMS classification, privacy protection algorithm or automatically correcting system.
Key words computer application; Chinese information processing; Chinese short message; tagged corpus