|
|
Extracting Sentimental Lexicons from Chinese Microblog: a Classification Method using N-Gram Features |
LIU Dexi 1, NIE Jianyun2, ZHANG Jing3, LIU Xiaohua2, WAN Changxuan1, LIAO Guoqiong1 |
1. School of Information Technology, Jiangxi University of Finance and Economics, Nanchang, Jiangxi 330013, China; 2. Department of Computer Science and Operations Research, University of Montreal, Montreal, H3C3J7,Canada; 3. School of Computer Science and Engineering, South China University of Technology, Guangzhou, Guangdong 510641, China |
|
|
Abstract Sentimental analysis heavily relies on resources such as sentimental dictionaries. However, it is difficult to manually build such resources with a satisfactory coverage. A promising avenue is to automatically extract sentimental lexicons from microblog data. In this paper, we target the problem of identifying new sentimental words in a Chinese microblog collection provided at COAE 2014. We observe that traditional measures based on co-occurrences, such as pointwise mutual information, are not effective in determining new sentimental words. Therefore, we propose a group of context-based features, N-Gram features, for classification, which can capture the lexical surroundings and lexical patterns of sentimental words. Then, a classifier trained on the known sentimental words is employed to classify the candidate words. We will show that this method works better than the traditional approaches. In addition, we also observe that, different from English, many sentimental words in Chinese are nouns, which cannot be discriminated using co-occurrence-based measures, but can be better determined by our classification method.
|
Received: 15 September 2014
|
|
|
|
|
|
|
|