Review
LIU Liangliang1,2, WANG Shi1, WANG Dongsheng1,2, WANG Pingze1,2, CAO Cungen1
2013, 27(3): 77-84.
Text automatic proofreading is an important research issue in NLP, and still remaing as an challenge. This paper analyzes the type and the cause of Chinese errors, and proposes an automatic detection of typos based the user query log in the domain Question Answering System. First the word segmentation is performed on the corpus, then fragments in the word segmentation result are merged, After clustering the multi-character words and the merged strings, the approach gets typos pair automatically according to the contextual analysis of similar strings. The experiment show that the recall rate is 71.32% and accuracy rate is 82.6% for this method in actual question answering system logs.
Key wordstext automatic proofreading; question answering system; no-word error; real-word error; typos pair