Information Retrieval and Question Answering
ZHUAN Yue, XIONG Jinhua, MA Hongyuan, CHENG Shuyang, CHENG Xueqi
2016, 30(2): 99-106.
Query in Chinese information retrieval system often contains Chinese, Chinese phonetic alphabet and English etc. Existing method can not solve the issue of mixed language and long Chinese query. In order to solve these problems, we propose a parallel query correction method for mixed language. The method establishes language model with mixed language and built the heterogeneous character dictionary tree according to the corresponding edit rules to process the query words. For the long Chinese query, we put forward spell correction model of two-way parallel. For paralle processing, we put forward the concept of reverse character dictionary tree and reverse language model. The training corpus used in the model is extracted from the user query log, click log, web links and other information. Experiment shows that the parallel query correction method for mixed language increases the accuracy by 9%, reduces the recall by 3%, and, especially, speeds up the processing by 40% compared to single pass query correction.