领域问答系统中的文本错误自动发现方法

刘亮亮1,2,王 石1,王东升1,2,汪平仄1,2,曹存根1

PDF(1079 KB)
PDF(1079 KB)
中文信息学报 ›› 2013, Vol. 27 ›› Issue (3) : 77-84.
综述

领域问答系统中的文本错误自动发现方法

  • 刘亮亮1,2,王 石1,王东升1,2,汪平仄1,2,曹存根1
作者信息 +

Automatic Text Error Detection in Domain Question Answering

  • LIU Liangliang1,2, WANG Shi1, WANG Dongsheng1,2, WANG Pingze1,2, CAO Cungen1
Author information +
History +

摘要

文本自动校对是自然语言处理的一个挑战性的研究课题,也是一个难题。该文对中文的错误类型和原因进行分析,提出了一种基于领域问答系统用户问题日志的错别字自动发现方法。该方法首先对语料进行分词,然后对分词的结果中出现的散串进行合并,对分词中的多字词和合并的串进行相似词串聚类,对相似词串的上下文语境进行统计分析,从中自动获取错别字对。实验表明,该系统获得71.32%的召回率,82.6%的准确率。

Abstract

Text automatic proofreading is an important research issue in NLP, and still remaing as an challenge. This paper analyzes the type and the cause of Chinese errors, and proposes an automatic detection of typos based the user query log in the domain Question Answering System. First the word segmentation is performed on the corpus, then fragments in the word segmentation result are merged, After clustering the multi-character words and the merged strings, the approach gets typos pair automatically according to the contextual analysis of similar strings. The experiment show that the recall rate is 71.32% and accuracy rate is 82.6% for this method in actual question answering system logs.
Key wordstext automatic proofreading; question answering system; no-word error; real-word error; typos pair

关键词

文本自动校对 / 问答系统 / 非词错误 / 真词错误 / 错别字对

Key words

text automatic proofreading / question answering system / no-word error / real-word error / typos pair

引用本文

导出引用
刘亮亮1,2,王 石1,王东升1,2,汪平仄1,2,曹存根1. 领域问答系统中的文本错误自动发现方法. 中文信息学报. 2013, 27(3): 77-84
LIU Liangliang1,2, WANG Shi1, WANG Dongsheng1,2, WANG Pingze1,2, CAO Cungen1. Automatic Text Error Detection in Domain Question Answering. Journal of Chinese Information Processing. 2013, 27(3): 77-84

基金

国家自然科学基金项目(60573063,60573064, 60773059, 61035004);国家社科基金重点项目(10AYY003)
PDF(1079 KB)

779

Accesses

0

Citation

Detail

段落导航
相关文章

/