Information Retrieval and Question Answering
KE Wenjun, GAO Jinhua, SHEN Huawei, LIU Yue, CHENG Xueqi
2020, 34(10): 76-84.
For domain-specific question answering (QA) systems, question retrieval via template matching proves to be effective and stable. However, existing template extraction methods usually work in a supervised manner, resulting in heavy dependence on manually annotated data and poor extensibility among different domains. To address this issue, this paper proposes an unsupervised template extraction method based on an improved Apriori algorithm. For given samples of question utterances, the frequently occurred phrases are first orderly extracted as frame words of candidate templates. The information inhabited in candidate templates is measured via TF-IDF, and candidates with low information are filtered out. In particular, to allow longer templates, an adaptive updating mechanism for support threshold is proposed. Finally, NER methods are adopted to locate slots, and question templates are obtained by combining frame words and the corresponding slots. Experimental results show that our method can effectively extract question templates for specific domains and obtain better results than baseline models.