Language model based IR system proposed in recent 5 years has introduced the language model approach in the speech recognition area into the IR community and improves the performance of the IR system effectively. However , the assumption that all the indexed words are irrelative behind the method is not the truth. Though statistical MT approach alleviates the situation by taking the synonymy factor into account , it never helps to judge the different meanings of the same word in varied context . In this paper we propose the trigger language model based IR system to resolve the problem. Firstly we compute the association ratio of the words from training corpus and then get the triggered words collection of the query words to find the real meaning of the word in specific text context . We introduce the relative parameters into the document language model to form the trigger language model based IR system. Experiments have shown that the performance of trigger language model based IR system has been improved greatly. Compared with classical language model IR system , Precision of the trigger language model based IR system increased almost 12% and recall of the system increased 10.8%.
ZHANG Jun-lin,QU Wei-min,Sun Le,SUN Yu-fang.
An Improved Language Model-based Chinese IR System. Journal of Chinese Information Processing. 2004, 18(2): 24-30,44
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
参考文献
[1] J. Ponte and W. B. Croft. A Language Modeling Approach to Information Retrieval[A] . In : Proceedings of the 1998 ACM SIGIR Conference on Research and Development in Information Retrieval[C] . 1998 , 275 - 281.[C] [2] D. H. Miller , T. Leek and R. Schwartz. A hidden Markov model information retrieval system[A] . In : Proceedings of the 1999 ACM SIGIR Conference on Research and Development in Information Retrieval [C] . 1999 ,214 - 221. [3] A. Berger and J. Lafferty. Information retrieval as statistical translation[A] . In : Proceedings of the 1999 ACM SIGIR Conference on Research and Development in Information Retrieval[C] . 1999 ,222 - 229. [4] T. Hofmann. Probabilistic latent semantic indexing[A] . In : Proceedings of the 1999 ACM SIGIR Conference on Research and Development in Information Retrieval[C] . 1999 ,50 - 57. [5] S. Deerwester ,S. T. Dummais etc. Indexing by latent semantic analysis[J] . Journal of the Society for Information Science , 1990 ,41 (6) :381 - 407. [6] M. Srikanth and R. Srihari. Biterm Language Models for Document Retrieval [A] . In : Proceedings of the 2002 ACM SIGIR Conference on Research and Development in Information Retrieval[C] . 2002. [7] C Zhai and J Lafferty. A study of smoothing methods for language models applied to ad hoc information retrieval [A] . In : Proceedings of the 2001 ACM SIGIR Conference on Research and Development in Information Retrieval[C] . 2001. [8] Stanley F. Chen and Josha Goodman. An empirical study of smoothing techniques for language modeling. Harvard University[R] , August 1998. [9] NTCIR Workshop (research.nii.ac.jp/ntcir/index-en.html) [Z].