Dirichlet 过程及其在自然语言处理中的应用

徐谦,周俊生,陈家骏

PDF(829 KB)
PDF(829 KB)
中文信息学报 ›› 2009, Vol. 23 ›› Issue (5) : 25-33.
综述

Dirichlet 过程及其在自然语言处理中的应用

  • 徐谦1,周俊生1,2,陈家骏1
作者信息 +

Dirichlet Process and Its Applications in Natural Language Processing

  • XU Qian1, ZHOU Junsheng1,2, CHEN Jiajun1
Author information +
History +

摘要

Dirichlet过程是一种典型的变参数贝叶斯模型,其优点是参数的个数和性质灵活可变,可通过模型和数据来自主地计算,近年来它已成为机器学习和自然语言处理研究领域中的一个研究热点。该文较为系统的介绍了Dirichlet过程的产生、发展,并重点介绍了其模型计算,同时结合自然语言处理中的具体应用问题进行了详细分析。最后讨论了Dirichlet过程未来的研究方向和发展趋势。

Abstract

Dirichlet process is a well-known nonparametric Bayesian model, with the attractive property of a flexible number of components determined by the model and the data. The Dirichlet process is an active area of research both within machine learning and in the natural language processing community. This paper introduces the origin and development of Dirichlet process, and the methods for model calculating. This paper also demonstrates how to use this model to solve natural language processing task. In the end, the future research and development trend of Dirichlet process is discussed.
Key words computer applicationg; Chinese information processing; nonparametric Bayesian model;Dirichlet process;Dirichlet process mixture model;Markov chain Monte Carlo

关键词

计算机应用 / 中文信息处理 / 变参数贝叶斯模型 / Dirichlet过程 / Dirichlet过程混合模型 / 马尔可夫链蒙特卡罗

Key words

computer applicationg / Chinese information processing / nonparametric Bayesian model / Dirichlet process / Dirichlet process mixture model / Markov chain Monte Carlo

引用本文

导出引用
徐谦,周俊生,陈家骏. Dirichlet 过程及其在自然语言处理中的应用. 中文信息学报. 2009, 23(5): 25-33
XU Qian, ZHOU Junsheng, CHEN Jiajun. Dirichlet Process and Its Applications in Natural Language Processing. Journal of Chinese Information Processing. 2009, 23(5): 25-33

参考文献

[1] Thomas Bayes (1763/1958). Studies in the History of Probability and Statistics: IX. Thomas Bayes’ Essay Towards Solving a Problem in the Doctrine of Chances[J]. Biometrika 45: 296-315.
[2] Müller and Quintana. Nonparametric Bayesian data analysis [J]. Statist. Sci. 2004, 19: 95-110.
[3] Ferguson, T.S. A Bayesian analysis of some nonparametric problems [J]. Annals of Statistics, 1973, vol.1: 209-230.
[4] Antoniak, C.E. Mixtures of Dirichlet processes with applications to Bayesian nonparametric problems [J]. Annals of Statistics. 1974, 2: 1152-1174.
[5] Blackwell, D. and MacQueen, J. Ferguson Distributions via Polya Urn Schemes [J],Annals of Statistics. 1973, 1: 353-355.
[6] Ewens, W. J. The sampling theory of selectively neutral alleles [J]. Theoretical Population Biology, 1972, 3: 87-112.
[7] Ferguson, T.S. Bayesian density estimation by mixtures of normal distributions. H.Rizvi and J. Rustagi (editors) Recent Advances in Statistics [M]. New York: Academic Press. 1983, pp. 287-303.
[8] Escobar, M.D. and West, M. Bayesian density estimation and inference using mixtures [J]. Journal of the American Statistical Association, 1995, 90: 577-588.
[9] Bush, C.A. and MacEachern, S.N. A semiparametric Bayesian model for randomized block designs [J]. Biometrika, 1996, 83: 275-285.
[10] MacEachern, S.N. and Müller, P. Estimating mixture of Dirichlet process models [J]. Journal of Computational and Graphical Statistics, 1998, 7: 223-238.
[11] Neal, R.M. Markov chain sampling methods for Dirichlet process mixture models [J]. Journal of Computational and Graphical Statistics, 2000,9: 249-265.
[12] Johnson, T.S. and Kotz, S. Urn Models and their Applications [J]. John Wiley and Sons. 1977.
[13] Sethuraman, J. A constructive definition of Dirichlet priors [J]. Statistica Sinica, 1994, 4:639-650.
[14] Poreous, I., Ihler, A., Smyth, P., and Welling, M. Gibbs sampling for (Coupled) infinite mixture models in the stick-breaking representation [C]//Proceedings of the Conference on Uncertainty in Artificial Intelligence, volume 22. 2006.
[15] Aldous, D. Exchangeability and related topics[M]. In école d’été de Probabilités de Saint-Flour XIII-1983, Springer, Berlin, 1985: 1-198.
[16] Hastings, W.K. Monte Carlo sampling methods using Markov chains and their applications [J]. Biometrika, 1970, 57: 97-109.
[17] Walker, S. and Damien, P. Sampling methods for Bayesian nonparametric inference involving stochastic processes [R]. draft manuscript. 1998.
[18] Haghighi, A. and Dan, K. Unsupervised Coreference Resolution in a Nonparametric Bayesian Model[C]//Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, 2007: 848-855.
[19] Ben Wellner, Andrew McCallum, Fuchun Peng, Michael Hay, An integrated, conditional model of information extraction and coreference with application to citation matching [C]//Proceedings of the 20th conference on Uncertainty in artificial intelligence, July 07-11, Banff, Canada, 2004: 593-601.
[20] Cardie, C. and Wagstaff, K. Noun phrase coreference as clustering [C]//Proceedings of the Joint SIGDAT Conference on, 1999 .
[21] Denis, P and Baldridge, J. Joint Determination of Anaphoricity and Coreference Resolution using Integer Programming [C]//Proceedings of NAACL HLT, 2007.
[22] Levin, B. English Verb Classes and Alternations: a preliminary investigation [M]. Chicago: University of Chicago Press. 1993.
[23] Schulte im Walde, S. Experiments on the automaticinduction of german semantic verb classes [J]. Computational Linguistics, 2006, 32: 159-194.
[24] Ghahramani, Z.,Korhonen, A and Vlachos, A. Dirichlet Process Mixture Models for Verb Clustering [C]//Text and Language Processing Workshop 2008 Proceedings, 43-48.
[25] Sun, L., Korhonen, A., & Krymolowski, Y. Verb class discovery from rich syntactic data [C]//Proceedings of the 9th International Conference on Intelligent Text Processing and Computational Linguistics. 2008.
[26] Teh, Y. W., Jordan, M. I., Beal, M. J., and Blei, D. M. Hierarchical Dirichlet processes [J]. Journal of the American Statistical Association, 2006, 101(476):1566-1581.
[27] MacEachern, S., Kottas, A., and Gelfand, A. Spatial nonparametric Bayesian models [R]. Technical Report 01-10, Institute of Statistics and Decision Sciences, Duke University. 2001.http://ftp.isds.duke.edu/WorkingPapers/01-10.html.
[28] Rasmussen, C.E. The infinite gaussian mixture model [C]//Advances in Neural Information Processing Systems 12. Cambridge, MA:MIT Press. 2000.
[29] Griffiths, T.L. and Ghahramani, Z. Infinite latent feature models and the Indian Buffet Process [R]. Gatsby Computational Neuroscience Unit Technical Report GCNU-TR, 2005.
[30] Thibaux, R. and Jordan, M.I. Hierarchical beta processes and the Indian buffet process [C]//Proceedings of the International Workshop on Artificial Intelligence and Statistics, volume 11. 2007.
[31] The, Y.W., Gorur, D., and Ghahramani, Z. Stick-breaking construction for the Indian buffet process [C]//Proceedings of the International Conference on Artificial Intelligence and Statistics, volume 11. 2007.[32] Blei, D., Griffiths, T., Jordan, M., and Tenenbaum, J. Hierarchical topic models and the nested Chinese restaurant process [C]//Advances in Neural Information Processing Systems, volume 16. 2004.
[33] Neal, R. M. Defining priors for distributions using Dirichlet diffusion trees [R]. Technical Report 0104, Department of Statistics, University of Toronto. 2001.
[34] Neal, R.M. Density modeling and clustering using Dirichlet diffusion trees [J]. J.M.Bernardo,et al.(editors) Bayesian Statistics 7. 2003.
[35] Teh, Y.W., Daume III, H., and Roy, D.M.. Bayesian agglomerative clustering with coalescents [C]//Advances in Neural Information Processing Systems, 2008, volume 20.
[36] Roy, D.M., Kemp, C., Mansinghka, V., and Tenenbaum, J.B. Learning annotated hierarchies from relational data. [C]//Advances in Neural Information Processing Systems, 2007, volume 19.
[37] 车万翔,刘挺,李生.实体关系自动抽取 [J].中文信息学报,2005,19(2): 1-6.
[38] 王厚峰.指代消解的基本方法和实现技术 [J].中文信息学报,2002,16(6): 9-17.
[39] 刘远超,王晓龙,徐志明,关毅.文档聚类综述 [J].中文信息学报,2006,20(3): 55-62.

基金

国家自然科学基金项目资助(60673043);国家863高科技项目(2006AA010109);江苏省高校自然科学基金资助(07KJB520057)
PDF(829 KB)

Accesses

Citation

Detail

段落导航
相关文章

/