Abstract：Aiming at Chinese-Vietnamese bilingual news event storyline analysis, a generative model for event storyline is proposed based on global/local word pairs’ co-occurrence distribution. Firstly, the detected news topic word distribution was used as global words to characterize a global event, Then time, person, place and other event elements in the news segment divided by certain time granularity are used as local words. The are co-occurrence of global and local words is analyzed and used as supervised information, with RCRP algorithm and bilingual aligned words together, which are integrated into a bilingual topic model to get sub-topic distribution under corresponding time slice. Finally, by the sub-topic distribution representing the developing process of an event, a generative model to storyline was constructed. On Chinese-Vietnamese mixed news set crawled from the internet, the comparative experiments of storyline generation are conducted, proving that the proposed bilingual news storyline is model got better effect than the other methods. Key words Chinese-Vietnamese; news event storyline; global/local co-occurrence words; sub-topic distribution; bilingual topic model
 Gerard Salton. Introduction to modern information retrieval[M]. New York: McGraw-Hill, 1983:289-317.  Niek Hoogma. The Modules and Methods of Topic Detection and Tracking[C]//Proceedings of the 2nd Student Conference on IT. Enschede, Netherlands: University of Twente, 2005:1-6.  赵华,赵铁军,于浩等. 基于查询向量的英语话题跟踪研究[J]. 计算机研究与发展, 2007,44(8):1412-1417.  Hischeng Chang. Extraction of Topic and Event Keywords from News Story[C]//Proceedings of 2007 National Computer Symposium.Taichung, Taiwan, 2007:1-10.  Thomas Hofmann. Probalilistic Latent Semantic Indexing[C]//Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Berkeley, California: ACM, 1999:50-57.  David M Blei, Andrew Y Ng, Michael I Jordan. Latent Dirichlet Allocation[J]. Journal of Machine Learning Research, 2003,3(4-5):993-1022.  Paul Ogilvie, James Allan, David Jensen, et al. Extracting and using relationships found in text for topic tracking[R]. CIIR Technical Report IR-209Undergraduate Honors Thesis, 2000.  Thomas L Griffiths, Mark Steyvers. Finding scientific topics[C]//Proceedings of the National Academy of Sciences. USA: 2004, 101(suppl 1):5228-5235.  Kuanyu Chen, Luesak Luesukprasert, Seng-cho T Chou. Hot Topic Extraction Based on Timeline Analysis and Multidimensional Sentence Modeling[J]. IEEE Transactions on Knowledge and Data Engineering,2007,19(8):1016-1025.  Ruihua Song, Haifeng Liu, Jirong Wen, et al. Learning block importance models for web pages[C]//Proceedings of the 13th international conference on World Wide Web. New York : ACM, 2004:203-211.  Loulwah AlSumait, Daniel Barbara, Carlotta Domeniconi. On-Line Lda: Adative Topic Models for Mining Text Streams with Applications to Topic Detection and Tracking[C]//Proceedings of the 2008 8th IEEE International Conference on Data Mining. Pisa, Italy: IEEE, 2008:3-12.  Amr Ahmed, Eric Xing. Dynamic Non-parametric Mixture Models and the Recurrent Chinese Restaurant Process: With Applications to Evolutionary Clustering[C]//Proceedings of 8th SIAM International Conference on Data Mining in Applied Mathematics 130.Atlanta, GA, United states: Society for Industrial and Applied Mathematics Publications, 2008:219-230.  YingJu Chen, HsinHsi Chen. NLP and IR approaches to monolingual and multilingual link detection[C]//Proceedings of the 19th International Conference on Computational Linguistics. Stroudsburg, PA, USA: ACL, 2002:1-7.  陆前. 英、汉跨语言话题检测与追踪技术研究[D]. 北京: 中央民族大学博士论文,2013.  Wenxu Long, Jixun Gao, Zhengtao Yu, et al. Online Chinese-Vietnamese Bilingual Topic Detection Based on RCRP Algorithm with Event Elements[J]. Communications in Computer and Information Science, 2014,496(1):422-429.  Lifu Huang, Lian’en Huang. Optimizd Event Storyline Generation based on Mixture-Event-Aspect Model[C]//Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing. Seattle, WA, United states: ACL,2013:726-735.  Dragomir R Radev, Hongyan Jing, Malgorzata Stys, et al. Centroid-based summarization of multiple documents[J]. Information Processing and Management,2004,40(6):919-938.  Gunes Erkan, Dragomir R Radev. LexRank: Graph-based Lexical Centrality as Salience in Text Summarization[J]. Journal of Artificial Intelligence Research,2004,22(2):457-479.  Gunes Erkan, Dragomir R Radev. Lexpagerank: Prestige in multi-document text summarization[C]//Proceedings of EMNLP, Barcelona, Spain: ACM, 2004:365-371.  Hai Leong Chieu, Yoong Keok Lee. Query based event extraction along a timeline[C]//Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Sheffield, United Kingdom: ACM, 2004:425-432.