SUN Maosong1, LIU Ting2, JI Donghong3, SUI Zhifang4, ZHAO Jun5, ZHANG Bo1, WUSHOUER Silamu6, YU Shiwen4, ZHU Jun1, LI Jianmin1, LIU Yang1, WANG Houfeng4, TURGUN Ibrahim6, LIU Qun7, LIU Zhiyuan1
Author information+
1. Department of Computer Science, Tsinghua University, Beijing 100084, China;
2. School of Computer Science and Technology, Harbin Institute of Technology, Harbin, Heilongjiang 150001,China;
3. School of Computer Science, Wuhan University,Wuhan, Hubei 430072, China;
4. School of Information Science and Technology, Peking University, Beijing 100871, China;
5. Institute of Automation, Chinese Academy of Sciences, Beijing 100190,China;
6. School of Information Science and Technology, Xinjiang University, Urumqi, Xinjiang 830046, China;
7. Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China
This paper surveys research frontiers of language computing in the context of Web-scale text information processing, covering the perspectives of fundamental computational model, language analysis algorithm, linguistic resource construction, machine translation, content understanding as well as question and answering. Several related key issues are discussed, and their significance to Chinese information processing in the near future is also addressed.
SUN Maosong, LIU Ting, JI Donghong, SUI Zhifang, ZHAO Jun, ZHANG Bo, WUSHOUER Silamu, YU Shiwen, ZHU Jun, LI Jianmin, LIU Yang, WANG Houfeng, TURGUN Ibrahim, LIU Qun, LIU Zhiyuan.
Frontiers of Language Computing. Journal of Chinese Information Processing. 2014, 28(1): 1-8
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
参考文献
[1] 张钹,自然语言处理的计算模型[J]. 中文信息学报,2007,21(3): 3-7.
[2] Tenenbaum J, Kemp C, Griffiths T, et al. How to Grow a Mind: Statistics, Structure, and Abstraction[J]. Science, 2011,(331): 1279-1285.
[3] Zhu J, Lao N, Xing E. Grafting-Light: Fast, Incremental Feature Selection and Structure Learning of Markov Networks[C]//Proceedings of SIGKDD International Conference on Knowledge Discovery and Data Mining, 2010.
[4] Kim S, Xing E. Tree-guided Group Lasso for Multi-task Regression with Structured Sparsity[C]//Proceedings of International Conference on Machine Learning (ICML), 2010.
[5] Zhu J, Xing E, Zhang B. Laplace Maximum Margin Markov Networks[C]//Proceedings of International Conference on Machine Learning (ICML):1256-1263, 2008.
[6] Ganchev K, Gra a J, Gillenwater J, et al. Posterior Regularization for Structured Latent Variable Models[J]. Journal of Machine Learning Research. 2010(11):2001-2049.
[7] Altun Y, Tsochantaridis I, Hofmann T. Hidden Markov Support Vector Machines[C]//Proceedings of International Conference on Machine Learning (ICML), 2003.
[8] Poon H, Domingos P. Unsupervised Ontology Induction from Text[C]//Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL), 2010.
[9] Cohen S, Smith N. Covariance in Unsupervised Learning of Probabilistic Grammars[J]. Journal of Machine Learning Research, 2010(11):3017-3051.
[10] Hinton G, Osindero S, Teh Y. A Fast Learning Algorithm for Deep Belief Nets[J]. Neural Computation, 2006(18): 1527-1554.
[11] Bengio Y, Lamblin P, Popovici D, et al. Greedy Layer-Wise Training of Deep Networks[C]//Proceedings of Advances in Neural Information Processing Systems 19 (NIPS 2006): 153-160, MIT Press, 2006.
[12] Ranzato M A, Poultney C, Chopra S, et al. Efficient Learning of Sparse Representations with an Energy-Based Model[C]//Proceedings of Advances in Neural Information Processing Systems (NIPS 2006), MIT Press, 2007.
[13] Hinton G E, Salakhutdinov R. Reducing the dimensionality of data with neural networks[J]. Science, 2006(313): 504-507.
[14] Seide F, Li G, Yu D. Conversational Speech Transcription Using Context-Dependent Deep Neural Networks[C]//Proceedings of the International Conference on Spoken Language Processing (INTERSPEECH), 2011:437-440.
[15] Collobert R, Weston J, Bottou L, et al. Natural Language Processing (Almost) from Scratch[J]. Journal of Machine Learning Research, 2011(12): 2493-2537.
[16] Raykar V C, Yu S, Zhao L H, et al. Learning from Crowds[J]. Journal of Machine Learning Research, 2010(4):1297-1322.
[17] Zhu J, Nie Z, Liu X, et al. StatSnowball: a Statistical Approach to Extracting Entity Relationships[C]//Proceedings of International Conference on World Wide Web (WWW), 2009: 101-110.
[18] Koo T, Carreras X, Collins M. Simple Semi-supervised Dependency Parsing[C]//Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL), Columbus, Ohio, June, 2008, 595-603.
[19] Chen W, Kazama J. Bitext Dependency Parsing with Bilingual Subtree Constraints[C]//Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL), Uppsala, Sweden, 2010, 21-29.
[20] Bansal M, Klein D. Web-Scale Features for Full-Scale Parsing[C]//Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (ACL), Portland, Oregon, USA, 2011, 693-702.
[21] Wong Y, Mooney R. Learning Synchronous Grammars for Semantic Parsing with Lambda Calculus[C]//Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics (ACL), Prague, Czech Republic, 2007(6): 960-967.
[22] Kwiatkowski T, Zettlemoyer L S, Goldwater S, et al. Inducing Probabilistic CCG Grammars from Logical Form with Higher-Order Unification[C]//Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, Cambridge, MA, October, 2010: 1223-1233.
[23] Liang P, Jordan M I, Klein D. Learning Dependency-Based Compositional Semantics[C]//Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (ACL-HLT), Portland, Oregon, USA, 2011, 590-599.
[24] Poon H, Domingos P. Unsupervised Semantic Parsing[C]//Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing (EMNLP), Singapore, 2009,8: 1-10.
[25] Rahman, V. Ng. Coreference Resolution with World Knowledge[C]//Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics (ACL), Human Language Technologies,2011: 814-824.
[26] Lin Z, Kan M, Ng H T. Recognizing Implicit Discourse Relations in the Penn Discourse Treebank[C]//Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing (EMNLP), Singapore, 2009,8: 343-351.
[27] Wang W, Su J, Tan C. Kernel-based Discourse Relation Recognition with Temporal Ordering Information[C]//Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics (ACL), Uppsala, Sweden, 2010, 710-719.
[28] Lin Z, Kan M, Ng H T. Automatically Evaluating Text Coherence Using Discourse Relations[C]//Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (ACL), USA, 2011, 997-1006.
[29] Palmer M, Kingsbury P, Gildea D. The Proposition Bank: An Annotated Corpus of Semantic Roles. Computational Linguistics,2005, 31(1): 71-106.
[30] Meyers A. Annotation Guidelines for Nombank-Noun Argument Structure for Propbank. Technical report, New York University. 2007
[31] Baker F, Fillmore J, Lowe B. The Berkeley FrameNet Project[C]//Proceedings of the the Joint Conference of the International Committee on Computational Linguistics and the Association for Computational Linguistics (COLING-ACL). 1998.
[32] Xue N, Palmer M. Annotating Propositions in the Penn Chinese Treebank[C]//Proceedings of the 2nd SIGHAN Workshop on Chinese Language Processing, in conjunction with ACL03. Sapporo, Japan, 2003.
[33] Mann C, Thompson A. Rhetorical Structure Theory: Towards a Functional Theory of Text Organization[J]. Text, 1998,8(3):243-281.
[34] Pradhan S, Xue N, OntoNotes: the 90% Solution[C]//Proceedings of the 11th Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL). Tutorial, 2009.
[35] Wu D, Fung P. Semantic Roles for SMT: A Hybrid Two-Pass Model[C]//Proceedings of the 11th Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL), 2009.
[36] Liu D, Gildea D. Semantic Role Features for Machine Translation[C]//Proceedings of the conference of the International Committee on Computational Linguistics (COLING), 2010.
[37] Gao Q, Vogel S. Corpus Expansion for Statistical Machine Translation with Semantic Role Label Substitution Rules[C]//Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics (ACL), 2011.
[38] Oflazer K. Statistical Machine Translation into a Morphological Complex Language[C]//Proceedings of the Conference on Intelligent Text Processing and Computational Linguistics (CICLing), 2008.
[39] Etzioni O. Search needs a shake-up[J]. Nature, 2011(476): 25-26.
[40] Etzioni O. Anthony Fader, Janara Christensen. Open Information Extraction: the Second Generation[C]//Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI), 2011.
[41] Schoenmackers S. Inference over the Web[D], Ph.D thesis, Washington University. 2011.
[42] Carlson A, et al. Toward an Architecture for Never-Ending Language Learning[C]//Proceedings of the Twenty-Fourth AAAI Conference on Artificial Intelligence (AAAI), 2010: 1306-1313.