面向数据的分析技术(Data-Oriented Parsing ,DOP) 首先由Scha (1990) 年提出。该处理技术具体表达了这样的假设:人类对语言的领悟和创造依赖于以往具体的语言经验,而不是依赖于抽象的语法规则。DOP 技术框架可以分为: (1) 建立包括以往成功分析的语言经验的标注语料库; (2) 从语料库中抽取片段单元来构造新语言的分析过程;(3) 计算分析过程的概率。DOP 模型建立在包含大量语言现象的语料库基础上,把经过标注的语料库看作一个语法( Grammar) 。当输入一个新的语言现象时,系统通过对语料库中片段单元的组合运算来组合分析过程。根据所有片段单元的共现频率来评估最有可能性的分析结果。本文详细论述了语料库的标注,片段单元的定义,组合分析和概率计算。
Abstract
This paper present s a data2oriented syntactic parsing (DOP) technique. The data-oriented parsing(DOP) method was suggested by Scha in 1990. Data2oriented models of language processing embody the assumption that human language perception and production works with
representations of concrete past language experiences ,rather than with abst ract grammar rules.The data-oriented syntacti parsing f ramework could be described by indication three components: (1) build marked corpus consisted of representations of past language experiences ; (2)ext ract f ragment unit s f rom marked corpus to const ruct parsing procedure of new utterance.(3) a definition of the way in which the probability of an analysis of new utterance. Data2ori2ented syntactic parsing model maintains large corpora of liguistic rapresentations of previously occurring utterances ,uses marked corpus as a grammar. When processing a new input utterance,analyses of this utterance are const ructed by combining f ragment untis f rom the corpus ;the occurrencef requencies of the fragment unit s are used to estimate which analysis is the most puobable one . This paper discusses marked corpus ,f ragment unit , combination parsing and probability model in detail.
关键词
面向数据的分析技术 /
片段单元 /
组合分析 /
概率模型
{{custom_keyword}} /
Key words
data-oriented syntactic parsing /
fragment unit /
combination parsing /
probability model
{{custom_keyword}} /
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
参考文献
[1 ]姚天顺等“, 自然语言理解:一种让机器懂得人类语言的研究”,清华大学出版社 ,1995 ,12
[2 ]Rens Bod“, Data2Oriented Parsing(DOP)”,Prpceedings COL ING’92 ,Nantes ,Framce. ,1992
[3 ]Rens Bod ,“Using an Annotated Corpus as a Stochastic Grammar”,Proceeddeings EACL’93 ,Utrecht , The Netherlands ,1992.
[4 ]khalil Sima′an ,“Computational Complexity of Probabilistic Disambiguation by Means of Tree2Grammars”,Proceedings COL ING296 ,Copenhagen.
[5 ]Rens Bod“, Monte Carlo Paring”,Recent Advances in Parsing Technology ,Kluwer Acadenmic Publishers.
[6 ] Khalil Sima′an ,“An Optimized algorithm for Data Oriented Parsing”,Proceedings International Conference on Recent Advances in Natural Language Processing ,Tzigov Chark ,Bulgaria.
[7 ]Joshua Goodman“, Parsing Algorithms and Metrics”, Proceedings of th 34th Annual Meeting of the ACL ,J une ,1996
{{custom_fnGroup.title_cn}}
脚注
{{custom_fn.content}}