扎西加1,高定国2. 藏语语料库TEI标记规范探讨[J]. 中文信息学报, 2011, 25(4): 66-71.
Corpus Tashigyal1, GAO Dingguo2. A Study on the TEI Standard Annotation for Tibetan. , 2011, 25(4): 66-71.
A Study on the TEI Standard Annotation for Tibetan
Corpus Tashigyal1, GAO Dingguo2
1. Computer Science of Engineering School, Tibet University Lhasa, Tibet 850000,China; 2. Tibetan Information Technology Research Center, Tibet University, Lhasa, Tibet 850000, China
Abstract:Large-scale real text processing has become a hotspot in the language information processing. To annotate the Tibetan Corpus is very important for the research on Chinese-Tibetan machine translation, information retrieval, text data mining and dictionary compilation. To facilitate the data exchange and sharing, this paper studies on on adopting the TEI coding for Tibetan corpusannotation, including the text attribute information and structure information. Key wordsTibetan; corpus; TEI mark