Abstract:"Imperial Collection of Four" is a sutra and representation of Chinese antient books. So the digitalization works of this Collection will accumulate and provide experiences for other antient books. This system is the pre-processing system of costumized OCR system for the digitized publication of "Imperial Collection of Four". The main function of this system is to analysis and undterstand the page images scanned from the Collection , then to seperate the Chinese characters in them for the use of recognition and statistics ,meanwhile extracting the layout structures for rebuilding and publishing. The design of the system adoptted top-down approaches with bottom-up ones ,and also adoptted automatic processings with manual correcting. In application , this system has been used to process a large numbers of page images ,and has shown efficient and satisfiable performance. It provides a stable ground for the pre-processing works ,and builds up a good situation for learning and recognition procedures of the recogintion system.