|
Click here for full text:
Creating Digital Libraries: Content Generation and Re- Mastering
Simske, Steven J.; Lin, Xiaofan
HPL-2003-259
Keyword(s): zoning analysis; quality assurances; TIFF; OCR; PDF; meta-algorithmics
Abstract: This paper has two main goals: to describe the automatic creation of a digital library and to provide an overview of the meta-algorithmic patterns that can be applied to increase the accuracy of its creation. Automating the creation of useful digital libraries- that is, digital libraries affording searchable text and reusable ("re-purposable") output-is a complicated process, whether the original library is paper-based or already available in electronic form. In this paper, we outline the steps involved in the creation of a deployable digital library (> 1.2 x 106 pages) for MIT Press, as well as its implications to other aspects of digital library creation, management, use and repurposing. Input, transformation, information extraction, and output processes are considered in light of their utility in creating layers of content. Interestingly, in some aspects, scanning directly from paper offers extra opportunities for error-checking through feedback- feedforward combination. Strategies for quality assurance (QA) at the document, chapter and book level are also discussed. We emphasize the use of meta- algorithmic design patterns for application towards improving the content generation, extraction and re- mastering. This approach also increases the ease with which modules and algorithms are added to and deprecated from the system. Notes: Copyright IEEE. To be published in and presented at the International Workshop on Document Image Analysis for Libraries (DIAL'04), 23-24 January 2004, Palo Alto, California
16 Pages
Back to Index
|