|
Chronos: a System for Digitally Recapturing Content from Scanned Magazines
Abad, Jose; Burns, John; Faraboschi, Paolo; Ortega, Daniel; Sanchez, Jose Antonio; Yacoub, Sherif
HPL-2005-100
External - Copyright Consideration
Keyword(s):document analysis and recognition
Abstract: The conversion of large collections of paper-based documents into digital forms that are suitable for electronic archival purposes and digital libraries is recently getting much interest from publishers, government offices and other sectors such as the healthcare one. To meet this demand, high-resolution images of the scanned paper documents must be analyzed to extract the meaningful information that they contain, with an accuracy that fits the purpose for the desired application. In such a process, and especially for large collections, an automated document analysis system and a manual correction process are both needed. The automated system needs to perform a set of analysis and recognition tasks in order to reach an accuracy level that minimizes the manual correction effort downstream. This document describes an overview of the automated document analysis and understanding system, which we called Chronos, used at HPL to recapture 80 years of TIME Magazine. It covers the overall organization of the system, the IT infrastructure, the most important components and the content representation. We show the major phases of the processing pipeline, the high-level interactions among the components and some of the tools that were developed to monitor the quality of the results of the system.
31 Pages
Back to Index
|