Jump to content United States-English
HP.com Home Products and Services Support and Drivers Solutions How to Buy
» Contact HP

hp.com home


Technical Reports


printable version
» 

HP Labs

» Research
» News and events
» Technical reports
» About HP Labs
» Careers @ HP Labs
» People
» Worldwide sites
» Downloads
Content starts here

  Click here for full text: PDF

Quality Assurance for Document Understanding Systems

Yacoub, Sherif

HPL-2002-116

Keyword(s): quality assurance; document understanding; content remastering

Abstract: Document understanding is a field that is concerned with semantic analysis of documents to extract human understandable information and codify it into machine- readable form. Document understanding systems provide means to automatically extract meaningful information from a raster image of a document. Those systems provide means to create information rich content that is usable in many end-user applications such as search and retrieval. To process a large volume of data, such as the collection of books and journals produced by a publisher, content understanding systems should run non-stop in an automated fashion and in an unattended operation mode. Ensuring the quality of the output of such system is a challenging task due to several factors including the unattended nature of the system and the mass amount of data (in terabytes) which could give rise to considerable number of exceptions. Automated quality assurance (QA) techniques are essential to the success of the operation of a large- scale document understanding system. In this paper, we propose QA techniques that are essentially needed for a document understanding system and their automation.

19 Pages

Back to Index

»Technical Reports

» 2009
» 2008
» 2007
» 2006
» 2005
» 2004
» 2003
» 2002
» 2001
» 2000
» 1990 - 1999

Heritage Technical Reports

» Compaq & DEC Technical Reports
» Tandem Technical Reports
Privacy statement Using this site means you accept its terms Feedback to HP Labs
© 2009 Hewlett-Packard Development Company, L.P.