| Click here for full text:
   
Text-mining based journal splitting
  Lin, Xiaofan
 HPL-2001-137R1
 Keyword(s): table of contents; OCR; journal splitting; text mining; text chunking; document understanding
 Abstract: This paper introduces a novel journal splitting algorithm. It takes full advantage of various kinds of information such as text match, layout and page numbers. The core procedure is a highly efficient text-mining algorithm, which detects the matched phrases between the content pages and the title pages of individual articles. Experiments show that this algorithm is robust and able to split a wide range of journals, magazines and books.
  5 Pages
  Back to Index
 |