Technical Reports

HPL-2009-345

Click here for full text: PDF

High Speed Raster Image Streaming For Digital Presses Using the Hadoop File System

Perry, Russell
HP Laboratories

HPL-2009-345

Keyword(s): Print, VDP, Hadoop, Binary Integer Programming

Abstract: An application of the distributed Hadoop file system to very high rate variable data printing is described. The raster image processing of a large variable data document is represented as a MapReduce process. The key challenge addressed by this paper is how to stream the resulting raster images off the Hadoop file system to a digital press at multi-gigabit data rates. To achieve this, efficient scheduling of the order in which file blocks are read by the client is beneficial. An approach to scheduling based on binary integer programming is described which generates more efficient schedules compared to a na*ive approach. The scheduling model allows the exploration of system design choices and helps to identify file block distributions that are problematic to read at high rates. Measured stream rates approaching 4Gb/s were achieved which is close to the required rate for streaming pages containing rich designs to a digital press. This required only a minor extension to the Hadoop client to allow file blocks to be read in parallel from the Hadoop data nodes.

11 Pages

External Posting Date: October 21, 2009 [Fulltext]. Approved for External Publication
Internal Posting Date: October 21, 2009 [Fulltext]

Back to Index