Technical Reports

HPL-2010-71

A Distributed Parallel Image Analysis Platform with MapReduce Integration

Liu, Keyan; Zhang, Tong; Wang, Lei; Wang, Qinlong; Ma, Yue
HP Laboratories

HPL-2010-71

Abstract: In this paper, a novel distributed parallel platform for large scale image analysis is developed to support image analysis tasks like image feature extraction, clustering and similarity matching. The platform can be widely used in large scale image processing that usually involves compute-intensive and data-intensive tasks. The platform is built upon a two level distributed and parallel computing architecture, i.e. the multi-server level and the multi-core level, and a parallel scheduler is proposed to dynamically dispatch images to different servers and different cores. The NFS (Network File System) is employed as the main storage system for storing images. A light-weight MapReduce scheme that can utilize computing resources on multi-core servers is implemented by extending Phoenix through MPI (Message Passing Interface). A face clustering application has been built over the platform to demonstrate the effectiveness and efficiency of the proposed platform. The experiments with the parallel K-means algorithm also show that the platform can achieve significant speedup and good scalability. The parallel platform framework can be proved to be a good fit for large scale image analysis.

9 Pages

External Posting Date: June 21, 2010 [Abstract Only]. Approved for External Publication - External Copyright Consideration
Internal Posting Date: June 21, 2010 [Fulltext]

Back to Index