Technical Reports
HPL-2008-30R2
Efficient Detection of Large Scale Redundancy in Enterprise File Systems
Forman, George; Eshghi, Kave; Suermondt, Jaap
HP Laboratories
HPL-2008-30R2
Keyword(s): data mining, min-hashing, set sketches, directory similarity and deduplication, file systems, scalability, storage management.
Abstract: In order to catch and reduce waste in the exponential demand for disk storage, we have developed a technology based on set sketches that enables enterprise storage managers to efficiently detect approximate duplication of large directory hierarchies, e.g. unnecessary mirroring by uncoordinated employees or departments. Identifying these duplicate or near duplicate hierarchies allows appropriate action to be taken at a high level, e.g. coordinate and consolidate multiple copies in one location.
8 Pages
Additional Publication Information: To be published in Operating Systems Review, journal, January 2009, vol.31 (1)
External Posting Date: December 18, 2008 [Fulltext]. Approved for External Publication
Internal Posting Date: December 18, 2008 [Fulltext]