Technical Reports

HPL-2009-323

Click here for full text: PDF

DataSeries: An Efficient, Flexible Data Format for Structured Serial Data

Anderson, Eric; Arlitt, Martin; Morrey III, Charles B.; Veitch, Alistair
HP Laboratories

HPL-2009-323

Keyword(s): data series, data format, structured serial data, data sets,

Abstract: Structured serial data is used in many scientific fields; such data sets consist of a series of records, and are typically written once, read many times, chronologically ordered, and read sequentially. In this paper we introduce DataSeries, an on-disk format, run-time library and set of tools for storing and analyzing structured serial data. We identify six key properties of a system to store and analyze this type of data, and describe how DataSeries was designed to provide these properties. We quantify the benefits of DataSeries through several experiments. In particular, we demonstrate that DataSeries exceeds the performance of common trace formats by at least a factor of two.

6 Pages

Additional Publication Information: Published in Journal- ACM SIGOPS Operating Systems Review (OSR) HPL Special issue, January 2009, volume 43, issue 1.

External Posting Date: September 21, 2009 [Fulltext]. Approved for External Publication
Internal Posting Date: September 21, 2009 [Fulltext]

Back to Index