Technical Reports
HPL-2010-164R1
Improving the efficiency of information collection and analysis in widely-used IT applications
 Blagodurov, Sergey; Arlitt, Martin
HP Laboratories
 
HPL-2010-164R1
Keyword(s): efficiency, collection, analysis, DataSeries, Apache, Bro, Web server, Intrusion Detection System
Abstract: Modern IT environments collect and analyze increasingly large volumes of data for a growing number of purposes (e.g., automated management, security, regulatory compliance, etc.). Simultaneously, such environments are challenged by the need to minimize their environmental footprints. A general solution to this problem is to utilize IT resources more efficiently. The goals of this paper are to systematically evaluate the inefficiencies in the information collection and analysis of several widely used IT applications, to implement a more efficient solution, and to quantify the improvements. In particular, the logging of HTTP transactions by the Apache Web server and of network events by the Bro intrusion detection system will be converted from text files to DataSeries [1]. The costs of recording, storing and analyzing the information in the different formats are thoroughly evaluated and compared. In particular, we converted the text logs to DataSeries online, with no discernable overhead on the logging applications. We achieved a 7x decrease in the logfile sizes relative to the sizes of the default text logs, and speedups of almost 8x to analyze the logfiles.
25 Pages
Additional Publication Information: A shorter version of the paper will appear in the 2nd ACM/SPEC International Conference on Performance Engineering (ICPE), March 14-16, 2011, Karlsruhe, Germany.
 External Posting Date: January 6, 2011 [Fulltext].  Approved for External Publication
 
 Internal Posting Date: January 6, 2011 [Fulltext]
 


