We develop models and algorithms for analyzing server workload data and recommending a best consolidation and workload assignment plan. These algorithms are integrated with system discovery and performance data collection software tools. Our objective is to develop a set of consolidation analysis tools for servers, storage and applications that can be used by HP consultants in analyzing the customer's IT environment and optimizing it continuously for best efficiency.
Driven by the business focus on Total Cost of Ownership (TCO) and Return on Investment (ROI) on IT assets, and fueled by the rapid advancement in hardware technology (which has resulted in more powerful servers and storage systems at lower costs) and enabled by the emergence of virtualization software, IT consolidation has been a current technology concern for many companies.
We are developing models and algorithms for helping IT consultants, administrators and planners to manage their IT infrastructure assets (servers and storage) more efficiently. Our models and algorithms have a number of features:
- Our algorithms are based on actual workload trace data collected over a period of time (typically a month) that captures the workload time patterns (peaks and valleys). Unlike most existing capacity sizers, calculators, and "light-weight" consolidation tools which simply perform capacity look-ups and translations between an existing system and a target system for a hypothetical workload (and usually for a single point of time), our tools are based on actual time series of workload metrics (such as CPU utilization, memory, disk I/O, and network usage).
- Our models and algorithms allow user inputs on consolidation constraints that represent the user's business/IT operating policies and priorities. In particular, we allow probabilistic guarantees and limits that are similar to SLA (such as CPU utilization not exceeding 60% with probability 0.99). These flexible and realistic constraints give the user more "knobs" to fine tune their consolidated IT environment.
- Our algorithms allow user designation of existing servers to "remove", "re-use" or "decide". If a server is marked as "decide", our algorithm will optimize its use and will recommend to the user whether the server should be removed or be re-used.
- Data quality assurance: Since our algorithms are based on actual workload data, we implement data quality assurance rules and models to verify, diagnose, and, in limited cases, repair the data collected, before they are fed to our algorithm. This is to prevent "Garbage In, Garbage Out". Data assurance almost alerts the user of potential server configuration and performance issues.
- Graphical visualization of server workloads, automatic report generation, and a host of other productivity aids for the user to better understand the current IT environment.
Our research on consolidation involves both practical issues (such as data related issues) and theoretical modeling and algorithm design work. Currently we are actively pursuing research on consolidation at a larger scope: Servers, storage, and applications. We are tackling practical problems in automated discovery, data collection and monitoring. We also build and test application performance models for better predicting application performances therefore allowing us to optimize the servers and storage supporting the applications.
| |
|