Technical Reports

HPL-2009-164

An Integrated Power, Area, and Timing Modeling Framework and its Application to Scaling and Clustering Tradeoffs in Future Many core Architecture

Ahn, Jung Ho; Jouppi, Norm; Li, Sheng
HP Laboratories

HPL-2009-164

Keyword(s): Modeling, Power, Area, Timing, Multicore

Abstract: This paper introduces McPAT, an integrated power, area, and timing modeling framework that supports comprehensive design space exploration for multicore and manycore processor configurations ranging from 90nm to 22nm and beyond. At the microarchitectural level, McPAT includes models for the fundamental components of a chip multiprocessor, including in- order and out-of-order processor cores, networks-on- chip, shared caches, integrated memory controllers, and multiple-domain clocking. At the circuit and technology levels, McPAT supports critical-path timing modeling, area modeling, and dynamic, short-circuit, and leakage power modeling for each of the device types forecast in the ITRS road map including bulk CMOS, SOI, and double-gate transistors. McPAT has a flexible XML interface to facilitate use with many performance simulators. Combined with a performance simulator, McPAT enables architects to consistently quantify the cost of new ideas and assess subtle tradeoffs of different architectures using me trics like energy-delay-area product (EDAP). In this paper, we explore the interconnect options of future manycore processors by varying the degree of clustering over generations of process technologies. Clustering will bring interesting tradeoffs between area and performance because the crossbars needed to group cores into clusters incur area overhead, but many applications can make good use of them due to synergies of cache sharing. By incorporating the power, area, and timing results of McPAT into performance simulation at the 22nm technology node, we find that configuring a cluster using 4 cores and 1 core gives the best EDAP for multithreaded and consolidated workloads, respectively, whereas clustering 8 cores together gives the best energy- delay product on all workloads because cost was not taken into account.

20 Pages

External Posting Date: July 21, 2009 [Abstract Only]. Approved for External Publication - External Copyright Consideration
Internal Posting Date: July 21, 2009 [Fulltext]

Back to Index