Power and thermal limitations make it impossible to run all cores on a multicore system at their maximum frequency. Therefore, modern systems require careful power management. These systems must manage complex tradeoffs between energy, power, and frequency, choosing which cores to accelerate to achieve good performance while maintaining energy efficiency or operating under a power budget. Navigating these tradeoffs is especially hard with multi-threaded applications, where performance depends on the relative progress of parallel worker threads between synchronization points. Prior work on chip-level power management for multi-threaded applications has largely relied on indirect heuristics and metrics calculated from low-level performance counters to estimate each thread’s progress. However, these indirect metrics are often inaccurate. Instead, we propose to gather progress information directly from software itself. We present ThreadBeats, a simple application-level annotation framework that directly and accurately conveys thread progress information to hardware. We design DVFS controllers that exploit ThreadBeats information for two purposes: (i) improving performance by equalizing thread progress and (ii) minimizing runtime under a power budget constraint. These controllers reduce wait time at barriers by 77% on average and improve energy-delay product under a power budget by 23% over prior work.
This paper described recent improvements to the Graphite simulator designed to help explore current and emerging research topics. With these improvements, Graphite is ideally suited to explore both power and performance in future multicore and manycore processors, especially those incorporating dynamic runtime monitoring and adaptation. Separate validation of Graphite has shown performance results within about 6% on average (18% worst case) of a cycle-level simulator and normalized power trends are predicted to within 10%. This makes Graphite accurate enough for medium- to long-term studies while maintaining very high performance. Graphite is freely available for anyone to use: https://github.com/mit-carbon/Graphite.
This paper presents a self-aware processor with energy monitoring circuits that can measure actual energy consumption of the key blocks. The monitors are embedded into on-chip DC/DC converters and generate results within 10% of accuracy with minimal power (<0.1%) and area (<1%) overhead. Our system, which is implemented in 0.18um technology, is designed to be voltage scalable from 1.8V down to 0.6V. Low-voltage SRAM operation is made possible through the use of 8T bit-cells and write-assists. The d-caches are designed to be re-configurable in associativity and size to adapt to compute- versus cache-bound phases of applications. Cache configuration is performed in < 3 clock cycles including tag invalidation. These hardware features enable a software self-aware computation engine (SEEC) to dynamically adapt the processor to meet performance and energy goals. Measurement results show that up to 8.4x energy savings can be achieved with DVFS and self-adaptation.
Henry Hoffmann, Jim Holt, George Kurian, Eric Lau, Martina Maggio, Jason E Miller, Sabrina M. Neuman, Mahmut Sinangil, Yildiz Sinangil, Anant Agarwal, Anantha P Chandrakasan, and Srinivas Devadas. 2012. “Self-aware computing in the Angstrom processor.” In Proceedings of the 49th Annual Design Automation Conference (DAC), Pp. 259–264. Full TextAbstract
Addressing the challenges of extreme scale computing requires holistic design of new programming models and systems that support those models. This paper discusses the Angstrom processor, which is designed to support a new Self-aware Computing (SEEC) model. In SEEC, applications explicitly state goals, while other systems components provide actions that the SEEC runtime system can use to meet those goals. Angstrom supports this model by exposing sensors and adaptations that traditionally would be managed independently by hardware. This exposure allows SEEC to coordinate hardware actions with actions specified by other parts of the system, and allows the SEEC runtime system to meet application goals while reducing costs (e.g., power consumption).