Boost Your Performance Longer with Phase Change Material-Based Cooling
Phase Change Materials (PCMs) have been used as passive coolers to boost processor performance while maintaining safe temperatures. This article presents a technique that increases the benefits of PCM by monitoring the PCM state and using this information to utilize PCM more efficiently.
Maximizing performance while operating under safe temperatures is a major challenge for today’s computers. Due to the current trends in technology scaling, on-chip power density is increasing at a rate that exceeds the ability to remove heat. This implies that, even if the number of transistors per area increases with each generation, we cannot activate them all simultaneously while sustaining safe temperatures (also known as the dark silicon phenomenon). On the other hand, obeying thermal constraints is highly important, as high on-chip temperatures limit the energy efficiency and shorten processor lifetime significantly.
Recently, the placement of Phase Change Materials (PCMs) has been proposed as a passive cooling alternative. PCMs are compounds that can store very large amount of heat at near-constant temperature during phase transition, such as from solid to liquid. Owing to its heat storage capability, PCM delays the rise of temperature during high activity periods. Thus, PCM has been used in cooperation with performance boosting algorithms.
One of the major performance-boosting algorithms is computational sprinting, which refers to exceeding the thermal design power (i.e., TDP, the amount of power that the chip can dissipate while sustaining safe temperature) of the chip for short durations of high computational demand.[1,2] The main idea of sprinting is to utilize the dark cores for a short time and exploit the thread-level parallelism for performance speedup. Once the thermal limit is reached, the sprint ends and normal operation continues. When used together with computational sprinting, PCM extends the sprinting duration as it provides additional thermal headroom, leading to higher speedup.
Various sprinting policies have been proposed in recent work. However, there is unexplored potential for further improvement. In this article, I present a new sprinting strategy, Adaptive Sprinting, which improves the performance of multithreaded applications in systems with PCM-enhanced cooling. Adaptive Sprinting monitors the PCM state at runtime, and based on this information, it decides on the number of CPU cores to sprint with, the location of the sprinting cores, and their Voltage/frequency (V/f) settings. By making “PCM-aware” decisions, Adaptive Sprinting utilizes the available PCM heat storage capacity more effectively and saves significant energy compared to the existing strategies.
PCM AS A PASSIVE COOLING ALTERNATIVE
Phase change materials have been widely used in microelectronics cooling for their thermal properties. Drawing upon the well-known analogy between the electrical and thermal phenomena, we can think of PCM acting as a very large thermal capacitor during phase transition. Similar to how electrical capacitors resist the change of voltage, PCM resists the change of temperature during the phase transition, hence, provides close-to-constant temperature. Likewise, PCM charges while melting from solid to liquid (storing heat) and discharges while freezing (releasing heat). PCM-based cooling is advantageous in that it is a passive cooling solution, meaning it does not require additional power for cooling. Moreover, it is very suitable for platforms in which the area dedicated for cooling is highly limited, such as in mobile phones and tablets.
The application of PCM to electronics cooling can be categorized into two main groups. The first group of work uses PCM as a processor package enhancer. Hybrid heat spreader or heatsink designs incorporating PCM are shown to provide lower temperature for high peak load applications. Moreover, PCM increases the energy efficiency in air-cooled systems using fans.
Another group of work focuses on using PCM on chip (i.e., placed on top of the silicon layer) to aid the performance boosting strategies, such as computational sprinting, as mentioned earlier. In the context of sprinting, PCM allows longer sprints and this way, it helps achieve higher performance speedup. However, the heat storage capability of PCM is limited, that is, once the PCM finishes melting, its temperature will rise much faster. Before we can reuse it, PCM has to release the stored heat and freeze back. Therefore, when and at what rate we exhaust the available PCM capacity will strongly affect the benefit we get from sprinting. To this end, we present a novel sprinting strategy, Adaptive Sprinting, which utilizes the PCM capacity more efficiently to maximize its benefits.
ADAPTIVE SPRINTING TO MAXIMIZE PCM BENEFITS
Our Adaptive Sprinting technique is driven by observations that are not captured in prior techniques. Firstly, different parts of PCM melt at different rates depending on the spatial heat distribution across the chip. For example, the center cores typically get hotter and force the center part of PCM to melt faster compared to the side cores. Similarly, caches consume much less power than cores and stay cooler, as illustrated in Figure 1. Thus, when a center core exhausts its corresponding PCM capacity and hits a temperature limit, the side cores may still have available PCM capacity (i.e., unmelted PCM above them) to continue sprinting. However, prior techniques assume that PCM melts uniformly and follow an all or nothing approach, meaning that they either sprint by activating all the cores or not sprint by switching to idle mode or single-core operation. This approach wastes the yet unused PCM capacity and leads to suboptimal performance. Secondly, the power consumption during sprinting is not a fixed value; it is application dependent and may change over time based on the phase of an application. Assuming fixed power consumption during sprinting leads to reduced gains for applications that consume lower power. Thirdly, if we monitor the remaining unmelted PCM at various locations of the chip, we can utilize the PCM storage capability much more efficiently and extend sprinting duration.
The Adaptive Sprinting strategy exploits those observations to operate in sprinting mode as long as possible. By monitoring PCM state, it determines how much sprinting capability is left for each core. Based on this information, the policy decides on the number, the location, and the V/f settings of the cores at runtime. We change the number of sprinting cores by applying thread packing (unpacking), which refers to binding the threads of an application to a lower (higher) number of cores. This way, when a core uses up its PCM capacity and reaches the thermal limit, we can still continue sprinting with the remaining cores, instead of switching to single-core operation.
Figure 2 summarizes the flow of Adaptive Sprinting policy. Sprinting cores (S-cores) are the currently active cores. Available cores (A-cores) are the cores that have some remaining unmelted PCM above them and have lower temperature than the critical temperature. A-cores can be active or idle. Our policy monitors the temperature and remaining PCM capacity above to each core. If a core hits a temperature threshold or if its remaining unmelted PCM falls to zero, a warning is raised. In case of a warning, the policy checks for the number of available cores. Depending on that number, it either migrates the threads to A-cores or it continues sprinting with fewer number of cores by packing the threads to A-cores. Over time, the PCM freezes back above the idle cores. In that case, the policy unpacks the threads and sprinting continues with more cores. In order to decide on the V/f setting of the cores, we follow offline analysis approach. We generate a lookup table, which holds the most efficient V/f setting for a given number of S-cores, and the policy polls this table at runtime. The remaining unmelted PCM can be tracked by using thermal sensors and thermal resistance values. We describe the details on how such a soft PCM-monitor can be implemented in real life and in a simulation environment in our prior work.[3,4]
STATE-OF-THE-ART SPRINTING TECHNIQUES
Before we dive into the evaluation of our technique, let’s first briefly consider the existing sprinting mechanisms from prior work. Truncated sprints strategy activates all cores of a system at the highest V/f level during sprinting. When any core reaches a thermal threshold, the sprint is truncated by putting all cores but one into idle mode. Upon sprint truncation, execution continues on a single core until the application finishes. We implement an improved version of it, which allows re-sprinting when a portion of the PCM freezes back over time. Fixed duty cycle sprinting alternates between sprint (all cores active at the highest V/f) and rest (all cores idle) modes based on a predetermined duty cycle. Duty cycle is determined for the worst-case power consumption scenario; thus, this policy results in suboptimal performance for lower-power applications. Sprint pacing keeps all the cores active throughout the execution, but lowers the pace of the sprint when 50% of the PCM is melted. Switching the V/f levels of the cores from the highest to the lowest level lowers the pace. It is not clear how the policy behaves if there is thermal violation during low-pace sprinting. Thus, we also implement a thermal-aware version of it, modified sprint pacing, which puts the cores to idle if thermal violation occurs during low-pace sprint. Lastly, we implement reactive DVFS, which represents the dynamic voltage frequency scaling (DVFS) policies in current processors. If a core hits a temperature threshold, this policy decreases the V/f setting in steps, and after cooling down it increases it back.
EVALUATION OF ADAPTIVE SPRINTING
We evaluate the benefits of Adaptive Sprinting by comparing it against the prior sprinting strategies. For this purpose, we use a full system simulation framework that models the thermal behavior of PCM as well as the impact of sprinting strategies on the performance of applications. Our PCM thermal model was validated both against a computational fluid dynamics model and on a hardware/software testbed. We run a subset of benchmarks from the PARSEC multithreaded benchmark suite.
Figure 3 compares the running times of the individual benchmarks normalized to the no management case. In the no management case, no thermal management policy applied and benchmarks run with all available cores at the highest V/f level, which represents ideal performance. As indicated in Figure 3, truncated sprints and fixed duty cycle policies results in significantly longer running time (up to 4.2× of the ideal). The main reason of poor performance for these policies is that, by switching to single-core operation of idling all cores, they lose the benefit of thread-level parallelism.
Sprint pacing and Adaptive Sprinting seems to give similar and best performance. However, as discussed earlier, sprint pacing is not thermally aware, and results in temperature violation up to 60% of the execution time. On the other hand, when compared to modified sprint pacing (a temperature-aware version of it), Adaptive Sprinting gives 42% higher performance on average. Finally, Adaptive Sprinting provides 29% better performance than reactive DVFS without exceeding temperature limit. The reason is that Reactive DVFS tries to mitigate the temperature problem merely by applying DVFS without deactivating cores. However, in a system where a large portion of the cores is ‘dark cores’, even operating the cores at the lowest V/f level means exceeding the TDP. Eventually, reactive DVFS exhausts the thermal headroom and results in more frequent idling of the cores. On the other hand, Adaptive Sprinting allows extended sprints with fewer cores; hence, it exploits some level of parallelism while avoiding complete idling of the cores.
We also analyze the resulting energy consumption and energy-delay product (EDP) for each sprinting policy. In Figure 4, we compare the energy and EDP for the thermally aware sprinting policies, averaged across the benchmarks and normalized to the no management case. As shown, Adaptive Sprinting saves energy by 22% and reduces the EDP by 43% in comparison to the best performing strategy.
In this article, we have presented a novel sprinting strategy that improves the performance of multithreaded applications on systems with PCM-based cooling. Adaptive Sprinting policy tracks the PCM state over time and uses this information for runtime sprinting decisions. Being “PCM-aware,” Adaptive Sprinting is able to utilize the available PCM heat storage capability much more efficiently, allow longer sprints, and provide significant performance gains. Experimental evaluation demonstrates 29% performance improvement, 22% energy savings, and 43% EDP reduction compared to the state-of-the-art sprinting schemes.
 A. Raghavan, Y. Luo, A. Chandawalla, M. Papaefthymiou, K. Pipe, T. Wenisch, and M. Martin, “Computational Sprinting,” International Symposium on High Performance Computer Architecture (HPCA), 2012.
 A. Raghavan, L. Emurian, L. Shao, M. Papaefthymiou, K. Pipe, T. Wenisch, and M. Martin, “Computational Sprinting on a Hardware/Software Testbed,” International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2013.
 F. Kaplan and A. Coskun, “Adaptive Sprinting: How to Get the Most Out of Phase Change Based Passive Cooling,” International Symposium on Low Power Electronics and Design (ISLPED), 2015.
 C. De Vivero, F. Kaplan, and A. Coskun, “Experimental Validation of a Detailed Phase Change Model on a Hardware Testbed,” International Technical Conference and Exhibition on Packaging and Integration of Electronic and Photonic Microsystems (InterPACK), 2015.
 F. Kaplan, C. De Vivero, S. Howes, M. Arora, H. Homayoun, W. Burleson, D. Tullsen, and A. Coskun, “Modeling and Analysis of Phase Change Materials for Efficient Thermal Management,” International Conference on Computer Design (ICCD), 2014.
PUBLISHED IN CIRCUIT CELLAR MAGAZINE • NOVEMBER 2016 #316– Get a PDF of the issueSponsor this Article
Fulya Kaplan is a PhD candidate in the Electrical and Computer Engineering Department at Boston University. In 2011, she earned a BS degree in Electrical and Electronics Engineering from the Middle East Technical University in Turkey. Fulya’s research focuses on thermal modeling and runtime management in multicore processors and data centers, with a particular emphasis on advanced cooling techniques. She is a student member of the IEEE.