In my May 2014 Circuit Cellar article, “Data Centers in the Smart Grid” (Issue 286), I discussed the growing data center energy challenge and a novel potential solution that modulates data center power consumption based on the requests from the electricity provider. In the same article, I elaborated on how the data centers can provide “regulation service reserves” by tracking a dynamic power regulation signal broadcast by the independent service operator (ISO).
Demand-side provision of regulation service reserves is one of the ways of providing capacity reserves that are picking up traction in US energy markets. Frequency control reserves and operating reserves are other examples. These reserves are similar to each other in the sense that the demand-side, such as a data center, modulates its power consumption in reaction to local measurements and/or to signals broadcast by the ISO. The time-scale of modulation, however, differs depending on the reserves: modulation can be done in real time, every few seconds, or every few minutes.
In addition to the emerging mechanisms of providing capacity reserves in the grid, there are several other options for a data center to manage its electricity cost. For example, the data center operators can negotiate electricity pricing with the ISO such that the electricity cost is lower when the data center consumes power below a given peak value. In this scenario, the electricity cost is significantly higher if the center exceeds the given limit. “Peak shaving,” therefore, refers to actively controlling the peak power consumption using data center power-capping mechanisms. Other mechanisms of cost and capacity management include load shedding, referring to temporary load reduction in a data center, load shifting, which delays executing loads to a future time, and migration of a subset of loads to other facilities, if such an option is available.
All these aforementioned mechanisms require the data center to be able to dynamically cap its power within a tolerable error margin. Even in absence of advanced cost management strategies, a data center generally needs to operate under a predetermined maximum power consumption level as the electricity distribution infrastructure of the data center needs to be built accordingly.
Most data centers today run a diverse set of workloads (applications) at a given time. Therefore, an interesting sub-problem of the power capping problem is how to distribute a given total power cap efficiently among the computational, cooling, and other components in a data center. For example, if there are two types of applications running in a data center, should one give equal power caps to the servers running each of these applications, or should one favor one of the applications?
Even when the loads have the same level of urgency or priority, designating equal power to different types of loads does not always lead to efficient operation. This is because the power-performance trade-offs of applications vary significantly. One application may meet user quality-of-service (QoS) expectations or service level agreements (SLAs) while consuming less power compared to another application.
Another reason that makes the budgeting problem interesting is the temperature and cooling related heterogeneity among the servers in a data center. Even when servers in a data center are all of the same kind (which is rarely the case), their physical location in the data center, the heat recirculation effects (which refer to some of the heat output of servers being recirculated back into the center and affecting the thermal dynamics), and the heat transfer among the servers create differences in temperatures and cooling efficiencies of servers. Thus, while budgeting, one may want to dedicate larger power caps to servers that are more cooling-efficient.
As the computational units in a data center need to operate at safe temperatures below manufacturer-provided limits, the budgeting policy in the data center needs to make sure a sufficient power budget is saved for the cooling elements. On the other hand, if there is over-cooling, then the overall efficiency drops because there is a smaller power budget left for computing.
I refer to the problem of how to efficiently allocate power to each server and to the cooling units as the “power budgeting” problem. The rest of the article elaborates on how this problem can be formulated and solved in a practical scenario.
For distributing a total computational power budget in an application-aware manner, one needs to have an estimate of the relationship between server power and application performance. In my lab at Boston University, my students and I studied the relationship between application throughput and server power on a real-life system, and constructed empirical models that mimic this relationship.
Figure 1 demonstrates how the relationship between the instruction throughput and power consumption of a specific enterprise server changes depending on the application. Another interesting observation out of this figure is that, performance of some of the applications saturates beyond a certain power value. In other words, even when a larger power budget is given to such an application by letting it run with more threads (or in other cases, letting the processor operate at a higher speed), the application throughput does not improve further.
Figure 1: The plot demonstrates billion of instructions per second (BIPS) versus server power consumption as measured on an Oracle enterprise server including two SPARC T3 processors.
Estimating the slope of the throughput-power curve and the potential performance saturation point helps make better power budgeting decisions. In my lab, we constructed a model that estimates the throughput given server power and hardware performance counter measurements. In addition, we analyzed the potential performance bottlenecks resulting from a high number of memory accesses and/or the limited number of software threads in the application. We were able to predict the saturation point for each application via a regression-based equation constructed based on this analysis. Predicting the maximum server power using this empirical modeling approach gave a mean error of 11 W for our 400-to-700-W enterprise server.
Such methods for power-performance estimations highlight the significance of telemetry-based empirical models for efficient characterization of future systems. The more detailed measurement capabilities newer computing systems can provide—such as the ability to measure power consumption of various sub-components of a server—the more accuracy one can achieve in constructing models to help with the data center management.
Temperature, Once Again
In several of my earlier articles this year, I emphasized the key role of temperature awareness for improving computing energy efficiency. This key role is a result of the high cost of cooling, the fact that server energy dynamics also rely on temperature substantially (i.e., consider the interactions among temperature, fan power and leakage power), and the impact of processor thermal management policies on performance.
Solving the budgeting problem efficiently, therefore, relies on having good estimates for how a given power allocation among the servers and cooling units would affect the temperature. The first step is estimating the CPU temperature for a given server power cap. In my lab, we modeled the CPU temperature as a function of the CPU junction-to-air thermal resistance, CPU power, and the inlet temperature to the server. CPU thermal resistance is determined by the hardware and packaging choices, and can be characterized empirically. For a given total server power, CPU power can be estimated using performance counter measurements in a similar way to estimating the performance given a server cap, as described above (see Figure 1). Our simple empirical temperature model was able to estimate temperature with a mean error of 2.9°C in our experiments on an Oracle enterprise server.
Heat distribution characteristics of a data center depend strongly on the cooling technology used. For example, traditional data centers use a hot aisle-cold aisle configuration, where the cold air from the computer room air conditioners (CRAC) and the hot air coming out of the serves are separated by the rows of racks that contain the servers. The second step in thermal estimation, therefore, has to do with estimating the impact of servers to one another and the overall impact of the cooling system.
In a traditional hot-cold aisle setting, the inlet server temperatures can be estimated based on a heat distribution matrix, power consumption of all the servers, and the CRAC air temperature (which is the cold air input to the data center). Heat distribution matrix can be considered as a lumped model representing the impact of heat recirculation and the air flow properties together in a single N × N matrix, where N is the number of servers.
Recently, using in-row coolers that leverage liquid cooling to improve efficiency of cooling is preferred in some (newer) data centers to improve cooling efficiency. In such settings, the heat recirculation effects are expected to be less significant as the most of the heat output of the servers is immediately removed from the data center.
In my lab, my students and I used low-cost data center temperature models to enable fast dynamic decisions. Detailed thermal simulation of data centers is possible through computational fluid dynamics tools. Such tools, however, typically require prohibitively long simulation times.
What should the goal be during power budgeting? Maximizing overall throughput in the data center may seem like a reasonable goal. However, such a goal would favor allocating larger power caps to applications with higher throughput, and absolute throughput does not necessarily give an idea on whether the application QoS demand is met. For example, an application with a lower BIPS may have a stricter QoS target.
Consider this example for a better budgeting metric: the fair speed-up metric computes the harmonic mean of per-server speedup (i.e., per-server speedup is the ratio of measured BIPS to the maximum BIPS for an application). The purpose of this metric is to ensure none of the applications are starving while maximizing overall throughput.
It is also possible to impose constraints on the budgeting optimization such that a specific performance or throughput level is met for one or more of the applications. Ability to meet such constraints strongly relies on the ability to estimate the power-vs.-performance trends of the applications. Thus, empirical models I mentioned above are also essential for delivering more predictable performance to users.
Figure 2 demonstrates how the hill-climbing strategy my students and I designed for optimizing fair speed up evolves. The algorithm starts setting the CRAC temperature to its last known optimal value, which is 20.6°C in this example. The CRAC power consumption corresponding to providing air input to the data center at 20.6°C can be computed using the relationship between CRAC temperature and the ratio of computing power to cooling power. This relationship can often be derived from datasheets for the CRAC units and/or for the data center cooling infrastructure.
Figure 2: The budgeting algorithm starts from the last known optimal CRAC temperature value, and then iteratively aims to improve on the objective.
Once the cooling power is subtracted from the overall cap, the algorithm then allocates the remaining power among the servers with the objective of maximizing the fair speed up. Other constraints in the optimization formulation prevent any server to exceed manufacturer-given redline temperatures and ensure each server to receive a feasible power cap that falls between the server’s minimum and maximum power consumption levels.
The algorithm then iteratively searches for a better solution as demonstrated in steps 2 to 6 in Figure 2. Once the algorithm detects that the fair speed up is decreasing (e.g., fair speedup in step 6 is less than the speedup in step 5), it converges to the solution computed in the last step (e.g., converges to step 5 in the example). Note that setting cooler CRAC temperatures typically indicate a larger amount of cooling power, thus the fair speedup drops. However, as the CRAC temperature increases beyond a point, the performance of the hottest servers are degraded to maintain CPU temperatures below the redline; thus, a further increase in the CRAC temperature is not useful any longer (as in step 6).
This iterative algorithm took less than a second of running time using Matlab CVX in our experiments for a small data center of 1,000 servers on an average desktop computer. This result indicates that the algorithm can be run in much shorter time with an optimized implementation, allowing for frequent real-time re-budgeting of power in a modern data center with a larger number of servers. Our algorithm improved fair speedup and BIPS per Watt by 10% to 20% compared to existing budgeting techniques.
The initial methods and results I discussed above demonstrate promising energy efficiency improvements; however, there are many open problems for data center power budgeting.
First, the above discussion does not consider loads with some dependence to each other. For example, high-performance computing applications often have heavy communication among server nodes. This means that the budgeting method needs to account for the impact of inter-node communication for performance estimates as well as while making job allocation decisions in data centers.
Second, especially for data centers with a non-negligible amount of heat recirculation, thermally-aware job allocation significantly affects CPU temperature. Thus, job allocation should be optimized together with budgeting.
In data centers, there are elements other than the servers that consume significant amounts of power such as storage units. In addition there are a heterogeneous set of servers. Thus, a challenge lies in budgeting the power to a heterogeneous computing, storage, and networking elements.
Finally, the discussion above focuses on budgeting a total power cap among servers that are actively running applications. One can, however, also adjust the number of servers actively serving the incoming loads (by putting some servers into sleep mode/turning them off) and also consolidate the loads if desired. Consolidation often decreases performance predictability. The server provisioning problem needs to be solved in concert with the budgeting problem, taking the additional overheads into account. I believe all these challenges make the budgeting problem an interesting research problem for future data centers.
Ayse K. Coskun ([email protected]
) is an assistant professor in the Electrical and Computer Engineering Department at Boston University. She received MS and PhD degrees in Computer Science and Engineering from the University of California, San Diego. Coskun’s research interests include temperature and energy management, 3-D stack architectures, computer architecture, and embedded systems. She worked at Sun Microsystems (now Oracle) in San Diego, CA, prior to her current position at BU. Coskun serves as an associate editor of the IEEE Embedded Systems Letters.
 O. Tuncer, K. Vaidyanathan, K. Gross, and A. K. Coskun, “CoolBudget: Data Center Power Budgeting with Workload and Cooling Asymmetry Awareness,” in Proceedings of IEEE International Conference on Computer Design (ICCD), October 2014.
 Q. Tang, T. Mukherjee, S. K. S. Gupta, and P. Cayton, “Sensor-Based fast Thermal Evaluation Model for Energy Efficient High-Performance Datacenters,” in ICISIP-06, October 2006.
 J. Moore, J. Chase, P. Ranganathan, and R. Sharma, “Making Scheduling ‘Cool’: Temperature-Aware Workload Placement in Data Centers,” in USENIX ATC-05, 2005.
 CVX Research, “CVX: Matlab Software for Disciplined Convex Programming,” Version 2.1, September 2014, http://cvxr.com/cvx/.