Datacenter power consumption has become a major concern in recent years, as utilities struggle to keep up with growing demand and operators are forced to seek alternative means to keep the lights on.
According to Uptime Institute, curbing energy consumption – and by extension lowering operating costs – could be as simple as flipping the switch on any one of the performance- and power-management mechanisms build into modern systems.
We’re not talking about a trivial amount of power either. In a blog post this week, Uptime analyst Daniel Bizo wrote that simply enabling OS-level governors and power profiles could result in a 25 to 50 percent reduction in energy consumption. Scaled across a whole datacenter those savings add up pretty quickly.
Additionally, enabling processor C-states can lead to a nearly 20 percent reduction in idle power consumption. In a nutshell, C-states dictate which aspects of the chip can be turned off during idle periods.
The problem, according to Bizo, is these features are disabled by default on most server platforms today, and enabling them is often associated with performance instability and added latency.
That’s because whether you’re talking about C-or P-states, the transition from a low performance state like P6 to full power at P0 takes time. For some workloads, that can have a negative effect on observed performance.
However, Bizo argues that outside of a select few latency-sensitive workloads – like technical computing, financial transactions, high-speed analytics, and real-time operating systems – enabling these features will have negligible, if any, impact on performance while offering a substantial reduction in power consumption.
Do you really need all that perf anyway
Uptime’s argument is rooted in the belief that modern chips are capable of delivering far more performance than is required to maintain an acceptable quality of service.
“If a second for a database query is still within tolerance, there is, by definition, limited value to having a response under one tenth of a second just because the server can process a query that fast when loads are light. And, yet, it happens all the time,” Bizo wrote.
Citing benchmark data published by Standard Performance Evaluation Corp. and The Green Grid, Uptime reports that modern servers typically achieve their best energy efficiency when their performance is limited to something like P2.
Making matters more difficult, over-performance isn’t something that’s typically tracked – while there are numerous tools out there for maintaining SLAs and QoS.
There’s an argument to be made that the faster the computation is completed, the lower the power consumption will be. For example, using 500 watts to complete a task in a minute will require less energy as a whole than consuming 300 watts for two minutes.
However, Bizo points out, the gains aren’t always that clear cut. “The energy consumption curve for semiconductors gets steeper the closer the chip pushes to the top of its performance envelope.”
In other words, there’s often a point of diminishing returns, after which you’re burning more power for minimal gains. In this case, running a chip at 500 watts just to shave off an extra two or three seconds compared to running at 450 watts probably isn’t worth it.
Plenty of knobs and levers to turn
The good news is CPU vendors have developed all manner of techniques for managing power and performance over the years. Many of these are rooted in mobile applications, where energy consumption is a far more important metric than in the datacenter.
According to Uptime, these controls can have a major impact on system power consumption and don’t necessarily have to kneecap the chip by limiting its peak performance.
The most power efficient of these regimes, according to Uptime, are software-based controls, which have the potential to cut system power consumption by anywhere from 25 to 50 percent – depending on how sophisticated the operating system governor and power plan are.
However, these software-level controls also have the potential to impart the biggest latency hit. This potentially makes these controls impractical for bursty or latency-sensitive jobs.
By comparison, Uptime found that hardware-only implementations designed to set performance targets tend to be far faster when switching between states – which means a lower latency hit. The trade-off is the power savings aren’t nearly as impressive, topping out around ten percent.
A combination of software and hardware offers something of a happy medium, allowing the software to give the underlying hardware hints as to how it should respond to changing demands. Bizo cites performance savings of between 15 and 20 percent when utilizing performance management features of this nature.
While there are still performance implications associated with these tools, the actual impact may not be as bad as you might think. “Arguably, for most use cases, the main concern should be power consumption, not performance,” Bizo wrote.