Sunday, December 6, 2009

Data center efficiency measure

There is a lot of talk about Green data centers or data center efficiency, but how does one measure data center efficiency? We would have to define efficiency first. Most Chief Financial Officer's (CFO's) would define efficiency as something along the lines of getting the needed work done with the least money spent, or better yet, getting the most work completed with the least possible money spent. Seems rather simple doesn't it.

So what goes into the data center? Well, the CFO would say pretty much almost every dollar that IT spends and that is spent in support of IT, as well as all the money spent on behalf of the data center by facilities and security. That's actually the easy part. How do you define the work done by the data center? This is not as easily defined. Do we use the number of bytes generated? Many processes perform considerable work just to reduce data.

For something as complex and flexible as a data center, defining the efficiency is very complex. This is why a simple rule of thumb such as PUE (Power Usage Effectiveness) and DCiE (Data Center infrastructure Efficiency) are so commonly used. PUE is not too difficult to measure. PUE is best calculated by measuring the ratio of total energy consumption (servers + cooling + power distribution + UPS) to 'useful' energy consumption (servers only). The ultimate efficient data center would have a PUE of 1, where the average enterprise data center is about 2. In general using PUE is a good start and seems pretty straight forward; a reduction in power used on the infrastructure side, gives lower PUE. However there are some anomaly's such as, increasing server power usage results in reduced PUE, though clearly less efficient. PUE and DCiE (which is a simple mathematical conversion of PUE) do not actually relate to the work done, instead they simply measure the loss of power in the infrastructure side of the data center (UPS, cooling, power distribution).

DCPE (Data Center Power Efficiency) which measures useful work to total facility power, is much better in theory, but more difficult in reality to measure. How does one define useful work for the entire data center? Until the Green Grid group comes up with a better definition of DCPE and "useful work", I suggest we use SWaP, which uses the potential for work (cpu performance benchmarks). SWaP is simply PERFORMANCE / (SPACE x POWER). In most data centers space is less important than power so a simple weight constant can be added to the equation. So what does "Performance" mean in the equation? You simply define it for your data center and systems;

  • for storage for example you might define it at as capacity in GB or maybe TB
  • for switch gear, as bandwidth maybe in Gb/s
  • for computers, as relative performance metric to the use of the system.
SWaP does not measure anything about the data center power or cooling plant, just the efficiency of the server itself. With SWaP measuring the potential of the IT hardware for efficiency, PUE measuring the energy efficiency of the infrastructure (power distribution, UPS, and cooling), we come close to a total data center efficiency measure.

To evaluate the total efficiency of a data center, first measure the infrastructure efficiency with PUE, then measure the efficiency of all the major systems in the data center with SWaP. Then devise a plan to increase efficiency of BOTH. Often the systems measured by PUE are maintained and managed by the facilities department while the systems measured with SWaP are maintained by IT. There is no point designing a new infrastructure for the existing IT equipment and typically existing facilities are often not appropriate for the new optimized IT equipment.
For example, suppose you have a large data center that is mostly tied up with storage systems and storage capacity has grown 50% every year for the past decade. The facilities group would have seen this growth in power and cooling demand and a data center efficiency initiative on their part using PUE would likely involve more efficient systems as well as an increase in capacity. However the IT side would be using some measure of the efficiency of the storage system itself, using SWaP they decide to use one of the new hybrid storage systems and move from thousands of 72GB 15K RPM fiber disks to a system using hundreds of 2TB drives with some SSD drives for cache. Resulting in a drastic reduction in power and cooling requirements.