Sunday, December 6, 2009

Data center efficiency measure

There is a lot of talk about Green data centers or data center efficiency, but how does one measure data center efficiency? We would have to define efficiency first. Most Chief Financial Officer's (CFO's) would define efficiency as something along the lines of getting the needed work done with the least money spent, or better yet, getting the most work completed with the least possible money spent. Seems rather simple doesn't it.

So what goes into the data center? Well, the CFO would say pretty much almost every dollar that IT spends and that is spent in support of IT, as well as all the money spent on behalf of the data center by facilities and security. That's actually the easy part. How do you define the work done by the data center? This is not as easily defined. Do we use the number of bytes generated? Many processes perform considerable work just to reduce data.

For something as complex and flexible as a data center, defining the efficiency is very complex. This is why a simple rule of thumb such as PUE (Power Usage Effectiveness) and DCiE (Data Center infrastructure Efficiency) are so commonly used. PUE is not too difficult to measure. PUE is best calculated by measuring the ratio of total energy consumption (servers + cooling + power distribution + UPS) to 'useful' energy consumption (servers only). The ultimate efficient data center would have a PUE of 1, where the average enterprise data center is about 2. In general using PUE is a good start and seems pretty straight forward; a reduction in power used on the infrastructure side, gives lower PUE. However there are some anomaly's such as, increasing server power usage results in reduced PUE, though clearly less efficient. PUE and DCiE (which is a simple mathematical conversion of PUE) do not actually relate to the work done, instead they simply measure the loss of power in the infrastructure side of the data center (UPS, cooling, power distribution).

DCPE (Data Center Power Efficiency) which measures useful work to total facility power, is much better in theory, but more difficult in reality to measure. How does one define useful work for the entire data center? Until the Green Grid group comes up with a better definition of DCPE and "useful work", I suggest we use SWaP, which uses the potential for work (cpu performance benchmarks). SWaP is simply PERFORMANCE / (SPACE x POWER). In most data centers space is less important than power so a simple weight constant can be added to the equation. So what does "Performance" mean in the equation? You simply define it for your data center and systems;

  • for storage for example you might define it at as capacity in GB or maybe TB
  • for switch gear, as bandwidth maybe in Gb/s
  • for computers, as relative performance metric to the use of the system.
SWaP does not measure anything about the data center power or cooling plant, just the efficiency of the server itself. With SWaP measuring the potential of the IT hardware for efficiency, PUE measuring the energy efficiency of the infrastructure (power distribution, UPS, and cooling), we come close to a total data center efficiency measure.

To evaluate the total efficiency of a data center, first measure the infrastructure efficiency with PUE, then measure the efficiency of all the major systems in the data center with SWaP. Then devise a plan to increase efficiency of BOTH. Often the systems measured by PUE are maintained and managed by the facilities department while the systems measured with SWaP are maintained by IT. There is no point designing a new infrastructure for the existing IT equipment and typically existing facilities are often not appropriate for the new optimized IT equipment.
For example, suppose you have a large data center that is mostly tied up with storage systems and storage capacity has grown 50% every year for the past decade. The facilities group would have seen this growth in power and cooling demand and a data center efficiency initiative on their part using PUE would likely involve more efficient systems as well as an increase in capacity. However the IT side would be using some measure of the efficiency of the storage system itself, using SWaP they decide to use one of the new hybrid storage systems and move from thousands of 72GB 15K RPM fiber disks to a system using hundreds of 2TB drives with some SSD drives for cache. Resulting in a drastic reduction in power and cooling requirements.

Tuesday, November 24, 2009

Server power

Many IT shops are looking to reduce power and the current solution is virtualization and moving toward local clouds. Little effort is put into the servers in the data center. Now certainly virtualizing 10 servers into 2 physical systems is likely to give a good deal of power reduction. However there are still the 2 physical servers, along with other servers that may not be appropriate for virtualization, and possibly a computational GRID system.
Is there much savings in the servers themselves? of course, and possibly quite a bit. There are two things to consider here, internal components (cpu, ram, disk, power supply, etc) and the chassis itself (2 cpu system, 4 cpu system, etc.). For chassis itself this is often dictated by the requirements. Try to avoid purchasing a system with future expansion capabilities. If 2 cpus are required, purchase a system that can handle just two cpus. You will waist a good deal of energy for the expansion capability, and worse by the time you are ready to expand the system, Moore's law generally means that it is cheaper to purchase a new one than to expand.

For the internal components though, you find different situations. It is well known to look at the CPU for power differences. Most CPUs have known power draws, though the draw at any given time is based on load, and modern CPUs have many power savings features for when there is not a heavy load. Server CPUs often vary from 45 to 120 watts, however performance can vary with this power consumption. Sometimes the choice can be pretty easy to judge. Say a server is needed with the performance capabilities of two Nehalem CPUs at 2.26 GHZ. This could be met by either two E5520 CPUs at 160 watts or two L5520 CPUs at 120 watts. The L5520 CPUs will likely cost a little more but with almost 1KW savings per day with the same performance.
On a basic server system today, server processors consume the most power, followed by memory, then disks or PCI slots, the motherboard, and lastly the fan and networking interconnects.

For example say specifications require a dual cpu system with 24G RAM. With a Nehalem architecture the RAM could be configured at least 3 different ways; 12x2, 6x4, and 12x2. Further the RAM can be a mix UDIMMs and RDIMMs as well as single, dual, and quad ranked. These configurations are hard enough to work out just to get a working system but there can be a huge difference in the power consumption based on the RAM configured, worse even the manufacture of the RAM. It is pretty easy to configure two seemingly identical systems of the same make and model with different RAM configurations such that one draws nearly twice the power of the other. Some server manufacturers give some web based tools to give an indication of how much power is required for a given configuration. Generally more ranks on the DIMM the more power efficient, also use higher density DIMMs for overall system.

This savings will be slightly more than doubled at the meter. For every watt saved in the computer, the average data center saves an additional:
  • 0.04W in power distribution
  • 0.14W in UPS
  • 1.07W in cooling
  • 0.10W in building transformer and switch gear
The savings could be much more significant than just the electric cost. It is common to fit 40 1U servers or 64 blade servers to a rack. Even with just modest changes a savings of 100 watts per server is possible, which is easily achieved with combination of CPU and RAM arrangement. With 64 blades per rack this comes to 6.4KW. This could be the difference between the infrastructure being able to support the servers and needing to upgrade the power distribution, UPS, and cooling in the data center.

The hardware used for virtualization, local clouds, and compute intensive environments are taking an increasingly large portion of the data center power budget. The configuration of servers for these systems have to balance initial cost, licensing, support and now power use. The SPECpower_ssj2008 benchmark can be used to assist though the systems are often not optimally configured for virtualization or GRID work. What is needed is a merger of SPECpower and the (currently unfinished) SPECvirtualization and/or SPECmpi2007.
Anandtech has an excellent article on server performance to power.

Monday, November 23, 2009

Data center power

A recent study by Lawrence Berkeley National Lab, Self-benchmarking Guide for Data Center Energy Performance, revealed that in a typical data center installation, an average of 33 percent of total power goes to IT equipment. The rest is consumed by cooling (50%), the power system (9%), and lighting (8%). The most efficient data centers can achieve 80% power utilization for IT equipment. This measurement, however, does not take into account the efficiency of the computer systems at doing the desired work; only the ratio of electric power for computers vs. power for support equipment.

The real goal is to do the same or more electronic work using less power. Many data centers have expanded to the point where there just isn’t enough power or cooling to allow new projects. Expanding data center power and cooling infrastructure can be very costly, and will only result in increased annual costs. Spending the money to make the center more efficient solves the same problem and reduces expenses while setting the data center on the GreenIT path.

The simplest and most direct way to reduce power consumption in a data center is to reduce the power used by the equipment. In the average data center, for every watt reduced on direct electronic equipment (computers, network equipment, and storage), at least one more watt will be saved on the facility side (HVA/C , UPS, power distribution). Further, in the average data center this equipment is already at or near capacity. Therefore the most direct path to savings is in reducing the IT equipment power needs with such methods as consolidation, virtualization, use of larger disks in storage systems, etc. Efforts spend on infrastructure without corresponding IT effort is wasted, as any savings will eventually be re-absorbed by continued wasteful IT side growth.

Successful efforts must include both Facility and IT systems. This is the crux of the problem as these two groups generally have little to do with each other with very different goals and needs. Facilities often has the electric costs and infrastructure maintenance costs, with goals like reduce electric bill, maintain data center temperature and power. IT usually doesn’t have the electric costs for the computers it maintains, and has goals such as maintain up time, reduce equipment and IT maintenance costs. None of these goals are in opposition but the two groups do not likely talk to each other and many of their terms sound like a different language to each other.