Thursday, March 4, 2010

Is a Bloom Box in your data center future

Since the announcement of Bloom Energies Bloom Box there has been considerable news on all fronts.  I even hear that people are planning to use the units to "save energy".  What exactly can a Bloom Box do for you and your data center?
It is quite simply a form of a fuel cell.  Basically it converts a fuel to electricity with an electrochemical process similar.  A generator converts fuel to electricity via a mechanical process.  The Bloom Box does indeed use fuel (natural gas) in it's process, so it can't be used to "save energy".  It does use a fairly clean process though to generate the energy.

How can a company utilize one of these units and what is the ROI?  For an example one could us the units in a corporate data center environment in replacement of the UPS and generator.   Their ES-5000 unit is rated at 100KW and can be grouped with other units for more KW capacity.

A typical corporate data center might be rated at 200KW with two 200KW UPS units for redundancy and a 500KW generator (extra capacity for cooling).  The two UPS units will require battery replacement every 3 to 4 years as well.  Also the UPS units are located inside and will require significant cooling and space.  Most UPS units are less than 95% efficient as well (energy loss in charging and heat).

The Bloom ES-5000 units are outdoor rated so there would be considerable less need for cooling and a gain in indoor space from the removal of the UPS units.  The removal of the UPS units would also result in some increased electrical efficiency.  The ES-5000 only produces 100KWs though, so 4 of them would be required, 5 for some redundancy.  There is no need to keep the 5th unit idle though, it can be busy providing power to the rest of the building or the power grid till it is needed in the data center.  This would provide some cost savings and still provide the needed redundancy for the data center in case of a unit failure.

So that is our comparison two traditional 200KW  UPS and one 500KW diesel generator VS five Bloom ES-5000 units.  Up front costs, difficult as there are many incentives for green and local generated energy which the Bloom ES-5000 would qualify for.  There are more and more of these incentives becoming available.  However without the incentives the five Bloom ES-5000 units would certainly cost more (around $2.5 million more based on Bloom articles).

They can run on commercial natural gas local generated natural gas.  Assuming no local supply of natural gas, the operating cost for the facility would be cut in half for current prices of electricity and natural gas.  For our example would likely result in a savings around $225k annually.  Yielding a return on investment of  about 10 years (without incentives). With the incentives and possible carbon trading, these figures would move more in the favor of the Bloom technology.

Friday, February 5, 2010

Data Center Temperature

There are presently many articles and posts around encouraging increasing data center temperatures.  ASHRAE has widened the temperature and humidity recommendations for data centers, and many blogs are recommending and discussing drastically increased temperatures.   Lately the discussions have been around the work environment of over 110F in data centers.  Note, that these high temperatures are in new or re-designed data centers with hot/cold isle separation (hot isle containment or cold isle containment).

Existing data centers should definitively investigate possible temperature increases as well as humidity range broadening.  However any such temperature change should be made slowly and with considerable monitoring.  In traditional data centers it is easy to get upside down with temperature changes.

Keep in mind that the recommendations are for  equipment inlet temperatures.  The temperatures over 90F that are being talked about are rack outlet temperatures.  Invest in at least some moderate temperature monitoring for the rack inlets before making any changes.  Get at least some trending data before changing temperatures and change them gradually.  You want to make sure that you are not getting a lot of hot air recirculation through the equipment, particularly top of rack and end of isle trouble spots.  If your data center doesn't already have them, invest in some rack blanking panels to prevent recirculation within the rack.

Some other points to watch for are that in some traditional data centers with CRA/C units against the walls you might find a slight problem with the location of the temperature sensor for the CRA/C units.  Many of these units are designed for open returns usually an open top for down draft units.   Some of these units have their temperature sensor right on top right in the return air stream.  So these units are measuring and using for control, air that is not the rack inlet supply temperature.
So the CRA/C unit is being controlled by the hot isle temperature instead of the cold isle temperature, but you only care about the hot isle temperature.  Therefore to get the cold isle temperature up to 75-77F the CRA/C set point might have to be set up to 80F or 85F.  The best results will be from relocating the temperature sensor to the cold isle or using multiple remote sensors in the cold isles.

You may not see much savings though depending on the type of cooling involved.  If you are using chilled water you might want to experiment with increasing the water temperature once you have the data center temperature stabilized.  You will want to keep monitoring the data center rack inlet temperatures as you increase the chilled water temperatures.  The highest savings will come from economizers, either chilled water or air systems.  As the data center temperature is increased the effectiveness of economizers increases.

Ultimately you should have a real time power monitoring system in the data center in addition to the temperature monitoring system, before making any changes.  This will help insure that there is some savings as there will be a temperature that proves to be most efficient for your data center and equipment, above which efficiency will start to decrease.

Use total data center (including cooling) energy use to find the best temperature for  your data center.  Do not use PUE for this.  PUE can be deceptive when changing data center temperature.  The problem comes from the fans in the IT equipment.  They are usually variable speed, temperature controlled, such that the fan speed starts to ramp up as the inlet temperature goes above 78F.    Most computers have 5 to 10 fans leaving about 300 fans per rack.  What happens is you decrease the infrastructure side of PUE (cooling) at the same time you increase the IT side (IT equipment fans) which shifts much of the cooling costs to the IT equipment which is much less efficient.  This will result in lowered PUE but increased overall energy utilization.  Using the overall energy utilization will avoid this situation.

You might also investigate moving the CRA/C units to be aligned with the hot isles if they are not already so aligned.  Another option is to install ducted returns from the hot isles which may be more efficient and/or cheaper to implement.  It may be possible to use the data center drop ceiling for a ducted return plenum.  These changes will allow you to increase temperature further and maintain more accurate inlet temperatures.

Wednesday, January 20, 2010

Software efficiency IS part of data center efficiency

 A few companies are spending effort to optimize hardware and processes towards the reduction of energy in the data center.  Some companies have facilities based projects and others IT based, with a few doing both.  The graphic below from a white paper at Emerson depicts the current thinking in that the further to the left you can save a watt, the more actual savings you receive (assuming right scaling). Thus the further to the right of the graph you save a watt, the less additional savings you receive through this cascade.

Graphic from white paper at http://emerson.com/  




What is missing from this graph, and not often spoken of in data center efficiency conversations, is software efficiency.  I am not talking about virtualization which is allowing you to make better use of hardware and is thus part of the "server component" in the cascade chart above.  I am talking about actual algorithm optimization. 
There are still code shops out there that attempt to optimize their code but as most of us know, most code is rarely optimized these days.  Code optimization has long since fallen under the Moore's Law knife of accounting; "It's cheaper to buy faster hardware than to pay for developer time".  Often time to market pushes aside code optimization (and some times even debugging).

If converting an application from say perl to C++, or simply turning on some compiler options, allows a bit of code to finish in 30 minutes instead of 60 and/or to use less RAM then you likely have a significant measurable power difference that can then trickle down the cascade effect.  I am not by any means saying that all perl or java code should be converted to C or C++ (or any other language).  Just that if you have a piece of code that takes a significant amount of time to run OR is run a significant number of times, spending some effort to optimize it can result in significant savings.

Here is another example.  Say there is a large web application written in some fictional interpreted language (say Perava ;).  The bulk of this code is infrequently hit and performs perfectly well.  But there is one function that is hit repeatedly for every web page.  This function takes 4 seconds to complete and to meet performance requirements for the number of web customers etc. the company deploys 10 redundant servers.  Each server uses 300 watts for a combined 3Kw for servers or following the above cascade up, about 8.5Kw in the data center.
If the optimization of the one segment of code or translation to some fictional compiled language managed to cut the run time in half (overly simplistic I know), so that only 10 servers were needed, and thus just 1.5Kw for servers or just 4.25Kw in the data center.

I have seen this same thing in high performance grid computing as well.  By just turning on optimization flags when compiling programs that are run 100k times a day for minutes at a time, managed to eliminate the need to expand the compute cluster.

We are starting to see some real push towards compiler optimizations particularly around auto-parallelization, which with modern chips is proving hugely successful.  Because of this cascade multiplier effect we can see some real gains on software optimization efforts.

There is another area for optimization which is between the software and hardware layer.  We are used to getting the right hardware to meet the software specs, but what I am thinking about is building the software towards hardware specs.  For example instead of building a single threaded process that would require a very fast processor, build a multy threaded process that can take advantage of a lower watt multi core processor (yes this is already the case for commercial software but in house software can do the same).

Probably a much better and direct example is an in house application that currently, to meet performance requirements keeps all the data in RAM for a particular job.  This might be able to be rewritten to use a combination of much less RAM, multi processing, and SSD (Solid State Disks) to reach even better performance than the original.   I am not talking about just putting SSD drives in a server or using SSDs for SWAP and running the application, but rather the application is altered to use the much faster access times of the SSD.  Bioinformatics, geo-imaging, weather simulations, and many other large data set research programs currently use large RAM systems, MPI clusters, or other methods to handle the large data sets.

Here is a great article on how facebook is drastically increasing performance by cross compiling PHP to C++.

Thursday, January 7, 2010

Resolve to Measure PUE in 2010

Measure PUE in 2010! Why measure PUE after I blog about it not being perfect, just last month?
Well, it is better than nothing. Actually it is more than that, it is the minimum insight into your data center.
Think of driving your car late at night in an unfamiliar location. You have no idea were the next open gas station will be:
How comfortable would you be without a gas gauge?
This auto analogy isn't too far off. Many corporate data centers have some monitoring system to let someone know :
  • when the temperature gets too high
  • when there is a water leak
  • generator status
  • UPS (battery) status
And then on the IT side there is often some monitoring system or other keeping track of most of the critical servers, storage, backups, firewalls, even critical applications. Often there are all kinds of graphs and analysis of these systems with trends and pretty graphs over time showing how much work has been done in the data center.
With all this often real time monitoring and trending, few corporate IT managers have no idea what their date centers PUE is for any point much less over time. Though PUE would be more closely related to MPG than a gas gauge. PUE gives you a base line. As an IT or facilities director, hopefully you are looking at reducing costs in this economy for 2010. The data used to calculate PUE can easily be used to calculate energy cost of the data center.
IT directors are used to showing costs of new projects amortized over time showing costs for:
  • computers
  • storage
  • network
  • cables
  • support
  • replacement hardware
  • helpdesk
  • even data center space
but usually not electricity. Some Directors are starting to include anticipated electric costs in the projections but most still don't consider it because the costs are not in their budget. Getting individual project electric cost projections can be much more difficult than measuring the data center as a whole. Planning a major new project for 2010, maybe some virtualization or new storage. Measure PUE before and after implementation.

Getting back to the gas gauge theme, my first cars were all older than me. They all pretty much had the same instruments in the dash: speedometer, gas gauge, odometer, and check engine light. My dad was a master mechanic from the navy and he had a term for that check engine light. He called it an "idiot light" because you were an idiot to drive with it on. If you don't measure PUE then you are relying on the "idiot light" which for many data centers comes in the form of high temperature alerts, and by then things are too late.

I have several vehicles now, from my motorcycle to a prius. Now that prius puts that gas gauge and "idiot light" to shame. There is a row of lights to tell me what is going on with the engine, check engine, change oil, change filter, tire pressure, rotate tires. Right in the middle of the car is a computer display showing instantaneous MPG, MPG for since last reset, a graph of MPG for the last 30 minutes, and a gas gauge.

This is the kind of information you want for your data center. A nice little graph showing how much energy is being used every 5 minutes. If you don't measure the PUE for you data center than it is worse than driving a 1955 Cadillac with a broken gas gauge (you do not want to guess how much gas you have in a car that gets less than 10 MPG).
Lets get started in 2010 by measuring the PUE.