Wednesday, January 20, 2010

Software efficiency IS part of data center efficiency

 A few companies are spending effort to optimize hardware and processes towards the reduction of energy in the data center.  Some companies have facilities based projects and others IT based, with a few doing both.  The graphic below from a white paper at Emerson depicts the current thinking in that the further to the left you can save a watt, the more actual savings you receive (assuming right scaling). Thus the further to the right of the graph you save a watt, the less additional savings you receive through this cascade.

Graphic from white paper at  

What is missing from this graph, and not often spoken of in data center efficiency conversations, is software efficiency.  I am not talking about virtualization which is allowing you to make better use of hardware and is thus part of the "server component" in the cascade chart above.  I am talking about actual algorithm optimization. 
There are still code shops out there that attempt to optimize their code but as most of us know, most code is rarely optimized these days.  Code optimization has long since fallen under the Moore's Law knife of accounting; "It's cheaper to buy faster hardware than to pay for developer time".  Often time to market pushes aside code optimization (and some times even debugging).

If converting an application from say perl to C++, or simply turning on some compiler options, allows a bit of code to finish in 30 minutes instead of 60 and/or to use less RAM then you likely have a significant measurable power difference that can then trickle down the cascade effect.  I am not by any means saying that all perl or java code should be converted to C or C++ (or any other language).  Just that if you have a piece of code that takes a significant amount of time to run OR is run a significant number of times, spending some effort to optimize it can result in significant savings.

Here is another example.  Say there is a large web application written in some fictional interpreted language (say Perava ;).  The bulk of this code is infrequently hit and performs perfectly well.  But there is one function that is hit repeatedly for every web page.  This function takes 4 seconds to complete and to meet performance requirements for the number of web customers etc. the company deploys 10 redundant servers.  Each server uses 300 watts for a combined 3Kw for servers or following the above cascade up, about 8.5Kw in the data center.
If the optimization of the one segment of code or translation to some fictional compiled language managed to cut the run time in half (overly simplistic I know), so that only 10 servers were needed, and thus just 1.5Kw for servers or just 4.25Kw in the data center.

I have seen this same thing in high performance grid computing as well.  By just turning on optimization flags when compiling programs that are run 100k times a day for minutes at a time, managed to eliminate the need to expand the compute cluster.

We are starting to see some real push towards compiler optimizations particularly around auto-parallelization, which with modern chips is proving hugely successful.  Because of this cascade multiplier effect we can see some real gains on software optimization efforts.

There is another area for optimization which is between the software and hardware layer.  We are used to getting the right hardware to meet the software specs, but what I am thinking about is building the software towards hardware specs.  For example instead of building a single threaded process that would require a very fast processor, build a multy threaded process that can take advantage of a lower watt multi core processor (yes this is already the case for commercial software but in house software can do the same).

Probably a much better and direct example is an in house application that currently, to meet performance requirements keeps all the data in RAM for a particular job.  This might be able to be rewritten to use a combination of much less RAM, multi processing, and SSD (Solid State Disks) to reach even better performance than the original.   I am not talking about just putting SSD drives in a server or using SSDs for SWAP and running the application, but rather the application is altered to use the much faster access times of the SSD.  Bioinformatics, geo-imaging, weather simulations, and many other large data set research programs currently use large RAM systems, MPI clusters, or other methods to handle the large data sets.

Here is a great article on how facebook is drastically increasing performance by cross compiling PHP to C++.

No comments:

Post a Comment