Friday, February 24, 2012

As another customer learned, temperature can rise very quickly in a data center, causing your servers to shutdown

Today I get to visit a brand new data center still under construction. It is for a casino in Macau. Their setup is interesting with about 5 rooms of around 60 racks each. Judging from the state of the building, I am surprise to learn they have to be up and running by April! At least they have their data center operational while the building itself is still under construction.

Recently, they had an episode where apparently the chiller water supply was shutoff, and IT was not notified of the event. As the temperature rises in the data center, server began to overheat and shutdown. Only then, when the end user starts complaining did IT realize there was an issue in the data center. An the interesting part is, their data center is only at 20% capacity!

As we learned with another client, after we installed the DC Insight monitoring system in the data center, just shutting down one CRAC unit rises the temperature very quickly!

During the exercise, we took turns to shutdown 3 out of 4 CRAC units in the data center. As we observed from the above chart, the placement of the first and third CRAC made it crucial in the operation of the data center. Shutting down the first or the third CRAC, the data center temperature rises quickly, and within 8 minutes we have rack in-take temperature going over 30 deg C. Since, this is operational site the facility manager quickly restore the CRAC to avoid an issue. I guess they don't really have a N+1 cooling design after all :)

Now imagine if all the CRAC goes down at once, temperature can rise VERY quickly in a data center.