Brief power interruption for some servers (Resolved)
This afternoon at 3:49 PM (Pacific time), one of the cabinets at our data center tripped a circuit breaker, causing all of the servers in that cabinet to lose power. Power was restored nine minutes later.
Customer Web sites on the calculon, lrrr, and zapp Web servers were unavailable during this time. The ability to send and receive e-mail was also interrupted (no mail was lost, of course). Other servers were not affected.
We pay close attention to the power load in each cabinet to avoid this sort of problem. The previously measured peak load of that cabinet had been 12 amps. Since the circuit allows 15 amps, this issue surprised us (we’ve been using the same setup in the same data center for seven years and this has never happened before). It appears that a combination of several servers experiencing unusually high CPU loads led to power usage beyond what we previously considered possible.
We will take immediate steps to make sure the problem doesn’t happen again, and we sincerely apologize to customers who were affected by this incident.
Update 7:26 PM: We have removed a server from the cabinet in question, lowering the power use.
Update 10:38 PM: We have removed a second server from the cabinet, ensuring that power use is well below any level that could cause further trouble. The problem will not recur.