Mom server temporarily unavailable (resolved)
The “mom” server experienced high load starting at around 10:45 AM (Pacific time) this morning. We restarted it just before 11:00, and it’s now working normally.
The high load was occurring because the Linux kernel was limiting what it cached from the file system. The server had close to 2 GB of free RAM that is normally used for file caching, but the kernel was caching only a few hundred MB. This caused heavy load on the disk system, making the server sluggish and pushing the “load” far higher than we consider acceptable.
We are investigating the root cause of the caching problem to ensure that it doesn’t happen again. We don’t consider this acceptable, and we sincerely apologize for the inconvenience this caused customers on that server (other servers were not affected).
Update: Our investigation has revealed the probable cause of this, and it is not a problem that will recur.