Between 10:00 PM and 10:59 PM Pacific time Wednesday May 22, 2013, the “web05” and “web07” servers will be restarted. This will cause an eight minute interruption of service for each server at some point during this hour.
A couple of days ago, one of our Web servers became unstable for an unknown reason and needed to be restarted. This is rare: on average, this happens less than once every five years of uptime per server, so we took it very seriously and launched an investigation.
What we found was that the owner of one of the sites on that server made a mistake that allowed attackers to run their own scripts. That’s all too common, unfortunately, but usually only the single site is affected by this kind of thing. What was surprising in this case was that the script used a previously unknown method of causing problems for other sites running on the server.
As a result of this investigation, we’ve made several changes to our systems to ensure the problem won’t recur. The rest of this post has a detailed technical description of the problem in case it’s useful for others.
Our “web07” server needed restarting at 11:36 AM Pacific time on February 1, 2012, because it had been intermittently unable to run some PHP scripts for 22 minutes.
The restart resolved the immediate problem, and a followup post explains what happened and the changes we made to prevent it from happening again.
- Brief scheduled maintenance on web05 & web07 servers May 22, 2013
- PHP 5.3.25 and 5.4.15
- High load on web04 server May 9 2013 (resolved)
- WP Super Cache and W3 Total Cache security
- WordPress login rate limiting (again)
- Slow performance on web04 server April 11, 2013 (resolved)
- Outage on web12 server April 9, 2013 (resolved)
- Network outage March 23 2013 (resolved)
- PHP 5.3 upgraded to 5.3.22; PHP 5.4.12 also available
- Brief performance problem on web12 server March 4, 2013 (resolved)