Brief performance problem on web12 server March 4, 2013 (resolved)

There was a brief but severe performance problem on the web12 server today between 9:59 and 10:07 AM Pacific time. During this time, many Web server requests were very slow to load or even “timed out” completely. All services are now operating normally again. Other servers were not affected.

Read the rest of this entry »

Brief scheduled maintenance February 26 2013 (completed)

Between 11:00 PM and 11:59 PM Pacific time February 26, 2013, each of our servers will be restarted for a “kernel upgrade”. This will cause an approximately four minute interruption of service for each customer at some point during this hour.

Read the rest of this entry »

Outage on web12 server (resolved)

There was a brief outage on the web12 server today starting at about 6:22 PM Pacific time. This was caused by a “SYN flood” attack, which effectively blocked all other connections with the server.

We took steps to work around the attack, which we completed by 7:08 PM Pacific time (46 minutes after the start of the attack). Furthermore, the attack itself seems to have stopped; the steps we took should help in case in starts again.

We sincerely apologize for the interruption in service for those affected customers; we know that reliable service is a primary concern for all of our customers.

web03 server restarted (resolved)

At 9:45 PM Pacific time February 6 2013, our “web03” server experienced a “kernel panic” and needed to be restarted. This led to an 11 minute outage of Web sites and e-mail hosted on that server.

All services are now working normally, and other servers were not affected. We apologize for the trouble this caused customers on the web03 server.

Denial of service attack February 5, 2013 (resolved)

Beginning at 3:00 PM Pacific time February 5, a server on our network was the target of an extremely high volume DNS amplification denial of service attack. The inbound network data exceeded 11.6 Gbps, which is an extremely large amount — large enough to exceed the 10 Gpbs capacity of our upstream Ethernet switches and cause our entire network to slow down dramatically.

This affected all servers for about 19 minutes, until we and our network partners began discarding (“null routing”) all traffic targeted at that server. This fixed the problem for the rest of our network, but still left sites on the “web11” server unavailable.

To solve that, the IP addresses of all sites on the web11 server have been changed to new IP addresses that are working correctly and are not under attack. This was completed by 3:44 PM, and all sites on all servers are now working properly.

If the attackers target another IP address, we’re ready to immediately block that one, too. If that does happen, the way we’ve redistributed the IP addresses, in combination with previous analysis we’ve done on this attack, will allow us to immediately know which site is under attack. (It’s otherwise hard to determine which IP address is involved, because the type of attack we’re seeing targets only an IP address and not a specific Web site name.) That site will then be moved off our main network to prevent a recurrence.

We sincerely apologize for the inconvenience this caused our customers; we know you count on us for reliable service, and we’re committed to doing everything possible to avoid problems.

Brief outage on web11 server February 2, 2013 (resolved)

There was a brief outage on the web11 server today at about 2:42 PM Pacific time.

This was caused by a “denial of service” attack that increased the incoming network traffic to that server from the usual 5 Mbps or so to over 350 Mbps. Servers other than web11 were not affected.

This appears to be very similar to the attacks that occurred last Monday morning.

We are closely monitoring all systems so that we can see exactly how to block future attacks.

Brief outage on web11 server January 28, 2013 (resolved)

There were two brief outages on the web11 server on January 28, 2013, at about 8:09 AM and 8:46 AM Pacific time.

Read the rest of this entry »

Brief scheduled maintenance on web04 server January 18, 2013 (completed)

Update: The maintenance described below was completed with less than 5 minutes downtime.

At 11:00 PM Pacific time January 18 2013, the “web04” server will be restarted.

Read the rest of this entry »

Brief MySQL scheduled maintenance December 22 2012 (completed)

Between 11:00 PM and 11:59 PM Pacific time on Saturday December 22 2012, the MySQL database software on each of our servers will be upgraded to version 5.1.66 and restarted. This will cause an approximately 30 second interruption of service on each customer Web site at some point during this hour.

This upgrade is necessary for security reasons. We apologize for the inconvenience this causes.

Update December 22 11:17 PM: The maintenance was completed with less than 30 seconds downtime per server.

web10 server restarted (resolved)

At 9:45 PM Pacific time November 15 2012, our “web10” server became unstable and we eventually decided to restart it to resolve the problem. This caused a period of about 20 minutes where the server was intermittently not working reliably, then a four minute outage while it restarted.

Read the rest of this entry »