web12 server restarted (resolved)

At 11:00 PM Pacific time October 26 2012, our “web12” server experienced a “kernel panic” and needed to be restarted. This led to an 8 minute outage of Web sites and e-mail hosted on that server.

All services are now working normally, and other servers were not affected.

Read the rest of this entry »

AOL mail delivery delayed (resolved)

Update 6:20 PM October 11: AOL has resolved the problem described below. All delayed mail has been delivered, and all services are operating normally.

Read the rest of this entry »

web04 server restarted (resolved)

At 3:33 PM Pacific time September 12 2012, our “web04” server became unstable and needed to be restarted. This led to an 8 minute outage of Web sites and e-mail hosted on that server.

All services are now working normally, and other servers were not affected.

This isn’t normal or acceptable. We take server reliability seriously, and we’re investigating the underlying cause to avoid a recurrence of this problem.

Server “web08” restarted (resolved)

At 12:50 AM Pacific time, the web08 server experienced extremely high disk load and needed to be restarted as a result, resulting in approximately 5 minutes downtime for sites on that server. Other servers were not affected, and the server is now working normally.

Network attacks July 26, 2012 (resolved)

Today at about 12:26 PM and 12:34 PM (Pacific time), our network was briefly attacked by an extremely high volume of data — a “distributed denial of service” (DDoS) attack using forged (“spoofed”) source addresses. The volume of the attack was more than 50 times greater than the usual peak inbound data rate to all our servers combined. This caused Web sites and e-mail we host to be very slow or timeout completely for a few minutes. (All services are working normally now.)

The same attack happened a week ago. Based on what we learned previously, we were able to trace the attack in more detail, and we have identified a specific controversial site that the attackers are targeting. We have moved that site to a different section of our network that can fail without affecting other sites, and we will work with the site owner to move it to a dedicated DDoS protection service.

We apologize for the problems caused by this incident. We know that achieving maximum uptime and availability is important for all of our customers.

Scattered network problem reports July 19, 2012 (resolved)

Update 1:41 PM Pacific time: A contact at Level 3 Communications confirms that their San Francisco Bay Area network was disrupted by a “configuration error”, causing problems for a great deal of Internet traffic that passes through Level 3 (not related to us in particular). Level 3 has corrected the problem, so we’re marking this issue as “closed”.

Read the rest of this entry »

System status update – brief connectivity problems (resolved)

We had a couple of brief network interruptions today (at about 8:50am and 12:49pm Pacific time). We are investigating them, and will update this post with more details later.

Updated 2012-07-19 2:23PM Pacific: One of our upstream network providers has traced this to what appears to be some very brief, very high-packet-per-second attacks. The attacks have not recurred, and we are continuing to monitor all systems.

Brief service interruption on web11 server (resolved)

The “web11” server became very slow and needed to be restarted at approximately 2:00 a.m. Pacific time this morning (June 25). This caused a brief outage for Web sites on that server. Other servers were not affected.

MySQL scheduled maintenance June 23, 2012 (completed)

Between 11:00 PM and 11:59 PM Pacific time on Saturday June 23 2012, the MySQL database software on each of our servers will be upgraded to version 5.1.63 and restarted. This will cause an approximately 30 second interruption of service on each customer Web site at some point during this hour.

This upgrade is necessary for security reasons. We apologize for the inconvenience this causes.

Update 11:12 PM June 23: The maintenance was completed as planned.

Network connectivity problems June 15 (resolved)

Between 5:10 and 5:22 A.M. Pacific time this morning (June 15), one of our upstream network providers experienced a large distributed denial of service attack (DDoS) targeted at one of their other customers, overwhelming their core network routers. This resulted in many people being unable to connect to our network during this period.

The problem has been resolved (the provider has blocked the attack), and they tell us it should not recur. We sincerely apologize for the inconvenience this caused.