Outage at primary data center (resolved)

Between 6:00 AM and 6:29 AM Pacific time August 7, 2011, all services were unavailable due to a power failure at our primary data center.

The problem was resolved for most servers by 6:29 AM, and for all servers except the “amy” server by 6:53 AM. The “amy” server needed extra manual intervention, and was working by 7:55 AM. All services are now operating normally.

Any e-mail that arrived during the outage was queued at our secondary data center and delivered as soon as the outage ended.

We sincerely apologize for this problem. We know you count on us for reliability, and we don’t consider this acceptable, especially since the data center has had previous power problems this year. However, this incident had a different root cause. It wasn’t a utility power failure that the redundant UPS systems didn’t handle, but was instead caused by a circuit breaker incorrectly “tripping” to prevent the power output of the UPS systems from reaching the server cabinets.

Update 4:15 PM: We have received an incident report from the data center indicating that they are working to replace the affected part of the UPS system to prevent further problems.

PHP 5 updated

We’ve installed a PHP 5 security update. Customers should not notice any changes; the update just fixes several security issues in PHP 5.

Comcast routing problems (resolved)

Some Comcast users in California had trouble connecting to some of our servers beginning at around 7 AM Pacific May 21, 2011. Non-Comcast users were not affected at all, and even people who were affected were able to reach some of our servers with no trouble.

This was caused by a technical problem at Comcast, and not related to us specifically. It appears Comcast was incorrectly filtering some combinations of IP addresses and ports in one of their California network routers, preventing their customers from reaching some sites.

The issue was apparently resolved by Comcast at 10:07 AM Pacific time, and we are not aware of any ongoing problems. As always, don’t hesitate to contact us if you have any trouble.

High packet loss for some connections (resolved)

A router failure at an upstream Internet “peer” that we connect to caused high packet loss for some Internet connections between 5:18 PM and 5:28 PM Pacific time.

The packet loss grew worse through that period until it exceeded 25%, which is enough to cause pages to fail to load within a browser’s timeout period if your connection was one of the affected ones. (Connections that go through different routers were not affected.)

Network engineers have routed all connections around the failed hardware until it’s replaced, so the problem is resolved. If your part of the Internet was one of the affected ones, please accept our apologies for the problem.

Service outage May 6, 2011 (resolved)

May 6, 4:43 AM Pacific time: An outage at our primary data center caused a complete service interruption for all customers.

Update 5:08 AM: All services have been restored and are working normally.

Read the rest of this entry »

Network issues April 10, 2011

Our primary data center experienced network routing problems between 2:06 PM and 2:49 PM Pacific time today (April 10, 2011).

During this time, packets from some (but not all) places on on the Internet were unreliable, causing connection problems. The data center technicians have resolved the issue, and all services are now working normally.

We don’t consider this normal or acceptable, and we sincerely apologize for the inconvenience this caused. (We do not yet have a full explanation from the data center about the root cause, but have requested one so that we can be sure it won’t recur.)

Service outage Nov. 23, 2010 (resolved, updated)

Our primary data center had another power interruption this morning at 7:28 am (Pacific time). All of our servers lost power and then had it restored, thus rebooting them. All customer web sites were unavailable during this time. Incoming email would have simply been delayed during the downtime, not lost. When the servers came back online e-mail may have seemed sluggish to some customers for a while but this should also be fixed now.

This incident follows another power incident the previous Saturday night. We are working with the data center to get more details, including an estimate of when they will have replaced any faulty equipment. We will update this post as more information becomes available.

Update Nov. 29: The final data center report is that on the night of November 20, lightning strikes damaged both of the redundant UPS systems, interrupting data center power for a few seconds. The UPS manufacturer scheduled replacements for November 23, but another PG&E utility power interruption lasting a few seconds occurred that morning before it was finished. The UPS manufacturer has since replaced all damaged parts, restoring full redundancy. In addition, the UPS manufacturer has overhauled each unit, replacing and upgrading other parts to increase robustness. We take this very seriously — it’s at the core of what we do — and we will continue to work with the data center to ensure that their infrastructure meets our high standards.

Service outage Nov. 20, 2010 (resolved)

A major power failure at our primary data center in Fremont, California, caused a complete outage for nearly all services beginning at 8:32 PM Pacific time Saturday night. It lasted between six and 13 minutes, depending on the server. Only our blog and redundant DNS infrastructure was unaffected.

All services are now fully operational; please don’t hesitate to contact us if you have any questions. We sincerely apologize for the inconvenience this caused our customers.

Read the rest of this entry »

Brief scheduled maintenance Saturday, August 28 (completed)

Between 10:00 PM and 11:59 PM Pacific time this Saturday, August 28, all our hosting servers will be restarted. As a result, Web site service and the ability to read incoming e-mail will be unavailable for approximately five minutes at some point during this maintenance “window”.

Read the rest of this entry »

Brief scheduled maintenance Saturday, May 22 (completed)

Between 10:00 PM and 11:59 PM Pacific time this Saturday, May 22, all our hosting servers will be restarted. As a result, Web site service and the ability to read incoming e-mail will be unavailable for approximately five minutes at some point during this maintenance “window”.

Read the rest of this entry »