Network problem earlier today (resolved)

Some of our customers may have noticed “high packet loss” today from about noon to 12:25 PM (Pacific time). This could make it seem like Web sites hosted on our servers were loading slowly, or even timing out.

The problem has been resolved by our upstream provider, but we are working with them to make sure it doesn’t recur.

Routing problem for some Comcast customers (resolved)

Between 5:11 and 5:46 PM Pacific time today, some people who reach our servers via an “Internet backbone” called Global Crossing (including some Comcast cable customers) were unable to connect to our data center. Other users weren’t affected.

Global Crossing has apparently corrected the problem, and everything is now operating normally. We’ll continue to monitor this issue closely.

Read the rest of this entry »

Network outage followup

This is a followup to last night’s post about a network outage.

The root cause of the problem was the failure of an Ethernet switch at our data center. The switch was the one that our network cables actually plug into to connect to the Internet. Unfortunately, it’s one of the few pieces of the network infrastructure that’s not automatically redundant: although the “other side” of the switch is connected to multiple fully redundant upstream paths to the Internet, the side of it that goes to our server cabinets effectively has a single connection for each a group of servers.

When the switch failed, the data center staff replaced it with a new spare one. Because the faulty hardware was completely replaced, the problem is properly solved, and this won’t be something that’s an ongoing problem.

Read the rest of this entry »

Unscheduled network outage (resolved)

Between 9:52 and 11:06 PM Pacific time on January 10, a complete network failure at our primary data center caused an unscheduled outage that resulted in all services (all Web sites and e-mail) being unreachable from the Internet.

This problem has been resolved and all services are now available. We are waiting for a full report from the data center personnel so that we can determine the cause and ensure that it won’t recur.

We sincerely apologize to our customers who were affected by this. This kind of outage is not normal (it’s the longest outage we’ve experienced in more than four years), and we know it’s not acceptable to our customers who rely on our services. We’ll post a followup message with more details when they become available.

Update Friday 10 AM: As a clarification, we should also have originally mentioned that no e-mail is lost during this kind of outage: it’s delivered after the issue is resolved. While some messages were certainly delayed, they were all properly delivered afterward.

Slow server response for some customers (resolved)

Since about 9:00 AM (Pacific time) this morning, we’ve been seeing network routing problems to some destinations on the Internet that use the “xo.net” backbone. For some customers, this will have the effect of making any access to your web site extremely slow — it may even be so slow as to seem completely non-responsive. Most customers will have no problems.

Our data center technicians are working on this problem. We’ll update this post as soon as the issue is resolved.

Update: This issue was resolved at approximately 10:20 AM, and all systems are operating normally.

Temporary routing problem (resolved)

Between 4:33 and 4:41 PM Pacific time, we experienced a short-lived problem where users who reach our servers via an “Internet backbone” called Global Crossing (including Comcast and Charter cable customers) were unable to connect. Other users weren’t affected.

The problem lasted for less than ten minutes, and everything is now operating normally.

Packet loss to some destinations (resolved)

We’re currently seeing about 15% “packet loss” from our data center to a handful of locations on the Internet (notably connections that go through the above.net backbone). Most people aren’t affected by this, but for those that are, this can cause connections to be slower than normal. We have a ticket open with the data center for this issue, and we’ll update this page when it’s resolved.

Update May 20: The packet loss problem was effectively resolved on Friday, although we’ve been monitoring the above.net backbone connection closely to ensure that there is no ongoing problem. Although we’ve seen a couple of short latency issues that we’re still following up with the data center about, customers are not experiencing any problems.