Comcast routing problems (resolved)

Some Comcast users in California had trouble connecting to some of our servers beginning at around 7 AM Pacific May 21, 2011. Non-Comcast users were not affected at all, and even people who were affected were able to reach some of our servers with no trouble.

This was caused by a technical problem at Comcast, and not related to us specifically. It appears Comcast was incorrectly filtering some combinations of IP addresses and ports in one of their California network routers, preventing their customers from reaching some sites.

The issue was apparently resolved by Comcast at 10:07 AM Pacific time, and we are not aware of any ongoing problems. As always, don’t hesitate to contact us if you have any trouble.

Sites on “farnsworth” server moved to “zapp”

All Web sites on the “farnsworth” Web server have been moved to a new server named “zapp”.

This change was made for reliability; our monitoring systems detected potential hardware problems with the “farnsworth” server earlier today, and the sites were moved so it can be replaced before it causes any problems.

This doesn’t cause any downtime, and customers shouldn’t notice any change — but as always, don’t hesitate to contact us if you have any questions.

Brief scheduled maintenance on pazuzu server (completed)

At approximately 11:00 PM Pacific time tonight, May 14, the “pazuzu” Web server will be restarted.

As a result, for customers on the “pazuzu” server (only), Web site service and the ability to read incoming e-mail will be unavailable for approximately five minutes. Customers on other servers will not be affected.

Read the rest of this entry »

High packet loss for some connections (resolved)

A router failure at an upstream Internet “peer” that we connect to caused high packet loss for some Internet connections between 5:18 PM and 5:28 PM Pacific time.

The packet loss grew worse through that period until it exceeded 25%, which is enough to cause pages to fail to load within a browser’s timeout period if your connection was one of the affected ones. (Connections that go through different routers were not affected.)

Network engineers have routed all connections around the failed hardware until it’s replaced, so the problem is resolved. If your part of the Internet was one of the affected ones, please accept our apologies for the problem.

Service outage May 6, 2011 (resolved)

May 6, 4:43 AM Pacific time: An outage at our primary data center caused a complete service interruption for all customers.

Update 5:08 AM: All services have been restored and are working normally.

Read the rest of this entry »

Brief scheduled maintenance on “fry” and “bender” servers (completed)

The “fry” and “bender” Web servers will be restarted between 11:00 and 11:15 PM Pacific time tonight (Friday, April 29, 2011). This will cause a five-minute interruption of Web and e-mail service for customers on those servers.

Other servers will not be affected, and incoming mail will only be delayed, not lost.

Read the rest of this entry »

Problem with “fry” server (resolved)

8:52 PM Pacific time: We’re investigating a problem with the “fry” hosting server that’s requiring us to restart it; further details in a few minutes.

Update 9:42 PM Pacific time: The “fry” server was restarted, but a technician will be doing some maintenance on the server for approximately an hour. This will require a reboot, meaning the server will be unavailable for approximately 5 – 10 minutes. Web service will be unavailable during that time. E-mail service on that server also will be unavailable; delivery of new incoming mail will suspend during that time and then resume when the server comes back; no e-mail will be lost.

All others servers are unaffected.

Update 10:50 PM Pacific time: The “fry” web server will be rebooted in about 10 minutes, at approximately 11:00 PM Pacific time.

Update 11:10 PM Pacific time: The “fry” web server was successfully rebooted as planned. There may be more maintenance on the server this weekend; watch our blog or follow us on Twitter for updates.

Network issues April 10, 2011

Our primary data center experienced network routing problems between 2:06 PM and 2:49 PM Pacific time today (April 10, 2011).

During this time, packets from some (but not all) places on on the Internet were unreliable, causing connection problems. The data center technicians have resolved the issue, and all services are now working normally.

We don’t consider this normal or acceptable, and we sincerely apologize for the inconvenience this caused. (We do not yet have a full explanation from the data center about the root cause, but have requested one so that we can be sure it won’t recur.)

Brief MySQL load problems (resolved)

We had a couple of instances of MySQL queries overloading the bender server today. The first one happened at about 3:41 AM (Pacific time) and the second one happened at about 7:48 AM. Each occurrence lasted about 20 minutes. The problem each time was that a database was running extremely inefficient queries. Each time we fixed the problem by creating indexes so that the queries could then run in a fraction of the time previously required.

We apologize for any inconvenience caused by this problem. Visitors to your Web site (on the bender server) might have seen reduced performance (or, in rare cases, 503 errors). E-mail was not affected. We don’t consider this type of problem to be acceptable. These problems should not recur since the indexes have been created.

Brief scheduled maintenance on elzar server (completed)

The “elzar” Web server will be restarted at 10 PM Pacific time tonight (February 25). This will cause a five-minute interruption of Web and e-mail service for customers on that server.

Other servers will not be affected, and incoming mail will only be delayed, not lost.

This restart is necessary to fix a memory problem. We apologize for the inconvenience.

Update 10:03 PM: The maintenance was completed with less than 3 minutes downtime.