High load on some servers (resolved)

Three of our Web hosting servers (amy, flexo, and leela) experienced high load earlier today that caused some customers to see “503 errors” on their Web sites for a few minutes.

This was caused by an upgrade to the eAccelerator PHP caching system that removed all the cached files at once, which doesn’t normally happen.

The problem has been permanently resolved and will not recur.

Read the rest of this entry »

Brief scheduled maintenance Saturday, August 28 (completed)

Between 10:00 PM and 11:59 PM Pacific time this Saturday, August 28, all our hosting servers will be restarted. As a result, Web site service and the ability to read incoming e-mail will be unavailable for approximately five minutes at some point during this maintenance “window”.

Read the rest of this entry »

Comcast network problems August 12 (resolved)

Our monitoring systems are showing that some people who reach our servers via an “Internet backbone” company called Global Crossing, including some Comcast cable customers, have been intermittently unable to connect over the last hour or so.

This isn’t an outage on our end; these visitors are also unable to reach other sites that Comcast routes through Global Crossing (and not related to us), such as www.globalcrossing.com. It’s something Comcast and Global Crossing need to address.

We’ll continue to monitor this issue closely and post an update when we’re confident that it’s been resolved.

By the way, if you ever find that you’re unable to connect to our servers (or anyone else’s), a very useful site is CheckSite.us. It shows you whether the destination servers are down, or whether the problem is just a local routing problem that isn’t affecting most other people.

Update 9 AM PDT August 13: According to our monitoring systems, Comcast resolved this shortly after our post, and the problem has not recurred in the ten hours since then.

Brief scheduled maintenance Monday, August 2 on some servers (completed)

Between 11:00 PM and 11:59 PM Pacific time tonight (Monday August 2), several of our hosting servers will be restarted: bender, elzar, farnsworth, lrrr, mom, and seymour.

As a result, Web site service and the ability to read incoming e-mail for some customers will be unavailable for approximately five minutes at some point during this maintenance “window”.

Read the rest of this entry »

Brief maintenance on calculon server (completed)

The “calculon” Web server will be restarted at 9 PM Pacific time tonight (July 5). This will cause a five-minute interruption of Web and e-mail service for customers on that server.

Other servers will not be affected, and incoming mail will only be delayed, not lost.

Read the rest of this entry »

Brief scheduled maintenance Saturday, May 22 (completed)

Between 10:00 PM and 11:59 PM Pacific time this Saturday, May 22, all our hosting servers will be restarted. As a result, Web site service and the ability to read incoming e-mail will be unavailable for approximately five minutes at some point during this maintenance “window”.

Read the rest of this entry »

Network issues (resolved)

We’re receiving reports of network connectivity problems from a couple of customers using the “Global Crossing” Internet backbone to reach our primary data center, although most customers are unaffected. We’re investigating this issue.

Update 12:35 PM: Our upstream provider reports that an 8 minute network interruption for some connections, beginning at 11:11 AM Pacific time, was caused by a router failure at Global Crossing. The problem has been resolved.

Network slowness for some customers (resolved)

Between 7:00 and 7:45 PM Pacific time Thursday night (March 11), we received two reports of slow or nonexistent network connections to sites on our servers.

Our automated monitoring systems didn’t detect any general problems, so the majority of customers were certainly unaffected — but we suspect that one of the “Internet backbones” between the affected customers and our data center had high packet loss during that period.

Both customers reported that the problem resolved itself by 7:45, and we haven’t received similar reports since, so there does not appear to be be an ongoing problem. We’ll continue to monitor it closely.

Brief maintenance on Calculon server (completed)

The “calculon” Web server will be restarted at 11 PM Pacific time tonight (February 19). This will cause a five-minute interruption of Web and e-mail service for customers on that server.

Other servers will not be affected, and incoming mail will only be delayed, not lost.

We apologize for the problem and for the short notice: the restart is necessary to replace a disk in the RAID array.

Update 11:03 PM Pacific time: The restart was completed with less than 3 minutes “downtime”.

Bender server load problem 2010-02-18 (resolved)

The “bender” Web server experienced intermittently high load between about 7:40 and 10:15 AM Pacific time this morning, February 18. This resulted in slow or even inaccessible Web sites on that server. (Some e-mail was also delayed before being properly delivered.) Other servers were not affected.

This server had similar high load symptoms (but much more briefly) earlier this week. We took some steps to reduce the load then, but it appears those weren’t sufficient. We’re now taking much stronger action to ensure that this does not happen again.

We sincerely apologize to customers affected by this problem. We don’t consider it normal or acceptable, and we will make sure this isn’t a recurring issue.