High load on the “elzar” server (resolved)

The “elzar” Web hosting server experienced very high load between 9:07 and 9:14 AM Pacific time this morning (September 27, 2011), causing sites on that server to load slowly during those seven minutes. Other servers were not affected.

This was caused by a distributed denial of service (“DDOS”) attack against a site on that server. We manually blocked the attackers to resolve it, and we’re continuing to monitor it closely to make sure it doesn’t recur.

Brief scheduled maintenance on elzar server (completed)

The “elzar” Web server will be restarted at 10 PM Pacific time tonight (February 25). This will cause a five-minute interruption of Web and e-mail service for customers on that server.

Other servers will not be affected, and incoming mail will only be delayed, not lost.

This restart is necessary to fix a memory problem. We apologize for the inconvenience.

Update 10:03 PM: The maintenance was completed with less than 3 minutes downtime.

Brief scheduled maintenance Monday, August 2 on some servers (completed)

Between 11:00 PM and 11:59 PM Pacific time tonight (Monday August 2), several of our hosting servers will be restarted: bender, elzar, farnsworth, lrrr, mom, and seymour.

As a result, Web site service and the ability to read incoming e-mail for some customers will be unavailable for approximately five minutes at some point during this maintenance “window”.

Read the rest of this entry »

Problem affecting two servers (resolved)

We posted earlier about a problem affecting the elzar Web server. While we were investigating the cause of that, the same thing happened on another Web server, “calculon”, causing a separate outage for customers on that server from 2:34 PM to 2:43 PM Pacific time this afternoon.

During this period, Web sites on that server were unavailable and incoming e-mail was delayed. (The Web server was slow for about six minutes after it was restarted, too.)

On both servers, high disk and memory usage caused the load to skyrocket to the point where they effectively stopped responding.

The good news is that we have narrowed down the cause, so it shouldn’t happen again. A bug in one of our maintenance programs that runs on each server was almost certainly responsible. The bug has been fixed.

We sincerely apologize for this issue, and regret the inconvenience it caused for customers hosted on these servers. Other servers were not affected.

Elzar server temporarily unavailable (resolved)

The “elzar” Web server experienced high load between 5.40 and 6.00 AM Pacific time this morning, April 15. This resulted in slow Web sites and some interruption of service. (Some e-mail activity was delayed, but no e-mail was lost.)

We sincerely apologize for this problem. We consider this type of failure to be unacceptable, and are looking into the cause of the problem so that we can take the appropriate steps to prevent it from happening again.

Brief scheduled maintenance Friday, April 3 (completed)

At approximately 11:00 PM Pacific time on Friday, April 3, the “flexo”, “mom” and “elzar” servers will be restarted. As a result, Web site and e-mail service for some customers will be unavailable for approximately five minutes.

No e-mail will be lost, of course; incoming mail will just be slightly delayed.

We apologize for any inconvenience this may cause. This maintenance is necessary to install an updated “kernel” on our servers, as described in an earlier post.

Update: We’re also going to include the “zapp” server in this maintenance to replace a disk in the RAID array.

Update 2: The maintenance was completed with less than five minutes of “downtime”.

Brief scheduled maintenance Saturday, January 31 (completed)

At approximately 11:00 PM Pacific time on Saturday, January 31, all of our Web hosting servers (except the “hypnotoad” and “mom” servers) will be restarted. As a result, Web site and e-mail service for some customers will be unavailable for approximately five minutes.

No e-mail will be lost, of course; incoming mail will just be delayed for a few minutes.

We apologize for any inconvenience this may cause. This maintenance is necessary to install an updated “kernel” on our servers, as described in an earlier maintenance announcement.

Update: the maintenance was successfully completed on all servers with less than 5 minutes of “downtime”.

Brief scheduled maintenance for elzar and farnsworth servers (completed)

At 11:00 PM Pacific time tonight (November 24), the “elzar” and “farnsworth” servers will be restarted. As a result, Web sites and e-mail service for customers using those servers will be unavailable for approximately five minutes.

Read the rest of this entry »

Temporary overload on “elzar” server (resolved)

Starting at 10:14 AM this morning, our elzar server experienced an unexpectedly high server load that effectively made some processes on the server unusable for about 10 minutes.

Web sites using scripts or databases on the elzar server may have seemed unresponsive during that time. Also, any customer hosted on elzar who was reading their e-mail during this time may have felt the system was slow or unresponsive (no e-mail was lost, of course).

Customers on other servers were not affected.

Read the rest of this entry »

PHP problem on “elzar” server (resolved)

As a result of an error on our part, a small handful of PHP 4 scripts on the “elzar” Web server may have displayed an error message or a blank page for up to 14 minutes today (from 12:48 to 1:02 PM Pacific time in the worst case). The problem has been resolved for any customers who were affected.

This happened because of a mistake we made in an upgrade to our sitewide PHP4/FastCGI configuration file, which our pre-upgrade testing failed to detect. We have added a new check to our automated testing system to ensure this cannot happen again.

We sincerely apologize to any customers affected by this problem.