We posted earlier about a problem affecting the elzar Web server. While we were investigating the cause of that, the same thing happened on another Web server, “calculon”, causing a separate outage for customers on that server from 2:34 PM to 2:43 PM Pacific time this afternoon.
During this period, Web sites on that server were unavailable and incoming e-mail was delayed. (The Web server was slow for about six minutes after it was restarted, too.)
On both servers, high disk and memory usage caused the load to skyrocket to the point where they effectively stopped responding.
The good news is that we have narrowed down the cause, so it shouldn’t happen again. A bug in one of our maintenance programs that runs on each server was almost certainly responsible. The bug has been fixed.
We sincerely apologize for this issue, and regret the inconvenience it caused for customers hosted on these servers. Other servers were not affected.
The “elzar” Web server experienced high load between 5.40 and 6.00 AM Pacific time this morning, April 15. This resulted in slow Web sites and some interruption of service. (Some e-mail activity was delayed, but no e-mail was lost.)
We sincerely apologize for this problem. We consider this type of failure to be unacceptable, and are looking into the cause of the problem so that we can take the appropriate steps to prevent it from happening again.
As we’ve already posted, some of our Web servers will be restarted tonight at 11 PM Pacific time.
We’re adding the “zapp” Web server to that list so we can replace a RAID array disk that caused a problem on that server earlier today.
Update: The maintenance was completed with less than five minutes of “downtime”.
The “zapp” Web server was unavailable between 3:43 and 3.53 AM Pacific time this morning, April 4. This resulted in an interruption of service for Web sites on that server. (Some e-mail activity was delayed, but no e-mail was lost.)
We sincerely apologize for this problem. We consider this type of failure to be unacceptable, and are looking into the cause of the problem so that we can take the appropriate steps to prevent it from happening again.
At approximately 11:00 PM Pacific time on Friday, April 3, the “flexo”, “mom” and “elzar” servers will be restarted. As a result, Web site and e-mail service for some customers will be unavailable for approximately five minutes.
No e-mail will be lost, of course; incoming mail will just be slightly delayed.
We apologize for any inconvenience this may cause. This maintenance is necessary to install an updated “kernel” on our servers, as described in an earlier post.
Update: We’re also going to include the “zapp” server in this maintenance to replace a disk in the RAID array.
Update 2: The maintenance was completed with less than five minutes of “downtime”.
We recently had a server that twice “crashed” and needed manually restarting. We’ve identified the cause of that problem — an apparent bug in Linux kernel version 2.6.26 — and made some changes to ensure that it doesn’t affect our customers again.
However, we didn’t find any information about this problem when searching the Internet, so we’re describing the details here in the hope that it helps someone else.
Read the rest of this entry »
The “flexo” Web server was unavailable between 9:54 and 10:02 PM Pacific time tonight, March 28. This resulted in an interruption of service for Web sites on that server. (Some e-mail activity was delayed, but no e-mail was lost.)
We sincerely apologize for this problem. We consider this type of failure to be unacceptable, and are looking into the cause of the problem so that we can take the appropriate steps to prevent it from happening again.
Update: The problem happened a second time on March 31 from 6:22 to 6:31 AM. However, the second incident gave our engineers enough details to determine the cause (which we’ve reported in a subsequent blog post), and we have made a technical change that will prevent it from happening again.
At approximately 11:00 PM Pacific time on Saturday, January 31, all of our Web hosting servers (except the “hypnotoad” and “mom” servers) will be restarted. As a result, Web site and e-mail service for some customers will be unavailable for approximately five minutes.
No e-mail will be lost, of course; incoming mail will just be delayed for a few minutes.
We apologize for any inconvenience this may cause. This maintenance is necessary to install an updated “kernel” on our servers, as described in an earlier maintenance announcement.
Update: the maintenance was successfully completed on all servers with less than 5 minutes of “downtime”.
At 11:00 PM Pacific time tonight (January 26), the “mom” server will be restarted. As a result, Web sites and e-mail service for customers using that server will be unavailable for approximately five minutes.
Other servers will not be affected. And no e-mail will be lost, of course; incoming mail will just be delayed for a few minutes.
We apologize for any inconvenience this may cause.
Read the rest of this entry »
Between 10:30 PM and 11:59 PM Pacific time this Saturday night (December 6), all Tiger Technologies servers will be restarted. As a result, customer Web sites and e-mail service will be unavailable for about five minutes at some point during this period.
No e-mail will be lost, of course; incoming mail will just be delayed for a few minutes.
This brief maintenance is necessary to upgrade the operating system “Linux kernel” to a newer version for security reasons. We apologize for the inconvenience this causes.
Update: the maintenance was completed with less than five minutes of downtime.