We posted earlier about a problem affecting the elzar Web server. While we were investigating the cause of that, the same thing happened on another Web server, “calculon”, causing a separate outage for customers on that server from 2:34 PM to 2:43 PM Pacific time this afternoon.
During this period, Web sites on that server were unavailable and incoming e-mail was delayed. (The Web server was slow for about six minutes after it was restarted, too.)
On both servers, high disk and memory usage caused the load to skyrocket to the point where they effectively stopped responding.
The good news is that we have narrowed down the cause, so it shouldn’t happen again. A bug in one of our maintenance programs that runs on each server was almost certainly responsible. The bug has been fixed.
We sincerely apologize for this issue, and regret the inconvenience it caused for customers hosted on these servers. Other servers were not affected.
The “elzar” Web server experienced high load between 5.40 and 6.00 AM Pacific time this morning, April 15. This resulted in slow Web sites and some interruption of service. (Some e-mail activity was delayed, but no e-mail was lost.)
We sincerely apologize for this problem. We consider this type of failure to be unacceptable, and are looking into the cause of the problem so that we can take the appropriate steps to prevent it from happening again.
At approximately 11:00 PM Pacific time on Friday, April 3, the “flexo”, “mom” and “elzar” servers will be restarted. As a result, Web site and e-mail service for some customers will be unavailable for approximately five minutes.
No e-mail will be lost, of course; incoming mail will just be slightly delayed.
We apologize for any inconvenience this may cause. This maintenance is necessary to install an updated “kernel” on our servers, as described in an earlier post.
Update: We’re also going to include the “zapp” server in this maintenance to replace a disk in the RAID array.
Update 2: The maintenance was completed with less than five minutes of “downtime”.
At approximately 11:00 PM Pacific time on Saturday, January 31, all of our Web hosting servers (except the “hypnotoad” and “mom” servers) will be restarted. As a result, Web site and e-mail service for some customers will be unavailable for approximately five minutes.
No e-mail will be lost, of course; incoming mail will just be delayed for a few minutes.
We apologize for any inconvenience this may cause. This maintenance is necessary to install an updated “kernel” on our servers, as described in an earlier maintenance announcement.
Update: the maintenance was successfully completed on all servers with less than 5 minutes of “downtime”.
At 11:00 PM Pacific time tonight (November 24), the “elzar” and “farnsworth” servers will be restarted. As a result, Web sites and e-mail service for customers using those servers will be unavailable for approximately five minutes.
Read the rest of this entry »
Starting at 10:14 AM this morning, our elzar server experienced an unexpectedly high server load that effectively made some processes on the server unusable for about 10 minutes.
Web sites using scripts or databases on the elzar server may have seemed unresponsive during that time. Also, any customer hosted on elzar who was reading their e-mail during this time may have felt the system was slow or unresponsive (no e-mail was lost, of course).
Customers on other servers were not affected.
Read the rest of this entry »
As a result of an error on our part, a small handful of PHP 4 scripts on the “elzar” Web server may have displayed an error message or a blank page for up to 14 minutes today (from 12:48 to 1:02 PM Pacific time in the worst case). The problem has been resolved for any customers who were affected.
This happened because of a mistake we made in an upgrade to our sitewide PHP4/FastCGI configuration file, which our pre-upgrade testing failed to detect. We have added a new check to our automated testing system to ensure this cannot happen again.
We sincerely apologize to any customers affected by this problem.
Starting around 9:03 PM (Pacific time) tonight, our elzar server had an extremely high load placed upon it. The result was that many users may have had problems connecting to Web sites running on elzar. The problem occurred intermittently for about 30 minutes, at which point we managed to restore normal service.
Please be assured that we do our best to make sure that our server loads always stay within reasonable limits to avoid just this kind of problem. The load spike was unexpected and was well outside of the generous safety limits that we keep on each server. We will continue to monitor the situation, and will take corrective or preventive action if appropriate. We appreciate your business, and apologize for any inconvenience.
The “elzar” Web server stopped responding a few minutes ago under a heavy load on the MySQL database server, and had to be restarted. This resulted in an interruption of service for Web sites on that server.
We apologize for this problem; we’ll be investigating the issue further and monitoring the server closely to make sure it doesn’t recur.
Update 10:00 PM: The NFS network connection between ftp.tigertech.net and elzar wasn’t working properly even after the Web server was restarted, causing additional problems for customers publishing files. This problem has also been corrected.