Between 9:52 and 11:06 PM Pacific time on January 10, a complete network failure at our primary data center caused an unscheduled outage that resulted in all services (all Web sites and e-mail) being unreachable from the Internet.
This problem has been resolved and all services are now available. We are waiting for a full report from the data center personnel so that we can determine the cause and ensure that it won’t recur.
We sincerely apologize to our customers who were affected by this. This kind of outage is not normal (it’s the longest outage we’ve experienced in more than four years), and we know it’s not acceptable to our customers who rely on our services. We’ll post a followup message with more details when they become available.
Update Friday 10 AM: As a clarification, we should also have originally mentioned that no e-mail is lost during this kind of outage: it’s delivered after the issue is resolved. While some messages were certainly delayed, they were all properly delivered afterward.
From 12:51 to 12:54 PM Pacific time today, one of our inbound mail servers (mx2.tigertech.net) incorrectly rejected some incoming mail that wouldn’t normally have been rejected, due to a configuration problem. This resulted in a small handful of messages being returned to the sender instead of properly delivered.
Read the rest of this entry »
A problem with the NFS automounter software on our main FTP server (which also serves our own www.tigertech.net Web site) caused us to restart that server at about 2:30 PM Pacific time. This caused a brief interruption in service for customers uploading files via FTP or using our control panel. (The problem didn’t directly affect any customer Web sites or e-mail service.)
Everything is now working normally. Again, we apologize to anyone who was inconvenienced by this issue — this just hasn’t been our day.
As a result of an error on our part, a small handful of PHP 4 scripts on the “elzar” Web server may have displayed an error message or a blank page for up to 14 minutes today (from 12:48 to 1:02 PM Pacific time in the worst case). The problem has been resolved for any customers who were affected.
This happened because of a mistake we made in an upgrade to our sitewide PHP4/FastCGI configuration file, which our pre-upgrade testing failed to detect. We have added a new check to our automated testing system to ensure this cannot happen again.
We sincerely apologize to any customers affected by this problem.
The “farnsworth” and “lrrr” Web hosting servers will be restarted at approximately 11:00 PM Pacific time on Saturday October 13, and customer Web sites on those two servers will be unavailable for approximately five minutes. (See “Which server is my account on?” if you aren’t sure.) E-mail service, and customers on all other Web servers, will not be affected at all.
The restart is necessary so we can increase the memory (RAM) on those two servers to 4 GB, as we described here. After this, all our hosting servers will have 4 GB of memory.
By comparison with many other reasonably priced hosting companies, we keep the load on our servers pretty low to start with, so 4 GB is “overkill” probably 99.99% of the time — but we want to cover the other .01%. (Our unofficial motto should probably be something like “Tiger Technologies: We’re paranoid so you don’t have to be.”)
We apologize for any inconvenience this may cause.
Update: Note that the maintenance time has been changed from 10:00 PM to 11:00 PM.
In an earlier post, we talked about how we use a monitoring system that forwards test e-mail to other large ISPs, then checks to make sure the message was promptly delivered.
We already check delivery to AOL, Comcast, GMail and Verizon, and we’ve now added AT&T/SBCGlobal and Yahoo mail. We’ll continue to expand it in the future.
An extensive monitoring and alert system is at the heart of our reliability, really; it’s what allows us to know that things are working properly. We can guarantee that if our customers send mail to those ISPs right now, it’s being delivered. That’s something few other companies even bother to check.
Between 11:00 PM and 11:59 PM Pacific time on Saturday August 11, all Tiger Technologies Web hosting servers will be restarted. As a result, customer Web sites, as well as the Tiger Technologies Web site, will be unavailable for approximately five minutes. E-mail service will not be affected.
This brief maintenance is necessary for two reasons. First, we’re upgrading the operating system “Linux kernel” to a newer version for security reasons. Secondly, we’re adding more memory to our hosting servers, so that each server will have 4 GB of RAM instead of the current 2 GB.
Read the rest of this entry »
Since about 9:00 AM (Pacific time) this morning, we’ve been seeing network routing problems to some destinations on the Internet that use the “xo.net” backbone. For some customers, this will have the effect of making any access to your web site extremely slow — it may even be so slow as to seem completely non-responsive. Most customers will have no problems.
Our data center technicians are working on this problem. We’ll update this post as soon as the issue is resolved.
Update: This issue was resolved at approximately 10:20 AM, and all systems are operating normally.
Starting around 9:03 PM (Pacific time) tonight, our elzar server had an extremely high load placed upon it. The result was that many users may have had problems connecting to Web sites running on elzar. The problem occurred intermittently for about 30 minutes, at which point we managed to restore normal service.
Please be assured that we do our best to make sure that our server loads always stay within reasonable limits to avoid just this kind of problem. The load spike was unexpected and was well outside of the generous safety limits that we keep on each server. We will continue to monitor the situation, and will take corrective or preventive action if appropriate. We appreciate your business, and apologize for any inconvenience.
Between 4:33 and 4:41 PM Pacific time, we experienced a short-lived problem where users who reach our servers via an “Internet backbone” called Global Crossing (including Comcast and Charter cable customers) were unable to connect. Other users weren’t affected.
The problem lasted for less than ten minutes, and everything is now operating normally.