Brief scheduled maintenance on Saturday, August 11

Between 11:00 PM and 11:59 PM Pacific time on Saturday August 11, all Tiger Technologies Web hosting servers will be restarted. As a result, customer Web sites, as well as the Tiger Technologies Web site, will be unavailable for approximately five minutes. E-mail service will not be affected.

This brief maintenance is necessary for two reasons. First, we’re upgrading the operating system “Linux kernel” to a newer version for security reasons. Secondly, we’re adding more memory to our hosting servers, so that each server will have 4 GB of RAM instead of the current 2 GB.

Read the rest of this entry »

Slow server response for some customers (resolved)

Since about 9:00 AM (Pacific time) this morning, we’ve been seeing network routing problems to some destinations on the Internet that use the “xo.net” backbone. For some customers, this will have the effect of making any access to your web site extremely slow — it may even be so slow as to seem completely non-responsive. Most customers will have no problems.

Our data center technicians are working on this problem. We’ll update this post as soon as the issue is resolved.

Update: This issue was resolved at approximately 10:20 AM, and all systems are operating normally.

Elzar server temporarily unavailable (resolved)

Starting around 9:03 PM (Pacific time) tonight, our elzar server had an extremely high load placed upon it. The result was that many users may have had problems connecting to Web sites running on elzar. The problem occurred intermittently for about 30 minutes, at which point we managed to restore normal service.

Please be assured that we do our best to make sure that our server loads always stay within reasonable limits to avoid just this kind of problem. The load spike was unexpected and was well outside of the generous safety limits that we keep on each server. We will continue to monitor the situation, and will take corrective or preventive action if appropriate. We appreciate your business, and apologize for any inconvenience.

Temporary routing problem (resolved)

Between 4:33 and 4:41 PM Pacific time, we experienced a short-lived problem where users who reach our servers via an “Internet backbone” called Global Crossing (including Comcast and Charter cable customers) were unable to connect. Other users weren’t affected.

The problem lasted for less than ten minutes, and everything is now operating normally.

E-mail, zapp, lrrr, servers temporarily unavailable (resolved)

Due to a failure of the power distribution unit (essentially a fancy power strip) in one of the cabinets at our data center, the following services became unavailable at 05:52 AM Pacific time:

(Other Web servers are not affected.) A data center technician is replacing the power unit in that cabinet and all systems should be be back online within 15 minutes; we’ll update this post when that happens.

Update: The faulty hardware has been completely replaced. All servers are back online and functioning normally, and all queued e-mail has been delivered and is available for retrieval. The total outage for these servers was from 05:52 AM to 06:15 AM (Pacific time).

In addition, the FTP service on the “zapp” server was not fully working after it was restarted, so FTP publishing on that server was unavailable until shortly after 7:00 AM. This has been corrected (and the underlying problem that could cause incorrect startup was fixed).

We sincerely apologize to customers affected by this outage. This kind of issue has happened to us only once before in the last seven years (and that was with a different brand of power unit). Since the replacement power unit is brand new, we don’t expect the problem to recur.

Elzar server restarted

The “elzar” Web server stopped responding a few minutes ago under a heavy load on the MySQL database server, and had to be restarted. This resulted in an interruption of service for Web sites on that server.

We apologize for this problem; we’ll be investigating the issue further and monitoring the server closely to make sure it doesn’t recur.

Update 10:00 PM: The NFS network connection between ftp.tigertech.net and elzar wasn’t working properly even after the Web server was restarted, causing additional problems for customers publishing files. This problem has also been corrected.

A defense against some MySQL connection problems

A couple of times in the last week, we’ve seen one of our MySQL database servers have an unusually high number of connections. That’s a serious issue: If there are too many connections to a MySQL server, customer scripts won’t be able to connect to a database, so we’ve spent some time looking at the cause and fixing it.

Read the rest of this entry »

Brief scheduled maintenance (May 30)

The “farnsworth” Web server locked up and needed restarting again last night at about 9:04 PM Pacific time, causing another short outage for some customers. (A similar problem happened Monday night.) To make sure this doesn’t happen again, we’ll be replacing the entire server (switching it with a spare server) at about 11 PM (Pacific) tonight, which will result in about 5 minutes of downtime.

We’re also taking this opportunity to upgrade the hardware on one of our mail servers to allow for future growth; customers (even those with accounts on other Web servers besides the farnsworth server) may see a short (approximately 5 minute) interruption in their ability to retrieve e-mail between 11 PM and midnight.

We apologize for any inconvenience this causes — as always, we’re committed to the highest possible levels of reliability.

Web server outage for some customers

One of our Web servers (the “farnsworth” server) stopped responding at 7:07 PM Pacific time today, and needed to be forcibly restarted. This resulted in a Web server and FTP server outage of about 15 minutes for some customers, although most sites were unaffected.

After being restarted, the server is responding properly, but still showing a problem with one of the disks in its RAID array. Because of that, we plan to replace the disk to prevent future problems, meaning we’ll restart that server again later tonight (after 11 PM Pacific time).

We apologize to all customers affected; we strive hard to avoid this kind of problem.

Packet loss to some destinations (resolved)

We’re currently seeing about 15% “packet loss” from our data center to a handful of locations on the Internet (notably connections that go through the above.net backbone). Most people aren’t affected by this, but for those that are, this can cause connections to be slower than normal. We have a ticket open with the data center for this issue, and we’ll update this page when it’s resolved.

Update May 20: The packet loss problem was effectively resolved on Friday, although we’ve been monitoring the above.net backbone connection closely to ensure that there is no ongoing problem. Although we’ve seen a couple of short latency issues that we’re still following up with the data center about, customers are not experiencing any problems.