Calculon server restarted (resolved)

The “calculon” Web server needed to be restarted at 10:14 AM Pacific time, resulting in a five-minute interruption of service for Web sites and e-mail on that server.

Read the rest of this entry »

Webmail “Thread View” is now a preference

One of the features of our new(ish) Webmail system is “thread view”. This groups similar messages together based on their “Subject” and other headers, which can occasionally be useful if you’re trying to see all the replies to a particular message and you want them grouped together.

However, thread view has a potential downside: it you have several active threads going with several messages each, new messages can sometimes appear on the second page of the incoming mail screens, instead of the first page.

That’s not a problem if you’re expecting it. However, since we introduced the new Webmail system, we’ve had several complaints from customers who accidentally clicked “Switch to Thread View” without realizing what it does, then thought some of their incoming mail was missing because they aren’t used to looking for new mail on other pages. Since thread view is “remembered” even after you logout and login again, this caused some people a great deal of heartache.

From our logs, we’ve found that very few people actually use thread view. Because it seems to cause frequent problems and few people use it, we’ve made it an optional feature instead of being always enabled.

If (like most people) you don’t use thread view, you don’t need to do anything. If do you want to use thread view, it’s still available: just click “Preferences”, then click “Display Preferences”, then change “Show ‘Thread View’ Link” to “Yes”.

Zapp server temporarily unavailable (resolved)

The “zapp” Web server was unavailable between 8:20 and 8:40 Pacific time this morning. This resulted in an interruption of service for Web sites and e-mail on that server.

The problem was caused by a faulty hard disk in the RAID array (which theoretically shouldn’t cause a server to stop responding, but did). The hard disk has been removed from the array and will be replaced tonight at 10 PM. The server will be restarted at that time, resulting in about 4 minutes additional downtime.

We sincerely apologize for this problem. We will be investigating the root cause: it’s normal for hard drives to fail — we expect that occasionally — but it shouldn’t cause such negative effects (normally the RAID array would prevent the failure of any single drive from causing the entire machine to fail).

Mail problem this morning (resolved)

Between 5:58 and 6:26 AM Pacific time today (March 12), a network problem on one of our mail servers prevented some customers from being able to read and send e-mail.

The issue has been resolved and everything is working normally. Although incoming mail was delayed, no mail was lost. Web site service was not affected.

The cause of the problem was that a debugging tool used by one of our technicians (“tcpdump”), when used with certain options, can apparently cause network interface failures. This was not an issue we were previously aware of. We will avoid using the tool in that manner in the future, so the problem should not recur.

We regret the problem and sincerely apologize to our customers who were affected by this issue.

Brief scheduled maintenance on Saturday, March 1 (completed)

At approximately 11:00 PM Pacific time this Saturday night (March 1), all Tiger Technologies servers will be restarted. As a result, customer Web sites and e-mail service will be unavailable for three to five minutes.

No e-mail will be lost, of course; incoming mail will just be delayed for a few minutes.

This brief maintenance is necessary to upgrade the operating system “Linux kernel” to a newer version for security reasons. This was also done two weeks ago; unfortunately our operating system vendor has released an even newer kernel since then — it doesn’t usually happen this often.

We apologize for the inconvenience this causes.

(This maintenance was also successfully completed with less than four minutes of downtime per server.)

Brief scheduled maintenance on Saturday, February 16 (completed)

At approximately 11:00 PM Pacific time this Saturday night (February 16), all Tiger Technologies servers will be restarted. As a result, customer Web sites and e-mail service will be unavailable for three to five minutes.

No e-mail will be lost, of course; incoming mail will just be delayed for a few minutes.

This brief maintenance is necessary to upgrade the operating system “Linux kernel” to a newer version for security reasons. We apologize for the inconvenience this causes.

(This maintenance was successfully completed with less than four minutes of downtime per server.)

MySQL and PHP 5 Security Updates

We’ve installed MySQL and PHP 5 security updates. Customers should not notice any changes; the updates just fix several security issues in PHP 5 and MySQL.

The updates were performed in such a way that new Web server connections were delayed during the 30 seconds or so that PHP and MySQL were unavailable on each server. That should mean that as far as scripts on your Web site were concerned, there was zero downtime.

Read the rest of this entry »

Network outage followup

This is a followup to last night’s post about a network outage.

The root cause of the problem was the failure of an Ethernet switch at our data center. The switch was the one that our network cables actually plug into to connect to the Internet. Unfortunately, it’s one of the few pieces of the network infrastructure that’s not automatically redundant: although the “other side” of the switch is connected to multiple fully redundant upstream paths to the Internet, the side of it that goes to our server cabinets effectively has a single connection for each a group of servers.

When the switch failed, the data center staff replaced it with a new spare one. Because the faulty hardware was completely replaced, the problem is properly solved, and this won’t be something that’s an ongoing problem.

Read the rest of this entry »

Unscheduled network outage (resolved)

Between 9:52 and 11:06 PM Pacific time on January 10, a complete network failure at our primary data center caused an unscheduled outage that resulted in all services (all Web sites and e-mail) being unreachable from the Internet.

This problem has been resolved and all services are now available. We are waiting for a full report from the data center personnel so that we can determine the cause and ensure that it won’t recur.

We sincerely apologize to our customers who were affected by this. This kind of outage is not normal (it’s the longest outage we’ve experienced in more than four years), and we know it’s not acceptable to our customers who rely on our services. We’ll post a followup message with more details when they become available.

Update Friday 10 AM: As a clarification, we should also have originally mentioned that no e-mail is lost during this kind of outage: it’s delivered after the issue is resolved. While some messages were certainly delayed, they were all properly delivered afterward.

New locales available for scripts

A customer pointed out that our servers didn’t have many “locales” installed. A “locale” is a set of rules that apply to a language, region or culture — things like the language’s words for “January” and “Monday”, the way that dates are displayed, and the currency symbol used.

Read the rest of this entry »