Brief scheduled maintenance for elzar and farnsworth servers (completed)

At 11:00 PM Pacific time tonight (November 24), the “elzar” and “farnsworth” servers will be restarted. As a result, Web sites and e-mail service for customers using those servers will be unavailable for approximately five minutes.

Read the rest of this entry »

Mom server temporarily unavailable (resolved)

The “mom” server experienced high load starting at around 10:45 AM (Pacific time) this morning. We restarted it just before 11:00, and it’s now working normally.

Read the rest of this entry »

Brief power interruption for some servers (resolved)

This morning at 12:11 AM (Pacific time), one of the cabinets at our data center tripped a circuit breaker, causing all of the servers in that cabinet to lose power. Power was restored at 12:18 AM.

Customer Web sites and e-mail on the bender, calculon, lrrr, and zapp Web servers were unavailable during this 7 minute period. The ability to send and receive e-mail was also interrupted (no mail was lost, of course).

We are investigating the root cause of this problem to prevent it from happening again.

Brief scheduled maintenance for calculon server (completed)

At approximately 11:00 PM Pacific time tonight (October 18), the “calculon” Web server will be restarted. As a result, Web sites and e-mail service for customers using that server will be unavailable for approximately five minutes.

Read the rest of this entry »

Calculon server temporarily unavailable (resolved)

The “calculon” Web server was unavailable between approximately 5:00 and 5:08 Pacific time this afternoon. This resulted in an interruption of service for Web sites on that server. (Some e-mail activity was delayed, but no e-mail was lost.)

We sincerely apologize for this problem! We consider this type of failure to be unacceptable, and are looking into the cause of the problem so that we can take the appropriate steps to prevent it from happening again.

Mail with blank contents for some customers (resolved)

Due to a problem with our spam filtering system, some customers received blank incoming messages between 11:35 and 11:53 AM (Pacific time) this morning.

A non-blank copy of these messages was also properly delivered (although with a delay), so no mail is missing.

We have permanently fixed the underlying cause of the problem, and we apologize for the concern and annoyance this caused.

Brief scheduled maintenance on Saturday, September 20 (completed)

At approximately 11:00 PM Pacific time this Saturday night (September 20), all Tiger Technologies servers will be restarted. As a result, customer Web sites and e-mail service will be unavailable for three to five minutes.

No e-mail will be lost, of course; incoming mail will just be delayed for a few minutes.

This brief maintenance is necessary to upgrade the operating system “Linux kernel” to a newer version for security reasons. We apologize for the inconvenience this causes.

Update: the maintenance was completed with less than three minutes “downtime” per server.

Farnsworth server restarted (resolved)

The network interface on the “farnsworth” Web server stopped responding at 6:38 AM Pacific time today, and the server needed to be manually restarted by our data center staff. The server was unavailable for 17 minutes, causing an interruption of service for Web sites on that server. It also prevented users of that server from reading incoming e-mail (such e-mail was delayed and delivered after the outage).

Other servers were not affected.

We sincerely apologize to anyone affected by this problem.

Calculon server problem (resolved)

The “calculon” Web server needed to be restarted at 12:40 AM Pacific time this morning due to extremely high load.

However, the server did not restart immediately, because it performed a time-consuming disk file system check (“fsck”) after the restart, causing an interruption in Web service and a delay in mail delivery for customers on that server (other servers were not affected).

The server finished its fsck check at 3:45 AM and is now working normally.

This is by far the longest outage we’ve experienced on a server in several years. I want to personally apologize to every affected customer: we don’t consider this kind of problem acceptable at all, and we deeply regret the downtime. We’ll be carefully reviewing this incident to see what we can learn from it in the future.

Routing problem for some Comcast customers (resolved)

Between 5:11 and 5:46 PM Pacific time today, some people who reach our servers via an “Internet backbone” called Global Crossing (including some Comcast cable customers) were unable to connect to our data center. Other users weren’t affected.

Global Crossing has apparently corrected the problem, and everything is now operating normally. We’ll continue to monitor this issue closely.

Read the rest of this entry »