farnsworth hosting server restarted

The “farnsworth” server was restarted at 11:45 PM Pacific time tonight, causing a brief 2 minute interruption in Web and e-mail service for customers on that server. Incoming mail was queued and delivered after the interruption.

Read the rest of this entry »

New feature: Live error logs

We’ve added a new feature to hosting accounts: Live, realtime access to the Apache Web server “error log”, both in the “My Account” control panel and as raw files you can access through FTP/ssh/etc.

To view the most recent 200 lines of the error log, login to the control panel (having trouble?), click “Statistics and Logs”, and look at the new “Web site error logs” section.

To download the full raw error log files, see this page.

We hope you find this useful!

Protection against viruses that steal FTP passwords

Recently, several customers have told us that pages on their Web sites have been modified without their knowledge. Upon investigation, the customers found their computers had been infected with a virus that steals saved FTP passwords, such as the “Gumblar” or Trojan.PWS.Tupai.A virus.

We’ve taken a step to protect you against this problem (described below), but it’s wise to protect yourself, too.

Read the rest of this entry »

Problem affecting two servers (resolved)

We posted earlier about a problem affecting the elzar Web server. While we were investigating the cause of that, the same thing happened on another Web server, “calculon”, causing a separate outage for customers on that server from 2:34 PM to 2:43 PM Pacific time this afternoon.

During this period, Web sites on that server were unavailable and incoming e-mail was delayed. (The Web server was slow for about six minutes after it was restarted, too.)

On both servers, high disk and memory usage caused the load to skyrocket to the point where they effectively stopped responding.

The good news is that we have narrowed down the cause, so it shouldn’t happen again. A bug in one of our maintenance programs that runs on each server was almost certainly responsible. The bug has been fixed.

We sincerely apologize for this issue, and regret the inconvenience it caused for customers hosted on these servers. Other servers were not affected.

Zapp server added to brief scheduled maintenance (completed)

As we’ve already posted, some of our Web servers will be restarted tonight at 11 PM Pacific time.

We’re adding the “zapp” Web server to that list so we can replace a RAID array disk that caused a problem on that server earlier today.

Update: The maintenance was completed with less than five minutes of “downtime”.

Brief scheduled maintenance Friday, April 3 (completed)

At approximately 11:00 PM Pacific time on Friday, April 3, the “flexo”, “mom” and “elzar” servers will be restarted. As a result, Web site and e-mail service for some customers will be unavailable for approximately five minutes.

No e-mail will be lost, of course; incoming mail will just be slightly delayed.

We apologize for any inconvenience this may cause. This maintenance is necessary to install an updated “kernel” on our servers, as described in an earlier post.

Update: We’re also going to include the “zapp” server in this maintenance to replace a disk in the RAID array.

Update 2: The maintenance was completed with less than five minutes of “downtime”.

Avoiding a Linux kernel 2.6.26 cgroup bug

We recently had a server that twice “crashed” and needed manually restarting. We’ve identified the cause of that problem — an apparent bug in Linux kernel version 2.6.26 — and made some changes to ensure that it doesn’t affect our customers again.

However, we didn’t find any information about this problem when searching the Internet, so we’re describing the details here in the hope that it helps someone else.

Read the rest of this entry »

Flexo server temporarily unavailable (resolved)

The “flexo” Web server was unavailable between 9:54 and 10:02 PM Pacific time tonight, March 28. This resulted in an interruption of service for Web sites on that server. (Some e-mail activity was delayed, but no e-mail was lost.)

We sincerely apologize for this problem. We consider this type of failure to be unacceptable, and are looking into the cause of the problem so that we can take the appropriate steps to prevent it from happening again.

Update: The problem happened a second time on March 31 from 6:22 to 6:31 AM. However, the second incident gave our engineers enough details to determine the cause (which we’ve reported in a subsequent blog post), and we have made a technical change that will prevent it from happening again.

favicon.ico files and WordPress

We host some pretty high-volume WordPress sites, and one of the questions that occasionally comes up is “How can I make WordPress faster?”. That’s really just another way of saying “What part of my WordPress site is slow?”, which translates to “What requests are using a lot of CPU time?”

This question is surprisingly difficult to answer, particularly because we encourage customers who run busy WordPress sites to use FastCGI and caching. A single FastCGI process can handle lots of different PHP requests, so it’s hard to break down which individual request used what amount of server resources.

To solve this problem, we recently patched our version of PHP to optionally log the CPU time used by each request, even under FastCGI, so we could see what was really happening (patch available here).

What we found was unexpected. On some busy WordPress sites, 20–30% of the CPU time was being used to handle requests for “favicon.ico”. What the deuce?!

Read the rest of this entry »

Change in secure SSL ciphers

We’ve made a technical change to the way our servers handle SSL connections (we’ve disabled 40 bit and 56 encryption ciphers). The change shouldn’t affect anyone, but we’re describing it here just for the record.

Read the rest of this entry »