Problem on web03 server (resolved)

Web sites on the web03 server suffered an interruption in service between 7:32 AM and 7:45 AM this morning (Tuesday, February 21).

This was caused by a “hung” process that prevented a routine Apache Web server reload from completing. Other servers were not affected. Our staff restarted the server to stop the “hung” process, and the problem was resolved.

We sincerely apologize to customers affected by this incident. We’re considering possible underlying causes to prevent a recurrence.

Brief scheduled maintenance February 18, 2012 (completed)

On Saturday, February 18, 2012 between 10:00 and 11:00 PM Pacific time, we’ll be upgrading the Apache Web server software on each of our Web servers.

Most customers will not notice anything, but the upgrade will cause approximately 30 seconds of slow Web page loading at some point during that hour as we delay incoming connections at the network level.

This maintenance is necessary to apply security and reliability fixes released by the Apache developers. (We’ve been using the upgraded version on our Webmail servers for several days, so it’s well tested.)

Update: The maintenance was completed at 10:03 PM Pacific time.

President’s Day 2012 holiday hours

Our business offices will be closed on Monday, February 20 to observe the US legal holiday. As always, we’ll provide same-day support for time-sensitive issues via our ticket and e-mail systems. However, questions that aren’t time-sensitive (including most billing matters) may not be answered until the next day, and telephone support (via callbacks) will be available only for urgent problems.

Beware of strangers asking you to install software

Over the past week, we’ve seen customers falling victim to two separate scams that allowed strangers to gain access to their site by installing malicious software.

One of these involves a fake ad agency, and the other involves offers to upgrade outdated software on your site. Don’t fall for these!

Read the rest of this entry »

web05 server high load (resolved)

The disk load on the “web05” server was extremely high between 2:30 and 2:42 AM Pacific time Saturday February 4, causing some downtime during that period for sites using that server. Other servers were not affected.

Read the rest of this entry »

Stability improvements for a server memory problem

A couple of days ago, one of our Web servers became unstable for an unknown reason and needed to be restarted. This is rare: on average, this happens less than once every five years of uptime per server, so we took it very seriously and launched an investigation.

What we found was that the owner of one of the sites on that server made a mistake that allowed attackers to run their own scripts. That’s all too common, unfortunately, but usually only the single site is affected by this kind of thing. What was surprising in this case was that the script used a previously unknown method of causing problems for other sites running on the server.

As a result of this investigation, we’ve made several changes to our systems to ensure the problem won’t recur. The rest of this post has a detailed technical description of the problem in case it’s useful for others.

Read the rest of this entry »

web07 server restart on February 1, 1012 (resolved)

Our “web07” server needed restarting at 11:36 AM Pacific time on February 1, 2012, because it had been intermittently unable to run some PHP scripts for 22 minutes.

The restart resolved the immediate problem, and a followup post explains what happened and the changes we made to prevent it from happening again.