High load on the “elzar” server (resolved)

The “elzar” Web hosting server experienced very high load between 9:07 and 9:14 AM Pacific time this morning (September 27, 2011), causing sites on that server to load slowly during those seven minutes. Other servers were not affected.

This was caused by a distributed denial of service (“DDOS”) attack against a site on that server. We manually blocked the attackers to resolve it, and we’re continuing to monitor it closely to make sure it doesn’t recur.

2011 server upgrades

Over the next four weeks, we’ll be migrating customer Web sites to upgraded servers. The servers have updated software (and upgraded hardware in some cases), and are also located in a data center with increased power reliability.

For most customers, these changes will be completely unnoticeable. However, a very small number of customers might notice software differences or experience up to five minutes total of “downtime” at some point. We recommend reading through this entire post for details.

Read the rest of this entry »

Outage at primary data center (resolved)

Between 6:00 AM and 6:29 AM Pacific time August 7, 2011, all services were unavailable due to a power failure at our primary data center.

The problem was resolved for most servers by 6:29 AM, and for all servers except the “amy” server by 6:53 AM. The “amy” server needed extra manual intervention, and was working by 7:55 AM. All services are now operating normally.

Any e-mail that arrived during the outage was queued at our secondary data center and delivered as soon as the outage ended.

We sincerely apologize for this problem. We know you count on us for reliability, and we don’t consider this acceptable, especially since the data center has had previous power problems this year. However, this incident had a different root cause. It wasn’t a utility power failure that the redundant UPS systems didn’t handle, but was instead caused by a circuit breaker incorrectly “tripping” to prevent the power output of the UPS systems from reaching the server cabinets.

Update 4:15 PM: We have received an incident report from the data center indicating that they are working to replace the affected part of the UPS system to prevent further problems.

Brief scheduled maintenance on pazuzu server (completed)

At approximately 11:00 PM Pacific time July 26 2011, the “pazuzu” Web server will be restarted.

As a result, for customers on the “pazuzu” server (only), Web site service and the ability to read incoming e-mail will be unavailable for approximately five minutes. Customers on other servers will not be affected.

Read the rest of this entry »

Perl software updated to fix security bug

We’ve updated our servers with a Perl security bug fix. This won’t affect most customers, but read on if you know you use Perl scripts on your site.

Read the rest of this entry »

DDoS attack on fry server (resolved)

The “fry” server was the victim of a high-bandwidth Distributed Denial of Service (DDoS) attack beginning at 3:11 AM Pacific time this morning. On that server, Web pages were intermittently slow to load or generated timeout errors. (Other servers were not affected.)

We’ve blocked the large number of IP addresses from the “botnet” attacking the server, and the issue was completely resolved by 4:19 AM Pacific time. Please accept our apologies if you noticed any problems with your site loading slowly during this period.

Comcast routing problems (resolved)

Some Comcast users in California had trouble connecting to some of our servers beginning at around 7 AM Pacific May 21, 2011. Non-Comcast users were not affected at all, and even people who were affected were able to reach some of our servers with no trouble.

This was caused by a technical problem at Comcast, and not related to us specifically. It appears Comcast was incorrectly filtering some combinations of IP addresses and ports in one of their California network routers, preventing their customers from reaching some sites.

The issue was apparently resolved by Comcast at 10:07 AM Pacific time, and we are not aware of any ongoing problems. As always, don’t hesitate to contact us if you have any trouble.

Brief scheduled maintenance on pazuzu server (completed)

At approximately 11:00 PM Pacific time tonight, May 14, the “pazuzu” Web server will be restarted.

As a result, for customers on the “pazuzu” server (only), Web site service and the ability to read incoming e-mail will be unavailable for approximately five minutes. Customers on other servers will not be affected.

Read the rest of this entry »

High packet loss for some connections (resolved)

A router failure at an upstream Internet “peer” that we connect to caused high packet loss for some Internet connections between 5:18 PM and 5:28 PM Pacific time.

The packet loss grew worse through that period until it exceeded 25%, which is enough to cause pages to fail to load within a browser’s timeout period if your connection was one of the affected ones. (Connections that go through different routers were not affected.)

Network engineers have routed all connections around the failed hardware until it’s replaced, so the problem is resolved. If your part of the Internet was one of the affected ones, please accept our apologies for the problem.

Service outage May 6, 2011 (resolved)

May 6, 4:43 AM Pacific time: An outage at our primary data center caused a complete service interruption for all customers.

Update 5:08 AM: All services have been restored and are working normally.

Read the rest of this entry »