Due to software updates on our servers, most Web hosting customers will experience about ten minutes of scheduled maintenance downtime between 11 PM and 1 AM Pacific time starting on one of the following nights, depending on which server your site is on:
- Friday, August 22 (servers beginning with letter “l-z”)
- Saturday, August 23 (servers beginning with letter “a-k”)
(The servers named “bender” and “lrrr” have already been upgraded, and those customers are not affected.)
Read the rest of this entry »
Around 11:26 AM (Pacific time) this morning, one of our mail servers encountered an unusual load, became unresponsive, and needed to be restarted. This affected our users’ ability to read e-mail and to use our Webmail system for several minutes.
Read the rest of this entry »
The “calculon” Web server needed to be restarted at 1:36 Pacific time today, resulting in a five-minute interruption of service for Web sites and e-mail on that server.
Read the rest of this entry »
Due to what appears to be a DNS issue at a third party, a small number of messages that weren’t actually spam may have been incorrectly blocked by our mail filters over the last few hours.
We’ve made changes to our system to ignore these errors, making sure no other messages will be blocked.
The number of affected messages was small enough that this wasn’t an issue for most customers. However, if someone tells you they sent a message that was initially blocked with an error message about “red.uribl.com”, but which later went through without problems, this problem was the cause of that.
We sincerely apologize to anyone who had trouble.
This afternoon at 3:49 PM (Pacific time), one of the cabinets at our data center tripped a circuit breaker, causing all of the servers in that cabinet to lose power. Power was restored nine minutes later.
Customer Web sites on the calculon, lrrr, and zapp Web servers were unavailable during this time. The ability to send and receive e-mail was also interrupted (no mail was lost, of course). Other servers were not affected.
We pay close attention to the power load in each cabinet to avoid this sort of problem. The previously measured peak load of that cabinet had been 12 amps. Since the circuit allows 15 amps, this issue surprised us (we’ve been using the same setup in the same data center for seven years and this has never happened before). It appears that a combination of several servers experiencing unusually high CPU loads led to power usage beyond what we previously considered possible.
We will take immediate steps to make sure the problem doesn’t happen again, and we sincerely apologize to customers who were affected by this incident.
Update 7:26 PM: We have removed a server from the cabinet in question, lowering the power use.
Update 10:38 PM: We have removed a second server from the cabinet, ensuring that power use is well below any level that could cause further trouble. The problem will not recur.
Starting at 10:14 AM this morning, our elzar server experienced an unexpectedly high server load that effectively made some processes on the server unusable for about 10 minutes.
Web sites using scripts or databases on the elzar server may have seemed unresponsive during that time. Also, any customer hosted on elzar who was reading their e-mail during this time may have felt the system was slow or unresponsive (no e-mail was lost, of course).
Customers on other servers were not affected.
Read the rest of this entry »
The “farnsworth” Apache Web server had an outage lasting approximately five minutes at 11:36 AM Pacific time today, resulting in an interruption of service for Web sites and e-mail on that server. Other servers were not affected.
The problem occurred when the Apache web server process failed to gracefully restart when a new SSL certificate was added. We have discovered why this happened and will take steps to prevent it in the future.
We sincerely apologize to anyone affected by this.
Update: We have added a new automated step to our certificate installation process that checks for problems with SSL certificates, guaranteeing that this problem will not recur.
Between 4:58 and 5:39 AM Pacific time today (March 23), our server which runs the Mailman mailing list software encountered an internal problem. During most of this time, all Mailman-related functionality was unavailable.
Since Mailman most works via e-mail, no data was lost. Some messages might have been slightly delayed, but not for any longer than might normally be noticed with mail delivery via the Internet.
We apologize for any inconvenience that this might have caused!
The “calculon” Web server needed to be restarted at 10:14 AM Pacific time, resulting in a five-minute interruption of service for Web sites and e-mail on that server.
Read the rest of this entry »
The “zapp” Web server was unavailable between 8:20 and 8:40 Pacific time this morning. This resulted in an interruption of service for Web sites and e-mail on that server.
The problem was caused by a faulty hard disk in the RAID array (which theoretically shouldn’t cause a server to stop responding, but did). The hard disk has been removed from the array and will be replaced tonight at 10 PM. The server will be restarted at that time, resulting in about 4 minutes additional downtime.
We sincerely apologize for this problem. We will be investigating the root cause: it’s normal for hard drives to fail — we expect that occasionally — but it shouldn’t cause such negative effects (normally the RAID array would prevent the failure of any single drive from causing the entire machine to fail).