<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	>

<channel>
	<title>Tiger Technologies Blog &#187; System Status</title>
	<atom:link href="http://blog.tigertech.net/category/system-status/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.tigertech.net</link>
	<description>Behind the scenes at tigertech.net</description>
	<pubDate>Thu, 15 May 2008 05:40:55 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.5.1</generator>
	<language>en</language>
			<item>
		<title>Brief power interruption for some servers (Resolved)</title>
		<link>http://blog.tigertech.net/posts/brief-power-interruption/</link>
		<comments>http://blog.tigertech.net/posts/brief-power-interruption/#comments</comments>
		<pubDate>Thu, 15 May 2008 00:15:20 +0000</pubDate>
		<dc:creator>Ken</dc:creator>
		
		<category><![CDATA[System Status]]></category>

		<category><![CDATA[status]]></category>

		<guid isPermaLink="false">http://blog.tigertech.net/?p=116</guid>
		<description><![CDATA[This afternoon at 3:49 PM (Pacific time), one of the cabinets at our data center tripped a circuit breaker, causing all of the servers in that cabinet to lose power. Power was restored nine minutes later.
Customer Web sites on the calculon, lrrr, and zapp Web servers were unavailable during this time. The ability to send [...]]]></description>
			<content:encoded><![CDATA[<p>This afternoon at 3:49 PM (Pacific time), one of the cabinets at our data center tripped a circuit breaker, causing all of the servers in that cabinet to lose power. Power was restored nine minutes later.</p>
<p>Customer Web sites on the calculon, lrrr, and zapp Web servers were unavailable during this time. The ability to send and receive e-mail was also interrupted (no mail was lost, of course). Other servers were not affected.</p>
<p>We pay close attention to the power load in each cabinet to avoid this sort of problem. The previously measured peak load of that cabinet had been 12 amps. Since the circuit allows 15 amps, this issue surprised us (we&#8217;ve been using the same setup in the same data center for seven years and this has never happened before). It appears that a combination of several servers experiencing unusually high CPU loads led to power usage beyond what we previously considered possible.</p>
<p>We will take immediate steps to make sure the problem doesn&#8217;t happen again, and we sincerely apologize to customers who were affected by this incident.</p>
<p><em>Update 7:26 PM: We have removed a server from the cabinet in question, lowering the power use.</em></p>
<p><em>Update 10:38 PM: We have removed a second server from the cabinet, ensuring that power use is well below any level that could cause further trouble. The problem will not recur.</em></p>
]]></content:encoded>
			<wfw:commentRss>http://blog.tigertech.net/posts/brief-power-interruption/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Temporary overload on “elzar” server (resolved)</title>
		<link>http://blog.tigertech.net/posts/temporary-overload-on-%e2%80%9celzar%e2%80%9d-server-resolved/</link>
		<comments>http://blog.tigertech.net/posts/temporary-overload-on-%e2%80%9celzar%e2%80%9d-server-resolved/#comments</comments>
		<pubDate>Fri, 11 Apr 2008 18:30:31 +0000</pubDate>
		<dc:creator>Ken</dc:creator>
		
		<category><![CDATA[System Status]]></category>

		<category><![CDATA[status]]></category>

		<guid isPermaLink="false">http://blog.tigertech.net/?p=114</guid>
		<description><![CDATA[Starting at 10:14 AM this morning, our elzar server experienced an unexpectedly high server load that effectively made some processes on the server unusable for about 10 minutes.
Web sites using scripts or databases on the elzar server may have seemed unresponsive during that time. Also, any customer hosted on elzar who was reading their e-mail [...]]]></description>
			<content:encoded><![CDATA[<p>Starting at 10:14 AM this morning, our <a href="http://blog.tigertech.net/posts/which-server/">elzar</a> server experienced an unexpectedly high server load that effectively made some processes on the server unusable for about 10 minutes.</p>
<p>Web sites using scripts or databases on the elzar server may have seemed unresponsive during that time. Also, any customer hosted on elzar who was reading their e-mail during this time may have felt the system was slow or unresponsive (no e-mail was lost, of course).</p>
<p>Customers on other servers were not affected.</p>
<p><span id="more-114"></span></p>
<p>The problem was related to an extremely high number of Apache Web server processes running. We&#8217;ve changed some Apache configuration settings (lowering the maximum number of Apache processes and reducing some timeout settings so that fewer processes are required in the first place) on all our servers. We expect those change will prevent this symptom from happening again in the future.</p>
<p>We apologize for any inconvenience that this may have caused.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.tigertech.net/posts/temporary-overload-on-%e2%80%9celzar%e2%80%9d-server-resolved/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Short outage on &#8220;farnsworth&#8221; server (resolved)</title>
		<link>http://blog.tigertech.net/posts/2008-04-03-farnworth-outage/</link>
		<comments>http://blog.tigertech.net/posts/2008-04-03-farnworth-outage/#comments</comments>
		<pubDate>Thu, 03 Apr 2008 18:48:19 +0000</pubDate>
		<dc:creator>Robert Mathews</dc:creator>
		
		<category><![CDATA[System Status]]></category>

		<guid isPermaLink="false">http://blog.tigertech.net/?p=112</guid>
		<description><![CDATA[The &#8220;farnsworth&#8221; Apache Web server had an outage lasting approximately five minutes at 11:36 AM Pacific time today, resulting in an interruption of service for Web sites and e-mail on that server. Other servers were not affected.
The problem occurred when the Apache web server process failed to gracefully restart when a new SSL certificate was [...]]]></description>
			<content:encoded><![CDATA[<p>The <a href="http://blog.tigertech.net/posts/which-server/">&#8220;farnsworth&#8221;</a> Apache Web server had an outage lasting approximately five minutes at 11:36 AM Pacific time today, resulting in an interruption of service for Web sites and e-mail on that server. Other servers were not affected.</p>
<p>The problem occurred when the Apache web server process failed to gracefully restart when a new SSL certificate was added. We have discovered why this happened and will take steps to prevent it in the future.</p>
<p>We sincerely apologize to anyone affected by this.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.tigertech.net/posts/2008-04-03-farnworth-outage/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Mailman server problem this morning (resolved)</title>
		<link>http://blog.tigertech.net/posts/mailman-server-problem-this-morning-resolved/</link>
		<comments>http://blog.tigertech.net/posts/mailman-server-problem-this-morning-resolved/#comments</comments>
		<pubDate>Mon, 24 Mar 2008 18:00:33 +0000</pubDate>
		<dc:creator>Ken</dc:creator>
		
		<category><![CDATA[System Status]]></category>

		<category><![CDATA[e-mail]]></category>

		<category><![CDATA[mailman]]></category>

		<category><![CDATA[status]]></category>

		<guid isPermaLink="false">http://blog.tigertech.net/posts/mailman-server-problem-this-morning-resolved/</guid>
		<description><![CDATA[Between 4:58 and 5:39 AM Pacific time today (March 23), our server which runs the Mailman mailing list software encountered an internal problem. During most of this time, all Mailman-related functionality was unavailable.
Since Mailman most works via e-mail, no data was lost. Some messages might have been slightly delayed, but not for any longer than [...]]]></description>
			<content:encoded><![CDATA[<p>Between 4:58 and 5:39 AM Pacific time today (March 23), our server which runs the Mailman mailing list software encountered an internal problem. During most of this time, all Mailman-related functionality was unavailable.</p>
<p>Since Mailman most works via e-mail, no data was lost. Some messages might have been slightly delayed, but not for any longer than might normally be noticed with mail delivery via the Internet.</p>
<p>We apologize for any inconvenience that this might have caused!</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.tigertech.net/posts/mailman-server-problem-this-morning-resolved/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Calculon server restarted (resolved)</title>
		<link>http://blog.tigertech.net/posts/calculon-server-restarted-resolved/</link>
		<comments>http://blog.tigertech.net/posts/calculon-server-restarted-resolved/#comments</comments>
		<pubDate>Thu, 20 Mar 2008 17:44:37 +0000</pubDate>
		<dc:creator>Robert Mathews</dc:creator>
		
		<category><![CDATA[System Status]]></category>

		<category><![CDATA[status]]></category>

		<guid isPermaLink="false">http://blog.tigertech.net/posts/calculon-server-restarted-resolved/</guid>
		<description><![CDATA[The &#8220;calculon&#8221; Web server needed to be restarted at 10:14 AM Pacific time, resulting in a five-minute interruption of service for Web sites and e-mail on that server.

The problem was caused while debugging high bandwidth usage on that server; running the Linux &#8220;iptraf&#8221; command caused the server to instantly lose network connectivity for some reason. [...]]]></description>
			<content:encoded><![CDATA[<p>The <a href="/posts/which-server/">&#8220;calculon&#8221;</a> Web server needed to be restarted at 10:14 AM Pacific time, resulting in a five-minute interruption of service for Web sites and e-mail on that server.</p>
<p><span id="more-109"></span></p>
<p>The problem was caused while debugging high bandwidth usage on that server; running the Linux &#8220;iptraf&#8221; command caused the server to instantly lose network connectivity for some reason. We will not use that command again, obviously.</p>
<p>We apologize to anyone affected by this problem.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.tigertech.net/posts/calculon-server-restarted-resolved/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Zapp server temporarily unavailable (resolved)</title>
		<link>http://blog.tigertech.net/posts/zapp-server-temporarily-unavailable-resolved/</link>
		<comments>http://blog.tigertech.net/posts/zapp-server-temporarily-unavailable-resolved/#comments</comments>
		<pubDate>Tue, 18 Mar 2008 16:32:48 +0000</pubDate>
		<dc:creator>Robert Mathews</dc:creator>
		
		<category><![CDATA[System Status]]></category>

		<category><![CDATA[status]]></category>

		<guid isPermaLink="false">http://blog.tigertech.net/posts/zapp-server-temporarily-unavailable-resolved/</guid>
		<description><![CDATA[The &#8220;zapp&#8221; Web server was unavailable between 8:20 and 8:40 Pacific time this morning. This resulted in an interruption of service for Web sites and e-mail on that server.
The problem was caused by a faulty hard disk in the RAID array (which theoretically shouldn&#8217;t cause a server to stop responding, but did). The hard disk [...]]]></description>
			<content:encoded><![CDATA[<p>The <a href="/posts/which-server/">&#8220;zapp&#8221;</a> Web server was unavailable between 8:20 and 8:40 Pacific time this morning. This resulted in an interruption of service for Web sites and e-mail on that server.</p>
<p>The problem was caused by a faulty hard disk in the RAID array (which theoretically shouldn&#8217;t cause a server to stop responding, but did). The hard disk has been removed from the array and will be replaced tonight at 10 PM. The server will be restarted at that time, resulting in about 4 minutes additional downtime.</p>
<p>We sincerely apologize for this problem. We will be investigating the root cause: it&#8217;s normal for hard drives to fail &#8212; we expect that occasionally &#8212; but it shouldn&#8217;t cause such negative effects (normally the RAID array would prevent the failure of any single drive from causing the entire machine to fail).</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.tigertech.net/posts/zapp-server-temporarily-unavailable-resolved/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Mail problem this morning (resolved)</title>
		<link>http://blog.tigertech.net/posts/mail-problem-2008-03-12/</link>
		<comments>http://blog.tigertech.net/posts/mail-problem-2008-03-12/#comments</comments>
		<pubDate>Wed, 12 Mar 2008 14:06:37 +0000</pubDate>
		<dc:creator>Robert Mathews</dc:creator>
		
		<category><![CDATA[System Status]]></category>

		<category><![CDATA[e-mail]]></category>

		<category><![CDATA[status]]></category>

		<guid isPermaLink="false">http://blog.tigertech.net/posts/mail-problem-2008-03-12/</guid>
		<description><![CDATA[Between 5:58 and 6:26 AM Pacific time today (March 12), a network problem on one of our mail servers prevented some customers from being able to read and send e-mail.
The issue has been resolved and everything is working normally. Although incoming mail was delayed, no mail was lost. Web site service was not affected.
The cause [...]]]></description>
			<content:encoded><![CDATA[<p>Between 5:58 and 6:26 AM Pacific time today (March 12), a network problem on one of our mail servers prevented some customers from being able to read and send e-mail.</p>
<p>The issue has been resolved and everything is working normally. Although incoming mail was delayed, no mail was lost. Web site service was not affected.</p>
<p>The cause of the problem was that a debugging tool used by one of our technicians (&#8221;tcpdump&#8221;), when used with certain options, can apparently cause network interface failures. This was not an issue we were previously aware of. We will avoid using the tool in that manner in the future, so the problem should not recur.</p>
<p>We regret the problem and sincerely apologize to our customers who were affected by this issue.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.tigertech.net/posts/mail-problem-2008-03-12/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Brief scheduled maintenance on Saturday, March 1 (completed)</title>
		<link>http://blog.tigertech.net/posts/2008-03-01-maintenance/</link>
		<comments>http://blog.tigertech.net/posts/2008-03-01-maintenance/#comments</comments>
		<pubDate>Fri, 29 Feb 2008 21:26:20 +0000</pubDate>
		<dc:creator>Robert Mathews</dc:creator>
		
		<category><![CDATA[System Status]]></category>

		<category><![CDATA[linux]]></category>

		<category><![CDATA[maintenance]]></category>

		<category><![CDATA[status]]></category>

		<guid isPermaLink="false">http://blog.tigertech.net/posts/2008-03-01-maintenance/</guid>
		<description><![CDATA[At approximately 11:00 PM Pacific time this Saturday night (March 1), all Tiger Technologies servers will be restarted. As a result, customer Web sites and e-mail service will be unavailable for three to five minutes.
No e-mail will be lost, of course; incoming mail will just be delayed for a few minutes.
This brief maintenance is necessary [...]]]></description>
			<content:encoded><![CDATA[<p>At approximately 11:00 PM Pacific time this Saturday night (March 1), all Tiger Technologies servers will be restarted. As a result, customer Web sites and e-mail service will be unavailable for three to five minutes.</p>
<p>No e-mail will be lost, of course; incoming mail will just be delayed for a few minutes.</p>
<p>This brief maintenance is necessary to upgrade the operating system “Linux kernel” to a newer version for security reasons. This was also done two weeks ago; unfortunately our operating system vendor has released an even newer kernel since then &#8212; it doesn&#8217;t usually happen this often.</p>
<p>We apologize for the inconvenience this causes.</p>
<p><em>(This  maintenance was also successfully completed with less than four minutes of downtime per server.)</em></p>
]]></content:encoded>
			<wfw:commentRss>http://blog.tigertech.net/posts/2008-03-01-maintenance/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Brief scheduled maintenance on Saturday, February 16 (completed)</title>
		<link>http://blog.tigertech.net/posts/2008-02-16-maintenance/</link>
		<comments>http://blog.tigertech.net/posts/2008-02-16-maintenance/#comments</comments>
		<pubDate>Fri, 15 Feb 2008 00:20:34 +0000</pubDate>
		<dc:creator>Robert Mathews</dc:creator>
		
		<category><![CDATA[System Status]]></category>

		<category><![CDATA[linux]]></category>

		<category><![CDATA[maintenance]]></category>

		<category><![CDATA[status]]></category>

		<guid isPermaLink="false">http://blog.tigertech.net/posts/2008-02-16-maintenance/</guid>
		<description><![CDATA[At approximately 11:00 PM Pacific time this Saturday night (February 16), all Tiger Technologies servers will be restarted. As a result, customer Web sites and e-mail service will be unavailable for three to five minutes.
No e-mail will be lost, of course; incoming mail will just be delayed for a few minutes.
This brief maintenance is necessary [...]]]></description>
			<content:encoded><![CDATA[<p>At approximately 11:00 PM Pacific time this Saturday night (February 16), all Tiger Technologies servers will be restarted. As a result, customer Web sites and e-mail service will be unavailable for three to five minutes.</p>
<p>No e-mail will be lost, of course; incoming mail will just be delayed for a few minutes.</p>
<p>This brief maintenance is necessary to upgrade the operating system &#8220;Linux kernel&#8221; to a newer version for security reasons. We apologize for the inconvenience this causes.</p>
<p><em> (This  maintenance was successfully completed with less than four minutes of downtime per server.)</em></p>
]]></content:encoded>
			<wfw:commentRss>http://blog.tigertech.net/posts/2008-02-16-maintenance/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Network outage followup</title>
		<link>http://blog.tigertech.net/posts/network-outage-followup/</link>
		<comments>http://blog.tigertech.net/posts/network-outage-followup/#comments</comments>
		<pubDate>Fri, 11 Jan 2008 22:46:24 +0000</pubDate>
		<dc:creator>Robert Mathews</dc:creator>
		
		<category><![CDATA[System Status]]></category>

		<category><![CDATA[status]]></category>

		<guid isPermaLink="false">http://blog.tigertech.net/posts/network-outage-followup/</guid>
		<description><![CDATA[This is a followup to last night&#8217;s post about a network outage.
The root cause of the problem was the failure of an Ethernet switch at our data center. The switch was the one that our network cables actually plug into to connect to the Internet. Unfortunately, it&#8217;s one of the few pieces of the network [...]]]></description>
			<content:encoded><![CDATA[<p>This is a followup to <a href="/posts/unscheduled-network-outage-resolved/">last night&#8217;s post</a> about a network outage.</p>
<p>The root cause of the problem was the failure of an Ethernet switch at our data center. The switch was the one that our network cables actually plug into to connect to the Internet. Unfortunately, it&#8217;s one of the few pieces of the network infrastructure that&#8217;s not automatically redundant: although the &#8220;other side&#8221; of the switch is connected to multiple fully redundant upstream paths to the Internet, the side of it that goes to our server cabinets effectively has a single connection for each a group of servers.</p>
<p>When the switch failed, the data center staff replaced it with a new spare one. Because the faulty hardware was completely replaced, the problem is properly solved, and this won&#8217;t be something that&#8217;s an ongoing problem.</p>
<p><span id="more-99"></span></p>
<p>Just so it&#8217;s clear, we own and operate all our own servers, but we house these servers in a professional data center that provides provide uninterruptible electrical power, cooling, and extremely fast network connectivity. The data center has engineers on site 24 hours a day, 365 days a year to handle this kind of issue, and they started working on it the minute the outage started. That said, we&#8217;re disappointed that the problem wasn&#8217;t resolved sooner.</p>
<p>We&#8217;ve used the same data center for many years, and the small number of problems we&#8217;ve experienced have usually been taken care of very quickly. We expect that generally good performance to continue, but we will take appropriate remedial action if it does not.</p>
<p>Again, we apologize to our customers for the inconvenience caused by this outage.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.tigertech.net/posts/network-outage-followup/feed/</wfw:commentRss>
		</item>
	</channel>
</rss>

<!-- Dynamic Page Served (once) in 0.355 seconds -->
