Brief MySQL problem on web12 server (resolved)

Between 10:41 and 10:49 AM Pacific time today (March 25, 2014), some MySQL database queries on the web12 server ran very slowly and caused “timeouts” for scripts. (No other servers were affected.)

The problem is fixed, and we’ve added an automated check to prevent it from recurring.

For those interested in the MySQL technical details: We traced this problem to a query that remained in the “end” state for several minutes, causing other MySQL queries that perform INSERT or UPDATE operations to queue behind it in the “Waiting for query cache lock” state.

This is surprising. It should never happen, for two separate reasons. First of all, no query should stay in the “end” state for minutes like this unless there’s a bug in MySQL somewhere. However, other people have seen something similar.

Secondly, MySQL contains code that should prevent queries from remaining in the “Waiting for query cache lock” state for more than 1/20th of a second, even if another query holds the lock. However, that code contains its own bug and simply doesn’t work.

Because of all this, we’ve added code to our monitoring systems to quickly detect this if it happens again, killing a blocking query in the “end” state that holds the cache lock for more than a few seconds. This should prevent it from causing trouble again.

We apologize for the inconvenience this problem caused.