Having trouble with special characters in your PHP scripts?

After recent server updates, two customers contacted us with a problem: their pages that used correctly to display “unicode” non-ASCII characters (like “©” or “curly quotes”) started showing invalid characters like “�” instead.

If this happens to you, it’s likely to be caused by a bug in your script that’s only now visible because of a security change in recent MySQL database versions. For example, the problem happened to the two customers we mentioned because they were using old versions of the Joomla and TextPattern software. Updating each of those fixed it, so if you you have trouble, be sure you’re using the latest versions of any software like that.

What’s the technical problem?

If you’re interested in the details, the problem was that even though the databases were set to use “utf8” character encoding, the scripts weren’t telling PHP that MySQL is storing characters in utf8 format.

In older versions of the PHP MySQL code (which uses a software library called “libmysqlclient”), this often seemed to work, even though it shouldn’t have. Even if the script didn’t tell PHP it was using utf8, the output might look okay anyway. That’s because MySQL and PHP didn’t care much about the format of the bytes in the database, and often sent them to the browser without modification.

But recent versions of the MySQL and MariaDB database software don’t allow that. If a database row contains bytes that don’t match the character encoding that the PHP script says it’s using, the “invalid” characters will be replaced with characters that actually exist in that character set. (That change is because security problems have been found that are caused by accepting invalid character bytes.) But the default character set, “latin1”, doesn’t contain characters like “©” or “curly quotes” at all, so the replaced characters don’t display as you expect.

If you have this issue as a new problem with well-known software like TextPattern or Joomla, it’s likely that updating to the current version of that software will fix it, because the authors have probably fixed the problem in the last few years since this became known as a potential issue. If you wrote your own PHP script that connects to a utf8 database, you can fix this by making sure the script is using mysqli_set_charset or its equivalent correctly.

As always, our customers can contact us for help if they have trouble related to this that they can’t solve.