Converting a Few MySQL Columns to UTF8

Sometimes things just break.

Today, I ran across a bit of a situation on a client’s website. For some odd reason, out of nowhere, special characters were converted to question mark symbols — anything from quotation marks to diacritical marks. Using the W3 Validator, it spat out:

Sorry! This document can not be checked.

Strange. Well, not the part about not being able to be checked by a validator, but the part about a seemingly random change in the client’s website. Validator’s tend to give up at the first sight of invalid values in character encoding. Which is totally fine, it does tell us where the first instance of a no-no is. So, keyboard in hand (or on desk, rather), I did a bit of research. I couldn’t really find much and the only thing that set this website apart from the rest on the server was that it’s powered by the Textpattern CMS. Importing back ups of the database yielded no result; the database was perfectly fine at first glance. Digging further, I decided I’d take a look at the tables and columns within the database. Long story short, here’s what I did to fix the special character problem:

USE txpdatabase;
ALTER TABLE txp_textpattern CHARACTER SET utf8;

SHOW COLUMNS FROM txp_textpattern;

ALTER TABLE txp_textpattern CHANGE Body_html Body_html mediumtext character set utf8;
ALTER TABLE txp_textpattern CHANGE Excerpt_html Excerpt_html mediumtext character set utf8;
ALTER TABLE txp_textpattern CHANGE Excerpt Excerpt mediumtext character set utf8;

The above is showing me use a database called txpdatabase. I then take a look at the tables that store the blog posts (where the special characters were popping out of) and find the one I need to edit. Then, I look at the columns within that table, which is txp_textpattern, and see which of these columns are giving me the problem. As shown above, I manually set the Body_html, Excerpt_html and Excerpt columns to use the UTF-8 character set.

As I read in the helpful WordPress Codex, sometimes that will “mangle” the text and you’ll need to convert the columns to their binary formats and then back to their original state. While I didn’t need to do that, it’s a good idea to check that link and see what additional steps you may need to do to fix the problem. Of course, WordPress and Textpattern tables differ but the process is essentially the same. Now onto some new issues!