Elliott C. Back: In Aere Aedificare

Wordpress 2.0 Upgrade Problems: Character Encoding + Transformation

Posted in Blogging, My Blog, Code, Internationalization, Wordpress, Encodings by Elliott Back on April 3rd, 2006.

The Wordpress 2.0 upgrade.php script in the wp-admin folder is designed to convert older copies of your database from older versions of Wordpress. However, there’s something wrong with the character encoding, or the collation (latin1_swedish_ci). Changing the collation after the fact doesn’t affect the improper characters being passed back, so I assume it would need to be set before migration. Here’s an example of what it looks like:

whoah-weird-wordpress-chars.jpg

After the damage has occured, the best way, it seems, to clean this up is to run multiple SQL queries in the phpmyadmin console emulating find/replace on the wp_posts table:

UPDATE wp_posts SET post_content = replace(post_content, “bad”, “good”)

This has to be done for each character. So far, I’ve noticed apostrophe marks, the left and right quotation marks, all kinds of dashes, and ellipsis are affected.  It’s like it got run through the WP filters before it went back into the database.

It’s weird that my UTF-8 encoding has switched itself over to ISO-8859-1 all by itself…

This entry was posted on Monday, April 3rd, 2006 at 5:55 pm and is tagged with sql queries, character encoding, collation, ellipsis, utf 8, older versions, dashes, quotation, migration, phpmyadmin, transformation, wp. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback.

3 Responses to 'Wordpress 2.0 Upgrade Problems: Character Encoding + Transformation'

  1. Isaac Bythewood said:

    on April 4th, 2006 at 1:51 pm

    I have upgraded 3 blogs now to WP2 and had none of these problems… maybe they did something to the code since I last downloaded it but from what I have seen the upgrade works just fine from the last 1.whatever release to 2. Check out the blog I did at blog.overshard.com…. no problems at all… or none I have noticed.

  2. Character Encoding + Transformation « Netlex Toolbox said:

    on October 9th, 2006 at 7:24 am

    […] Wordpress 2.0 Upgrade Problems: Character Encoding + Transformation […]

  3. Cynthia Blue said:

    on July 9th, 2007 at 8:33 pm

    I am trying to verify my wordpress 2.1 and each time when I save a URL with &, it fails the validation. Says I need the & amp ; instead. This is only in the widgets (that are now standard). When I save it, it doesn’t take, and displays again as & and fails validation.

    How can I get the & amp ; to stick?

Your Thoughts Go Here:

Powered by WP Hashcash