Monday, July 18, 2011

Python mysqldb UnicodeDecodeError: 'ascii' codec can't decode byte


If you run into the error mentioned in the title of this post using python's mysqldb module version 1.2.1 or less, decode your data/query first:


(modifying 'utf8' to whatever encoding your data happens to be in)


So I was writing some code in python on Ubuntu, and it was working just fine. When I went to run it in RHEL, I got this error:

Traceback (most recent call last):
File "", line 50, in ?
File "/usr/lib64/python2.4/site-packages/MySQLdb/", line 146, in execute
query = query.encode(charset)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 223: ordinal not in range(128)

My first thought was that it was due to the incredibly old version of Python that ships with RHEL 5 (Python 2.4), but it didn't take me long to realize the problem was with the MySQLdb module itself. Ubuntu 10.10 ships with version 1.2.2 of that module, while RHEL 5 ships with version 1.2.1. A minor difference, but apparently in that time this bug was fixed:

Apparently MySQLdb 1.2.1 tries to indiscriminately encode the data to be put into the database to utf8 (well, at least when you specify utf8 as the database character set), without checking whether the string is already utf8 or not. My solution was just to decode my data from utf8 (to unicode) before passing it to my mysql query, at which point the encoding works just fine.

Like so (the first line's the relevant one):

query = ('INSERT INTO %(database)s (%(column)s) VALUES (%(value)s)' % {'database': database, 'column': column, 'value': mydata})

Of course, you should modify the 'utf8' part to whatever encoding your data is in.

Edit: If you're using MySQLdb.escape_string(), make sure you run that first before doing the decode, like so:



Post a Comment