First dump initial, non-utf database:
mysqldump -uUSER -pPASSWORD --default-character-set=latin1 --skip-set-charset dbname > dumpfile.sql
then (linux) change all latin1 in dumpfile.sql to utf8
sed 's/latin1/utf8/g' dumpfile.sql > dumpfile_collate_utf8.sql
Optionally gzip and send to localpc
gzip -d dumpfile_collate_utf8.sql.gz
Recreate same database with utf8 collation and import dumpfile_collate_utf8:
mysql -uUSER -pPASSWORD --execute="DROP DATABASE dbname;
CREATE DATABASE dbname CHARACTER SET utf8 COLLATE utf8_general_ci;"
mysql -uUSER dbname < dumpfile_collate_utf8.sql
Now the next part, get on Rails
I've set RAILS_GEM_VERSION = '1.1.6' to be sure we are in non unicode.
1. Create controller with:
require 'iconv'
require 'singleton'
class ConverterController < ApplicationController
end
Inside create two actions:
def convertme
rows = Model.find(:all)
for row in rows
row.title = iconvert(row.title, "latin1", "utf8")
# replace latin1 with appropriate encoding
row.save!
end
end
def iconvert(str, encoding_from, encoding_to = "utf8")
i = Iconv.new encoding_to, encoding_from
utf_str = ""
begin
utf_str << i.iconv(str)
rescue Exception => e
utf_str << e.success
ch, str = e.failed.split(//, 2)
utf_str << "?"
logger.info "FAIL !! char:#{ch}"
retry
end
return utf_str
end
Point browser/console to /converter !
Thats IT !
What doing iconvert: its taking string, trying to convert it to utf-8, if conversion fails (on some japanese/korean) it put "?" instead. Thats because some non-utf encoding cannot be converted to utf-8(needs to be utf-16 or utf-32).
Also in case of large database conversion, i suggest to add column to database table, like "utf_converted" and do conditionally find(:all) and then row.utf_converted = true before save, just to be safe if something fail, to do not convert twice.