Migrating a wordpress blog – mysql charset problems and backup script


The, now previous, hosting company of my wife’s blog had a major data corruption and completely lost a years worth of database entries and files. There was no communication before we found the problem ourselves, so we were very unhappy and decided to reconstruct the host on another site.

Luckily I had set up WordPress to send a complete database dump weekly as tar.gz balls, so no database entries were lost. All uploaded images and such was permanently lost, but reconstructing this is much easier than reconstructing posts and comments.

Charset problems moving the site to another webhotel

After creating a backup of the files left on the old host I made a local copy to my computer and another copy to the new webhotel. After the DNS changes had gone through and had imported the database dump on the new hosts the only thing left was to edit wp-config.php with the new database settings… or so I thought. It turned out that the all the tables of the database were in charset latin1_swedish_ci, but some of the posts contained utf8 characters as well. The result was the all Danish letters and many special characters in english looked garbled on the blog.

After searching the web for hours through variations over simple search and replace, which I did not find feasible, I finally found the holy grail – the ‘replace’ command as part of the mysql (now mariadb) project. The following command corrected all entries in the sql file from the mix of different charsets to a consistent utf8 output that rendered beautifully on the website:

replace "CHARSET=latin1" "CHARSET=utf8" "SET NAMES latin1" "SET NAMES utf8" < database.sql > database_uft8.sql

To get the replace command on my Fedora 20 desktop I had to install the “mariadb-server” package (this could have been mysqldb-server on older installations before mysql was forked to mariadb):

tjansson@kelvin:~$ sudo yum install mariadb-server

Weekly full database dumps

Since the site is hosted on a webhotel I have no shell access and I am only allowed FTP and web interface access to the site. Luckily I can create full database dumps from within wordpress’s interface. In the administration menu under Tools->Backup a weekly dump can be mailed to an address. Since the dumps are compress text they are very small (few MB).

Weekly full FTP backup from web hotel

Beside the dump of the database this incident showed me that I cannot trust web hotels at all and I needed a completely FTP dump as well. For now I have created a complete dump of the website and a tar ball from this on a weekly basis. On a long perspective I might do the same using rsnapshot or incremental backups in tar, but for now the following is my solution. I have a shell script that updates a “current” local copy and creates a tar ball of this on a weekly basis. As only new files are transferred the whole script is done in a few minutes (excluding the initial transfer of course).

lftp -e "mirror --parallel=4 --use-cache --verbose=0 --continue --delete / $backuppath/current; exit" -u username,password ftpsite.com
tar czf $backuppath/ftpsite_files_$(date --iso).tar.gz $backuppath/current

I run the this script at 3:00 in the morning on the first day of every week in cron (crontab -e):

00 3 * * 1  /home/tjansson/bin/website-backup-ftpsite.sh


stackoverflow.com – why-does-mysql-use-latin1-swedish-ci-as-the-default

This entry was posted in Articles, Computer. Bookmark the permalink.

Leave a Reply