Data portability and data serialization in WordPress
Over at illuminea, we’ve been migrating WordPress sites for years. So we thought we’d seen it all when it came to WordPress migration. And yet, recently we encountered an issue that we had never dealt with before, that puts a snag in the ease with which you can move WordPress sites: data serialization.
One of the reasons we love WordPress is because it’s so future-proof. What I mean by that is that once you have created a WordPress site, you aren’t stuck where you are for the following reasons:
- Content is separate from form so at any point you can completely redesign your site while retaining content and link structure. This is important to ensure you don’t lose traffic from search engines and inbound links.
- As long as the WordPress software continues to be developed, it will continue to support modern web technologies so users are not left with an archaic system that, for example, adores IE7 but doesn’t know what HTML5 is.
- Your site is portable. You need to move a site to another server, or even another domain name? No problem. A few steps related to find-and-replace, exporting and importing the database, and setting up global 301 redirects (if you’ve changed domains) and you’re there.
All the above is true, except the data portability point. It’s not quite as true as I thought it was.
Data serialization and its impact on data portability
(I am a big newbie when it comes to data serialization, so the following is based on what I’ve managed to learn over the while and could very well be inaccurate/wrong.)
It seems that developers use data serialization in order to improve performance or efficiency in a database. Instead of creating a new table for data, or storing data in a table which could end up having wasteful empty fields, you can keep all the data in an array in one field. This leads to saved space and simplified SQL statements. Serialized data is stored and recovered using PHP’s serialize() and unserialize() functions. Terri Swallow explains how using serialized data for her multi-user plugin helped save her from creating hundreds of new options for every new user created.
Here’s an example of serialized data from Andrew Nacin:
// Result is a string we can unserialize into an array:
But there are serious drawbacks to using serialized data. One problem is that you can’t run a MySQL query on serialized data. The data would have to be pulled out of the table via PHP, unserialized, and then searched using PHP instead of MySQL (thanks to Catch My Fame for explaining this). So let’s say you are storing user data in serialized format, but you want to find out how many of your users are from Australia, you can’t get that info easily.
A serious problem, in my opinion, is the impact serialization has on data portability. It makes it near-impossible to port a WordPress site from one domain to another without losing data. Another possible, yet unlikely issue, is if you need to port data from WordPress into another programming language – admittedly a rare occurrence, but a possible one that needs to be considered.
There are developers out there who are very frustrated by the use of serialized data:
An anonymous comment left on the PHP manual entry for serialize():
Please! please! please! DO NOT serialize data and place it into your database. Serialize can be used that way, but that’s missing the point of a relational database and the datatypes inherent in your database engine. Doing this makes data in your database non-portable, difficult to read, and can complicate queries. If you want your application to be portable to other languages, like let’s say you find that you want to use Java for some portion of your app that it makes sense to use Java in, serialization will become a pain in the buttocks. You should always be able to query and modify data in the database without using a third party intermediary tool to manipulate data to be inserted…That’s not to say serialize() is useless. It’s not…Just don’t abuse serialize because the next guy who comes along will have a maintenance or migration nightmare.
Alexander Scammon: “WordPress does a silly thing. It has serialized data inside its database.”
Serialized data usage that can have a serious impact on WordPress portability:
- Some widget data (though for some reason we’ve never had issues retaining widgets via a database dump)
- Some theme settings
- Plugin data
It was related to serialized plugin data that we had our first encounter with data serialization. We had installed a plugin to help a client manage a technical issue on their site without having to get too technical. They entered hundreds of entries in the plugin, but upon moving the site from development to production, the data disappeared. By that point, there was no way to recover it unfortunately, but we had definitely learned a painful, important lesson.
So what do we do?
- Theme and plugin developers: please seriously consider whether the benefits of serializing your data outweigh the potential future problems it could cause.
- Theme and plugin developers: if you must use serialized data, please try to make sure there’s some way to export and import the data. Even a simple .csv export is better than nothing.
- WordPress users: try to ascertain before using a theme or plugin whether the data is stored in a serialized format.
But let’s say you find yourself with a database that includes serialized data. Here are some tools created by generous souls that might be able to help you port your data without losing it:
Peach – an online utility that aims to deal with the fact that WordPress stores URLs in serialized format, which makes it hard to move a site from one domain to another, or from staging to production. To use this tool (which works best in Chrome) ou drag-and-drop an .sql file into the square in the middle of the page, and set a new domain.
WordPress Search and Replace Tool – this script can also handle multiply nested serializations and multi-byte Unicode changes, which is important now that internationalized domain names are allowed. The code will work for any platform that stores PHP serialized arrays in a MySQL database (Drupal, Joomla, etc.). You must delete the script from your server once you’re finished with it.
WP Migrate DB – this is a plugin that exports your database as a MySQL data dump, does a find and replace on URLs and file paths, then allows you to save it to your computer. It even takes into account serialized data and updates the string length values. (thanks Matt Wiebe)
After trying these tools, we wrote an update on data serialization.