elegant themes

Data portability and data serialization in WordPress

| April 10, 2012 | 26 Comments

Over at illuminea, we’ve been migrating WordPress sites for years. So we thought we’d seen it all when it came to WordPress migration. And yet, recently we encountered an issue that we had never dealt with before, that puts a snag in the ease with which you can move WordPress sites: data serialization.

One of the reasons we love WordPress is because it’s so future-proof. What I mean by that is that once you have created a WordPress site, you aren’t stuck where you are for the following reasons:

  • Content is separate from form so at any point you can completely redesign your site while retaining content and link structure. This is important to ensure you don’t lose traffic from search engines and inbound links.
  • As long as the WordPress software continues to be developed, it will continue to support modern web technologies so users are not left with an archaic system that, for example, adores IE7 but doesn’t know what HTML5 is.
  • Your site is portable. You need to move a site to another server, or even another domain name? No problem. A few steps related to find-and-replace, exporting and importing the database, and setting up global 301 redirects (if you’ve changed domains) and you’re there.

All the above is true, except the data portability point. It’s not quite as true as I thought it was.

Data serialization and its impact on data portability

(I am a big newbie when it comes to data serialization, so the following is based on what I’ve managed to learn over the while and could very well be inaccurate/wrong.)

It seems that developers use data serialization in order to improve performance or efficiency in a database. Instead of creating a new table for data, or storing data in a table which could end up having wasteful empty fields, you can keep all the data in an array in one field. This leads to saved space and simplified SQL statements. Serialized data is stored and recovered using PHP’s serialize() and unserialize() functions. Terri Swallow explains how using serialized data for her multi-user plugin helped save her from creating hundreds of new options for every new user created.

Here’s an example of serialized data from Andrew Nacin:

$data = array( 'apple', 'banana', 'orange' );
echo serialize( $data );
// Result is a string we can unserialize into an array:
// a:3:{i:0;s:5:"apple";i:1;s:6:"banana";i:2;s:6:"orange";}

But there are serious drawbacks to using serialized data. One problem is that you can’t run a MySQL query on serialized data. The data would have to be pulled out of the table via PHP, unserialized, and then searched using PHP instead of MySQL (thanks to Catch My Fame for explaining this). So let’s say you are storing user data in serialized format, but you want to find out how many of your users are from Australia, you can’t get that info easily.

A serious problem, in my opinion, is the impact serialization has on data portability. It makes it near-impossible to port a WordPress site from one domain to another without losing data. Another possible, yet unlikely issue, is if you need to port data from WordPress into another programming language – admittedly a rare occurrence, but a possible one that needs to be considered.

There are developers out there who are very frustrated by the use of serialized data:

An anonymous comment left on the PHP manual entry for serialize():

Please! please! please! DO NOT serialize data and place it into your database. Serialize can be used that way, but that’s missing the point of a relational database and the datatypes inherent in your database engine. Doing this makes data in your database non-portable, difficult to read, and can complicate queries. If you want your application to be portable to other languages, like let’s say you find that you want to use Java for some portion of your app that it makes sense to use Java in, serialization will become a pain in the buttocks. You should always be able to query and modify data in the database without using a third party intermediary tool to manipulate data to be inserted…That’s not to say serialize() is useless. It’s not…Just don’t abuse serialize because the next guy who comes along will have a maintenance or migration nightmare.

Alexander Scammon: “WordPress does a silly thing.  It has serialized data inside its database.”

Serialized data usage that can have a serious impact on WordPress portability:

  • Some widget data (though for some reason we’ve never had issues retaining widgets via a database dump)
  • Some theme settings
  • Plugin data

It was related to serialized plugin data that we had our first encounter with data serialization. We had installed a plugin to help a client manage a technical issue on their site without having to get too technical. They entered hundreds of entries in the plugin, but upon moving the site from development to production, the data disappeared. By that point, there was no way to recover it unfortunately, but we had definitely learned a painful, important lesson.

So what do we do?

Some suggestions:

  1. Theme and plugin developers: please seriously consider whether the benefits of serializing your data outweigh the potential future problems it could cause.
  2. Theme and plugin developers: if you must use serialized data, please try to make sure there’s some way to export and import the data. Even a simple .csv export is better than nothing.
  3. WordPress users: try to ascertain before using a theme or plugin whether the data is stored in a serialized format.

But let’s say you find yourself with a database that includes serialized data. Here are some tools created by generous souls that might be able to help you port your data without losing it:

Peach – an online utility that aims to deal with the fact that WordPress stores URLs in serialized format, which makes it hard to move a site from one domain to another, or from staging to production. To use this tool (which works best in Chrome) ou drag-and-drop an .sql file into the square in the middle of the page, and set a new domain.

WordPress Search and Replace Tool – this script can also handle multiply nested serializations and multi-byte Unicode changes, which is important now that internationalized domain names are allowed. The code will work for any platform that stores PHP serialized arrays in a MySQL database (Drupal, Joomla, etc.). You must delete the script from your server once you’re finished with it.

fix_serialization.php – Not much description provided with this tool, but Alex Scammon provides instructions in his comment below.

WP Migrate DB – this is a plugin that exports your database as a MySQL data dump, does a find and replace on URLs and file paths, then allows you to save it to your computer.  It even takes into account serialized data and updates the string length values. (thanks Matt Wiebe)

After trying these tools, we wrote an update on data serialization.

Tags: , , ,

Category: Tips

About Miriam Schwab: Miriam is the friendly CEO of illuminea, a WordPress design and development agency. Miriam is a huge fan of WordPress and has been using it for over five years now. In addition, Miriam and her team have been organizing the local Israeli WordCamp conferences for the past few years. View author profile.


Comments (26)

Trackback URL | Comments RSS Feed

  1. Alex Scammon says:

    How I use fix_serialization.php:

    1. copy the script to the machine that has the db you need to fix
    2. change three things:

    ‘db1′ to the actual database name
    username
    password

    3. type ‘php fix_serialization.php’ on the command line

    The only real reason why I need to run this is because migrating the site can change the site’s url length.  For example: beta.alexscammon.com -> http://www.alexscammon.com.  The length changes by one.  The wp_options table holds the site’s url in serialized data in a couple different places, which means that if you change the url the serialized data is no longer correct.  The script merely goes through that data and corrects the math in those couple of places and you’re back in business.

    Of course, if I know I’m going to migrate from beta -> www, I could always plan ahead and migrate from new.alexscammon.com -> http://www.alexscammon.com instead. The length of the site’s url would be the same.  The serialization wouldn’t know the difference and you’d avoid this whole thing.
     

    • Miriam Schwab says:

      Alex, thanks for the instructions on how to use fix_serialization! That’s a great idea about creating a beta site on new.site.com, though I guess that won’t work if the target URL will be no-www.

  2. Reuven says:

    From my experience, I use serialization when i store multiple values as post meta, and am to lazy to separate them, it also makes it easier to save more values in the future. That being said, it’s pure laziness and not the best idea

    • Post meta is one of the places where it really doesn’t make much sense to serialize. One of the quirks of the post meta functions (which is sometimes a feature), is that you can store multiple entries, but with the same key. Then you can get all the values back as an array with just one get_post_meta() call. This can sometimes be useful. :)

      The reason it’s done that way is that at the time I added the post meta functions to WP, the maybe_serialize() and maybe_unserialize() functions used by the options system weren’t in place yet.
       

  3. [...] has a reputation of being very portable but after reading a recent article on WPGarage.com, there is a certain condition in which the data in WordPress can become non-portable. It has to do [...]

  4. Steve Taylor says:

    The only problem with serialized data, for WP portability, is the issue of the domain (and for some plugins the installation server path) being stored in the database. Because of the serialization, you can’t do a simple search/replace on the SQL dump file. This is because a serialized array stores the length of each string element, and if the domain you’re moving to is a different length from the one you’re moving from, the serialized data will be corrupted because the stored string length will no longer correspond to the stored string.

    One interesting workaround for this for the common situation of moving from dev to staging to production is to fudge the dev and staging domains so they’re the same length as the production domain. Then you don’t have to worry about instances of the domain stored in serialized arrays. I don’t think WP core stores server paths, but you might need to check on that for your plugins – and that will be harder, if not impossible, to make the same length as on the live environment.

    Past that, I use InterconnectIT’s WordPress Search and Replace Tool – obviously back up before you do it, but it’s never failed for me.

    The issue you mention where someone entered loads of data for a plugin, which was serialized in the DB and then “disappeared” when you moved – I’m guessing the plugin stored URLs etc. and your plain search/replace corrupted the serialized data there. The data won’t have disappeared (unless you hit some unrelated and very scary issue!), and it could have been fixed manually. But that’s tedious – just use a scriptm :)

    • Chris Jean says:

      Steve mirrors the exact kind of comments that I was going to make. The issue isn’t serialized data, it is the type of data being serialized. If the data being serialized contains specific file paths, domains, and other installation/server-specific data, then it will have data portability issues. However, simply having serialized data doesn’t present a data portability issue as far as server/domain changes go.
      Serialization is a great thing to do and is how most solid WP devs store options. It is much preferable over creating dozens to thousands of options entries, thus cluttering up the options table with a huge number of entries.
      While the comment in the post about being unable to directly query serialized data is true, most options data doesn’t need to be queried. SEO plugins store title structures which don’t need to be used in conditional queries but do need to be read on page load. Themes typically store information such as color values, user-customized strings, and whether or not to show a header graphic; none of these need to be directly queried for, just known when needed.
      With most WordPress development, you have to assume the worst-case scenario. This means that you have to assume that the database is going to be incredibly slow. This is why most WordPress development guides recommend minimizing the number of queries that your code runs and to avoid creating tables unless absolutely necessary. So, if you have data that needs to be retrieved frequently and doesn’t need to be queried directly, storing it in a serialized data structure in the options table is the way to go.
      I find the language portability issue to be an extremely low priority. You are seriously advocating for dropping an extremely-important tool from the WordPress developer’s toolbag just because you can imagine someone at some point wanting to use another language to access the data without having to bother with a converter? Sorry, I won’t be signing up for that movement.
      So, rather than advocating for not using serialized data, advocate for proper use of serialized data:
      1. Don’t store installation or system specific data in serialized data. This includes but isn’t limited to domain names and file paths.
      2. Don’t store data that needs to be part of conditional queries in serialized options. If possible, use the postmeta or usermeta tables. If necessary, create a new table.

      • Miriam Schwab says:

        Chris, thanks for your useful explanation of when and how serialized data is the better option. Very interesting. I’m just wondering: are there not potentially any other scenarios, aside from URL-related data, where serializing data could lead to issues down the line? I can’t think of any, but it might be possible.

        As for the possibility that a site may be ported to another language – I just mentioned that as another issue that could pop up in the future. As I said, it’s very rare, but it’s there, and some developers like to prepare for those types of scenarios too. But for me personally, the most serious issue has to do with moving WP to WP. 

    • Excellent point about finding and replacing corrupting the serialization.  I wrote the article references a long, long time ago before I knew anything about JSON; which is now my preferred method of serialization of data.  The only advantage serialize() has is that it will serialize functions/methods and can have magic methods like  __sleep() and __wakeup() attached to it.  For storing simple data rather than complicated objects JSON is much more flexible, portable and plays nicely with PHP & JavaScript with no additional scripting effort!
      Using any serialization to store information in a single field in a MySQL database table does prevent the use of relationships and will make  query more complicated but if the serialized (or json_encode()’d) data is something you’re not going to query on then I still think it’s a perfectly viable *simple* storage solution.  Not everything needs to be over engineered, or well engineered for that matter; it all depends on the data and the use-case.

      • Miriam Schwab says:

        Thanks for your input Terri! Interesting to hear your perspective on using JSON over serialize(). Something to look into.

    • Miriam Schwab says:

      Steve, Alex also suggested making the beta site URL the same length as the live site, and I think that’s great advice! Thanks.

      As for the scenario described in the post – yes, the plugin was storing URL info, and yes, I found the info in the database in the wp_options table. I meant the data had disappeared from the plugin management area. But as you said – manually fixing the data would have been insane. I’ll definitely use a script next time :)

    • BTW, AFAIK, PHP is actually pretty resilient when it comes to changing the string lengths in serialized data. In actual usage, I think it will deserialize fine, even when the string length given doesn’t match the actual string. I have not tried a test yet, though.
       

  5. Steve Taylor says:

    Or just a script :-/

  6. Matt Wiebe says:

    Data serialization in WP options is considered a performance best practice, so it’s not going anywhere.

    Try using WP Migrate DB for handling your serialization woes. 

  7. This is one of my major gripes with wordpress as well.  Our deployment scripts use a php script to migrate the database but we shouldn’t have to do this.  It’s an oversight by the wordpress team and likely not something that will change any time soon, as the settings API utilizes functions that add serialized data to the database.  JSON would make it more portable.

    Based on blogs I’ve read for deploying wordpress sites, I’d imagine there are many sites in the wild with corrupted serialized PHP where users were taught to find/replace on their SQL.  Major issue.

    • Miriam Schwab says:

      Tyler, you’re for sure right that there are lots of sites out there with corrupted serialized data. I mean, for years we thought find-and-replace was the way to go, and now we’ve learned it so isn’t. Oh well, you live and learn.

  8. In most cases it is better to serialize data when saving to posts’ meta, and as @Matt Wieb mention, it considered best practice.
    In WP, you don’t really have to serialize an array, since WP handles this by her own – when you pass an (associative) array to update_post_meta() function, WP will auto serialize the array and save it to the db (since you can’t save a php array as it is in the database w/o serializng it first).
     
    But you gain performance, since you can with one db call to retrieve all the data from the database of a certain metadata, this is much better than saving and retrieving a lot of new rows each time.
     
    It is part of WP Meta API.
     
    It is the same with update_option(), can’t remember now.
     
     

  9. Amaury says:

    For move/rename a WordPress network and manage seralized datas, i share a tools on github :
    https://raw.github.com/herewithme/wordpress-cli-tools/master/move-wordpress-ms.php 

    Try it ;) 

  10. In most cases, the auto serialization/deserialization of options doesn’t matter much. If a plugin (or theme) has a good reason to need to store lots of data and select subsets of it, they should use separate table entries (with an identifiable prefix on the keys), or store the data in a different table.

    Calling serialized data in the database a “serious problem” is kind of overstating things. If the problem is data portability when changing domain names, that’s just not a use-case worth optimizing for, because 99.9% of sites will not change their domain. If the problem is efficiency of selecting just certain portions of the data, it’s up to the developer to know when to switch from serialized values to separate values.
     

  11. ???? says:

    great aritcle. I am a wordpress user for many years ,and I found that it’s impossible for a wordpresss blog to miagrate to another blog system without any data loose.

  12. It seems that data saved in your blog database was not in UNICDOE collection .My nick name just became ???? on your blog.
    I’ve repost this article on my blog and put a link to your post.I hope this would be permitted.

  13. Ryan Hellyer says:

    The need to serialize data in WordPress is too ingrained to be stopped. Avoiding serialization now would be very problematic IMHO.

    I don’t think the negatives are very great either. I can’t even think of any situations in which data portability would become a problem. I assumed people would be more concerned about performance and inability to query across specific bits of data.

  14. [...] my new best friend Rebecca Markowitz | May 3, 2012 | 0 Comments Recently, Miriam wrote about Data Serialization and what a pain in the tuchus(tush/bottom) it can be to migrate a site that has seralized links in [...]

  15. Cory says:

    Hi Miriam,

    Thanks for the detailed write-up on this topic!

    I authored a plugin called WordPress Duplicator (http://wordpress.org/extend/plugins/duplicator/) and many times I have to educate users on the complexities of PHP serialization and its usually painful…

    I stumbled across your article after looking for something to point user at as I was getting tired of trying to explain the subject. Kudos as this is exactly what I was looking for…

    Trying to explain to users why a simple move is more complex than just moving files and a database is difficult when this topic enters the conversation. I’m just going to point them to this article :)

    Thxs much!!

Leave a Reply




If you want a picture to show with your comment, go get a Gravatar.

More in Tips (12 of 93 articles)