Migrating from Blogger to WordPress,

published at 4:08am on 08/10/06

As a birthday present to our dear friend Dawn, I helped her move her Blogspot blog over to her very own WordPress installation on her very own domain (hosted by the friendly folks over at FictCo, who happen to include me). For the most part, this was a fairly easy operation, especially with the assistance of the two fantastic resources that I found out there that cover just such a migration, as well as the Blogger importing tool that is built in to the latest version of WordPress.

There is one little trick to the WordPress importer, however – apparently Blogger posts don’t have titles, and as such, the title of each of the posts was set as the Blogger post id, which just looked silly. So I hacked up this little script that would, given a tab delimited text file consisting of a post ID and the body of the post, grab the first line of the post and with some magic, spit out a line of SQL that would update that post with the new title (and the appropriate post slug).

Some rules that it uses to build the title from the first line of the post:
– Lines that are in all caps are assumed to be titles already, so keep them in their entirety
– Other lines should be limited to 9 words (an arbitrary number that seemed right)
– If there is punctuation in that first line, everything up to that punctuation is the title
– There need to be at least 5 characters before the first punctuation mark (to avoid things like a “D.C.” in the first line getting truncated to “D.”
– Strip out all HTML from the first line

Anyway, this code is really hacky, and I take no responsibility for it, but I thought someone else out there might be able to use it.

First, the code to get the posts out of the database:

% mysql -u DB_USER -pPASSWORD -e 'SELECT ID, post_content FROM wp_posts' DB_NAME > posts.txt

Next, the script itself, saved as “build_post_titles.pl”

#!/usr/local/bin/perl -w

use strict;

open(POSTS, 'posts.txt') or die('Could not open posts');

while () {
        chomp;
        my($id, $body) = split /\t/;

        $body =~ s/<.*?>//g;

        $body =~ /^(.*?)\\n/s;

        my $firstline = $1 ? $1 : $body;

        $firstline =~ s/\\n//g;

        if ($firstline =~ /[a-z]/) {
                $firstline =~ s/^(.{5,}?)([.!?]+)(.*)/$1$2/;

                my @firstline = split / /, $firstline;
                my $count = $#firstline > 8 ? 8 : $#firstline;
                @firstline = @firstline [ 0 .. $count ];
                $firstline = join ' ', @firstline;
        }

        $firstline =~ s/\'/\'\'/g;
        $firstline =~ s/^\s+//;
        $firstline =~ s/\s+$//;

        my $slug = $firstline;
        $slug =~ tr/A-Z/a-z/;
        $slug =~ s/\s+/-/g;
        $slug =~ s/[^-a-z0-9]//g;

        if ($firstline !~ /^\s*$/) {
                printf("UPDATE wp_posts SET post_title='%s', post_name='%s' WHERE ID=%d;\n", $firstline, $slug, $id);
        }
}

close(POSTS);

Next, running the script:

% perl build_post_titles.pl > update_posts.sql

Finally, running the code against the database:

% mysql -u DB_USER -pDB_PASSWORD DB_NAME < update_posts.sql

And that's that. Hope this is somewhat useful.

And for anyone else looking for musings, on life and such, more of that coming soon.

Filed under: Technology

Leave a Reply: