Migrating from Blogger to Wordpress,

published at 4:08am on 08/10/06, with No Comments

As a birthday present to our dear friend Dawn, I helped her move her Blogspot blog over to her very own WordPress installation on her very own domain (hosted by the friendly folks over at FictCo, who happen to include me). For the most part, this was a fairly easy operation, especially with the assistance of the two fantastic resources that I found out there that cover just such a migration, as well as the Blogger importing tool that is built in to the latest version of WordPress.

There is one little trick to the WordPress importer, however - apparently Blogger posts don’t have titles, and as such, the title of each of the posts was set as the Blogger post id, which just looked silly. So I hacked up this little script that would, given a tab delimited text file consisting of a post ID and the body of the post, grab the first line of the post and with some magic, spit out a line of SQL that would update that post with the new title (and the appropriate post slug).

Some rules that it uses to build the title from the first line of the post:
- Lines that are in all caps are assumed to be titles already, so keep them in their entirety
- Other lines should be limited to 9 words (an arbitrary number that seemed right)
- If there is punctuation in that first line, everything up to that punctuation is the title
- There need to be at least 5 characters before the first punctuation mark (to avoid things like a “D.C.” in the first line getting truncated to “D.”
- Strip out all HTML from the first line

Anyway, this code is really hacky, and I take no responsibility for it, but I thought someone else out there might be able to use it.

First, the code to get the posts out of the database:

% mysql -u DB_USER -pPASSWORD -e 'SELECT ID, post_content FROM wp_posts' DB_NAME > posts.txt

Next, the script itself, saved as “build_post_titles.pl”

#!/usr/local/bin/perl -w

use strict;

open(POSTS, 'posts.txt') or die('Could not open posts');

while () {
        chomp;
        my($id, $body) = split /t/;

        $body =~ s/<.*?>//g;

        $body =~ /^(.*?)\n/s;

        my $firstline = $1 ? $1 : $body;

        $firstline =~ s/\n//g;

        if ($firstline =~ /[a-z]/) {
                $firstline =~ s/^(.{5,}?)([.!?]+)(.*)/$1$2/;

                my @firstline = split / /, $firstline;
                my $count = $#firstline > 8 ? 8 : $#firstline;
                @firstline = @firstline [ 0 .. $count ];
                $firstline = join ‘ ‘, @firstline;
        }

        $firstline =~ s/’/’'/g;
        $firstline =~ s/^s+//;
        $firstline =~ s/s+$//;

        my $slug = $firstline;
        $slug =~ tr/A-Z/a-z/;
        $slug =~ s/s+/-/g;
        $slug =~ s/[^-a-z0-9]//g;

        if ($firstline !~ /^s*$/) {
                printf(”UPDATE wp_posts SET post_title=’%s’, post_name=’%s’ WHERE ID=%d;n”, $firstline, $slug, $id);
        }
}

close(POSTS);

Next, running the script:

% perl build_post_titles.pl > update_posts.sql

Finally, running the code against the database:

% mysql -u DB_USER -pDB_PASSWORD DB_NAME < update_posts.sql

And that’s that. Hope this is somewhat useful.

And for anyone else looking for musings, on life and such, more of that coming soon.

Filed under: Technology, with No Comments

How I Fixed My Raid-1 Partition Size Error,

published at 12:07am on 07/23/05, with 8 Comments

How did it start?

The first indication that there was something wrong with the server came on June 10, 2005 in the form of error messages that were reported to me by the command that I have running hourly to mail me system anomalies.

	Jul 10 04:16:11 loco kernel: attempt to access beyond end of device
	Jul 10 04:16:11 loco kernel: 09:03: rw=0, want=56050716, limit=56050688

Every hour, at around the same time, these errors started cropping up. I looked through all the crontabs and found one command, a bounced mail queue processor that I run for one of my projects that was running at that time. Turning off the process stopped the errors from coming up, and I thought that perhaps we just had a couple of corrupted files. The next morning, the errors started cropping up again, one or two at a time.

Realizing that this could be a sign that the drives were eating themselves, I decided to head to the data center for a bit of one-on-one time with the server.

So what did we do?

The first thing I did was drop the system into single-user mode. We’re running ext3 filesystems on software RAID-1 on two 73gb SCSI drives. I decided that I would try e2fsck on the partition that was giving me problems, but I kept running into the following error:

	The filesystem size (according to the superblock) is xxx
	The physical size of the device is xxx
	Either the superblock or the partiion table is likely to be corrupt!

Ok, so that’s a bit puzzling, and I spent a bit more time puzzling over this, and finding absolutely nothing in Google that would give any indication of what might have been going on, until I found the following gem in an article about converting a running system into a RAID-1 system:

Step-11 - resize filesystem
When we created the raid device, the physical partion became slightly smaller because a second superblock is stored at the end of the partition. If you reboot the system now, the reboot will fail with an error indicating the superblock is corrupt.

http://howtos.linux.com/howtos/Software-RAID-HOWTO-7.shtml#ss7.6

Eureka!

It appears that when we originally set up the RAID, we never resized the partitions. For the past year or so, the system has been running along without any problems because it just never wrote to that part of the disk. A couple of files must have made it out to this portion of the disk where the RAID superblock is stored, and the RAID system wouldn’t let it write and was throwing the errors that I saw. However, resizing the partitions without repairing them first will throw the following error:

	attempt to read block from filesystem resulted in short read while trying to resize

Obviously there was a problem with the drive that needed to be addressed.

Fixing the problem

The solution was actually quite straight forward, once I got all the steps in place. There were two time-consuming parts to this process. First, I had to figure out what was wrong. And second, I needed to wait to repair the drive. In the process of trying to write out beyond the RAID partition, some inconsistencies were introduced to the drive. e2fsck was the way to fix this. The solution is as follows:

	1. Unmount all partitions
	2. Repair the partitions
	3. Resize the partitions

Unmounting the partitions in single-user mode is a matter of running:

	umount -a

I’m not really sure how this works, but it doesn’t matter what services are running or what happens to be in use - it just unmounts everything for you.

Once the partitions were unmounted, it was a simple matter of telling e2fsck to check for bad blocks when run on the offending partition. man e2fsck tells us the following:

       -c     This  option  causes  e2fsck to run the badblocks(8) program to
              find any blocks which are bad on the filesystem, and then marks
              them  as  bad  by  adding them to the bad block inode.  If this
              option is specified twice, then the bad block scan will be done
              using a non-destructive read-write test.

By running e2fsck -cc /dev/md3 we were able to do the repairs non-destructively. However, as expected, on our 53 gig /home partition, this badblocks scan took about 7 hours to run. The good news is that in that time, it did find errors, it did seem to fix them, and running e2fsck following that run seemed to return no other errors.

After the partitions were repaired, I ran resize2fs following the instructions in the above article. I first ran e2fsck again (but not in badblocks mode), just to make sure everything was clean, then I resized the partition and then I ran e2fsck again.

	e2fsck -f /dev/md3
	resize2fs /dev/md3
	e2fsck -f /dev/md3

This worked like a charm, and I did not get the “short read” error from earlier. I was not able to unmount the root partition, however, since the running system needed access to it, and I was not able to mount it read-only as was suggested in the article.

How to resize the root partition

Resizing the root partition turned out to be less of a pain than I might have expected, though it was by no means obvious when first thinking about the problem. The solution would be as follows:

	1. Copy all files from the root partition to another, empty partition (/tmp worked nicely)
	2. Reboot the server passing in the new, fake root partition to the boot loader
	3. Unmount all partitions (including the real root partition, which is not running)
	4. Repair and resize as above

Fortunately, /tmp had its own partition. I deleted the contents out of /tmp (which should be temporary anyway) and copied all of the files out of the root partition into this new, temporary root. Remember that you can copy /dev files, but should avoid /proc. The idea here is to copy all of the files out of /, excluding anything that is mounted from another partition. [Looking at the man page again, after the fact, -x would probably be exactly what’s needed here. -jcn]

	1. cp -ax / /tmp
	(can't actually remember the cp command, but this should work)
	2. Edit /tmp/etc/fstab to not mount the partition that /tmp resides on

Once that is done, it is simply a matter of rebooting. At the LILO prompt, tell the existing kernel to use the new partition (which is normally /tmp) as the root partition.

	LILO: kernel root=/dev/sd5 single

Once booted, I ran unmount -a and proceeded as above.

Done!

This seems to have worked. resize2fs is, in fact, non-destructive and now when I run e2fsck, it just runs - it does not give me the error about a mismatched physical vs. filesystem partition.

Followup

Did this document help you? If so, I’d love if you would let me know, and let me know if there is anything I left out or was confusing. Thanks!

Filed under: Technology, with 8 Comments