Thoughts On ...

June 22, 2005

Fixed Quotes/Code blocks

I use MovableType as my blog software. Its been pretty good to me, although the comment spam gets a bit annoying sometimes (MTBlacklist keeps that manageable too).

But things can always be better. Over the life of this site, I have gradually tweaked the layout. I have moved steadily away from table-based and other 'form in conent' approaches. Specifically, my goal has been to get to where the entire layout and formatting of the site is determined using Cascading Style Sheets (CSS).

I took another major step in that direction today. First, I cleaned up some redundant and otherwise confusing CSS selectors. Then I finally tackled my biggest remaining formatting problem: Line breaks in preformatted text.

MT gives you a handy shortcut "Convert Line Breaks" that finds newlines in the raw blog text and replaces them with paragraph 'P' tags and BR tags (for blank lines).

Usually, it works OK, but when I want to format something (like say a code snippet) that is pre-formatted in html tags, it causes problems, because inserting those extra BR tags messes with the browser's interpretation of the newllines in the html element.

I am not alone in this problem of course, but while googling turned up several hacks and workarounds, none of them did what I wanted: "don't mess with newlines anywhere inside html tags that I explicitly place in my entries."

So, I did it myself. The subroutine in question is found in the MT library: Util.pm and is called html_text_transform (the code is perl). I added a new variable: $in_tag that gets set to true whenever an html start tag is found, and false whenever an html end tag is encountered. I also removed the insertion of BR tags, and got rid of an extra newline in the output.

Here's the original code (rendered with my new formatting):

sub html_text_transform {
    my $str = shift;
    $str ||= '';
    my @paras = split /\r?\n\r?\n/, $str;
    for my $p (@paras) {
        if ($p !~ m/^<(?:table|ol|ul|pre|select|form|blockquote|div|q)/) {
            $p =~ s!\r?\n!<br />\n!g;
            $p = "<p>$p</p>";
        }
    }
    join "\n\n", @paras;
}

And here is the new version of the function:

sub html_text_transform {
    my $str = shift;
    $str ||= '';
    my @paras = split /\r?\n/, $str;
    my $in_tag = 0;
    for my $p (@paras) {
        $in_tag = $in_tag || $p =~ m/^<(?:table|ol|ul|pre|select|form|blockquote|div|q)/i;
        if($p =~ m/^$/) {
            #do nothing with empty lines
        } else {
            $p = "<p>$p</p>" unless $in_tag;
        }
        $in_tag = 0 if $p =~ m/^<\/(?:table|ol|ul|pre|select|form|blockquote|div|q)/i;
    }
    join "\n", @paras;
}

Its been a long time since I used perl for anything, but I think I got this right -- it seems to work on the small sample that is this front page. By removing the extra newline check in the split, I was able to get rid of the BR entirely.

Update: I had an early draft of this code up here -- but it didn't work. I have since taken the original, and written some tests for it using Test::More. The text below has been updated accordingly.

The only caveats: 1) there must be a newline between the preceeding paragraph and the beginning of the code block (I use PRE, but it could be any tag listed in the regular expression (e.g. BLOCKQUOTE or DIV). (BTW, you still have to use html escapes when you want to show code that includes < or > characters of course (as I found inserting that code above!). 2) The enclosing html elements must be first on the line (I haven't tried removing that requirement yet).

If anyone sees any issues feel free to speak out.

Posted by wcaputo at June 22, 2005 01:24 PM
Comments