Sunday, January 29, 2012

[off-topic] Perl on Windows: handling files with Unicode characters

Using Strawberry Perl.

The rename with the proper Win32 module initialization:

# init
use strict;
use warnings;
use utf8;
use Win32::OLE qw(in);
Win32::OLE->Option( CP => Win32::OLE::CP_UTF8 );

# the code
my $fso = Win32::OLE->new("Scripting.FileSystemObject");
$fso->MoveFile( ".\\".$old_name , ".\\".$new_name );

Recursively scan directory tree:

# the code
sub scan1
my ($f, $l) = @_;
my $obj = Win32::OLE->new('Scripting.FileSystemObject');

my $folder = $obj->GetFolder( $f );
die "ERROR: $f" unless $folder;
foreach my $file (in $folder->Files) {
print $file->{Name}, ", size ", $file->{Size}, "bytes\n";

my $collection = $folder->{SubFolders};
foreach my $value (in $collection) {
my $foldername = $value->{Name};
scan1( "$f\\$foldername", $l );
my @l;
scan1( 'c:\\Movies', \@l );

Scripting.FileSystemObject documentation.

Tuesday, January 17, 2012

A use case for \%#

Some time ago, per chance, quite obscure feature of VIM's regular expressions: \%# - :h /\%# says "Matches with the cursor position."

At first I couldn't see any use for it.

But then, while editing per hand some XMLs, I stumbled upon a problem: how to insert a tag break (close and open), while adding the indentation? Normal VIM macrii(*) can insert - but one looses the cursor position as soon as movement commands are used. But to make a copy of the indentation, one needs to move the cursor to the beginning of the line.

Edit0. OK. It just dawned on me. One could first insert the XML tag break + new line. Then copy indentation. Uh. Need to sleep more and more often. But I already wrote the post so what the heck I'll just post it.

Then I recalled that the regular expressions could match current cursor position. And I had a hunch that the \%# could be used for the purpose. The Enlightenment come only few days later and took shape of that for the <p> tag:


First () submatch are the spaces (= indentation) of the current line.

Second () submatch is everything up to the word under cursor (matched non-greedily as I had some fancy problems with greedy match here).

Third () submatch is the actual word under cursor, anchored by the \%# to the current cursor position. (The redundant spaces before the word are trimmed between the second and third submatches.)

That all is replaced by. Original line: \1\2 is the line up to the word under cursor, closing </p> tag and \r for new line. New line: \1 which is the wanted indentation, opening <p> tag and finally the word (obviously followed by the rest of the original line).

P.S. \S (anything but space) in place of \k works pretty well too.

P.P.S. VIM should support some sort of nesting of regular expressions. During editing lots of pieces could be reused - but only reuse regular expressions do support is the copy-paste. The Clue.

(*) Because for the particular task at hand, macros have infested irreversibly my vimrc. Just like virii. Any press of a wrong button, and only the undo can sort out what the hell has just happened.

Sunday, January 08, 2012

[link] LOCALE settings and regexp classes

Spent some time editing a German text (actually an FB2 book) in VIM. Hell of a job, because as it turned out, I wasn't blind: VIM really doesn't support locale in the regular expressions.

Workaround is to use \k (also suggested \i doesn't match ß). But that's only half workaround, since \k is case insensitive and case is used in German (e.g. nouns are capitalized).

Another workaround is to use Perl or Python integration for regular expressions, since both provide locale support in the regular expressions. But I haven't gone that far yet.

Edit1. Do NOT use Perl for the purpose. Locale/utf8 support is totally and utterly messed up.

P.S. Here are some of my FB2 editing helper functions.

" some Fiction Book functions

" merge adjacent paragraphs
function! ParaMany_ToSingle() range
        let lines = getline(a:firstlinea:lastline)
        call filter( lines, 'v:val !~ "^$"' )
        let xind = substitute( lines[0], '^\(\s*\).*''\1''' )
        for i in range( 0len(lines)-1 )
                if i != 0
                        let lines[i] = substitute( lines[i], '^\s*<p>[ \t]*''''' )
"                       echo i." < ".lines[i] | sleep 2
                if i != len(lines)-1
                        let lines[i] = substitute( lines[i], '[ \t]*</p>\s*$''''' )
"                       echo i." > ".lines[i] | sleep 2
"       echo "xxx:".join( lines, " " ) | sleep 2
        if a:lastline > a:firstline
                exec ':'.(a:firstline+1).",".a:lastline."d"
        let text = join( lines, " " )
        let text = substitute( text, '[ \t]\{2,}'' ''g' )
        let text = substitute( text, '^\s\+''''' )
        call setline( a:firstline, xind.text )

" gvim helper keyboard shortcuts (c-up/-down do not work in terminal)
function! Keyboard_ParaCUpDown()
        map <C-Up> :-1,.call ParaMany_ToSingle()<CR>
        map <C-Down> :.,+1call ParaMany_ToSingle()<CR><Up>

"  insert section break, using the line's text as the section title
function! Para_ToSectionBreak()
        let t = getline(".")

        " capture the line indentation
        let xind = substitute( t, '^\(\s*\).*''\1''' )
        " etch a bit from section tag indentation
        let xindS = substitute( xind, '^\(\s*\)\s$''\1''' )

        " clean-up tags
        let t = substitute( t, '\v\</{0,1}[a-z]+[^>]*\>''''g' )
        let t = substitute( t, '[ \t]\+$''''' )
        let t = substitute( t, '^[ \t]\+''''' )

        " generate id, ensure starts with letter or _
        let id = substitute( t, '[^a-zA-Z0-9_-]''_''g' )
        let id = substitute( id, '^\([^a-zA-Z_]\)''_\1''g' )

        let l = [xindS.'</section>',
\               '',
\               xindS.'<section>',
\               xind.'<title>',
\               xind.'<p id="'.id.'">'.t.'</p>',
\               xind.'</title>' ]
        call setline( ".", l[0)
        call  append( ".", l[1:] )

        call cursor( line(".")+len(l)0 )

"  convert line's text to subtitle
function! Para_ToSubtitle()
        let t = getline(".")
        let xi = substitute( t, '^\(\s*\).*''\1''' )
        let t = substitute( t, '\v\</{0,1}[a-z]+[^>]*\>''''g' )
        let t = substitute( t, '[ \t]\+$''''' )
        let t = substitute( t, '^[ \t]\+''''' )
        call setline( ".", xi.'<subtitle>'.t.'</subtitle>' )