Monday, December 03, 2012

[off-topic] Perl or PCRE: sort strings with numbers

A little trick with regular expressions (if backtracking is supported) on how to compare two strings which might include number.

The trick is to join the strings with NUL character (never occurring in human readable strings anyway) and use it as an anchor to find the longest common sub-string, in both strings followed by a number. And then compare the numbers.

#!/usr/bin/env perl
use strictuse warnings;

sub cmp_str_with_numbers
{
        #my ($a, $b) = @_;
        warn $a."<=>".$b;
        my $s = $a."\x00".$b;
        if ($s =~ m/^(.*)(\d+).*?\x00\1(\d+)/) {
                if ($2 != $3) {
                        return $2 <=> $3;
                }
        }
        return $a cmp $b;
}

my @test1 = (
        'Test 2 ccc',
        'Test 1 aaa 1',
        'Test 1 aaa 10',
        'Test 1 aaa 2',
        'Test 10 bbb',
);

my @out0 = sort @test1;
my @out1 = sort cmp_str_with_numbers @test1;
print "original:\n";
print "\t$_\n" for @test1;
print "normal sort:\n";
print "\t$_\n" for @out0;
print "number-aware sort:\n";
print "\t$_\n" for @out1;

Output:

original:
        Test 2 ccc
        Test 1 aaa 1
        Test 1 aaa 10
        Test 1 aaa 2
        Test 10 bbb
normal sort:
        Test 1 aaa 1
        Test 1 aaa 10
        Test 1 aaa 2
        Test 10 bbb
        Test 2 ccc
number-aware sort:
        Test 1 aaa 1
        Test 1 aaa 2
        Test 1 aaa 10
        Test 2 ccc
        Test 10 bbb

Saturday, December 01, 2012

Regex to match the word under cursor

The VIM-specific regex below matches the word under cursor. (Pasting unmodified as it is in my vimrc to also match German letters.)

/[a-zA-Z0-9ßÄÜÖäüö]*\%#[a-zA-Z0-9ßÄÜÖäüö]*

Documentation is under ':h /\%#'

Example usage: enclose the word under cursor in 'em' tag. Best experience if that is triggered on a keyboard shortcut.

:s![a-zA-Z0-9ßÄÜÖäüö]*\%#[a-zA-Z0-9ßÄÜÖäüö]*!<em>\0</em>!

Negative side-effect: causes fancy behavior of a seemingly random word to be highlighted when 'set hls' is in effect.

Search for a misspelled word

Alternative 1:

/The\S\+les\(Themistokles\)\@<!

Alternative 2:

/\(Themistokles\)\@!\(\<The\S\+les\>\)

Both search for any word which starts with 'The' and ends with 'les', but is not 'Themistokles'.

[link] Wrap a visual selection in an HTML tag

Wrap a visual selection in an HTML tag.

Pretty useful function. I have only slightly modified it to take the tag as parameter and insert the tag on the line before/after selection. And hooked it on a keyboard shortcut.

[ The '^M' below should be converted there into real ^M (typed as ^V^M). ]

" Wrap visual selection in an HTML tag.
vmap <C-q> <Esc>:call VisualHTMLTagWrap('cite')<CR>
vmap <C-T> <Esc>:call VisualHTMLTagWrap('title')<CR>
function! VisualHTMLTagWrap(tag)
 normal `>
 if &selection == 'exclusive'
  exe "normal i^M</".a:tag.">"
 else
  exe "normal a^M</".a:tag.">"
 endif
 normal `<
 exe "normal i<".a:tag.">^M"
 normal `>
 normal j
endfunction

Folding something semi-automatically, on demand

Simple functions to fold in a file blocks which have beginning and ending markers.

Since custom folding functions can cause VIM's performance to degrade, the trick is: after applying the folding, disable it immediately back. For that work I found that I have to call 'redraw' before disabling the 'foldmethod=expr'.

The snippet below folds all lines enclosed between '<binary' and '</binary>'.

function! FoldWhateverFunc(mstart,mend,ln)
 let t = getline(a:ln)
 if t =~ a:mstart
  return '>1'
 elseif t =~ a:mend
  return '<1'
 endif
 return '='
endfunction

function! FoldWhatever()
 set foldexpr=FoldWhateverFunc('<binary','</binary>',v:lnum)
 set foldmethod=expr
 redraw
 set foldmethod=manual
endfunction

Hint: one can replace the hardcoded 'binary' tag with call to the 'input()' function. Though I prefer non-interactive approach, something I can plug into the ':au'.

Edit1 BTW ':h fold-expr' contains several useful one-line examples of folding expressions.

Thursday, November 01, 2012

Search case in/sensitve, 'ignorecase' regardless

I found it always bit clumsy that when I want to search case sensitive/case insensitive, I had to flip the 'ignorecase' option.

As it turned out, there is much much simple way: an 'ignorecase' override, right in the search pattern itself. Here it is.

Case insensitive search (as if 'noignorecase'):

/\CPORT

Case sensitive search (as if 'ignorecase'):
/\cPORT
First would find only "PORT" while the second would find also "port" and "Port". Easy peasy.

See more at :h /\c and :h /character-classes.


P.S. Blogspot seems to be too devastated by the storm Sandy bunch of monkeys Google's Web design team. (Now YouTube too.) A major redesign, but again without single improvement to the substance of the blogging platform.

Monday, March 26, 2012

Mapping: disable the search history modification

A nice trick from :h histdel() to remove the last entry from search history:

:call histdel("search", -1)
:let @/ = histget("search", -1)


That can be used the following way (a mapping to insert HTML's paragraph break (</p>\n<p>) before the word under cursor):

map <silent> <F3> :s!^\(\s*\)\(.\{}\)\s\+\(\S*\%#\S*\)!\1\2</p>\r\1<p>\3!<CR>:call histdel("search", -1)<CR>:let @/ = histget("search", -1)<CR>


With the histdel() andreset of @/, the search pattern replaced by :s is reset back to what it was before and thus the n/N keys work as expected. (You will not believe it: I have spent many years in VIM without knowing about the N shortcut. Discovering it by an accident was like enlightenment to me.)

OK, the mapping gets obsessively long, but works as expected. One can't have it all.

Sunday, January 29, 2012

[off-topic] Perl on Windows: handling files with Unicode characters

Using Strawberry Perl.

The rename with the proper Win32 module initialization:


# init
use strict;
use warnings;
use utf8;
use Win32::OLE qw(in);
Win32::OLE->Option( CP => Win32::OLE::CP_UTF8 );

# the code
my $fso = Win32::OLE->new("Scripting.FileSystemObject");
$fso->MoveFile( ".\\".$old_name , ".\\".$new_name );


Recursively scan directory tree:


# the code
sub scan1
{
my ($f, $l) = @_;
my $obj = Win32::OLE->new('Scripting.FileSystemObject');

my $folder = $obj->GetFolder( $f );
die "ERROR: $f" unless $folder;
foreach my $file (in $folder->Files) {
print $file->{Name}, ", size ", $file->{Size}, "bytes\n";
}

my $collection = $folder->{SubFolders};
foreach my $value (in $collection) {
my $foldername = $value->{Name};
scan1( "$f\\$foldername", $l );
}
}
my @l;
scan1( 'c:\\Movies', \@l );


Scripting.FileSystemObject documentation.

Tuesday, January 17, 2012

A use case for \%#

Some time ago, per chance, quite obscure feature of VIM's regular expressions: \%# - :h /\%# says "Matches with the cursor position."

At first I couldn't see any use for it.

But then, while editing per hand some XMLs, I stumbled upon a problem: how to insert a tag break (close and open), while adding the indentation? Normal VIM macrii(*) can insert - but one looses the cursor position as soon as movement commands are used. But to make a copy of the indentation, one needs to move the cursor to the beginning of the line.

Edit0. OK. It just dawned on me. One could first insert the XML tag break + new line. Then copy indentation. Uh. Need to sleep more and more often. But I already wrote the post so what the heck I'll just post it.

Then I recalled that the regular expressions could match current cursor position. And I had a hunch that the \%# could be used for the purpose. The Enlightenment come only few days later and took shape of that for the <p> tag:

:s!^\(\s*\)\(.\{}\)\s\+\(\k*\%#\k*\)!\1\2</p>\r\1<p>\3!

First () submatch are the spaces (= indentation) of the current line.

Second () submatch is everything up to the word under cursor (matched non-greedily as I had some fancy problems with greedy match here).

Third () submatch is the actual word under cursor, anchored by the \%# to the current cursor position. (The redundant spaces before the word are trimmed between the second and third submatches.)

That all is replaced by. Original line: \1\2 is the line up to the word under cursor, closing </p> tag and \r for new line. New line: \1 which is the wanted indentation, opening <p> tag and finally the word (obviously followed by the rest of the original line).

P.S. \S (anything but space) in place of \k works pretty well too.

P.P.S. VIM should support some sort of nesting of regular expressions. During editing lots of pieces could be reused - but only reuse regular expressions do support is the copy-paste. The Clue.


(*) Because for the particular task at hand, macros have infested irreversibly my vimrc. Just like virii. Any press of a wrong button, and only the undo can sort out what the hell has just happened.

Sunday, January 08, 2012

[link] LOCALE settings and regexp classes

Spent some time editing a German text (actually an FB2 book) in VIM. Hell of a job, because as it turned out, I wasn't blind: VIM really doesn't support locale in the regular expressions.

Workaround is to use \k (also suggested \i doesn't match ß). But that's only half workaround, since \k is case insensitive and case is used in German (e.g. nouns are capitalized).

Another workaround is to use Perl or Python integration for regular expressions, since both provide locale support in the regular expressions. But I haven't gone that far yet.

Edit1. Do NOT use Perl for the purpose. Locale/utf8 support is totally and utterly messed up.

P.S. Here are some of my FB2 editing helper functions.



" some Fiction Book functions

"
" merge adjacent paragraphs
"
function! ParaMany_ToSingle() range
        let lines = getline(a:firstlinea:lastline)
        call filter( lines, 'v:val !~ "^$"' )
        let xind = substitute( lines[0], '^\(\s*\).*''\1''' )
        for i in range( 0len(lines)-1 )
                if i != 0
                        let lines[i] = substitute( lines[i], '^\s*<p>[ \t]*''''' )
"                       echo i." < ".lines[i] | sleep 2
                endif
                if i != len(lines)-1
                        let lines[i] = substitute( lines[i], '[ \t]*</p>\s*$''''' )
"                       echo i." > ".lines[i] | sleep 2
                endif
        endfor
"       echo "xxx:".join( lines, " " ) | sleep 2
        if a:lastline > a:firstline
                exec ':'.(a:firstline+1).",".a:lastline."d"
        endif
        let text = join( lines, " " )
        let text = substitute( text, '[ \t]\{2,}'' ''g' )
        let text = substitute( text, '^\s\+''''' )
        call setline( a:firstline, xind.text )
endfunction

" gvim helper keyboard shortcuts (c-up/-down do not work in terminal)
function! Keyboard_ParaCUpDown()
        map <C-Up> :-1,.call ParaMany_ToSingle()<CR>
        map <C-Down> :.,+1call ParaMany_ToSingle()<CR><Up>
endfunction

"
"  insert section break, using the line's text as the section title
"
function! Para_ToSectionBreak()
        let t = getline(".")

        " capture the line indentation
        let xind = substitute( t, '^\(\s*\).*''\1''' )
        " etch a bit from section tag indentation
        let xindS = substitute( xind, '^\(\s*\)\s$''\1''' )

        " clean-up tags
        let t = substitute( t, '\v\</{0,1}[a-z]+[^>]*\>''''g' )
        let t = substitute( t, '[ \t]\+$''''' )
        let t = substitute( t, '^[ \t]\+''''' )

        " generate id, ensure starts with letter or _
        let id = substitute( t, '[^a-zA-Z0-9_-]''_''g' )
        let id = substitute( id, '^\([^a-zA-Z_]\)''_\1''g' )

        let l = [xindS.'</section>',
\               '',
\               xindS.'<section>',
\               xind.'<title>',
\               xind.'<p id="'.id.'">'.t.'</p>',
\               xind.'</title>' ]
        call setline( ".", l[0)
        call  append( ".", l[1:] )

        call cursor( line(".")+len(l)0 )
endfunction

"
"  convert line's text to subtitle
"
function! Para_ToSubtitle()
        let t = getline(".")
        let xi = substitute( t, '^\(\s*\).*''\1''' )
        let t = substitute( t, '\v\</{0,1}[a-z]+[^>]*\>''''g' )
        let t = substitute( t, '[ \t]\+$''''' )
        let t = substitute( t, '^[ \t]\+''''' )
        call setline( ".", xi.'<subtitle>'.t.'</subtitle>' )
endfunction