Monday, December 03, 2012

[off-topic] Perl or PCRE: sort strings with numbers

A little trick with regular expressions (if backtracking is supported) on how to compare two strings which might include number.

The trick is to join the strings with NUL character (never occurring in human readable strings anyway) and use it as an anchor to find the longest common sub-string, in both strings followed by a number. And then compare the numbers.

#!/usr/bin/env perl
use strictuse warnings;

sub cmp_str_with_numbers
{
        #my ($a, $b) = @_;
        warn $a."<=>".$b;
        my $s = $a."\x00".$b;
        if ($s =~ m/^(.*)(\d+).*?\x00\1(\d+)/) {
                if ($2 != $3) {
                        return $2 <=> $3;
                }
        }
        return $a cmp $b;
}

my @test1 = (
        'Test 2 ccc',
        'Test 1 aaa 1',
        'Test 1 aaa 10',
        'Test 1 aaa 2',
        'Test 10 bbb',
);

my @out0 = sort @test1;
my @out1 = sort cmp_str_with_numbers @test1;
print "original:\n";
print "\t$_\n" for @test1;
print "normal sort:\n";
print "\t$_\n" for @out0;
print "number-aware sort:\n";
print "\t$_\n" for @out1;

Output:

original:
        Test 2 ccc
        Test 1 aaa 1
        Test 1 aaa 10
        Test 1 aaa 2
        Test 10 bbb
normal sort:
        Test 1 aaa 1
        Test 1 aaa 10
        Test 1 aaa 2
        Test 10 bbb
        Test 2 ccc
number-aware sort:
        Test 1 aaa 1
        Test 1 aaa 2
        Test 1 aaa 10
        Test 2 ccc
        Test 10 bbb

No comments: