Simplifying Regex OR Patterns
I was asked today if there was a library to take a list of strings and to compute the most efficient regex to match only those strings. I think it's an NP Complete problem by itsel
Solution 1:
Regexp::Assemble::Compressed
/ Regexp::Assemble
know far more tricks than PreSuf
. R::A
comes with the command-line tool assemble
(not installed by default) which makes building regexes even easier.
Solution 2:
The Regex::PreSuf
module is designed to do exactly this.
To quote the Synopsis:
use Regex::PreSuf;
my $re = presuf(qw(foobar fooxar foozap));
# $re should be now 'foo(?:zap|[bx]ar)'
Solution 3:
The Perl regex compiler builds a branching trie data structure out of patterns with parts in common across alternatives:
$ perl -Mre=debug -ce '"whatever" =~ /appserver1\.domain\.tld|appserver2\.domain\.tld|appserver3\.domain\.tld/'
Compiling REx "appserver1\.domain\.tld|appserver2\.domain\.tld|appserver3\."...
Final program:
1: EXACT <appserver> (5)
5: TRIEC-EXACT[123] (25)
<1.domain.tld>
<2.domain.tld>
<3.domain.tld>
25: END (0)
anchored "appserver" at 0 (checking anchored) minlen 21
-e syntax OK
Freeing REx: "appserver1\.domain\.tld|appserver2\.domain\.tld|appserver3\."...
Post a Comment for "Simplifying Regex OR Patterns"