Skip to content Skip to sidebar Skip to footer

Simplifying Regex OR Patterns

I was asked today if there was a library to take a list of strings and to compute the most efficient regex to match only those strings. I think it's an NP Complete problem by itsel

Solution 1:

Regexp::Assemble::Compressed / Regexp::Assemble know far more tricks than PreSuf. R::A comes with the command-line tool assemble (not installed by default) which makes building regexes even easier.


Solution 2:

The Regex::PreSuf module is designed to do exactly this.

To quote the Synopsis:

use Regex::PreSuf;

my $re = presuf(qw(foobar fooxar foozap));

# $re should be now 'foo(?:zap|[bx]ar)'

Solution 3:

The Perl regex compiler builds a branching trie data structure out of patterns with parts in common across alternatives:

 $ perl -Mre=debug -ce '"whatever" =~ /appserver1\.domain\.tld|appserver2\.domain\.tld|appserver3\.domain\.tld/'
Compiling REx "appserver1\.domain\.tld|appserver2\.domain\.tld|appserver3\."...
Final program:
   1: EXACT <appserver> (5)
   5: TRIEC-EXACT[123] (25)
      <1.domain.tld> 
      <2.domain.tld> 
      <3.domain.tld> 
  25: END (0)
anchored "appserver" at 0 (checking anchored) minlen 21 
-e syntax OK
Freeing REx: "appserver1\.domain\.tld|appserver2\.domain\.tld|appserver3\."...

Post a Comment for "Simplifying Regex OR Patterns"