
Hi, I've implemented the basic regexp functionality using few simple template classes. Any regexp can be created by inserting the template classes one into another in the required order. Although it's in the design state, I'd like to find out if there's any interest in providing this functionality. Let me present you some short introduction to the topic. Some basic classes include - String Match (full string matching) - Is-in Match (matches any character in the list) - Or Match - Quantity and so on. These are the "building blocks" of the regexp. Each class does a very little bit but when combined in the specific order, you gain a clearly defined string matching algorithm of any kind and any complexity. In addition, it's quite simple to add any missing functionality. You just need to define you're desired algorithm and mix it with the others. I add few examples that should describe how's it all done. Regexp: ^[a-z]+ is implemented as: Quantity< Range< char_a, char_z >, 1
re;
(sorry if you don't like the indentation, but that's just to ease the reading) Regexp: ^0(x|X)[a-fA-F0-9]+ (matching the hexadecimal number) The required strings are hardcoded using macro "STRHOLDER", which generates a simple struct containing the given string (to avoid the need of pushing strings in the runtime and thus splitting the matching logic into two levels). STRHOLDER( 0, "0" ); STRHOLDER( xX, "xX" ); MultiTie< StrMatch< StringHolder_0 >, IsIn< StringHolder_xX >, Quantity< OrMatch< DigitChar, OrMatch< Range< char_a, char_f >, Range< char_A, char_F > > >, 1 >
re;
The "MultiTie" ties the match parts to the consecutive sequence. Some other examples I tested include - email address regexp - quoted string matching (containing escaped quoting chars) Currently the implementation counts 21 template classes, which are enough to implement (I hope) almost any regular expression. Each "brick" is called using operator ()(). That allows to pass it as an argument to the standard algorithms (like std::replace_if etc). I've already implemented the availability to push parameters to the regexp in the runtime (but that's not the major feature). Some rationale for the defense: String matching is widely applied part of the coding. Often it's also one of the problematic domains (especially when using self-implemented matching functions without any general solution). Once it's done, it can be hard to change it, because all what can be seen is HOW is the match done, but not WHAT does it do. Even when using some higher-level methods, it still isn't self-descriptive. Template-defined regexp provides one solution for this issue by hiding the HOW under the mask of WHAT, and taking the advantage of compile-time check. Only to mention that - the performance is not touched by any kind of interpreting - everything is prepared after compilation (but I don't want to touch anybody's baby here!). Please don't look at: - the proposed name "template-defined regexp", I use it as the temp. name - the naming of the classes - implementation details (yes, they aren't here, because it's not necessary for now) - term "regexp" - I'm not sure if this can be called so... Your opinions are welcomed! Vit Stepanek ------------