
Hi, I've implemented the basic regexp functionality using few simple template classes. Any regexp can be created by inserting the template classes one into another in the required order. Although it's in the design state, I'd like to find out if there's any interest in providing this functionality. Let me present you some short introduction to the topic. Some basic classes include - String Match (full string matching) - Is-in Match (matches any character in the list) - Or Match - Quantity and so on. These are the "building blocks" of the regexp. Each class does a very little bit but when combined in the specific order, you gain a clearly defined string matching algorithm of any kind and any complexity. In addition, it's quite simple to add any missing functionality. You just need to define you're desired algorithm and mix it with the others. I add few examples that should describe how's it all done. Regexp: ^[a-z]+ is implemented as: Quantity< Range< char_a, char_z >, 1
re;
(sorry if you don't like the indentation, but that's just to ease the reading) Regexp: ^0(x|X)[a-fA-F0-9]+ (matching the hexadecimal number) The required strings are hardcoded using macro "STRHOLDER", which generates a simple struct containing the given string (to avoid the need of pushing strings in the runtime and thus splitting the matching logic into two levels). STRHOLDER( 0, "0" ); STRHOLDER( xX, "xX" ); MultiTie< StrMatch< StringHolder_0 >, IsIn< StringHolder_xX >, Quantity< OrMatch< DigitChar, OrMatch< Range< char_a, char_f >, Range< char_A, char_F > > >, 1 >
re;
The "MultiTie" ties the match parts to the consecutive sequence. Some other examples I tested include - email address regexp - quoted string matching (containing escaped quoting chars) Currently the implementation counts 21 template classes, which are enough to implement (I hope) almost any regular expression. Each "brick" is called using operator ()(). That allows to pass it as an argument to the standard algorithms (like std::replace_if etc). I've already implemented the availability to push parameters to the regexp in the runtime (but that's not the major feature). Some rationale for the defense: String matching is widely applied part of the coding. Often it's also one of the problematic domains (especially when using self-implemented matching functions without any general solution). Once it's done, it can be hard to change it, because all what can be seen is HOW is the match done, but not WHAT does it do. Even when using some higher-level methods, it still isn't self-descriptive. Template-defined regexp provides one solution for this issue by hiding the HOW under the mask of WHAT, and taking the advantage of compile-time check. Only to mention that - the performance is not touched by any kind of interpreting - everything is prepared after compilation (but I don't want to touch anybody's baby here!). Please don't look at: - the proposed name "template-defined regexp", I use it as the temp. name - the naming of the classes - implementation details (yes, they aren't here, because it's not necessary for now) - term "regexp" - I'm not sure if this can be called so... Your opinions are welcomed! Vit Stepanek ------------

On 31/01/2011 00:30, Vit Stepanek wrote:
Hi,
I've implemented the basic regexp functionality using few simple template classes. Any regexp can be created by inserting the template classes one into another in the required order. Although it's in the design state, I'd like to find out if there's any interest in providing this functionality.
Let me present you some short introduction to the topic.
Some basic classes include - String Match (full string matching) - Is-in Match (matches any character in the list) - Or Match - Quantity
and so on.
These are the "building blocks" of the regexp. Each class does a very little bit but when combined in the specific order, you gain a clearly defined string matching algorithm of any kind and any complexity.
In addition, it's quite simple to add any missing functionality. You just need to define you're desired algorithm and mix it with the others.
How is that significantly different from what spirit or xpressive do?

On Mon, 2011-01-31 at 12:40 +0100, Mathias Gaunard wrote:
On 31/01/2011 00:30, Vit Stepanek wrote:
Hi,
I've implemented the basic regexp functionality using few simple template classes. Any regexp can be created by inserting the template classes one into another in the required order. Although it's in the design state, I'd like to find out if there's any interest in providing this functionality.
Let me present you some short introduction to the topic.
Some basic classes include - String Match (full string matching) - Is-in Match (matches any character in the list) - Or Match - Quantity
and so on.
These are the "building blocks" of the regexp. Each class does a very little bit but when combined in the specific order, you gain a clearly defined string matching algorithm of any kind and any complexity.
In addition, it's quite simple to add any missing functionality. You just need to define you're desired algorithm and mix it with the others.
How is that significantly different from what spirit or xpressive do?
_______________________________________________
Well, how is spirit or xpressive different from what all the regexp interpreters do? I think the case is HOW it is done. As I can see, both existing libs do it different way than I do. Both are very general-purpose, providing tones of functionality. But what I have is clear syntax, self-descriptive objects, which you use without knowing all the overloaded operators' functionality, which differs from implementation to implementation. All in one object, once it's created, it's ready. Everything in one object declaration. Sometimes simplicity counts. But that's just my opinion. Vit

On 31/01/2011 13:22, Vit Stepanek wrote:
Well, how is spirit or xpressive different from what all the regexp interpreters do?
They build their engine through composition of template objects. (well actually, it's more powerful than that, they can even access the composition tree directly and modify it)

On Mon, 2011-01-31 at 13:35 +0100, Mathias Gaunard wrote:
They build their engine through composition of template objects. (well actually, it's more powerful than that, they can even access the composition tree directly and modify it)
_______________________________________________
Yes, that's the HOW. But I don't try to compete these libs. Actually I wanted to discuss the idea, not to fight if it's better or whatever. So probably you meant "two's enough"?

On 31/01/2011 15:08, Vit Stepanek wrote:
On Mon, 2011-01-31 at 13:35 +0100, Mathias Gaunard wrote:
They build their engine through composition of template objects. (well actually, it's more powerful than that, they can even access the composition tree directly and modify it)
_______________________________________________
Yes, that's the HOW. But I don't try to compete these libs. Actually I wanted to discuss the idea, not to fight if it's better or whatever. So probably you meant "two's enough"?
No, I meant "how is your approach any different?" You are also building an engine through composition of template objects.

On 1/30/2011 6:30 PM, Vit Stepanek wrote:
Hi,
I've implemented the basic regexp functionality using few simple template classes. Any regexp can be created by inserting the template classes one into another in the required order. Although it's in the design state, I'd like to find out if there's any interest in providing this functionality.
snipped... I like your idea but you need to provide more extensive documentation of what you are doing.

On Mon, 2011-01-31 at 09:27 -0500, Edward Diener wrote:
On 1/30/2011 6:30 PM, Vit Stepanek wrote:
Hi,
I've implemented the basic regexp functionality using few simple template classes. Any regexp can be created by inserting the template classes one into another in the required order. Although it's in the design state, I'd like to find out if there's any interest in providing this functionality.
snipped...
I like your idea but you need to provide more extensive documentation of what you are doing.
OK, I didn't want to overload you from the start... My basic idea was to avoid runtime interpreting - thus using template classes, and to avoid depending on any regexp syntax and rules. Therefore I made a class for each of the most often used regexp actions, instead of creating overloaded operators or somehow simulating the regexp syntax. The result is in compile time built function with simple structure and use... But that's just my insight. To be more detailed, let's look at some implementation. I confess some things can be improved, but that can be done anytime. Every template class takes some template arguments (or none), depending on the action type. The action is executed through the () operator. Classes that contain other sub-matches, call the () operators of the underlying classes. (Currently it's implemented to work with c-strings, but I intend to make it possible to work on any iterator) For example - the string matching class looks like this: * template< typename T_STRHOLDER > * struct StrMatch * { * template< typename T_CHAR > * bool operator ( ) (const T_CHAR& str ) * { * const T_CHAR s = str; * const T_CHAR p = m_str( ); * // comparing here * (...) * return true/false, depending on the cmp. result * } * * T_STRHOLDER m_str; * }; Any function object or function can be passed as a parameter, and is called during the execution to obtain a value to compare. Similar is "IsIn" matching class, which compares one item against the set of given available values. The Or-match is nothing more than this: * template< typename T_MATCH1, typename T_MATCH2 > * struct OrMatch * { * template< typename T_CHAR > * bool operator( ) (const T_CHAR& str ) * { * return m_match1( str ) || m_match2( str ); * } * * T_MATCH1 m_match1; * T_MATCH2 m_match2; * }; Looks quite simple, but with few other classes the comparing is clear and self descriptive. Basically there are 2 kinds of classes, let's call them - control classes (OrMatch, Quantity - those which control the way the underlying comparison is done) and - matching classes (perform the comparing). Some more enhanced classes like LazyMatch allow to build more difficult comparing structures. The execution is invoked using operator () on the regexp object: bool res = re( str ); Any questions / ideas? To Mathias:
You are also building an engine through composition of template objects.
Well, yes. Just let me explain what I have, the differences may show (or not - we'll see). And, my little tool is far not yet finished.
participants (3)
-
Edward Diener
-
Mathias Gaunard
-
Vit Stepanek