"U.Mutlu"
why is this regex const string sRe = "((([a-zA-Z]|([a-zA-Z][a-zA-Z0-9\\-]))+[a-zA-Z0-9])\\.)+" "((([a-zA-Z]|([a-zA-Z][a-zA-Z0-9\\-]))+[a-zA-Z0-9]))";
not matching this string wholly? "a1a.a2a.a3a.a4aaaa"
It rather matches only this part: "a1a.a2a.a3a.a4"
What's the problem here? (I know I can append a delimiter to solve the problem, but I think the regex should've match it wholly, shouldn't it?)
I think you're getting caught by "first match, not longest match". Put differently, the regex engine isn't backtracking where you think it is. Looking at the last segment of your target string, the match goes like so: ((([a-zA-Z]|([a-zA-Z][a-zA-Z0-9\\-]))+[a-zA-Z0-9])) ^ Ok, "a" matches this charset. ((([a-zA-Z]|([a-zA-Z][a-zA-Z0-9\\-]))+[a-zA-Z0-9])) ^ But "4" doesn't match ([a-zA-Z]), so we call the quantifier done. ((([a-zA-Z]|([a-zA-Z][a-zA-Z0-9\\-]))+[a-zA-Z0-9])) ^ The "4" does match here, and that completes the match. As you said, you can put ^ and $ if you want to force the match to the entire string. I'm also curious if you're capturing those substrings for some purpose? If not, you can simplify it quite a bit (and probably make it faster, since regular parens require the RE engine to do a fair bit of extra work): ([a-zA-Z]|([a-zA-Z][a-zA-Z0-9\\-]))+ Becomes: [a-zA-Z][a-zA-Z0-9\\-]* Also, note that "-" should be treated as a regular character if it's at the beginning or end of a character range, so we can drop those backslashes: [a-zA-Z][a-zA-Z0-9-]* And I'd even consider using the preprocessor to make it a bit more readable: #define LETTER "a-zA-Z" #define DIGIT "0-9" #define DASH "-" #define ELEMENT \ "[" LETTER "]" "[" LETTER DIGIT DASH "]*" "[" LETTER DIGIT "]" const string re( "(?:" ELEMENT "\\." ")+" ELEMENT ); #undef ELEMENT #undef DASH #undef DIGIT #undef LETTER (I suppose that using actual C++ strings, with a compiler that can do string expression evaluation at compile time, would be just as good if not preferable...) This also makes clear the (possibly surprising) issue that you won't match single-character elements; I don't know if that's intentional or not. If you want to say "an element starts with a letter followed by any number of letters, digits, or dashes, but must end with a letter or digit", then you might prefer: #define ELEMENT \ "[" LETTER "]" "[" LETTER DIGIT DASH "]*" "(?