
I was just writing up a simple tutorial example; finding the subject in a set of email headers. Here's what I got:
std::string line; boost::regex pat("^Subject: (Re: )?(.*)"); boost::smatch matches;
while (std::cin) { std::getline(std::cin, line); if (boost::regex_match(line,matches, pat)) std::cout << matches[2]; }
1. There's no way to search a stream for a match because a regex requires bidirectional iterators, so I have to do this totally frustrating line-by-line search. I think Spirit has some kind of iterator that turns an input iterator into something forward by holding a cache of the data starting with the earliest copy of the original iterator. Could something like that be added?
Yes, but it's a more general iterator type rather than just regex specific, incidentally I also have a use for a "fileview" class which presents a files contents as a pair of random access iterators. If you want me to provide these though, you'll need to wait until I've finished the next round of regex internal changes / refactoring.
2. Seems to me that if match objects could be converted to bool, we might be able to:
std::string line; boost::regex pat("^Subject: (Re: )?(.*)");
while (std::cin) { std::getline(std::cin, line); if (boost::smatch m = boost::regex_match(line, pat)) std::cout << m[2]; }
which would be much smoother to the touch. Are match objects expensive to construct?
Currently, expensive'ish. Originally these were reference counted, and cheap to copy, but I ran into problems with thread safety (it's not uncommon to obtain a match with one thread, then hand off a copy to another thread for processing). Now that we have a thread safe shared_ptr though I need to revisit this, it just makes my head hurt trying to analyse concurrent code :-| One other thing - the current regex_match overload that doesn't take a match_results as a parameter currently returns bool - the intent is that if the user doesn't need the info generated in the match_results, then some time can be saved by not storing it. Boost.Regex doesn't currently take advantage of that, but I was planning to in the next revision (basically you can cut out memory allocation altogether, and that's an order or magnitude saving).
2. Seems to me that if match objects could be converted to bool, we might be able to:
I can only second that, I am currently using my own regex library (some of my reasoning to be found in this c.l.c++.m thread: <http://tinyurl.com/2xnbd>), here I also allow implicit conversion to the iterator type, which allow code like:
iterator it = regex:find(first, last, ptrn);
Although I already did propose it for boost, but was told that it poses a problem with the ambiguity of an "empty" match at the end of the string and "no match at all" -- my argument here is that if one knows that the pattern might generate such a match (and one is interested in knowing about it), one just declares the result to be the match object. The former generally allows to code w/o all those if's to see if something was actually matched -- at least it has made much of my code simpler/shorter.
Sounds good to me. John?
So we make match_results implicitly convertible to it's iterator type? I'm not necessarily against that, but there are dangers: mainly as Alan stated that you can easily miss corner cases (when the regex matches a zero-length string). John.