regex iterator question
I've got a need for regex and I would like to use it to extract tokens
matching a regular expresssion from a file stream. Seems like this would be
a common desire.
So first shot is
boost::regex_token_iterator
I'm thinking this is veeeeeery cool - maybe a 1000 lines of free code included for the price of one. But, doesn't work. multi_pass is a forward_trasversal iterator while regex_token_iterator requires a bidirectional_trasversal_iterator. A huge disappoint to come soooo close. Thinking about it, this problem must come very often. How is it usually addressed? There must be a simple bridge across this. In a pinch, I'll just have to load the whole file into some sort collection, but I prefer the ultimate unlimited file size solution. Robert Ramey
On Mon, Mar 30, 2009 at 02:25, Robert Ramey
I've got a need for regex and I would like to use it to extract tokens matching a regular expresssion from a file stream. Seems like this would be a common desire.
[snip]
But, doesn't work. multi_pass is a forward_trasversal iterator while regex_token_iterator requires a bidirectional_trasversal_iterator. A huge disappoint to come soooo close.
Thinking about it, this problem must come very often. How is it usually addressed? There must be a simple bridge across this. In a pinch, I'll just have to load the whole file into some sort collection, but I prefer the ultimate unlimited file size solution.
It seems to me that since a bidi iterator can cover the whole range, an adapter would have to end up just loading it all into memory eventually. Have you considered rephrasing your regex as a PEG (which shouldn't be hard) and using spirit instead? That way the multi-pass would be sufficient. I seem to recall a mention of a Spirit.Lex that might be exactly what you need...
I've got a need for regex and I would like to use it to extract tokens matching a regular expresssion from a file stream. Seems like this would be a common desire.
So first shot is
boost::regex_token_iterator
This doesn't work since regex_token_iterator requires a bidirectional iterator. Seems reasonable enough.
So next I comb through boost and find
#include
and try
boost::regex_token_iterator< boost::spirit::multi_pass
I'm thinking this is veeeeeery cool - maybe a 1000 lines of free code included for the price of one.
But, doesn't work. multi_pass is a forward_trasversal iterator while regex_token_iterator requires a bidirectional_trasversal_iterator. A huge disappoint to come soooo close.
Thinking about it, this problem must come very often. How is it usually addressed? There must be a simple bridge across this. In a pinch, I'll just have to load the whole file into some sort collection, but I prefer the ultimate unlimited file size solution.
If you check the regex examples there are some "load_file" routines than dump a files contents into memory, but I agree it's not an ideal solution. I did experiment with some adapters to solve this issue in the early days of regex but never got a really good solution, and folks weren't demanding it so it got dropped :-( But.. how about a memory mapped file? Boost.Interprocess has support for that: http://www.boost.org/doc/libs/1_38_0/doc/html/interprocess/sharedmemorybetwe... although I admit it's not quite a one liner... HTH, John.
But.. how about a memory mapped file? Boost.Interprocess has support for that: http://www.boost.org/doc/libs/1_38_0/doc/html/interprocess/sharedmemorybetwe... although I admit it's not quite a one liner...
Actually, the spirit library has a file_iterator which I think would be a one liner. This implementation uses memorymapped file if it's available and it would seem ideal. But it's only useful for an actual file and my case is a stream off the net. I'm very surprised that it seems I'm only wanting to do this. Makes me wonder if I'm doing the right thing. I'll flail around a little bit more before I give up and try something else. Robert Ramey
participants (3)
-
John Maddock
-
Robert Ramey
-
Scott McMurray