[regex] regex_replace with format string replaced as it is

older
[Boost-bugs] [ boost-Bugs-1238243...

Lorenzo Bettini

31 May 2005 31 May '05

10:41 a.m.

Hi when I use template <class traits, class Allocator, class charT> basic_string<charT> regex_replace(const basic_string<charT>& s, const basic_regex<charT, traits, Allocator>& e, const basic_string<charT>& fmt, match_flag_type flags = match_default); is there a way to make the fmt string be substituted as IT is? I mean if I specify "(\\)", for instance, I want that actually those four characters to be substituted for the occurrences of e in s, without any interpretation of ( and \. Of cource I could escape both ( and \, but I was wondering if there's an option to avoid this. thanks in advance Lorenzo

Show replies by date

Keith MacDonald

31 May 31 May

11:30 a.m.

New subject: [regex] regex_replace with format string replaced as it is

Instead of regex_replace, you could use regex_search, then std::copy for literal replacement, or match_results::format otherwise. Keith MacDonald "Lorenzo Bettini" <bettini@dsi.unifi.it> wrote in message news:d7hepp$403$1@sea.gmane.org...

...

Hi

when I use

template <class traits, class Allocator, class charT> basic_string<charT> regex_replace(const basic_string<charT>& s, const basic_regex<charT, traits, Allocator>& e, const basic_string<charT>& fmt, match_flag_type flags = match_default);

is there a way to make the fmt string be substituted as IT is?

I mean if I specify "(\\)", for instance, I want that actually those four characters to be substituted for the occurrences of e in s, without any interpretation of ( and \.

Of cource I could escape both ( and \, but I was wondering if there's an option to avoid this.

thanks in advance Lorenzo

_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Lorenzo Bettini

2:15 p.m.

New subject: [regex] regex_replace with format string replaced as it is

Keith MacDonald wrote:

...

Instead of regex_replace, you could use regex_search, then std::copy for literal replacement, or match_results::format otherwise.

OK, for the regex_search, but I do not understand how to use match_results::format

John Maddock

4 p.m.

...

when I use

template <class traits, class Allocator, class charT> basic_string<charT> regex_replace(const basic_string<charT>& s, const basic_regex<charT, traits, Allocator>& e, const basic_string<charT>& fmt, match_flag_type flags = match_default);

is there a way to make the fmt string be substituted as IT is?

I mean if I specify "(\\)", for instance, I want that actually those four characters to be substituted for the occurrences of e in s, without any interpretation of ( and \.

Of cource I could escape both ( and \, but I was wondering if there's an option to avoid this.

Yes: pass format_literal|match_default as the last parameter to regex_replace, and the string will be treated as a literal, and not a Perl-style format string. HTH, John.

Keith MacDonald

8:19 p.m.

New subject: [regex] regex_replace with format string replaced as it is

Thanks for implementing format_literal in 1.33. I see also that it is now possible to determine the character offset when bad_expression is thrown - which is very useful. Are there any other handy little changes that did not make it into libs/regex/doc/history.html (yet)? Thanks, Keith MacDonald "John Maddock" <john@johnmaddock.co.uk> wrote in message news:00f801c565f9$de9f6fa0$cbf30352@fuji...

...

Yes: pass format_literal|match_default as the last parameter to regex_replace, and the string will be treated as a literal, and not a Perl-style format string.

HTH,

John.

_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

John Maddock

1 Jun 1 Jun

10:13 a.m.

New subject: [regex] regex_replace with format string replaced as itis

...

Thanks for implementing format_literal in 1.33. I see also that it is now possible to determine the character offset when bad_expression is thrown - which is very useful. Are there any other handy little changes that did not make it into libs/regex/doc/history.html (yet)?

I really hope not! Embarrassingly, I notice that I haven't documented format_literal at all! Fixed in cvs now. Thanks for the prod, John.

Lorenzo Bettini

2:51 p.m.

New subject: [regex] regex_replace with format string replaced as itis

John Maddock wrote:

...

...
Thanks for implementing format_literal in 1.33. I see also that it is now possible to determine the character offset when bad_expression is thrown - which is very useful. Are there any other handy little changes that did not make it into libs/regex/doc/history.html (yet)?

I really hope not!

Embarrassingly, I notice that I haven't documented format_literal at all! Fixed in cvs now.

that's exactly what I was looking for, but I hadn't found it in the docs :-) as I understand, however, it's only available in 1.33 thanks again Lorenzo

John Maddock

2 Jun 2 Jun

9:58 a.m.

New subject: [regex] regex_replace with format string replaced asitis

...

that's exactly what I was looking for, but I hadn't found it in the docs :-)

as I understand, however, it's only available in 1.33

Shucks, yes I think you're right about that, another approach would be to iterator through all the matches and join up the bits that didn't match with your replacement text. But, it's probably easier to grab a current cvs snapshot, and then upgrade to 1.33 as soon as it comes out (should be real soon now). Sorry about the false lead, John.

Lorenzo Bettini

11:30 a.m.

New subject: [regex] regex_replace with format string replaced asitis

John Maddock wrote:

...

...
that's exactly what I was looking for, but I hadn't found it in the docs :-)

as I understand, however, it's only available in 1.33

Shucks, yes I think you're right about that, another approach would be to iterator through all the matches and join up the bits that didn't match with your replacement text. But, it's probably easier to grab a current cvs snapshot, and then upgrade to 1.33 as soon as it comes out (should be real soon now).

Indeed I used this solution string subst(const boost::regex &e, const string &s, const string &sub) { string ret; boost::sregex_iterator i1(s.begin(), s.end(), e); boost::sregex_iterator i2; string suffix; if (i1 == i2) return s; // the exp is not in the string so we do not alter it. for (boost::sregex_iterator it = i1; it != i2; ++it) { string prefix = it->prefix(); if (prefix.size()) ret += prefix; suffix = it->suffix(); ret += sub; } if (suffix.size()) ret += suffix; return ret; }

...

Sorry about the false lead,

no problem :-) by the way, I saw that it was also asked before (also by myself): do you plan to implement the features "subexpressions that matched" so that one does not have to iterate through all the subexpressions? If I could help implementing that somehow I'd be happy Lorenzo

John Maddock

12:40 p.m.

New subject: [regex] regex_replace with format string replaced asitis

...

by the way, I saw that it was also asked before (also by myself): do you plan to implement the features "subexpressions that matched" so that one does not have to iterate through all the subexpressions?

Sorry, I'm being dense today: what do you mean by that? John.

Lorenzo Bettini

11:08 p.m.

New subject: [regex] regex_replace with format string replaced asitis

John Maddock wrote:

...

...
by the way, I saw that it was also asked before (also by myself): do you plan to implement the features "subexpressions that matched" so that one does not have to iterate through all the subexpressions?

Sorry, I'm being dense today: what do you mean by that?

instead of checking every what[i].matched as in the following for (unsigned int i = 1; i < what.size(); ++i) { if (what[i].matched) { I'd like to be able to have the subexpressions that matched accessible directly. Something like for (unsigned int i = 1; i < what.submatched.size(); ++i) { so, if I have, say, 10 subexpressions, and only two matched, instead of checking every what[i] (for i = 0..9), I could access directly to those two subexpressions. It'd be much more efficient.

John Maddock

3 Jun 3 Jun

9:50 a.m.

New subject: [regex] regex_replace with format string replaced asitis

...

...
Sorry, I'm being dense today: what do you mean by that?

instead of checking every what[i].matched as in the following

for (unsigned int i = 1; i < what.size(); ++i) { if (what[i].matched) {

I'd like to be able to have the subexpressions that matched accessible directly. Something like

for (unsigned int i = 1; i < what.submatched.size(); ++i) {

so, if I have, say, 10 subexpressions, and only two matched, instead of checking every what[i] (for i = 0..9), I could access directly to those two subexpressions. It'd be much more efficient.

Hmm, it depends on what you want to do I guess: how would tell what index those two sub-expressions were, or don't you care? It also means a complete rewrite of the match_results interface, and probably the code that finds matches as well - it all assumes that match_results is basically an array of N sub-expressions. John.

Lorenzo Bettini

6 Jun 6 Jun

1:46 p.m.

New subject: [regex] regex_replace with format string replaced asitis

John Maddock wrote:

...

...
...
Sorry, I'm being dense today: what do you mean by that?

instead of checking every what[i].matched as in the following

for (unsigned int i = 1; i < what.size(); ++i) { if (what[i].matched) {

I'd like to be able to have the subexpressions that matched accessible directly. Something like

for (unsigned int i = 1; i < what.submatched.size(); ++i) {

so, if I have, say, 10 subexpressions, and only two matched, instead of checking every what[i] (for i = 0..9), I could access directly to those two subexpressions. It'd be much more efficient.

Hmm, it depends on what you want to do I guess: how would tell what index those two sub-expressions were, or don't you care? It also means a complete rewrite of the match_results interface, and probably the code that finds matches as well - it all assumes that match_results is basically an array of N sub-expressions.

well, since what[] is already an array of n subexpressions, during finding of matches, I suppose, you set all the fields of what[i], if the i-th subexpression matched. While you're doing this, you could also store the index i into another structure of what. I hope I explained this correctly... Lorenzo

John Maddock

9 Jun 9 Jun

11:13 a.m.

New subject: [regex] regex_replace with format string replaced asitis

...

well, since what[] is already an array of n subexpressions, during finding of matches, I suppose, you set all the fields of what[i], if the i-th subexpression matched. While you're doing this, you could also store the index i into another structure of what.

I hope I explained this correctly...

Yes, but it's not that simple... 1) As well as adding matches to the list, matches can become "unmatched" during backtracking. 2) The time taken for a match is completely dominated by the time taken for calls to new and delete, especially for simple expressions a call to new is about 10x as long as the time to find a match, so if you reuse your match_results structures (so the matcher doesn't have to allocate any memory), then matching is extremely fast. Using a dynamic memory structure (like std::set or std::map) to store just those sub-expressions that matched would completely blow this out of the water. And yes, for some folks this is very important.... An alternative would be to extend the existing structure to hold the extra information, the trouble is, almost any way I can think of arranging this will have rather a negative speed impact (don't forget you have to remove as well as add entries). Probably the fastest method is to just post-process the results by doing a linear scan through them, and that's something you can do right now! You'd need hundreds of sub-expressions before this became unduly inefficient - scanning through a vector really is very fast. There is one option I can think of that might work: add a linked list to the existing submatches, so that each links forward to the next matched subexpression and back to the previous matched subexpression. However, this means that only bi-directional (not random access ) would be possible to "just the matched" sub-expressions. It would also double the size of the sub_match structure, which itself has important implications: because these structures are pushed into the stack (on Windows at least), it basically halves the maximum complexity of expression you can match, more excessive memory usage also has a performance impact. For this reason the capturing code (see repeated captures in libs/regex/doc/captures.html) isn't enabled by default: it also doubles the size of sub_match, with all the issues that raises. Sorry if I'm being unduly pessimistic, but in all honesty, the most efficient method really is to just scan the match_results structure after the event, this has the advantage as well that you don't pay for what you don't use. John.

Lorenzo Bettini

3 Jul 3 Jul

12:05 p.m.

New subject: [regex] regex_replace with format string replaced asitis

John Maddock wrote:

...

...
well, since what[] is already an array of n subexpressions, during finding of matches, I suppose, you set all the fields of what[i], if the i-th subexpression matched. While you're doing this, you could also store the index i into another structure of what.

I hope I explained this correctly...

Yes, but it's not that simple...

1) As well as adding matches to the list, matches can become "unmatched" during backtracking. 2) The time taken for a match is completely dominated by the time taken for calls to new and delete, especially for simple expressions a call to new is about 10x as long as the time to find a match, so if you reuse your match_results structures (so the matcher doesn't have to allocate any memory), then matching is extremely fast. Using a dynamic memory structure (like std::set or std::map) to store just those sub-expressions that matched would completely blow this out of the water. And yes, for some folks this is very important....

An alternative would be to extend the existing structure to hold the extra information, the trouble is, almost any way I can think of arranging this will have rather a negative speed impact (don't forget you have to remove as well as add entries). Probably the fastest method

OK, I see: I wasn't thinking about backtracking and removing entries from a dynamic structure. I was thinking about a list, but I see that if you need to remove an entry this may waste lot of time.

...

There is one option I can think of that might work: add a linked list to the existing submatches, so that each links forward to the next matched subexpression and back to the previous matched subexpression. However, this means that only bi-directional (not random access ) would be possible to "just the matched" sub-expressions. It would also double

but again you may need to modify this structure during backtracking, right? Lorenzo -- +-----------------------------------------------------+ | Lorenzo Bettini ICQ# lbetto, 16080134 | | PhD in Computer Science | | Dip. Sistemi e Informatica, Univ. di Firenze | | Florence - Italy (GNU/Linux User # 158233) | | Home Page : http://www.lorenzobettini.it | | http://music.dsi.unifi.it XKlaim language | | http://www.lorenzobettini.it/purple Cover Band | | http://www.gnu.org/software/src-highlite | | http://www.gnu.org/software/gengetopt | | http://www.lorenzobettini.it/software/gengen | | http://www.lorenzobettini.it/software/doublecpp | +-----------------------------------------------------+

John Maddock

9 Jul 9 Jul

11:37 a.m.

New subject: [regex] regex_replace with format string replaced asitis

...

but again you may need to modify this structure during backtracking, right?

Yes, but you wouldn't have to allocate/deallocate any memory, so it's "just" some pointer fiddling. As I said though, it still adds another overhead. John.

Lorenzo Bettini

14 Jul 14 Jul

2:45 p.m.

New subject: [regex] regex_replace with format string replaced asitis

John Maddock wrote:

...

...
but again you may need to modify this structure during backtracking, right?

Yes, but you wouldn't have to allocate/deallocate any memory, so it's "just" some pointer fiddling. As I said though, it still adds another overhead.

I see. I agree with you that due to all these problems/overheads it is not worthwhile :-) -- +-----------------------------------------------------+ | Lorenzo Bettini ICQ# lbetto, 16080134 | | PhD in Computer Science | | Dip. Sistemi e Informatica, Univ. di Firenze | | Florence - Italy (GNU/Linux User # 158233) | | Home Page : http://www.lorenzobettini.it | | http://music.dsi.unifi.it XKlaim language | | http://www.lorenzobettini.it/purple Cover Band | | http://www.gnu.org/software/src-highlite | | http://www.gnu.org/software/gengetopt | | http://www.lorenzobettini.it/software/gengen | | http://www.lorenzobettini.it/software/doublecpp | +-----------------------------------------------------+

7313

Age (days ago)

7357

Last active (days ago)

List overview

Download

16 comments

3 participants

participants (3)

John Maddock
Keith MacDonald
Lorenzo Bettini