[xpressive] regex_replace() issue.

Hi, Currently I'm using Boost.Xpressive to search a string for variables that will be replaced by localized utf-8 encoded strings, in the format of $(variable_name). Because of this, I don't think I can simply use regex_replace(), since it takes a common regex that will be used to replace all matches, when in my case each match requires a special, unique replacement. Is there a way of doing this using Boost.Xpressive, or do I need to use a different component in the boost library? Is there a Xpressive 'replace' function that takes a functor, perhaps, and for each match it calls the function object and replaces the match with the return value of the functor? An example of a string that I'll be searching is below: "This string $(variable1) has localized $(var2) text interleaved for no $(reason)" In the string above, we have 3 variables. Each variable will reference a string in a localization table, which will then be used to replace the variable itself. Any help is greatly appreciated. Thanks.

Robert Dailey wrote:
An example of a string that I'll be searching is below:
"This string $(variable1) has localized $(var2) text interleaved for no $(reason)"
In the string above, we have 3 variables. Each variable will reference a string in a localization table, which will then be used to replace the variable itself.
It's not be hard to build such a thing using using regex_iterator. Here's some code to get you going... #include <map> #include <iostream> #include <boost/xpressive/xpressive.hpp> using namespace boost; template<typename OutIter, typename OtherBidiIter, typename BidiIter, typename Format> inline OutIter regex_grep( OutIter out , OtherBidiIter begin_ , OtherBidiIter end_ , xpressive::basic_regex<BidiIter> const &re , Format format , xpressive::regex_constants::match_flag_type flags = xpressive::regex_constants::match_default ) { BidiIter begin = begin_, end = end_; xpressive::regex_iterator<BidiIter> it1(begin, end, re, flags), it2; bool yes_copy = !(flags & xpressive::regex_constants::format_no_copy); for(; it1 != it2; ++it1) { if(yes_copy) out = std::copy(begin, (*it1)[0].first, out); out = format(*it1, out); begin = (*it1)[0].second; } if(yes_copy) out = std::copy(begin, end, out); return out; } std::map<std::string, std::string> replacements; struct format { template<typename BidiIter, typename OutIter> OutIter operator()(xpressive::match_results<BidiIter> const &what, OutIter out) const { std::map<std::string, std::string>::const_iterator where = replacements.find(what[1].str()); if(where != replacements.end()) out = std::copy((*where).second.begin(), (*where).second.end(), out); return out; } }; int main() { replacements["X"] = "this"; replacements["Y"] = "that"; std::string input("\"$(X)\" has the value \"$(Y)\""), output; xpressive::sregex rx = xpressive::sregex::compile("\\$\\(([^\\)]+)\\)"); regex_grep(std::back_inserter(output), input.begin(), input.end(), rx, format()); std::cout << output << std::endl; return 0; } HTH, -- Eric Niebler Boost Consulting www.boost-consulting.com

On Fri, Mar 14, 2008 at 12:42 PM, Eric Niebler <eric@boost-consulting.com> wrote:
Robert Dailey wrote:
An example of a string that I'll be searching is below:
"This string $(variable1) has localized $(var2) text interleaved for no $(reason)"
In the string above, we have 3 variables. Each variable will reference a string in a localization table, which will then be used to replace the variable itself.
It's not be hard to build such a thing using using regex_iterator. Here's some code to get you going...
#include <map> #include <iostream> #include <boost/xpressive/xpressive.hpp> using namespace boost;
<snip>
HTH,
-- Eric Niebler Boost Consulting www.boost-consulting.com
Thank you very much for helping. I wouldn't go as far as to say it isn't hard, because that looks very hard IMHO. I was hoping for something a little more intuitive and built-in. Instead of going through all of that work I will probably first try to find a different third party regular expression library that can automate this task for me so that I can keep my code a little cleaner and focus on the actual task. I know that other regex libraries in other languages provide such a feature. Perhaps in the future Boost.Xpressive can be extended to provide this behavior. You could simply create a version of regex_replace() that takes a functor as the replacement instead of a string. Again, I appreciate the help and I will use your code example as a reference in the future if I ever decide to come back to this.

Robert Dailey wrote:
Thank you very much for helping. I wouldn't go as far as to say it isn't hard, because that looks very hard IMHO. I was hoping for something a little more intuitive and built-in. Instead of going through all of that work I will probably first try to find a different third party regular expression library that can automate this task for me so that I can keep my code a little cleaner and focus on the actual task.
Hm, OK. Good luck. Maybe Boost.Regex has this feature.
I know that other regex libraries in other languages provide such a feature. Perhaps in the future Boost.Xpressive can be extended to provide this behavior. You could simply create a version of regex_replace() that takes a functor as the replacement instead of a string.
Nothing is simple. It would require careful design work, tests and docs. If you feel strongly about it (and it /would/ be useful), you can open a feature request on svn.boost.org. Or even better, submit a patch. This is open source, after all, and I welcome your participation. -- Eric Niebler Boost Consulting www.boost-consulting.com

On Fri, Mar 14, 2008 at 5:57 PM, Eric Niebler <eric@boost-consulting.com> wrote:
Robert Dailey wrote:
Thank you very much for helping. I wouldn't go as far as to say it isn't hard, because that looks very hard IMHO. I was hoping for something a
little
more intuitive and built-in. Instead of going through all of that work I will probably first try to find a different third party regular expression library that can automate this task for me so that I can keep my code a little cleaner and focus on the actual task.
Hm, OK. Good luck. Maybe Boost.Regex has this feature.
I know that other regex libraries in other languages provide such a feature. Perhaps in the future Boost.Xpressive can be extended to provide this behavior. You could simply create a version of regex_replace() that takes a functor as the replacement instead of a string.
Nothing is simple. It would require careful design work, tests and docs. If you feel strongly about it (and it /would/ be useful), you can open a feature request on svn.boost.org. Or even better, submit a patch. This is open source, after all, and I welcome your participation.
-- Eric Niebler Boost Consulting www.boost-consulting.com
Hey Eric, I ended up using the method you proposed anyway mainly because I love boost so much and also because you did all of the work for me, so I couldn't let that go to waste. I barely had to make any changes anyhow. I realize it is open source but I really don't like modifying third party libraries unless I'm absolutely stuck with it. In other words, if I can get the original authors of the library to make the changes for me then I will wait for that instead of doing it myself. In addition, the boost library implementations intimidate me (they use a very complex design, structure, and syntax that takes a couple of minutes to follow). The boost implementations "think outside of the box", which is why it looks so exceedingly different from other implementations I've seen. It definitely requires a whole different state of mind. My whole point in saying all of this is that I'm not confident I can provide as worthy of a patch as someone who works on the boost library on a daily basis, but I will most certainly try. For the most part I would just be taking your source and making changes from there. In any case, I split up your source a little and placed them into more generic hpp files and I've put that into the engine for our game. You were very helpful, and I thank you. I'll submit a patch later on when I have time to mess with it. For future reference, where can I submit patches? Thanks again.

Robert Dailey wrote:
Hey Eric,
I ended up using the method you proposed anyway mainly because I love boost so much and also because you did all of the work for me, so I couldn't let that go to waste. <snip>
Great! You might want to also provide the following overload with a simpler interface. It traffics in strings rather than iterators, and just invokes the more general iterator-based algorithm that I sent earlier: template<typename Char, typename Format> inline std::basic_string<Char> regex_grep( std::basic_string<Char> const &str , xpressive::basic_regex< typename std::basic_string<Char>::const_iterator > const &re , Format format , xpressive::regex_constants::match_flag_type flags = xpressive::regex_constants::match_default ) { std::basic_string<Char> result; regex_grep( std::back_inserter(result) , str.begin() , str.end() , re , format , flags ); return result; } Now you can simply say "out = regex_grep(in, rex, formater())". As with much template code, it looks more daunting at first than it really is.
In any case, I split up your source a little and placed them into more generic hpp files and I've put that into the engine for our game. You were very helpful, and I thank you. I'll submit a patch later on when I have time to mess with it. For future reference, where can I submit patches? Thanks again.
You can submit patches at the same place you file bugs and make feature requests: svn.boost.org. Use the "New Ticket" button and follow the instructions. If you go this route, I'm much more likely to accept the patch if it comes with tests and documentation. I'll also say that the name "regex_grep" is lousy. I was thinking that Boost.Regex had this functionality with that name, but I was mistaken. -- Eric Niebler Boost Consulting www.boost-consulting.com

Eric, I think both of your code posts would be greate addition if not to the library itself, then to the examples section of it for sure. Thanks, Andrey On Fri, 14 Mar 2008 20:15:35 -0600, Eric Niebler <eric@boost-consulting.com> wrote:
Robert Dailey wrote:
Hey Eric,
I ended up using the method you proposed anyway mainly because I love boost so much and also because you did all of the work for me, so I couldn't let that go to waste. <snip>
Great! You might want to also provide the following overload with a simpler interface. It traffics in strings rather than iterators, and just invokes the more general iterator-based algorithm that I sent earlier:
template<typename Char, typename Format> inline std::basic_string<Char> regex_grep( std::basic_string<Char> const &str , xpressive::basic_regex< typename std::basic_string<Char>::const_iterator > const &re , Format format , xpressive::regex_constants::match_flag_type flags = xpressive::regex_constants::match_default ) { std::basic_string<Char> result; regex_grep( std::back_inserter(result) , str.begin() , str.end() , re , format , flags ); return result; }
Now you can simply say "out = regex_grep(in, rex, formater())". As with much template code, it looks more daunting at first than it really is.
In any case, I split up your source a little and placed them into more generic hpp files and I've put that into the engine for our game. You were very helpful, and I thank you. I'll submit a patch later on when I have time to mess with it. For future reference, where can I submit patches? Thanks again.
You can submit patches at the same place you file bugs and make feature requests: svn.boost.org. Use the "New Ticket" button and follow the instructions. If you go this route, I'm much more likely to accept the patch if it comes with tests and documentation.
I'll also say that the name "regex_grep" is lousy. I was thinking that Boost.Regex had this functionality with that name, but I was mistaken.

Eric Niebler wrote:
Robert Dailey wrote:
Perhaps in the future Boost.Xpressive can be extended to provide this behavior. You could simply create a version of regex_replace() that takes a functor as the replacement instead of a string.
Nothing is simple. It would require careful design work, tests and docs.
I had some time today, so I looked into this. I noticed a few things: 1) Some time ago, I switched regex_match and regex_search to a range-based interface (i.e., they traffic in ranges, basic_string is not special), but never switched regex_replace. 2) There are open library issues against the std::regex_replace interface: http://www.open-std.org/jtc1/sc22/wg21/docs/lwg-active.html#726 http://www.open-std.org/jtc1/sc22/wg21/docs/lwg-active.html#727 3) You are not the first person to want a regex_replace overload that takes a formatter functor instead of a string: http://lists.boost.org/boost-users/2006/05/19680.php Long story short, I decided regex_replace was due for a major overhaul, and match_results::format along with it. In the process, you got what you wanted ... you can pass a functor instead of a format string. The functor can have one of 3 signatures: string (match_results) OutIter (match_results, OutIter) OutIter (match_results, OutIter, match_flag_type) The formatter can be a function object, or even just a plain function. So, for instance, you can do this: map<string,string> replacements; string my_format(smatch const &what) { return replacements[what[0].str()]; } int main() { string input = ...; sregex rx = ...; string output = regex_replace(input, rx, my_format); } Old code that uses format strings still works as it did before. This is committed to trunk and will not be part of the forthcoming 1.35 release. None of this is documented yet -- it's experimental, and I reserve the right to pull this at any time. :-) That said, I think it's a nice extension, and I plan to keep it unless it causes problems. -- Eric Niebler Boost Consulting www.boost-consulting.com

On Sun, Mar 16, 2008 at 2:22 AM, Eric Niebler <eric@boost-consulting.com> wrote:
Eric Niebler wrote:
Robert Dailey wrote:
Perhaps in the future Boost.Xpressive can be extended to provide this behavior. You could simply create a version of regex_replace() that takes a functor as the replacement instead of a string.
Nothing is simple. It would require careful design work, tests and docs.
I had some time today, so I looked into this. I noticed a few things:
1) Some time ago, I switched regex_match and regex_search to a range-based interface (i.e., they traffic in ranges, basic_string is not special), but never switched regex_replace.
2) There are open library issues against the std::regex_replace interface: http://www.open-std.org/jtc1/sc22/wg21/docs/lwg-active.html#726 http://www.open-std.org/jtc1/sc22/wg21/docs/lwg-active.html#727
3) You are not the first person to want a regex_replace overload that takes a formatter functor instead of a string: http://lists.boost.org/boost-users/2006/05/19680.php
Long story short, I decided regex_replace was due for a major overhaul, and match_results::format along with it. In the process, you got what you wanted ... you can pass a functor instead of a format string.
The functor can have one of 3 signatures:
string (match_results) OutIter (match_results, OutIter) OutIter (match_results, OutIter, match_flag_type)
The formatter can be a function object, or even just a plain function. So, for instance, you can do this:
map<string,string> replacements;
string my_format(smatch const &what) { return replacements[what[0].str()]; }
int main() { string input = ...; sregex rx = ...; string output = regex_replace(input, rx, my_format); }
Old code that uses format strings still works as it did before. This is committed to trunk and will not be part of the forthcoming 1.35 release.
None of this is documented yet -- it's experimental, and I reserve the right to pull this at any time. :-) That said, I think it's a nice extension, and I plan to keep it unless it causes problems.
Thanks so much! It *WON'T* be part of 1.35? Why not? A couple of closed-source libraries I use also compile boost into their binaries, which means that if they ever upgrade to 1.35 in the future I won't be able to use this great new feature.

Robert Dailey wrote:
On Sun, Mar 16, 2008 at 2:22 AM, Eric Niebler <eric@boost-consulting.com>
Old code that uses format strings still works as it did before. This is committed to trunk and will not be part of the forthcoming 1.35 release.
Thanks so much! It *WON'T* be part of 1.35? Why not? A couple of closed-source libraries I use also compile boost into their binaries, which means that if they ever upgrade to 1.35 in the future I won't be able to use this great new feature.
That's right. 1.35 is frozen for release. Critical bug fixes only, I'm afraid. You can still get the latest xpressive from trunk. It's a header-only library, so there's no need to worry about binary incompatibilities with 1.35. -- Eric Niebler Boost Consulting www.boost-consulting.com

Eric Niebler wrote:
The formatter can be a function object, or even just a plain function. So, for instance, you can do this:
map<string,string> replacements;
string my_format(smatch const &what) { return replacements[what[0].str()]; }
int main() { string input = ...; sregex rx = ...; string output = regex_replace(input, rx, my_format); }
I've made a small addition ... the formatter can be a lambda, too, if you #include <boost/xpressive/regex_actions.hpp>. The above can now be written simply as: using xpressive::ref; string output = regex_replace(input, rx, ref(replacements)[_]); Here, "_" gets substituted with a sub_match representing the current match. You can also use s1, s2, etc., to access the other sub-matches. -- Eric Niebler Boost Consulting www.boost-consulting.com

On Sun, Mar 16, 2008 at 5:44 PM, Eric Niebler <eric@boost-consulting.com> wrote:
Eric Niebler wrote:
The formatter can be a function object, or even just a plain function. So, for instance, you can do this:
map<string,string> replacements;
string my_format(smatch const &what) { return replacements[what[0].str()]; }
int main() { string input = ...; sregex rx = ...; string output = regex_replace(input, rx, my_format); }
I've made a small addition ... the formatter can be a lambda, too, if you #include <boost/xpressive/regex_actions.hpp>. The above can now be written simply as:
using xpressive::ref; string output = regex_replace(input, rx, ref(replacements)[_]);
Here, "_" gets substituted with a sub_match representing the current match. You can also use s1, s2, etc., to access the other sub-matches.
How is "[_]" legal C++ syntax?

Robert Dailey wrote:
Eric Niebler wrote: using xpressive::ref; string output = regex_replace(input, rx, ref(replacements)[_]);
Here, "_" gets substituted with a sub_match representing the current match. You can also use s1, s2, etc., to access the other sub-matches.
How is "[_]" legal C++ syntax?
It's magic. Actually, it's a lambda expression. xpressive::ref() creates a lazy reference, operator[] creates an object representing a lazy index operation, etc.. Check out http://tinyurl.com/36qfn6 for other things you can do with lambda expressions in xpressive 2.0 (which *will* be part of 1.35). -- Eric Niebler Boost Consulting www.boost-consulting.com

Robert Dailey wrote:
On Sun, Mar 16, 2008 at 5:44 PM, Eric Niebler <eric@boost-consulting.com> wrote:
I've made a small addition ... the formatter can be a lambda, too, if you #include <boost/xpressive/regex_actions.hpp>. The above can now be written simply as:
using xpressive::ref; string output = regex_replace(input, rx, ref(replacements)[_]);
Here, "_" gets substituted with a sub_match representing the current match. You can also use s1, s2, etc., to access the other sub-matches.
How is "[_]" legal C++ syntax? _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
"_" is just an identifier, so "[_]" is calling the [] operator with the parameter being whatever "_" refers to. In this case, it allows for the creation of lambda expressions. Sean

On Sun, Mar 16, 2008 at 9:30 PM, Sean Hunt <rideau3@gmail.com> wrote:
Robert Dailey wrote:
On Sun, Mar 16, 2008 at 5:44 PM, Eric Niebler <eric@boost-consulting.com
wrote:
I've made a small addition ... the formatter can be a lambda, too, if you #include <boost/xpressive/regex_actions.hpp>. The above can now be written simply as:
using xpressive::ref; string output = regex_replace(input, rx, ref(replacements)[_]);
Here, "_" gets substituted with a sub_match representing the current match. You can also use s1, s2, etc., to access the other sub-matches.
How is "[_]" legal C++ syntax? _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
"_" is just an identifier, so "[_]" is calling the [] operator with the parameter being whatever "_" refers to. In this case, it allows for the creation of lambda expressions.
Wow, I have never created a variable without alphabetic characters in it before. I never considered for a minute that '_' would be a legal variable name, just as a variable starting with numbers isn't legal. Now that you've explained it, it seems pretty obvious. The syntax itself just threw me off :)

Thank you very much for helping. I wouldn't go as far as to say it isn't hard, because that looks very hard IMHO. I was hoping for something a little more intuitive and built-in.
You could also use Xpressive static regular expressions with semantic actions to get a really simple solution. At least I think it's simple; hopefully you will too. :-) The example program below uses a function object to lookup the variable names and return replacement strings that are copied to the output string. Of course you can replace my "if" statement with your localization table. All normal text characters are simply copied to the output string. Also, if you have a lot of variable names to lookup, try using Xpressive symbol tables. They're really fast! HTH, Dave Jenkins #include <string> #include <iostream> #include <boost/xpressive/xpressive.hpp> #include <boost/xpressive/regex_actions.hpp> namespace xp = boost::xpressive; // function object to lookup a variable name and return a replacement string struct lookup_impl { typedef std::string result_type; result_type operator() (std::string const& s1) const { if (s1 == "variable1") return "V1"; else if (s1 == "var2") return "V2"; return "V3"; } }; // lazy function for lookup xp::function<lookup_impl>::type const lookup = {{}}; int main() { using namespace boost::xpressive; std::string input("This string $(variable1) has localized $(var2) text interleaved for no $(reason)"); std::string output; // match normal characters, i.e., not '$' sregex rx_text = (*(~set['$'])) [ xp::ref(output) += _ ]; // copy the characters to output // match variable names, e.g., $(name) sregex rx_variable = "$(" // match prefix >> (s1 = +(~set[')'])) // match variable name // lookup the variable name replacement and copy it to output [ xp::ref(output) += lookup(_) ] >> ")"; // match suffix // match normal characters interleaved with variable names sregex rx = rx_text >> *(rx_variable >> rx_text); regex_match(input, rx); // this outputs "This string V1 has localized V2 text interleaved for no V3" std::cout << output << '\n'; return 0; }
participants (5)
-
Andrey Tcherepanov
-
Dave Jenkins
-
Eric Niebler
-
Robert Dailey
-
Sean Hunt