[regex] embed string/char in regex w/o escaping?

As far as I can see, there is no way to embed a plain string into a regex, without escaping the string. Same with characters. Isn't that a bad omission for a library that becomes the new C++ standard? Escaping something just to unescape it on the other end of the function call seems unnecessary, and it opens up the possibility of ills like regex code injection if someone forgets to escape or does it wrong. I know about Boost.Xpressive, but that won't be the new standard. This is not a lobbying effort for Boost.Xpressive. I have never used it precisely because I wanted to stick to the standard. Now I am a bit worried about pouring something into C++ standard concrete that has a pretty obvious omission in it. Arno -- Dr. Arno Schödl | aschoedl@think-cell.com Technical Director think-cell Software GmbH | Chausseestr. 8/E | 10115 Berlin | Germany http://www.think-cell.com | phone +49 30 666473-10 | US phone +1 800 891 8091 Amtsgericht Berlin-Charlottenburg, HRB 85229 | European Union VAT Id DE813474306 Directors: Dr. Markus Hannebauer, Dr. Arno Schoedl

On 1:59 PM, Arno Schödl wrote:
As far as I can see, there is no way to embed a plain string into a regex, without escaping the string. Same with characters. Isn't that a bad omission for a library that becomes the new C++ standard? Escaping something just to unescape it on the other end of the function call seems unnecessary, and it opens up the possibility of ills like regex code injection if someone forgets to escape or does it wrong.
I was just pondering regex security risks (<http://lists.boost.org/boost-users/2011/02/66533.php>). Has anyone studied regex code injection and its implications? How about <http://www.boost.org/doc/libs/1_46_0/libs/regex/doc/html/boost_regex/ref/syntax_option_type/syntax_option_type_literal.html>? It treats the whole string as literal. Is that what you're seeking? regex has to contend with in-band signaling in general, and it's a thorny issue. To your point of escaping a string wrong, I fiddled with a regex_replace() that would remove all '\E' (end-of-quoted-sequence), including '\\\E', but not touch '\\E' (i.e., even numbers of '\' prefixing), and couldn't get it.

To your point of escaping a string wrong, I fiddled with a regex_replace() that would remove all '\E' (end-of-quoted-sequence), including '\\\E', but not touch '\\E' (i.e., even numbers of '\' prefixing), and couldn't get it.
I'm not sure I understand what you're trying to achieve there, can you explain? John.

On 1:59 PM, John Maddock wrote:
To your point of escaping a string wrong, I fiddled with a regex_replace() that would remove all '\E' (end-of-quoted-sequence), including '\\\E', but not touch '\\E' (i.e., even numbers of '\' prefixing), and couldn't get it.
I'm not sure I understand what you're trying to achieve there, can you explain?
I was going for different form, but (hopefully) same result as yours: std::string my_escaped_string = "\\Q" + regex_replace(my_string, e, "\\\\$&") + "\\E";

On Wed, Mar 2, 2011 at 5:43 PM, Jim Bell <Jim@jc-bell.com> wrote:
On 1:59 PM, John Maddock wrote:
To your point of escaping a string wrong, I fiddled with a regex_replace() that would remove all '\E' (end-of-quoted-sequence), including '\\\E', but not touch '\\E' (i.e., even numbers of '\' prefixing), and couldn't get it.
I'm not sure I understand what you're trying to achieve there, can you explain?
I was going for different form, but (hopefully) same result as yours:
std::string my_escaped_string = "\\Q" + regex_replace(my_string, e, "\\\\$&") + "\\E";
I use something like "\\Q" + boost::replace_all_copy(text, "\\E", "\\E\\\\E\\Q") + "\\E" Yechezkel Mett

As far as I can see, there is no way to embed a plain string into a regex, without escaping the string. Same with characters. Isn't that a bad omission for a library that becomes the new C++ standard? Escaping something just to unescape it on the other end of the function call seems unnecessary, and it opens up the possibility of ills like regex code injection if someone forgets to escape or does it wrong.
Good question - if this is Boost.Regex rather than the std then there is an option to treat a whole string as a literal (as Jim Bell mentioned), or you can enclose part of a string that has to be treated as a literal in \Q...\E as in Perl. Otherwise you're looking at a call to regex_replace to quote things for you, off the top of my head something like: regex e("[.\[\]{}()\\\\*+?|^$]"); std::string my_escaped_string = "(?:" + regex_replace(my_string, e, "\\\\$&") + ")"; Should do the trick. HTH, John.
participants (4)
-
Arno Schödl
-
Jim Bell
-
John Maddock
-
Yechezkel Mett