[Regex] Is it possible to match any "linebreak"

Is there a way to have a regex like [[:digit:]]{3}([^\n]+)\n? have \n match any line breaking character? (The processed files might have Unix or DOS line endings.) I know I could modify the regex [[:digit:]]{3}([^\n\r]+)[\n\r]* but I was hoping there might be an easier solution like a character class [[:linebreak:]], or maybe an escaped character like \R or something like that. (I am using Boost 1.43 in the relevant project.) Thank you and best regards Christoph

Is there a way to have a regex like [[:digit:]]{3}([^\n]+)\n? have \n match any line breaking character? (The processed files might have Unix or DOS line endings.)
Use \R, see: http://www.boost.org/doc/libs/1_49_0/libs/regex/doc/html/boost_regex/syntax/... HTH, John.

John Maddock wrote:
Is there a way to have a regex like [[:digit:]]{3}([^\n]+)\n? have \n match any line breaking character? (The processed files might have Unix or DOS line endings.)
Use \R, see:
http://www.boost.org/doc/libs/1_49_0/libs/regex/doc/html/boost_regex/syntax/...
HTH, John.
Thank you John. Somehow I must have overlooked \R in the docs. However, I still have one issue with \R. In the following code everything works fine if I use r2. If r1 the matches' captures do contain the newlines. In short: ([^\\R+]) does not seem to capture all non-linebreak characters. Should it? Or is this just a misunderstanding on my part? #include <boost/test/auto_unit_test.hpp> #include <boost/test/test_tools.hpp> #include <boost/regex.hpp> BOOST_AUTO_TEST_CASE(test_boost_regexp) { // works fine boost::regex r1("\\d\\d\\d([^\\R]+)\\R*"); // boost::regex r2("\\d\\d\\d([^\\r\\n]+)\\R*"); boost::smatch what; std::string input("123hallo welt\n\r"); BOOST_CHECK(boost::regex_match(input, what, r1)); // ok with r1, but fails with r2: the capture does contain the newline BOOST_CHECK_EQUAL(what[1], "hallo welt"); input="123hallo welt\n"; BOOST_CHECK(boost::regex_match(input, what, r1)); BOOST_CHECK_EQUAL(what[1], "hallo welt"); input="123hallo welt\r"; BOOST_CHECK(boost::regex_match(input, what, r1)); BOOST_CHECK_EQUAL(what[1], "hallo welt"); } Best regards Christoph

Is there a way to have a regex like [[:digit:]]{3}([^\n]+)\n? have \n match any line breaking character? (The processed files might have Unix or DOS line endings.)
Use \R, see:
http://www.boost.org/doc/libs/1_49_0/libs/regex/doc/html/boost_regex/syntax/...
HTH, John.
Thank you John. Somehow I must have overlooked \R in the docs.
However, I still have one issue with \R. In the following code everything works fine if I use r2. If r1 the matches' captures do contain the newlines.
In short: ([^\\R+]) does not seem to capture all non-linebreak characters. Should it? Or is this just a misunderstanding on my part?
No you can't do that, \R isn't a character class, it's much more complex than that - look at the link above to see what it maps to - something like [^\x0A-\x0D\x85\x{2028}\x{2029}] would be closer to the inverse. HTH, John.

John Maddock wrote: [snip]
In short: ([^\\R+]) does not seem to capture all non-linebreak characters. Should it? Or is this just a misunderstanding on my part?
No you can't do that, \R isn't a character class, it's much more complex than that - look at the link above to see what it maps to - something like [^\x0A-\x0D\x85\x{2028}\x{2029}] would be closer to the inverse.
HTH, John. Ok, Thank you.
participants (2)
-
Christoph Duelli
-
John Maddock