regex_replace and Unicode ( Cyrillic ) problem
Hi,
I have a string with 2 Cyrillic words новый дом repeated 3 times.
I try to regex_replace each occurrance of these words.
For this I use std:: string format("$1 красный $2")
and regex pattern ("(\\W+)\\s+(\\W+)"),
The result is only the last occurrence is replaced,the 2 preceding ones are
not.
Where is my mistake?
My code:
#include <iostream>
#include <string>
#include "boost/regex.hpp"
using namespace std;
int main(int argc, const char** argv)
{
std::string str( "новый дом, новый дом новый дом" );
regex regx("(\\W+)\\s+(\\W+)");
std::string format( "$1 красный $2");
cout<<"regex_replace :"<
On Sat, 31 Mar 2012 11:05:37 +0200, valery O
Hi,
I have a string with 2 Cyrillic words новый дом repeated 3 times. I try to regex_replace each occurrance of these words. For this I use std:: string format("$1 красный $2") and regex pattern ("(\\W+)\\s+(\\W+)"),
The result is only the last occurrence is replaced,the 2 preceding ones are not.
Where is my mistake? My code:
#include <iostream> #include <string> #include "boost/regex.hpp" using namespace std;
int main(int argc, const char** argv) { std::string str( "новый дом, новый дом новый дом" ); regex regx("(\\W+)\\s+(\\W+)"); std::string format( "$1 красный $2"); cout<<"regex_replace :"<
Hi Valery, First, your pattern should be: (\\w+)\\s+(\\w+) note lowercase \w Second, you probably do not set locale for regex properly. I do not have a machine with russian system locale under hand to check default behavior, but I succeeed using basic_regex::imbue(): #include <iostream> #include <string> #include "boost/regex.hpp" #include <locale> using namespace std; int main(int argc, const char** argv) { std::string str( "новый дом, новый дом новый дом" ); boost::regex regx; regx.imbue(std::locale("russian")); regx.assign("(\\w+)\\s+(\\w+)"); std::cout << "Search string: " << str << ", pattern: " << regx.str() << std::endl; std::string format( "$1 красный $2"); cout << "regex_replace: " << regex_replace( str, regx, format ) << std::endl; return 0; } gives: Search string: новый дом, новый дом новый дом, pattern: (\w+)\s+(\w+) regex_replace: новый красный дом, новый красный дом новый красный дом Note you must assign pattern after imbue() call. imbue() invalidates pattern if called afterwards. -- Slava
participants (2)
-
valery O
-
Viatcheslav.Sysoltsev@h-d-gmbh.de