2011/3/17 Julian Gonggrijp <j.gonggrijp@gmail.com>

Olivier Tournaire wrote:

> I am not really familiar with regexp, an I am facing a
> problem. I have some strings containing unicode sequences
> (like "\u****"), and I would like to replace them with html
> sequences (such that "\u****" becomes "&#x****;"). I think I
> can do that with boost regexp, but I really do not know how.
> The major problem is that I do not now in advance what are
> the characters for "****". I however know that they are
> always 4 and alphanumeric. So, I have to detect them and
> also append after a ";".

I have no experience with Boost.Regex, but these are the
notations you need.

Search pattern: "\\u(\w{4})"

It seems that we also have to escape the "\" in "\u". The working regex seems to be:

"\\\\u(\w{4})"

Best regards,

Olivier

�

Replacement pattern: "\&#\1;"

Here, "\1" stands for "the match to the first pattern in
parentheses", so that's your four digits. You'll have to
refer to the Boost.Regex manual to find out how to apply
these patterns.

HTH, Julian
_______________________________________________
Boost-users mailing list
Boost-users@lists.boost.org
http://lists.boost.org/mailman/listinfo.cgi/boost-users