[regexp] Replace a substring with a regexp

Hi all, I am not really familiar with regexp, an I am facing a problem. I have some strings containing unicode sequences (like "\u****"), and I would like to replace them with html sequences (such that "\u****" becomes "&#x****;"). I think I can do that with boost regexp, but I really do not know how. The major problem is that I do not now in advance what are the characters for "****". I however know that they are always 4 and alphanumeric. So, I have to detect them and also append after a ";". Do you have any hint on how to do that? Best regards, Olivier

Olivier Tournaire wrote:
I have no experience with Boost.Regex, but these are the notations you need. Search pattern: "\\u(\w{4})" Replacement pattern: "\&#\1;" Here, "\1" stands for "the match to the first pattern in parentheses", so that's your four digits. You'll have to refer to the Boost.Regex manual to find out how to apply these patterns. HTH, Julian

Olivier Tournaire wrote:
I'm surprised. The first backslash was already there to escape the second. Are there maybe two steps of backslash interpretation at work, one by the C++ compiler and one by Boost.Regex? In any case, I forgot to escape the backslash in "\w", you'd probably have to give that one the same treatment. And also the ones in "\&" and "\1" in the replacement pattern. But if you get the right result as the patterns stand right now, that's of course also fine. :) -Julian

2011/3/18 Julian Gonggrijp
I forgive to say that I finally used Qt (since I already used it in my project) which has a convenient QString::replace method which handles regex.
Yes, you are right, I should have also mentionned them.
But if you get the right result as the patterns stand right now, that's of course also fine. :)
Thank you for pointing me in the right direction! Regards, Olivier
participants (3)
-
John Maddock
-
Julian Gonggrijp
-
Olivier Tournaire