regex_replace for removing words
Hello, I have some texts and I would like to remove words. Each word for removing is in a std::vectorstd::string and I use a std::string for separating the words on the text. My first idea is to use the boost:split for splitting words of the text, in the result vector I remove the words, which should be removed, and recreate the text from the vector by concatination the elements. But another idea is to create a regexpr with the removing word and the separators and use regex_replace for removing. Which idea is the better one or is there another way to removing word of a text? Thanks PHil
I have some texts and I would like to remove words. Each word for removing is in a std::vectorstd::string and I use a std::string for separating the words on the text. My first idea is to use the boost:split for splitting words of the text, in the result vector I remove the words, which should be removed, and recreate the text from the vector by concatination the elements. But another idea is to create a regexpr with the removing word and the separators and use regex_replace for removing.
Which idea is the better one or is there another way to removing word of a text?
I would create a regular expression of all the words you want to remove: \<(?:word1|word2|word3|word4)\> Then use regex_replace with "" as the replacement string. HTH, John.
Am 22.05.2011 um 18:58 schrieb John Maddock:
I have some texts and I would like to remove words. Each word for removing is in a std::vectorstd::string and I use a std::string for separating the words on the text. My first idea is to use the boost:split for splitting words of the text, in the result vector I remove the words, which should be removed, and recreate the text from the vector by concatination the elements. But another idea is to create a regexpr with the removing word and the separators and use regex_replace for removing. Which idea is the better one or is there another way to removing word of a text?
I would create a regular expression of all the words you want to remove:
\<(?:word1|word2|word3|word4)\>
Then use regex_replace with "" as the replacement string.
Thanks. I have now a problem during creation. The wordlist is generated automatically, so I must mask each word in the correct way. Do you have an idea how I can do this? Because some words can be web content like < > or there are chars like ' Can I do a case-insensitive replace or must I switch the case of my text? Thanks Phil
participants (2)
-
John Maddock
-
Kraus Philipp