data:image/s3,"s3://crabby-images/5918d/5918d0dabafd2fb6707efa7a65f85c6cb97567ac" alt=""
Hello all, I'm reading an html file and parsing data from it. I've encountered a spurious string (researcher’s). Notepad++ shows that string as *researcher's *. I'm reading the file using std::fstream class. Is there any way i can read the string and get the spurious characters replaced using boost string/regex algorithms ? Thanks, Surya
data:image/s3,"s3://crabby-images/f9ecd/f9ecdac30e0c31950c61129fa787ee2661a42e9e" alt=""
On Wed, Sep 30, 2009 at 10:54 PM, Surya Kiran Gullapalli
Hello all,
I'm reading an html file and parsing data from it. I've encountered a spurious string (researcher’s). Notepad++ shows that string as researcher's .
I'm reading the file using std::fstream class. Is there any way i can read the string and get the spurious characters replaced using boost string/regex algorithms ?
It is not spurious, it is probably a UTF-8 file or something. Can you attach it, if so then we can confirm that. But yes, you can do such a replacement with Boost.Regex, although I would recommend doing it with Boost.Xpressive instead (it even has an example doing such a replace in the docs, and if you use the static version it will run faster).
data:image/s3,"s3://crabby-images/5918d/5918d0dabafd2fb6707efa7a65f85c6cb97567ac" alt=""
Hello,
The html page is located at "
http://photography.nationalgeographic.com/photography/photo-of-the-day/north...
"
btw, when googling around for xpressive the search results pointed to
http://lists.boost.org/boost-users/2008/08/39761.php, which says xpressive
is not directly usable with utf-8. I did not find any examples of xpressive
with utf-8 strings.
does boost::regex with icu have answer to my question ? (i'm going over it
now)
Surya
On Thu, Oct 1, 2009 at 11:02 AM, OvermindDL1
On Wed, Sep 30, 2009 at 10:54 PM, Surya Kiran Gullapalli
wrote: Hello all,
I'm reading an html file and parsing data from it. I've encountered a spurious string (researcher’s). Notepad++ shows that string as researcher's .
I'm reading the file using std::fstream class. Is there any way i can read the string and get the spurious characters replaced using boost string/regex algorithms ?
It is not spurious, it is probably a UTF-8 file or something. Can you attach it, if so then we can confirm that.
But yes, you can do such a replacement with Boost.Regex, although I would recommend doing it with Boost.Xpressive instead (it even has an example doing such a replace in the docs, and if you use the static version it will run faster). _______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users
data:image/s3,"s3://crabby-images/f9ecd/f9ecdac30e0c31950c61129fa787ee2661a42e9e" alt=""
On Thu, Oct 1, 2009 at 2:08 AM, Surya Kiran Gullapalli
Hello, The html page is located at "http://photography.nationalgeographic.com/photography/photo-of-the-day/north..."
btw, when googling around for xpressive the search results pointed to http://lists.boost.org/boost-users/2008/08/39761.php, which says xpressive is not directly usable with utf-8. I did not find any examples of xpressive with utf-8 strings.
does boost::regex with icu have answer to my question ? (i'm going over it now)
Xpressive does not have direct support for UTF8, but it does work fine with character strings, which is all your search and replace would require. Looking at the webpage now... Ah, yep, that is not a ' symbol or a ` symbol, it is one of those 'specia' Microsoft Word and such things special forward tick symbols, which yes, encodes as bh:e2bh:80bh:99 (bh mean binary format, hex, copied from the hex program I opened the page with). They are annoying as all freaking heck, but yes, regex or xpressive would work fine (and xpressive would work faster in static mode). That is not even the correct placement of a forward tick, someone screwed up there anyway.
data:image/s3,"s3://crabby-images/5918d/5918d0dabafd2fb6707efa7a65f85c6cb97567ac" alt=""
Thanks,
I'll give it a try.
Surya
On Thu, Oct 1, 2009 at 1:51 PM, OvermindDL1
On Thu, Oct 1, 2009 at 2:08 AM, Surya Kiran Gullapalli
wrote: Hello, The html page is located at " http://photography.nationalgeographic.com/photography/photo-of-the-day/north... "
btw, when googling around for xpressive the search results pointed to http://lists.boost.org/boost-users/2008/08/39761.php, which says xpressive is not directly usable with utf-8. I did not find any examples of xpressive with utf-8 strings.
does boost::regex with icu have answer to my question ? (i'm going over it now)
Xpressive does not have direct support for UTF8, but it does work fine with character strings, which is all your search and replace would require.
Looking at the webpage now... Ah, yep, that is not a ' symbol or a ` symbol, it is one of those 'specia' Microsoft Word and such things special forward tick symbols, which yes, encodes as bh:e2bh:80bh:99 (bh mean binary format, hex, copied from the hex program I opened the page with). They are annoying as all freaking heck, but yes, regex or xpressive would work fine (and xpressive would work faster in static mode).
That is not even the correct placement of a forward tick, someone screwed up there anyway. _______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users
participants (2)
-
OvermindDL1
-
Surya Kiran Gullapalli