Many Thanks Anthony , Your version of code works much better than mine..
Thanks
Subhash
On Fri, Jan 6, 2012 at 2:25 AM, Anthony Foiani
S Nagre
writes: std::string escapeChar = "\\" ; std::string bChar = "b"; std::string dotChar = ".";
std::string findWordInStr = escapeChar + bChar + dotChar + escapeChar + bChar;
This ends up with the expression "\\b.\\b", which will only ever match a single character with word break on either side (so, in your example, it should match all and only the spaces):
"Hello World and Google" ^ ^ ^
Closer would be "\\b.+?\\b", but that would still match on your spaces:
"Hello World and Google" ^ ^^ ^^ ^^
If you really want words, you are best off deciding what constitutes a word, and then writing the regex for exactly that purpose. There is the built-in "\\w" character class, but only you can decide whether things like apostrophes and hyphens break words. (And that's just in English; I have no idea what constitutes word-break most other languages!) For English, I'd consider something like "[\\w'-]+" (which should be: all word chars, plus apostrophes, plus hyphens).
And from a personal taste point of view, I'd likely write it exactly that way. (I do sometimes decompose my regexes, but only if they have repeated subsections that could better be described as a variable name.)
You also had a small logic error, when you wrote this:
OffSetMap[foundPos] = foundLen;
"foundPos" is relative to the start of the last search, not to the start of the whole string.
Here's my version:
| #include <map> | #include <string> | | #include
| #include | | typedef int int32; | | typedef std::map< int32, int32 > offset_map_t; | | void create_offset_map( const std::string & str, | offset_map_t & offset_map ) | { | std::cout << "searching '" << str << "'" << std::endl; | | boost::regex re( "[\\w'-]+" ); | | boost::smatch what; | | std::string::const_iterator start = str.begin(); | std::string::const_iterator end = str.end(); | | while ( boost::regex_search( start, end, what, re ) ) | { | int32 pos = what.position(); | int32 len = what.length(); | | std::cout << " found '" << what.str( 0 ) << "'" | << " at pos=" << pos << ", len=" << len << std::endl; | | start += pos; | offset_map[ start - str.begin() ] = len; | start += len; | } | | BOOST_FOREACH( const offset_map_t::value_type & p, offset_map ) | std::cout << " ( " << p.first << ", " | << p.second << " )" << std::endl; | } | | int main( int argc, char * argv [] ) | { | for ( int i = 1; i < argc; ++i ) | { | offset_map_t my_map; | create_offset_map( argv[i], my_map ); | } | return 0; | } Hope this helps.
Best Regards, Anthony Foiani _______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users