Re : extract url with boost::regex

26 Nov 2007

      No, I don't see this example, I was reading the documentation and example about regex iterator and not regex token iterator. I search on other website but don't find this example. I will study it.

Thanks John for always anwsering, and always fast answering. I was completly discouraged. Thanks again!

----- Message d'origine ----
De : John Maddock <john@johnmaddock.co.uk>
À : boost-users@lists.boost.org
Envoyé le : Lundi, 26 Novembre 2007, 17h50mn 31s
Objet : Re: [Boost-users] extract url with boost::regex

hallouina-ml@yahoo.fr wrote:
...
Hello;
I try to extract an url from a webpage and it's almostly done but
completly unoptimised :
Before I try with a regex iterator. But I don't understand the
documentation.
:-(

Did you see this 
example:http://www.boost.org/libs/regex/example/snippets/regex_token_iterator_eg_2.c...

It does exactly what you want - it exacts all the URL's from a HTML
 file.
...
boost::regex rexp(".*(http:\\/\\/.+)\"*.*");
and I get this result :
http://www.nolife-tv.com/"
http://www.nolife-tv.com">
http://www.nolife-tv.com/images/stories/noiz/1.jpg"
http://www.nolife-tv.com/component/option,com_poll/task,results/id,16/Itemid...';"
...
http://www.joomla.org"
http://www.google-analytics.com/urchin.js"
http://www.omniture.com
and so on...
I will cut and get only the url without the " or '
why this regex get the " with it? I put the close bracket before the
" so why?  I already try to do \\" rather than \"
Because the .* on the end of the expression will match whatever text
 follows 
the ", the grouping construct (...) spits out a *sub-expression* which
 you 
can access via the match_results::operator[] or match_results::str(i) 
methods.

HTH, John.

[...]

      _____________________________________________________________________________ 
Ne gardez plus qu'une seule adresse mail ! Copiez vos mails vers Yahoo! Mail

hallouina-ml＠yahoo.fr

tags

participants (1)