Could searching and indexing Boost docs might work better?

-----Original Message----- From: boost-users-bounces@lists.boost.org [mailto:boost-users- bounces@lists.boost.org] On Behalf Of Steven Watanabe Sent: Thursday, September 16, 2010 3:47 AM To: boost-users@lists.boost.org Subject: Re: [Boost-users] selective enabling/disabling of BOOST_AUTO_TEST_CASE
AMDG
Suresh Kumar wrote:
I have several tests in my unit test file each of them uniquely named using the function BOOST_AUTO_TEST_CASE. I may not need to run all of them always and as of now what I do is to comment out the tests which I dont want to run.
Instead of commenting out code, is there any other way I can selectively enable or disable tests if i have created tests with BOOST_AUTO_TEST_CASE.
You can control which tests are run on the command line See http://www.boost.org/libs/test/doc/html/utf/user-guide/runtime-config/run- by-name.html
Well thanks Steven - Now that's something I have wanted to know but didn't dare ask ;-) But I have now squandered a few minutes trying again to find this information using Google (putting myself in the shoes of the OP). A search for "Running specific test units selected by their name" didn't get me to this page (on the first page at least). Adding site::boost.org helped reduce clutter from other unit test systems. Even "Running specific test units selected by their name" site:boost.org (note quotes mean search for exact text) didn't get to the current (or nearly) releases by name, and I didn't spot this page quickly. Using boost.org and entering "Running specific test units selected by their name" into the search box also didn't produce the latest docs page above, only out of date stuff. Limiting search to www.boost.org didn't get 1.44 docs (only misleading ref to 1.38). This may be why there are so many questions from puzzled users on this list? Is there anything we can do to make Google work better for us? Paul --- Paul A. Bristow, Prizet Farmhouse, Kendal LA8 8AB UK +44 1539 561830 07714330204 pbristow@hetp.u-net.com

On Thu, Sep 16, 2010 at 3:54 AM, Paul A. Bristow
But I have now squandered a few minutes trying again to find this information using Google (putting myself in the shoes of the OP).
A search for "Running specific test units selected by their name" didn't get me to this page (on the first page at least).
Adding site::boost.org helped reduce clutter from other unit test systems.
Even "Running specific test units selected by their name" site:boost.org (note quotes mean search for exact text)
didn't get to the current (or nearly) releases by name, and I didn't spot this page quickly.
Using boost.org and entering "Running specific test units selected by their name" into the search box also didn't produce the latest docs page above, only out of date stuff. Limiting search to www.boost.org didn't get 1.44 docs (only misleading ref to 1.38).
This is because of the way our versioned docs are done. You can always get to the latest version of a page by substituting "release" for the version number in the URL, but because that is just a redirect, it doesn't end up in Google's index. Maybe the "release" URLs should be the actual pages that are redirected from the latest version number and we should be telling google not to index any of the pages that have a version number in the URL.
This may be why there are so many questions from puzzled users on this list?
Is there anything we can do to make Google work better for us?
I think it's a matter of picking a site structure that's less esoteric, and more like what most other projects do. -- Dave Abrahams BoostPro Computing http://www.boostpro.com

On Sat, Sep 18, 2010 at 6:36 AM, Dave Abrahams
On Thu, Sep 16, 2010 at 3:54 AM, Paul A. Bristow
Using boost.org and entering "Running specific test units selected by their name" into the search box also didn't produce the latest docs page above, only out of date stuff. Limiting search to www.boost.org didn't get 1.44 docs (only misleading ref to 1.38).
This is because of the way our versioned docs are done. You can always get to the latest version of a page by substituting "release" for the version number in the URL, but because that is just a redirect, it doesn't end up in Google's index. Maybe the "release" URLs should be the actual pages that are redirected from the latest version number and we should be telling google not to index any of the pages that have a version number in the URL.
I don't like this. You might end up missing some functionality that was changed or removed from a prior version. Users of 1.40 packaged with their favorite linux distro might not be able to find the docs for their old Filesystem code. How about a notice at the top of the page saying that it might not be for the latest and greatest version, with a link to the "release" version? Or (more complex) do what MSDN does, and have a box on each page like "This page is specific to Boost X. Other versions are also available for the following: Boost Y, Boost Z" -- Cory Nelson http://int64.org

-----Original Message----- From: boost-users-bounces@lists.boost.org [mailto:boost-users- bounces@lists.boost.org] On Behalf Of Cory Nelson Sent: Saturday, September 18, 2010 4:37 PM To: boost-users@lists.boost.org Subject: Re: [Boost-users] Could searching and indexing Boost docs might work better?
On Sat, Sep 18, 2010 at 6:36 AM, Dave Abrahams
wrote: On Thu, Sep 16, 2010 at 3:54 AM, Paul A. Bristow
Using boost.org and entering "Running specific test units selected by their name" into the search box also didn't produce the latest docs page above, only out of date stuff. Limiting search to www.boost.org didn't get 1.44 docs (only misleading ref to 1.38).
This is because of the way our versioned docs are done. You can always get to the latest version of a page by substituting "release" for the version number in the URL, but because that is just a redirect, it doesn't end up in Google's index. Maybe the "release" URLs should be the actual pages that are redirected from the latest version number and we should be telling google not to index any of the pages that have a version number in the URL.
I don't like this. You might end up missing some functionality that was changed or removed from a prior version. Users of 1.40 packaged with their favorite linux distro might not be able to find the docs for their old Filesystem code.
How about a notice at the top of the page saying that it might not be for the latest and greatest version, with a link to the "release" version?
Or (more complex) do what MSDN does, and have a box on each page like "This page is specific to Boost X. Other versions are also available for the following: Boost Y, Boost Z"
Neither of these solve the real problem - Google is not indexing the release version, surely this is the most important of all? Ideally, all versions should be indexed, but this too might be confusing - very many index entries for the same item in *all* the old versions. Perhaps including 'release' in the search terms? Or some 'old' flag - but you can't add that later (after has been indexed) can you? But I'm not sure if Google will respect that as a 'must have' and or a 'must not' term? I'm not sure how to solve this, but I feel it is really rather important. A Google search on boost.org at least should find the release version. Paul --- Paul A. Bristow, Prizet Farmhouse, Kendal LA8 8AB UK +44 1539 561830 07714330204 pbristow@hetp.u-net.com

Hi, Maybe it's just enough to provide a sitemap.xml for web-crawlers? http://en.wikipedia.org/wiki/Sitemap Regards, michi7x7

On Sep 19, 2010, at 9:58 AM, Paul A. Bristow wrote:
Neither of these solve the real problem - Google is not indexing the release version, surely this is the most important of all?
Ideally, all versions should be indexed, but this too might be confusing - very many index entries for the same item in *all* the old versions.
Perhaps including 'release' in the search terms?
Or some 'old' flag - but you can't add that later (after has been indexed) can you?
But I'm not sure if Google will respect that as a 'must have' and or a 'must not' term?
I'm not sure how to solve this, but I feel it is really rather important.
A Google search on boost.org at least should find the release version.
Does anyone on this list have (or have a friend with) SEO expertise? They would know how to address this. Otherwise, Paul, maybe you could try to ask Google themselves? -- Dave Abrahams BoostPro Computing http://boostpro.com

On 19 September 2010 15:20, David Abrahams
Does anyone on this list have (or have a friend with) SEO expertise? They would know how to address this. Otherwise, Paul, maybe you could try to ask Google themselves?
The problem is that bots are currently blocked from accessing the documentation pages because they are considered too expensive. I've reduced the cost of serving the documentation pages, but apparently that's not good enough because the regression test results are still served from zipfiles. Someone needs to sort that out with whoever runs the regression testing system. All they need to do is set up their script to unzip it somewhere on the server after uploading, or possibly just upload the pages without zipping them (maybe not a good idea - there are a lot of pages), and then let me know where they are on the server. But it might be the case that the cost of regularly unzipping/rsyncing the files would be greater than the cost of serving them from a zipfile. I suspect that a tiny minority of pages are accessed between uploads (ignoring bots, there's no need to remove the block on the regression results). Daniel

Does anyone on this list have (or have a friend with) SEO expertise? They would know how to address this. Otherwise, Paul, maybe you could try to ask Google themselves?
The problem is that bots are currently blocked from accessing the documentation pages because they are considered too expensive. I've reduced the cost of serving the documentation pages, but apparently that's not good enough because the regression test results are still served from zipfiles. Someone needs to sort that out with whoever runs the regression testing system.
Surely the test result pages aren't accessed that much? As long as they're blocked from the bots they shouldn't take up too much CPU time even when zipped? IMO it's unacceptible not to have the Boost documentation indexed by the web crawlers - would it be possible to allow indexing of current release *only* - and see how that goes? Cheers, John.

On Mon, Sep 20, 2010 at 12:57 AM, John Maddock
Does anyone on this list have (or have a friend with) SEO expertise? They would know how to address this. Otherwise, Paul, maybe you could try to ask Google themselves?
The problem is that bots are currently blocked from accessing the documentation pages because they are considered too expensive. I've reduced the cost of serving the documentation pages, but apparently that's not good enough because the regression test results are still served from zipfiles. Someone needs to sort that out with whoever runs the regression testing system.
Surely the test result pages aren't accessed that much? As long as they're blocked from the bots they shouldn't take up too much CPU time even when zipped?
IMO it's unacceptible not to have the Boost documentation indexed by the web crawlers - would it be possible to allow indexing of current release *only* - and see how that goes?
Cheers, John.
A single sitemap would work for all the big search engines. A sitemap lets you specify how often something is updated -- their spiders should index it far less often if you specify a long term. You might be able to do the same thing with HTTP cache headers. Depending on your web server you might be able to configure entire subdirs to cache until the end of time. -- Cory Nelson http://int64.org

-----Original Message----- From: boost-users-bounces@lists.boost.org [mailto:boost-users- bounces@lists.boost.org] On Behalf Of Cory Nelson Sent: Monday, September 20, 2010 9:53 AM To: boost-users@lists.boost.org Subject: Re: [Boost-users] Could searching and indexing Boost docs mightwork better?
On Mon, Sep 20, 2010 at 12:57 AM, John Maddock
wrote: Does anyone on this list have (or have a friend with) SEO expertise? They would know how to address this. Otherwise, Paul, maybe you could try to ask Google themselves?
The problem is that bots are currently blocked from accessing the documentation pages because they are considered too expensive. I've reduced the cost of serving the documentation pages, but apparently that's not good enough because the regression test results are still served from zipfiles. Someone needs to sort that out with whoever runs the regression testing system.
Surely the test result pages aren't accessed that much? As long as they're blocked from the bots they shouldn't take up too much CPU time even when zipped?
IMO it's unacceptible not to have the Boost documentation indexed by the web crawlers - would it be possible to allow indexing of current release *only* - and see how that goes?
I agree that this should be the thing we try to achieve immediately.
A single sitemap would work for all the big search engines. A sitemap lets you specify how often something is updated -- their spiders should index it far less often if you specify a long term.
You might be able to do the same thing with HTTP cache headers. Depending on your web server you might be able to configure entire subdirs to cache until the end of time.
Long term, site maps looks the *right* way to do this - but it would best be done by someone who knows what they are doing (definitely NOT me!!). http://en.wikipedia.org/wiki/Site_map http://www.sitemaps.org/ Shall I start a new thread on boost-users asking for help on generating a site map? Paul --- Paul A. Bristow, Prizet Farmhouse, Kendal LA8 8AB UK +44 1539 561830 07714330204 pbristow@hetp.u-net.com

On 20 September 2010 09:53, Cory Nelson
You might be able to do the same thing with HTTP cache headers. Depending on your web server you might be able to configure entire subdirs to cache until the end of time.
I've already done that, although only for a year. I also implemented If-Modified-Since and If-None-Match. Daniel

-----Original Message----- From: boost-users-bounces@lists.boost.org [mailto:boost-users- bounces@lists.boost.org] On Behalf Of Daniel James Sent: Monday, September 20, 2010 8:02 PM To: boost-users@lists.boost.org Subject: Re: [Boost-users] Could searching and indexing Boost docs mightwork better?
On 20 September 2010 09:53, Cory Nelson
wrote: You might be able to do the same thing with HTTP cache headers. Depending on your web server you might be able to configure entire subdirs to cache until the end of time.
I've already done that, although only for a year. I also implemented If-Modified- Since and If-None-Match.
This is good - but unless I've lost the thread, we still haven't fixed the really important need. Don't we need to get the *current release* indexed by Google (and others). Paul --- Paul A. Bristow, Prizet Farmhouse, Kendal LA8 8AB UK +44 1539 561830 07714330204 pbristow@hetp.u-net.com
participants (7)
-
Cory Nelson
-
Daniel James
-
Dave Abrahams
-
David Abrahams
-
John Maddock
-
michi7x7
-
Paul A. Bristow