Re: [boost] Review request: auto-index tool

24 Jan 2011

      ...
...
-----Original Message-----
From: boost-bounces@lists.boost.org
[mailto:boost-bounces@lists.boost.org]
On Behalf Of Bryce Lelbach
Sent: Friday, January 21, 2011 12:44 PM
To: boost@lists.boost.org
Subject: Re: [boost] Review request: auto-index tool
...
...
:) Hey John. As one of the users of this tool, I'd be happy to do an
(informal) review of this.
I'm also a user of John's Auto-index and I'd like to indulge in a few
thoughts on indexing, and indeed Boost documentation in general.

Boost documentation has improved markedly since Quickbook came into use, and
was further improved when Doxygen was added.

PDF versions are very conveniently self-contained and include all the
invaluable hyperlinking that the html version has.
(PDF versions are Quickbook's killer advantage?).

(Aside - I note how few documents from government and corporate sites
provide decent internal hyperlinking. Quickbook helps us do much better.)

These two tools are especially useful for documenting big and/or complex
libraries, of which there are an increasing number.

But I think we still have some way to go, especially in the ease (or indeed
possibility) of *finding* what one seeks to know.

(As I've observed before, I still suspect we are *not* using Google to full
advantage because the full documentation set (or even the partial PDF
documentation set) is not being indexed (because it isn't visible except as
a zip on Sourceforge?). I have struggled to find things I know are there
somewhere.  However this is a separate issue.)

I've used and/or produced auto-indexes for Boost.Math, Boost.Units and the
GSOC SVG_plot utility,
and have recently converted the Boost.Pool docs from plain html to Quickbook
with Doxygen and finally added auto-indexes.

These are all big libraries and my brain isn't big enough to remember all
the details - 
like Homer Simpson, my brain is full and if anything new comes in, some of
the old stuff has to go! ;-)

So I've also become a frequent *user* of the indexing as well as an author
(with the advantage that I can remember that a piece of info exists, but
can't remember where it is - and quickly get cross when I can't find it!).

Doxygen Indexes well enough  (some templated code can get it confused, and
it is a bit picky about the position of the Doxygen comments).  I prefer
Doxygen's Standalone module layout, but then you don't get the nice
structure and mark-up features for the text part that Quickbook provides.
There are few libraries where simply feeding the uncommented header files
provides effective documentation - despite what some authors hopefully
imagine.

With a Quickbook/Doxygen "reference" section, one can find the syntax of a
specific named function. John Maddock's auto-index makes it very easy to
find *named* things.  

But the *major unsolved need* is when trying to find something to which *you
do not know the (function/class...) name*.

Few functions have an obvious function, an obvious name or an obvious
result.  So just showing the structure of the classes etc. isn't much help
deducing what functions do.

My experience is that using the PDF version and searching with likely terms
is often the only way to find things.  But with the 500 + page documents we
are now producing, the search takes some time, so an index of the *text*
would still be useful.

And while the syntax is useful, when you come to finding the pre-conditions,
post conditions, and what does it do, our docs prove much less effective.

This most valuable information for the user *can* come from the additional
information provided as Doxygen comments. The disadvantage of doing this it
that it clutters the code, often badly (though the use of syntax colouring
allows even my brain to filter out easily.  <aside> I *really, really, hate*
the current syntax color for comments.  We must to be able to let users
choose easily.  <\aside>).  It also documents well for maintainers - a need
that can only increase as more original authors get other lives.

But I am convinced that Doxygen comments is the most effective next step.
I've followed this for the SVG_plot package (a good example as it has
zillions of functions, so many that I definitely can't remember them, even
though I wrote many of them!). 

I've also added Doxygen comments to the classes and functions during
conversion to Quickbook of the Boost.Pool library.  This helps a lot IMO
(and would much even better if someone more expert could expand these
additions and correct my misunderstandings).

Doxygen can even tell you which things you haven't yet provided a comment
for, so we can ensure complete 'comment coverage'.

(I have trusted that adding comments retrospectively should be a low-risk
operation because any bad changes should be picked up by re-running the
existing tests.  I hope I'm not wrong about this!)

Adding Doxygen-style comments is a laborious job, especially tedious as an
after-thought activity, and requires a pretty details understanding of the
workings of the code.  So it is best done as the code is first written - new
authors note!

Finally I feel all the documentation still needs a conventional word index
of the text part.  I'm trying to (ab-)use John Maddock's auto-indexing for
this.  It won't be fully automatic - but doing it by hand is inconceivable.

So I feel we need to encourage library authors to produce:

1  PDF version as well as HTML (and so use Quickbook?).

2  Quickbook and Doxygen reference section.

2  Doxygen comments in C++ code about everything. (Perhaps GSoC or
Boost.Guild people could volunteer to do this?).

3  Auto-indexing of functions.,classes ... as standard. (So we need to get
auto-index into next Boost release package).

4  Conventional word index - Indexing of words in the text.

5  A user-friendly Quickbook template so new authors can get going quickly.

(I don't feel I am alone in finding that it's pretty hard going getting
setting up all the tools in place, jamfiles, correct links to tools and
headers.  Moving the code messes some 'would-be' relative address in
jamfiles.  Getting locations of images users graphs, equations, logos,
admonitions, navigations,  png versus svg images etc have caused me grief.
The requirement to use only relative addressing means that there are loads
of copies of image files bloating the file system.  And to produce a
packable html requires installing images and style sheet(s)).

Paul
---
Paul A. Bristow,
Prizet Farmhouse, Kendal LA8 8AB  UK
+44 1539 561830  07714330204
pbristow@hetp.u-net.com