[boost] [Logging] Review

9 Feb 2008

      My review of the proposed logging library follows.

Summary: I was disappointed by the amount of code needed to get the 
most basic logging working.  The use of macros to declare loggers seems 
like a bad design choice.  The absence of support for syslog was 
surprising.  I encountered many compiler warnings.  The documentation 
was unsatisfactory.  Many of the features that I would have liked to 
see were missing.  Despite all this, John seems to be enthusiastic and 
responsive: there is a real chance that he will be able to fix many of 
these problems.

Detail:

In order to evaluate the library I decided to apply it to Decimail, an 
IMAP mail server that I wrote.  It currently does some fairly basic 
logging of the IMAP conversation, enabled by a flag in its 
configuration, to syslog.

Most of the programs that I've written do some sort of logging.  
Typically it's something as simple as:

#include <iostreams>
cerr << "starting to frobnicate\n";

or

#include <syslog.h>
syslog(LOG_INFO,"starting to frobnicate");

or

#include <assert.h>
assert(frob!=-1);

These techniques are all very simple to use, and I feel that a logging 
library needs to be equally straightforward for basic operations:

#include <boost/logging.hpp>
log << "starting to frobnicate";

Of course it needs to offer more features than that - but the most 
basic use needs to be simple.  (If it's not that simple, then potential 
users will instead just write to cerr as above; as their needs slowly 
grow they'll add their own ad-hoc logging functions; it will then be 
too late to take the leap to a logging library.)

So I started by reading the main_intro.html and getting_started.html 
docs.  I eventually got to "Example 1", which described how I had to 
add two new files to my project (21 lines) and call an initialisation 
function from main().  I could then log by changing my code as follows:

     if (Configuration::singleton().get_bool("imapd_log_imap",false)) {
       //syslog(LOG_MAIL|LOG_DEBUG,"{%d} S: * %s",num,msg.c_str());
       L_ << "{" << num << "} S: * " << msg;
     }

Having fixed one error in the example (g_l_filter vs. g_log_filter) and 
copious warnings I was able to see log output on the console (twice?) 
and in a file.  This was all a lot more effort than I was expecting, 
but it did seem to work.

I found the use of macros in example 1 unsatisfactory.  Specifically, 
the logger "object" which I declare in the header and to which I send 
my log messages (L_ in the code above) is actually a macro, not an 
object.  This means that I have to handle it with great care.  For 
example, say I currently have:

void do_something(ostream& log_to, .......) {
   log_to << "starting to do something";
   ...
}

and I want to convert this to use this library.  I write:

void do_something(logger& log_to, .......) {
   log_to << "starting to do something";
   ...
}

This fails - and probably with an incomprehesible error message.

The motivation for the macro is sensible: it evaluates to something like

   if (l.level > current_level) l

so that time-consuming functions in the log line are not evaluated if 
they will not actually be stored.  But I would much prefer to see this 
optimisation made more explicit, so that the user can tell what's going 
on and avoid it if they want or need to:

   logger l;
   l << "message";  // wasteful if logging is disabled, but do I care -
   foo(l);          // - because this works as expected.
   LOG_IF_ENABLED(l) << expensive_function();  // if I do care, I can 
invoke a macro explicitly
#define ll LOG_IF_ENABLED(l)   // or if I want conciseness I can 
_choose_ to do this
   ll << expensive_function();

Or I could do this, without a macro at all, if I were the sort of 
person who felt that all macros were evil:

   if (l.enabled()) l << expensive_function();

I think that what's practically needed to make this work is to bundle 
together the current filter and logger objects into a single _object_, 
rather than bundling them together using a macro.

Next I tried to use the library's "filter" mechanism in place of the 
if-condition you can see above.  I looked at "Step 1: Filtering the 
message" at workflow.html, expecting to find a filter that I could 
initialise with my condition (as a value, or by reference to a 
variable, or passing a functor or function-pointer).  But it seems that 
all of the provided filter classes are things to provide various sorts 
of thread-safety.  Writing my own is not difficult, though it might be 
more conventional to use a functor (i.e. operator()()) rather than a 
class with an is_enabled member [this would allow you to use things 
like lambda expressions, I think].  So I wrote:

struct log_imap_filter {
   bool is_enabled() const {
     return Configuration::singleton().get_bool("imapd_log_imap",false);
   }
};

Using this involved more macros:

// log.hh:
typedef log_imap_filter filter_type;
BOOST_DECLARE_LOG_FILTER(g_log_filter, filter_type)

// log.cc:
BOOST_DEFINE_LOG_FILTER(g_log_filter, filter_type )

I don't know why these need to be macros.  It seems to obfuscate what 
is going on to me.

The last thing that I needed to replace my syslog call was to specify 
the log level, which I noticed in workflow.html under "Step 2A: 
Gathering the message".  (The fact that I couldn't find any description 
of this fairly central functionality anywhere else, e.g. in any of the 
pages in the tutorial section, seems typical of the documentation).  
Based on that example, I tried:

   L_(dbg) << "happily debugging";

which failed to compile.  I haven't worked out what I should have 
written.  I did try it with a namespace prefix.

The final thing that I looked at in detail was scoped logging.  When 
I'm writing more detailed logging code (e.g. for debugging) I generally 
make it scoped, i.e. reporting interesting function entries and 
normally exits.  I like to indent each scope by a few characters in the 
log output, e.g.

void foo() {
   cerr << "foo starting\n";
   for(int i=0; i<5; ++i) {
     blah(i);
   }
   cerr << "foo done.\n";
}

void blah(int i) {
   cerr << "  blah " << i << "\n";  // note indentation
   ...
}

I read the scoped_logs.html page, and it seems that the 
BOOST_SCOPED_LOG[_CTX] macro deals with the starting/ending business.  
Compared to my explicit start/done output above, it has the feature 
that the done message will be generated if the function returns using 
an explicit return statement or if it throws an exception.  This may be 
an advantage or a disadvantage: with my code, the absence of the done 
message would suggest that an exception might have occurred.  Ideally 
the end-of-scope log message would indicate whether it was invoked 
during exception handling; could that be done using compiler-magic?

But why does it have to be a macro?  And there doesn't seem to be any 
indentation or other way of matching up the start and end messages.  If 
they were matched up, then a log file viewer tool could provide 
collapse/expand tree view functionality (does Purify still do something 
like that?).

I was also surprised that no use was made of __LINE__, __FUNCTION__ 
etc. in the scoped logging or other macros.

I also don't see any way to couple the scoped logging features with 
enabling and disabling logging.  Specifically, I might detect that 
something of interest is happening and want to increase the log level 
within that scope:

void foo() {
   scoped_log sl(log);
   for (int i=0; i<100; ++i) {
     scoped_enabled_logging_if s(log, i==42);
     blah(i);  // logging enabled in here only if i is 42
   }
}

I didn't look in much detail at the "destinations" functionality, but 
I'd like to note some of the features that syslog offers for files that 
it writes to:

- Flushing of files after each message is optional.  Not flushing helps 
performance, but it might mean that critical messages are lost before a crash.
- Enormous numbers of identical messages within a short period are 
compressed to a single line, "last message repeated 193247 times".
- Periodic "MARK" lines can be generated to show that the application 
is still running.

Other features that I've seen in applications that do their own logging 
to files include:

- Creating new files for each sub-process or thread, by appending a 
thread identifier to the filename.
- Truncating individual messages to some limit (e.g. to avoid logging a 
rogue char* that points to a huge block of stuff with a \0 a megabyte 
later), or individual fields within the log message.

In my quick review I didn't see any of that functionality.

The rolling_file feature looks useful for long-running applications 
where it's necessary to move the log files (e.g. to compress them) 
without interrupting the process.  I suggest referring to the logrotate 
utility for how this sort of thing is often handled on Unix servers; 
for example, it can be configured to send a signal to a process to tell 
it to close and re-open its log file with the same filename.

Finally I'd like to look at how the user can control the logging.  I 
don't see any built-in functionality for this.  I have a couple of 
suggestions: environment variables and command-line options.

An environment variable could be used as follows.  The code would do 
something like this:

     g_l()->writer().write(::getenv("MYAPP_LOG_FMT"), ::getenv("MYAPP_LOG_DEST"));

The programmer's choice of format and destination may or may not 
override the user's choice via the variables.  The user could then 
control the format and destination when invoking the program:

$ export MYAPP_LOG_FMT='%time%($hh:$mm.$ss.$mili) [%idx%] |\n'
$ export MYAPP_LOG_DEST='file(debug.log)'
$ myapp

Firefox has something a bit like that IIRC.

For command-line integration, I suggest providing an initialisation 
function that processes argc and argv looking for debug options, and 
strips them:

int main(int argc, char* argv[]) {
   log.init(argc,argv);  // modifies argc and argc
   // following code doesn't see debug options
   ...
}

DirectFB has something like that.  Again you could use this to define 
formats and destinations.

You could also give the user control over scoped filtering via a 
command-line interface.  For example:

$ myapp --log:default=none --log:parser=debug --log:scheduler=info

The names 'parser' and 'scheduler' would match scoped log-enable 
blocks.  I could imagine hierarchical names too.  I've used one 
commercial C++ application that has this sort of logging (you can 
indicate component, log level and destination), and it proved useful.

Conclusion:

* What is your evaluation of the design?

Breaks up the problem into a sensible set of components.  Basic usage 
requires too much effort.  Lacks many features that I would find 
useful.  Pervasive use of macros is unappealing.

* What is your evaluation of the implementation?

Seems to work.  Many compiler warnings do not inspire confidence in 
code quality.  Have not reviewed the code in any detail.

* What is your evaluation of the documentation?

Poor.  But in part, that's because it is documenting the considerable 
effort required to get basic functionality working.

* What is your evaluation of the potential usefulness of the library?

Everyone needs logging.  If they do it using this library, they'll have 
to write a lot of code themselves to implement their own filters, 
destinations and so on.  In practice they won't bother and will write 
their own ad-hoc code.

* Did you try to use the library? With what compiler? Did you have any
problems?

Yes, with gcc-4.1 on Linux.  I encountered various warnings which John 
has already resolved.

* How much effort did you put into your evaluation?

About 6 hours.

* Are you knowledgeable about the problem domain?

I've written and used numerous programs with logging features, but have 
never attempted to write a logging library.

And finally, every review should answer this question:

* Do you think the library should be accepted as a Boost library?

No, not in its current form.

Phil.

[boost] [Logging] Review

Phil Endecott