[log] Boost.Log Review...

12 Mar 2010

      YES, the Boost.Log library SHOULD BE ACCEPTED in to the Boost release distribution.

Summary:

*) Boost.Log isn't perfect and after having reviewed most of its alternatives, and having recently converted from using glog, I like the utility and architecture provided by Boost.Log the most.  It comes close to solving all logging problems for everyone.  Closing the gap of solving the last 5% between what Boost.Log does and what I ideally wanted was tedious and required me to develop a much higher level of understanding that I would have liked, but it was possible.

*) In solving all logging problems for all people, Boost.Log's documentation is quite abstract.

*) I would like to see the design of templates/classes flipped so that the starting point for interacting and beginning to use Boost.Log begins with customizing Log Records and ends with defining backends and backend filtering.  This is a fundamental design choice, however and one that I'm willing to accept given the value that Boost.Log provides.

*) I've been saving most of this feedback for the better part of two months now so excuse the length and rambling.  Certain points of the library I struggled with and ultimately ended up begrudgingly resigned myself to accepting as "the way it is" vs. "the way it should be."

*) My perspective on the Boost.Log is framed by the recent use of Google's glog library and in a previous life having written performance-oriented, binary logging libraries for high TPS systems (only in C, however, not C++).
...
- What is your evaluation of the potential usefulness of the library?
Boost.Log quite useful and has the highest promise of any logging library that I am aware of because of its ability to accommodate the following common use case:

  Support asynchronous logging from multiple frontend threads with a backend able to order the output and send different messages to different files.

If it weren't for the ability of this library to meet that specific design requirement, I would still be using Google's glog library.
...
- What is your evaluation of the design?
The design is very modular.  Too modular.  So much so that the flexibility and required understanding is actually a barrier to entry to make use of the library (i.e. now that I'm using the library, I know more about the innards of Boost.Log than I would like or think is reasonable).

After using this library for several months now, I have distilled a quick summary for those coming up to speed on how to use Boost.Log and its components.  This is specific to how I'm using Boost.Log, but I think this is the desired use-case for most users of Boost.Log (esp since this is the common path for users of ASIO).

   - Your application is littered with Log Statements that are tagged with different Severity Levels
   - Log Statements consist of a Logger (a collection of Sinks), a Precondition (e.g. a severity) and Log Record instance data (most commonly a message)
   - Loggers are collections of Sinks (e.g. a file and the console)
   - Loggers conditionally accept Log Records if one of a Logger's Sinks returns true for a Precondition (e.g. only create a Log Record if the log statement's Severity Level greater than or equal to the Severity Level configured in the Sink)
   - Instructions required to create a Log Record won't be executed if a Sink doesn't accept a Log Record
   - Sinks can be configured to queue accepted Log Records (e.g. async frontends) before passing them to one or more Backends for printing (e.g. console text file, SNMP trap)

The Boost.Log system can be configured to mutate individual Log Records through the manipulation of the following three types of information:

   - Channels represent a way of tying together a collection of related systems (e.g. security, network, application).  Think of channels as a Sink-level observer that stamps the channel on a Log Record.
   - Severities represent the importance of a message (a Log Statement precondition that a Sink-level Observer could stamp a Log Record)
   - Attributes represent per-Log Record information (a string key/value in a Log Record instance)

Backends can then format the output based on the Attributes of a particular Log Record.

The above wasn't explicitly distilled in the documentation in a single place, but I think future developers would find it very useful to see that spelled out explicitly.  Given the extremely modular design of the library, I understand why the docs are the way they are and know that some of this is touched on in the Design Overview, but until I became very competent with the library, the documentation seemed too abstract.  Despite the importance of logging, becoming a logging expert was tangential to the problem I was trying to solve and I would have personally found the above summary very useful.

While glog is a supper thin logging API, Boost.Log feels like a complete and modular logging system.  In glog, seeing:

  VLOG(3) << "asdf";

Where 3 can/was a #define, the system ended up feeling very arbitrary and unstructured.

A hindsight 20/20 architecture and terminology for Boost.Log could be simplified in terms of:

*) Log Record pre-conditions.  Sinks control whether or not a Log Record is to be generated
*) Loggers.  A container for zero or more Sinks
*) Sinks.  A frontend that queues Log Records
*) Sink Compile-Time Attributes/Observers.  Observers that are configured at compile time, augment Log Records, and stamp Log Records as they pass through the Sink (e.g. severity, channels, etc)
*) Backends.  A formatter that orders and writes Log Records from Sinks
*) Log Records.  A container of information that is populated by the user's application, Sinks and Backends (possibly Loggers as well)
*) Log Record Compile-Time Attributes.  Compile-time members of a Log Record struct
*) Log Record Runtime Attributes.  Key/Value members of a Log Record stored in a std::map (currently how Attributes are implemented)

I'd imagine most developer would want async messaging between Sinks and Backends.  Those that don't, could make use of a Sink Compile-Time Attribute that always sets the flush bit on a Log Record.
...
- What is your evaluation of the implementation?
Generally speaking the internals of the implementation is pretty clean, especially given the size and various ways that Boost.Log can be cobbled together.  Also in hindsight: the developer interface for interacting with the library could be improved by making Log Records the fundamental unit that everything hangs off of.  Making log records easy to extend containers that carry domain specific information without std::map seems like a better design choice given how frequently developers will interact with Log Records.  The use case of having log records that maintain stats that get pumped through a logging system struck me as particularly useful, but unnecessarily expensive with std::map.

As was noted in other reviews, the cost of evaluating whether or not a Sink is going to accept a Log Record is of some concern.  Figuring out a way to have the Log Record preconditions evaluated as quickly as possible should be important given the frequency at which logging code is/should be deployed in most applications.  I remember reading someone else's benchmark saying that it was ~2500 instructions to evaluate whether or not a Sink should accept a given Log Record.  For something that should macro expand to:

  for (logger::iter = logger.begin(); iter->accept(severity); ++iter) iter->stream()

I haven't profiled, but when most accept() functions are doing:

  inline bool accept(SevT s) const { return s >= sev_filter ? true : false; }

2500 ops seems expensive to me.

The notion of a "global logging core" bothers me.  Having singletons of loggers and sinks in TSS is very good from a utility perspective.  Admittedly I don't know how the global logging core is used beyond being a nob that can be turned to enable/disable logging, nor have I profiled it yet, but any centralized core bugs me in a highly threaded environment.

Questions:

  Why does Boost.Log roll its own TLS/TSS support?  boost/log/detail/thread_specific.hpp vs. boost/thread/tss.hpp  There may be a very good reason, I just didn't see it when I was looking through the library.

  Can Log Records be pool managed somehow?

*) Log Records

I understand why std::map is used to store attributes (ease of use/flexibility), but at the same time, I do wish that frequently stored attributes could be baked in at compile time.

This is my biggest gripe with the library, actually.  I would like my application to err on the side of making use of compile-time information at the expense of the time required for compiling the source.  Storing things in a std::map seems lazy and brutish, for the lack of a better description.  Flipping things on its head so that Log Records define Loggers, Sinks and Backends would be ideal, IMHO, and would solve most of my gripes.  This is a design decision, but the contents of Log Records is of more importance than the backends.  Boost.Log appears to be designed from Backends to Log Records vs. Log Records to Backends, which is the way data flows through an application (this would be my preference).  For example:

// quick pseudo-code using MI to create my_log_record and downstream types
typedef enum { info = 0, warn = 1, crit = 2 } severity_t;
struct my_log_record : boost::log::record::attribute::map< std::string, std::string >, boost::log::record::attribute::severity< severity_t >,
  boost::log::record::attribute::message, boost::log::recordmy_app::log_record::attribute::src_port {};
typedef struct my_log_record log_record_t;
typedef SinkT< log_record_t > sink_t;
typedef LoggerT< sink_t > logger_t;
typedef BackendT< sinkT > backend_t;

Observers could then be placed on Sinks that would stamp Log Records with Severities, or Channels, or in whatever hodge-podge configuration the developer wanted and such examples would be very cookbook.

*) Log Records are highly reused and seem like ideal candidates for a TLS freelist.
...
- What is your evaluation of the documentation?
Pretty complete, albeit abstract.  Lacks examples for customized records and a best practice guidelines.  A few other suggestions for things that I had to hunt down or figure out:

  *) Global loggers vs. per-object references to singletons
  *) Channel severity loggers example
  *) Setting up an initial console logger as line one in main()
  *) Changing a backend filter at a later date
  *) Using async logging for INFO level logs, but switching to sync for CRIT errors.
  *) Obtaining the severity of a particular logger

Outstanding doc questions:

*) How do you remove log events given a severity level at compile time.  Currently this isn't explicit defined in the documentation like it is in Google's glog library:

http://google-glog.googlecode.com/svn/trunk/doc/glog.html#strip

*) Is there a way to set a bit in a Log Record that overrides a Sink's filter?  For instance, there are certain events that I want to make sure go to the console 100% of the time without changing the severity of the log message.  Do I just log twice?  It'd be keen if Log Records could have a bit that could be set and would override a filter.

  bool log_to_console = true;
  BOOST_LOG_SEV_CONS(lg, warning, log_to_console) << "my message";

Other comments:

*) It should be noted that the internal async log message queuing mechanism uses a vector and during very bursty logging activity that the vector can grow large in size and doesn't ever reduce in size.  This is a known limitation and the use of a vector was chosen to facilitate atomic append operations for log records.  Using a std::deque would be preferred for bursty applications that way space could be reclaimed easier, but as Andrey pointed out, it would require a full mutex which would reduce the performance of the library.

*) I find the argument against Lazy Streaming of attributes to be weak.

http://boost-log.sourceforge.net/libs/log/doc/html/log/rationale/why_not_laz...

Current BOOST_LOG() expands to a for() loop where the contents of the for() loop aren't executed unless a Sink in a Logger accepts the record.  From: boost/log/sources/record_ostream.hpp

#define BOOST_LOG_STREAM_WITH_PARAMS_INTERNAL(logger, rec_var, params_seq)\
    for (BOOST_LOG_AUTO(rec_var, (logger).open_record((BOOST_PP_SEQ_ENUM(params_seq)))); !!rec_var; rec_var.reset())\
        ::boost::log::aux::make_pump_stream((logger), rec_var)

So why can't Log Records have compile-time attributes that can be populated with typed attributes in a stream?

BOOST_LOG_SEV(lg, warning) << "My log has an attribute src_port" << log_record_t::set_src_port(conn.src_port) << " this continues to append to the Log Record's message";

And then be able to extract the compile-time attribute in a backend:

  // Dashed out pseudo-code to make a point (I know the syntax is garbage in the src_port's formatter).
  typedef struct backend : boost::log::record::base_record { typedef int src_port_type; int src_port; } record_t;
  typedef boost::log::sinks::text_file_backend< record_t > backend_t;
  b->set_formatter(
    fmt::stream << fmt::attr<unsigned int>("LineId", keywords::format = "%08x") << " Src Port: " << fmt::attr< record_t::src_port_type >(record_t::src_port)
  );

*) Dovetailing on the above two points, a flush bit for a given Log Record would be very nice as well.  99.999% of the time async logging is desired, but in the 0.001% of the time when a Log Record must get out to disk, chances are that one Log Record every blue moon is disproportionally valuable compared to the normal Log Record.  Having a flush() that doesn't return until the backend has fsync(2)'ed data would be really spiffy.

  BOOST_LOG_SEV(lg, warning) << "My message" << lg::flush;

As is, you have to call:

  sink->feed_records(::boost::posix_time::seconds(0));

which isn't as elegant and I don't believe actually signals to the backend that data needs to be flushed to disk (it simply drains the frontend).  Developers that want synchronous logging could define an observer that always sets the sync bit and would almost certainly be optimized away at compile-time.
...
- Did you try to use the library?  With what compiler?  Did you have any problems?
FreeBSD 7+ and Darwin.  GCC 4.2.1, gcc 4.3 gcc 4.4.

TIP in subversion has been working well for me.  RC4 was problematic due to bjam configuration problems.
...
- How much effort did you put into your evaluation? A glance? A quick reading? In-depth study?
Developed a reasonably in-depth understanding of the internals and its use.
...
- Are you knowledgeable about the problem domain?
Sufficiently adept in the problem domain.

Parting thoughts:

Like I said in the first paragraph, this library should be accepted in to Boost because it's the best that I've evaluated and see available on the landscape.  My compliments to Andrey, and likely Trojo as well.  All of my critical feedback is hindsight 20/20 after having successfully integrated Boost.Log in to my application.  I am happy with Boost.Log though do see room for improvement in the future.

Forward thinking:

The more times I have to tweak and revisit logging, for whatever reason, the more convinced that a better and more elegant design would result if Backends, Sinks, et al were defined in terms of Log Records.  If there is a Boost.Log2, I would love to see the architecture flipped on its head as suggested.  I believe such an architecture would:

*) improve the evaluation of a Sink's preconditions for accepting/creating a new Log Record
*) allow Log Records to contain bits that are set by observers and are propagated through a Log Record's life-cycle back to the Backends
*) result in easier application specific logic (e.g. flush or console redirection)
*) improve performance when preconditions are evaluated
*) improve performance because fixed sized Log Records could be memory managed on thread specific pools
*) allow observers to be chained together components that would be easier to setup and manipulate; and
*) could be used to allow data to stream to a backend (namely application stats)

Further, by constraining the scope of what Boost.Log is capable to a more rigid/fixed architecture that gets optimized at compile time, the library would be easier to understand and use (e.g. change the design so that not all Sinks or Backends accept all Log Records).  Hopefully someone finds this retrospective useful in the future.

-sc

PS  I started to put my money where my mouth is and cobbled together something that crudely demonstrates Log Records -> Backends, but came up short when getting streaming attributes to work due to a lack of time available to focus for getting an overloaded operator<<() to work.  The sample can be found here if anyone is interested.  Discussions of this should be taken offline since it's tangential to the current Boost.Log review process.

http://codepad.org/8ACp0obH

--
Sean Chittenden
sean@chittenden.org

Sean Chittenden

Andrey Semashev

tags

participants (2)