[log] Boost.Log Review...

YES, the Boost.Log library SHOULD BE ACCEPTED in to the Boost release distribution. Summary: *) Boost.Log isn't perfect and after having reviewed most of its alternatives, and having recently converted from using glog, I like the utility and architecture provided by Boost.Log the most. It comes close to solving all logging problems for everyone. Closing the gap of solving the last 5% between what Boost.Log does and what I ideally wanted was tedious and required me to develop a much higher level of understanding that I would have liked, but it was possible. *) In solving all logging problems for all people, Boost.Log's documentation is quite abstract. *) I would like to see the design of templates/classes flipped so that the starting point for interacting and beginning to use Boost.Log begins with customizing Log Records and ends with defining backends and backend filtering. This is a fundamental design choice, however and one that I'm willing to accept given the value that Boost.Log provides. *) I've been saving most of this feedback for the better part of two months now so excuse the length and rambling. Certain points of the library I struggled with and ultimately ended up begrudgingly resigned myself to accepting as "the way it is" vs. "the way it should be." *) My perspective on the Boost.Log is framed by the recent use of Google's glog library and in a previous life having written performance-oriented, binary logging libraries for high TPS systems (only in C, however, not C++).
- What is your evaluation of the potential usefulness of the library?
Boost.Log quite useful and has the highest promise of any logging library that I am aware of because of its ability to accommodate the following common use case: Support asynchronous logging from multiple frontend threads with a backend able to order the output and send different messages to different files. If it weren't for the ability of this library to meet that specific design requirement, I would still be using Google's glog library.
- What is your evaluation of the design?
The design is very modular. Too modular. So much so that the flexibility and required understanding is actually a barrier to entry to make use of the library (i.e. now that I'm using the library, I know more about the innards of Boost.Log than I would like or think is reasonable). After using this library for several months now, I have distilled a quick summary for those coming up to speed on how to use Boost.Log and its components. This is specific to how I'm using Boost.Log, but I think this is the desired use-case for most users of Boost.Log (esp since this is the common path for users of ASIO). - Your application is littered with Log Statements that are tagged with different Severity Levels - Log Statements consist of a Logger (a collection of Sinks), a Precondition (e.g. a severity) and Log Record instance data (most commonly a message) - Loggers are collections of Sinks (e.g. a file and the console) - Loggers conditionally accept Log Records if one of a Logger's Sinks returns true for a Precondition (e.g. only create a Log Record if the log statement's Severity Level greater than or equal to the Severity Level configured in the Sink) - Instructions required to create a Log Record won't be executed if a Sink doesn't accept a Log Record - Sinks can be configured to queue accepted Log Records (e.g. async frontends) before passing them to one or more Backends for printing (e.g. console text file, SNMP trap) The Boost.Log system can be configured to mutate individual Log Records through the manipulation of the following three types of information: - Channels represent a way of tying together a collection of related systems (e.g. security, network, application). Think of channels as a Sink-level observer that stamps the channel on a Log Record. - Severities represent the importance of a message (a Log Statement precondition that a Sink-level Observer could stamp a Log Record) - Attributes represent per-Log Record information (a string key/value in a Log Record instance) Backends can then format the output based on the Attributes of a particular Log Record. The above wasn't explicitly distilled in the documentation in a single place, but I think future developers would find it very useful to see that spelled out explicitly. Given the extremely modular design of the library, I understand why the docs are the way they are and know that some of this is touched on in the Design Overview, but until I became very competent with the library, the documentation seemed too abstract. Despite the importance of logging, becoming a logging expert was tangential to the problem I was trying to solve and I would have personally found the above summary very useful. While glog is a supper thin logging API, Boost.Log feels like a complete and modular logging system. In glog, seeing: VLOG(3) << "asdf"; Where 3 can/was a #define, the system ended up feeling very arbitrary and unstructured. A hindsight 20/20 architecture and terminology for Boost.Log could be simplified in terms of: *) Log Record pre-conditions. Sinks control whether or not a Log Record is to be generated *) Loggers. A container for zero or more Sinks *) Sinks. A frontend that queues Log Records *) Sink Compile-Time Attributes/Observers. Observers that are configured at compile time, augment Log Records, and stamp Log Records as they pass through the Sink (e.g. severity, channels, etc) *) Backends. A formatter that orders and writes Log Records from Sinks *) Log Records. A container of information that is populated by the user's application, Sinks and Backends (possibly Loggers as well) *) Log Record Compile-Time Attributes. Compile-time members of a Log Record struct *) Log Record Runtime Attributes. Key/Value members of a Log Record stored in a std::map (currently how Attributes are implemented) I'd imagine most developer would want async messaging between Sinks and Backends. Those that don't, could make use of a Sink Compile-Time Attribute that always sets the flush bit on a Log Record.
- What is your evaluation of the implementation?
Generally speaking the internals of the implementation is pretty clean, especially given the size and various ways that Boost.Log can be cobbled together. Also in hindsight: the developer interface for interacting with the library could be improved by making Log Records the fundamental unit that everything hangs off of. Making log records easy to extend containers that carry domain specific information without std::map seems like a better design choice given how frequently developers will interact with Log Records. The use case of having log records that maintain stats that get pumped through a logging system struck me as particularly useful, but unnecessarily expensive with std::map. As was noted in other reviews, the cost of evaluating whether or not a Sink is going to accept a Log Record is of some concern. Figuring out a way to have the Log Record preconditions evaluated as quickly as possible should be important given the frequency at which logging code is/should be deployed in most applications. I remember reading someone else's benchmark saying that it was ~2500 instructions to evaluate whether or not a Sink should accept a given Log Record. For something that should macro expand to: for (logger::iter = logger.begin(); iter->accept(severity); ++iter) iter->stream() I haven't profiled, but when most accept() functions are doing: inline bool accept(SevT s) const { return s >= sev_filter ? true : false; } 2500 ops seems expensive to me. The notion of a "global logging core" bothers me. Having singletons of loggers and sinks in TSS is very good from a utility perspective. Admittedly I don't know how the global logging core is used beyond being a nob that can be turned to enable/disable logging, nor have I profiled it yet, but any centralized core bugs me in a highly threaded environment. Questions: Why does Boost.Log roll its own TLS/TSS support? boost/log/detail/thread_specific.hpp vs. boost/thread/tss.hpp There may be a very good reason, I just didn't see it when I was looking through the library. Can Log Records be pool managed somehow? *) Log Records I understand why std::map is used to store attributes (ease of use/flexibility), but at the same time, I do wish that frequently stored attributes could be baked in at compile time. This is my biggest gripe with the library, actually. I would like my application to err on the side of making use of compile-time information at the expense of the time required for compiling the source. Storing things in a std::map seems lazy and brutish, for the lack of a better description. Flipping things on its head so that Log Records define Loggers, Sinks and Backends would be ideal, IMHO, and would solve most of my gripes. This is a design decision, but the contents of Log Records is of more importance than the backends. Boost.Log appears to be designed from Backends to Log Records vs. Log Records to Backends, which is the way data flows through an application (this would be my preference). For example: // quick pseudo-code using MI to create my_log_record and downstream types typedef enum { info = 0, warn = 1, crit = 2 } severity_t; struct my_log_record : boost::log::record::attribute::map< std::string, std::string >, boost::log::record::attribute::severity< severity_t >, boost::log::record::attribute::message, boost::log::recordmy_app::log_record::attribute::src_port {}; typedef struct my_log_record log_record_t; typedef SinkT< log_record_t > sink_t; typedef LoggerT< sink_t > logger_t; typedef BackendT< sinkT > backend_t; Observers could then be placed on Sinks that would stamp Log Records with Severities, or Channels, or in whatever hodge-podge configuration the developer wanted and such examples would be very cookbook. *) Log Records are highly reused and seem like ideal candidates for a TLS freelist.
- What is your evaluation of the documentation?
Pretty complete, albeit abstract. Lacks examples for customized records and a best practice guidelines. A few other suggestions for things that I had to hunt down or figure out: *) Global loggers vs. per-object references to singletons *) Channel severity loggers example *) Setting up an initial console logger as line one in main() *) Changing a backend filter at a later date *) Using async logging for INFO level logs, but switching to sync for CRIT errors. *) Obtaining the severity of a particular logger Outstanding doc questions: *) How do you remove log events given a severity level at compile time. Currently this isn't explicit defined in the documentation like it is in Google's glog library: http://google-glog.googlecode.com/svn/trunk/doc/glog.html#strip *) Is there a way to set a bit in a Log Record that overrides a Sink's filter? For instance, there are certain events that I want to make sure go to the console 100% of the time without changing the severity of the log message. Do I just log twice? It'd be keen if Log Records could have a bit that could be set and would override a filter. bool log_to_console = true; BOOST_LOG_SEV_CONS(lg, warning, log_to_console) << "my message"; Other comments: *) It should be noted that the internal async log message queuing mechanism uses a vector and during very bursty logging activity that the vector can grow large in size and doesn't ever reduce in size. This is a known limitation and the use of a vector was chosen to facilitate atomic append operations for log records. Using a std::deque would be preferred for bursty applications that way space could be reclaimed easier, but as Andrey pointed out, it would require a full mutex which would reduce the performance of the library. *) I find the argument against Lazy Streaming of attributes to be weak. http://boost-log.sourceforge.net/libs/log/doc/html/log/rationale/why_not_laz... Current BOOST_LOG() expands to a for() loop where the contents of the for() loop aren't executed unless a Sink in a Logger accepts the record. From: boost/log/sources/record_ostream.hpp #define BOOST_LOG_STREAM_WITH_PARAMS_INTERNAL(logger, rec_var, params_seq)\ for (BOOST_LOG_AUTO(rec_var, (logger).open_record((BOOST_PP_SEQ_ENUM(params_seq)))); !!rec_var; rec_var.reset())\ ::boost::log::aux::make_pump_stream((logger), rec_var) So why can't Log Records have compile-time attributes that can be populated with typed attributes in a stream? BOOST_LOG_SEV(lg, warning) << "My log has an attribute src_port" << log_record_t::set_src_port(conn.src_port) << " this continues to append to the Log Record's message"; And then be able to extract the compile-time attribute in a backend: // Dashed out pseudo-code to make a point (I know the syntax is garbage in the src_port's formatter). typedef struct backend : boost::log::record::base_record { typedef int src_port_type; int src_port; } record_t; typedef boost::log::sinks::text_file_backend< record_t > backend_t; b->set_formatter( fmt::stream << fmt::attr<unsigned int>("LineId", keywords::format = "%08x") << " Src Port: " << fmt::attr< record_t::src_port_type >(record_t::src_port) ); *) Dovetailing on the above two points, a flush bit for a given Log Record would be very nice as well. 99.999% of the time async logging is desired, but in the 0.001% of the time when a Log Record must get out to disk, chances are that one Log Record every blue moon is disproportionally valuable compared to the normal Log Record. Having a flush() that doesn't return until the backend has fsync(2)'ed data would be really spiffy. BOOST_LOG_SEV(lg, warning) << "My message" << lg::flush; As is, you have to call: sink->feed_records(::boost::posix_time::seconds(0)); which isn't as elegant and I don't believe actually signals to the backend that data needs to be flushed to disk (it simply drains the frontend). Developers that want synchronous logging could define an observer that always sets the sync bit and would almost certainly be optimized away at compile-time.
- Did you try to use the library? With what compiler? Did you have any problems?
FreeBSD 7+ and Darwin. GCC 4.2.1, gcc 4.3 gcc 4.4. TIP in subversion has been working well for me. RC4 was problematic due to bjam configuration problems.
- How much effort did you put into your evaluation? A glance? A quick reading? In-depth study?
Developed a reasonably in-depth understanding of the internals and its use.
- Are you knowledgeable about the problem domain?
Sufficiently adept in the problem domain. Parting thoughts: Like I said in the first paragraph, this library should be accepted in to Boost because it's the best that I've evaluated and see available on the landscape. My compliments to Andrey, and likely Trojo as well. All of my critical feedback is hindsight 20/20 after having successfully integrated Boost.Log in to my application. I am happy with Boost.Log though do see room for improvement in the future. Forward thinking: The more times I have to tweak and revisit logging, for whatever reason, the more convinced that a better and more elegant design would result if Backends, Sinks, et al were defined in terms of Log Records. If there is a Boost.Log2, I would love to see the architecture flipped on its head as suggested. I believe such an architecture would: *) improve the evaluation of a Sink's preconditions for accepting/creating a new Log Record *) allow Log Records to contain bits that are set by observers and are propagated through a Log Record's life-cycle back to the Backends *) result in easier application specific logic (e.g. flush or console redirection) *) improve performance when preconditions are evaluated *) improve performance because fixed sized Log Records could be memory managed on thread specific pools *) allow observers to be chained together components that would be easier to setup and manipulate; and *) could be used to allow data to stream to a backend (namely application stats) Further, by constraining the scope of what Boost.Log is capable to a more rigid/fixed architecture that gets optimized at compile time, the library would be easier to understand and use (e.g. change the design so that not all Sinks or Backends accept all Log Records). Hopefully someone finds this retrospective useful in the future. -sc PS I started to put my money where my mouth is and cobbled together something that crudely demonstrates Log Records -> Backends, but came up short when getting streaming attributes to work due to a lack of time available to focus for getting an overloaded operator<<() to work. The sample can be found here if anyone is interested. Discussions of this should be taken offline since it's tangential to the current Boost.Log review process. http://codepad.org/8ACp0obH -- Sean Chittenden sean@chittenden.org

On 03/12/2010 03:32 AM, Sean Chittenden wrote:
After using this library for several months now, I have distilled a quick summary for those coming up to speed on how to use Boost.Log and its components. This is specific to how I'm using Boost.Log, but I think this is the desired use-case for most users of Boost.Log (esp since this is the common path for users of ASIO).
- Your application is littered with Log Statements that are tagged with different Severity Levels - Log Statements consist of a Logger (a collection of Sinks), a Precondition (e.g. a severity) and Log Record instance data (most commonly a message) - Loggers are collections of Sinks (e.g. a file and the console) - Loggers conditionally accept Log Records if one of a Logger's Sinks returns true for a Precondition (e.g. only create a Log Record if the log statement's Severity Level greater than or equal to the Severity Level configured in the Sink) - Instructions required to create a Log Record won't be executed if a Sink doesn't accept a Log Record - Sinks can be configured to queue accepted Log Records (e.g. async frontends) before passing them to one or more Backends for printing (e.g. console text file, SNMP trap)
The Boost.Log system can be configured to mutate individual Log Records through the manipulation of the following three types of information:
- Channels represent a way of tying together a collection of related systems (e.g. security, network, application). Think of channels as a Sink-level observer that stamps the channel on a Log Record. - Severities represent the importance of a message (a Log Statement precondition that a Sink-level Observer could stamp a Log Record) - Attributes represent per-Log Record information (a string key/value in a Log Record instance)
Backends can then format the output based on the Attributes of a particular Log Record.
The above wasn't explicitly distilled in the documentation in a single place, but I think future developers would find it very useful to see that spelled out explicitly. Given the extremely modular design of the library, I understand why the docs are the way they are and know that some of this is touched on in the Design Overview, but until I became very competent with the library, the documentation seemed too abstract. Despite the importance of logging, becoming a logging expert was tangential to the problem I was trying to solve and I would have personally found the above summary very useful.
Well, that summary may describe your use case very well, but it is by no means complete with regard to Boost.Log as a whole. I would even say it's misleading in some parts - in particular: "Loggers are collections of Sinks" and "Think of channels as a Sink-level observer".
A hindsight 20/20 architecture and terminology for Boost.Log could be simplified in terms of:
What is "20/20"? I'll continue my answers assuming that the following was your understanding of the current Boost.Log architecture.
*) Log Record pre-conditions. Sinks control whether or not a Log Record is to be generated
Looks like you're confusing sinks with filters. Also, don't forget about the global filter.
*) Loggers. A container for zero or more Sinks
Loggers and sinks are not related, and that is the intention of the library. Different log records made through the same logger may end up in any sink(s). This gives the ability to encapsulate logging configuration in a single place of the application, without affecting every place where logging is used.
*) Sinks. A frontend that queues Log Records
Only async frontends use record queuing.
*) Sink Compile-Time Attributes/Observers. Observers that are configured at compile time, augment Log Records, and stamp Log Records as they pass through the Sink (e.g. severity, channels, etc)
There is no model of this in Boost.Log. I take it that you'd like it to be there. More on this below.
*) Backends. A formatter that orders and writes Log Records from Sinks
Roughly speaking, yes. But "writes" is not the accurate term, as no actual writing may take place.
*) Log Records. A container of information that is populated by the user's application, Sinks and Backends (possibly Loggers as well)
Nod.
*) Log Record Compile-Time Attributes. Compile-time members of a Log Record struct
Same as "Sink Compile-Time Attributes/Observers".
*) Log Record Runtime Attributes. Key/Value members of a Log Record stored in a std::map (currently how Attributes are implemented)
The current solution is much equivalent to std::map, but is not.
I'd imagine most developer would want async messaging between Sinks and Backends.
That assumption is not valid. One of the most frequent application of logging is the debug output, which doesn't work well with async logging. In fact, I expect synchronous logging to be more common than in-process asynchronous one.
Those that don't, could make use of a Sink Compile-Time Attribute that always sets the flush bit on a Log Record.
Flushing the other thread through the flag will be a time hog. Doing this on a per-record basis will bring performance down to an unusable level.
The notion of a "global logging core" bothers me. Having singletons of loggers and sinks in TSS is very good from a utility perspective. Admittedly I don't know how the global logging core is used beyond being a nob that can be turned to enable/disable logging, nor have I profiled it yet, but any centralized core bugs me in a highly threaded environment.
The core doesn't limit scalability as it doesn't block threads unless the user wants to configure it (e.g. add/remove sinks or global attributes). The tests I mentioned in another thread confirm that.
Why does Boost.Log roll its own TLS/TSS support? boost/log/detail/thread_specific.hpp vs. boost/thread/tss.hpp There may be a very good reason, I just didn't see it when I was looking through the library.
Because it's more efficient than what Boost.Thread provides. When thread_specific_ptr is fast enough, I'll be glad to switch back to it.
Can Log Records be pool managed somehow?
No. If you're concerned with dynamic memory allocation, it's better to use libraries dedicated to that problem (Google perftools, for instance).
I understand why std::map is used to store attributes (ease of use/flexibility), but at the same time, I do wish that frequently stored attributes could be baked in at compile time.
This is my biggest gripe with the library, actually. I would like my application to err on the side of making use of compile-time information at the expense of the time required for compiling the source. Storing things in a std::map seems lazy and brutish, for the lack of a better description. Flipping things on its head so that Log Records define Loggers, Sinks and Backends would be ideal, IMHO, and would solve most of my gripes. This is a design decision, but the contents of Log Records is of more importance than the backends. Boost.Log appears to be designed from Backends to Log Records vs. Log Records to Backends, which is the way data flows through an application (this would be my preference). For example:
[snip]
Observers could then be placed on Sinks that would stamp Log Records with Severities, or Channels, or in whatever hodge-podge configuration the developer wanted and such examples would be very cookbook.
Now that you presented a code snippet, your suggestion makes more sense to me. Let me express my thoughts on it in two parts: 1. Custom log records. Yes, the model of "mandatory" attributes look promising. It could, at least, partially solve the problem of possible library misconfiguration that was highlighted in another review. I imagine that some kind of log record prototype (or factory) should be defined by the user in order to be able to construct log records with the specific set of attribute values. Filters and formatters could then request these mandatory attribute values without having to worry about their absence. Lookup performance may also be higher, but I doubt it will be compile-time. The idea is quite vague now but after some thinking over it might become a good addition to the library. 2. Designing sinks (and the core, I presume) after the log records. No, that looks unacceptable to me, at least for what Boost.Log tries to achieve. This would introduce too much coupling between the logging side of the library (which tends to be exposed to many parts of the application) and the consuming side (which tends to be configured in a single place of the application). To my mind, these sides of the library should be maximally decoupled. Also, it's not clear how to support multiple log record types in such a design.
*) Log Records are highly reused and seem like ideal candidates for a TLS freelist.
I consider memory management out of scope of Boost.Log.
*) Global loggers vs. per-object references to singletons
Not sure I understand.
*) Channel severity loggers example
That example wouldn't be much different from the ones in sections about severity_logger or channel_logger. But I'll add a code sample.
*) Setting up an initial console logger as line one in main()
Hmm. Ok, I'll add something about that to the tutorial.
*) Changing a backend filter at a later date
Just call set_filter on the backend, that's all.
*) Using async logging for INFO level logs, but switching to sync for CRIT errors.
You can register two sinks - one synchronous and one asynchronous, and set their filters accordingly.
*) Obtaining the severity of a particular logger
You mean, the default severity? There's no public API for this, but there's a protected function default_severity in the logger feature. Why would you need it?
*) How do you remove log events given a severity level at compile time. Currently this isn't explicit defined in the documentation like it is in Google's glog library:
http://google-glog.googlecode.com/svn/trunk/doc/glog.html#strip
The library has no direct means for that. Users can conditionally define their logging macros, based on BOOST_LOG_SEV or pure C++ interface and then enable or disable the particular logging statements from the source.
*) Is there a way to set a bit in a Log Record that overrides a Sink's filter? For instance, there are certain events that I want to make sure go to the console 100% of the time without changing the severity of the log message. Do I just log twice? It'd be keen if Log Records could have a bit that could be set and would override a filter.
bool log_to_console = true; BOOST_LOG_SEV_CONS(lg, warning, log_to_console)<< "my message";
You can set an attribute (say, "Emergency") for that record, and setup filters so that whenever this attribute is present, the record is passed on to the console backend.
*) It should be noted that the internal async log message queuing mechanism uses a vector and during very bursty logging activity that the vector can grow large in size and doesn't ever reduce in size. This is a known limitation and the use of a vector was chosen to facilitate atomic append operations for log records. Using a std::deque would be preferred for bursty applications that way space could be reclaimed easier, but as Andrey pointed out, it would require a full mutex which would reduce the performance of the library.
There is no vector in the async frontends. The problem with the current queuing mechanism is not the storage strategy but the lack of queue boundaries. But I agree, I should have drawn attention to this aspect in the docs. Will do.
*) I find the argument against Lazy Streaming of attributes to be weak.
Current BOOST_LOG() expands to a for() loop where the contents of the for() loop aren't executed unless a Sink in a Logger accepts the record.
So why can't Log Records have compile-time attributes that can be populated with typed attributes in a stream?
And then be able to extract the compile-time attribute in a backend:
Just as it is said in the rationale - the streaming expression is _not_ executed until _after_ the filtering is done. And filtering already involves attribute values, so there's no way to use these attribute values you specified in the streaming expression. Besides, after the view of attribute values is constructed, it is immutable, so nothing can be added after the filtering is done.
*) Dovetailing on the above two points, a flush bit for a given Log Record would be very nice as well. 99.999% of the time async logging is desired, but in the 0.001% of the time when a Log Record must get out to disk, chances are that one Log Record every blue moon is disproportionally valuable compared to the normal Log Record. Having a flush() that doesn't return until the backend has fsync(2)'ed data would be really spiffy.
BOOST_LOG_SEV(lg, warning)<< "My message"<< lg::flush;
As I already noted, your mileage is too biased, at least. I, personally, tend to use synchronous logging. But I do recognize that some kind of flushing ability would be nice in case of async logging.
participants (2)
-
Andrey Semashev
-
Sean Chittenden