[log] Boost.Log formal review ongoing

Boosters, just a a friendly reminder that the formal review of the Boost.Log library, written by Andrey Semashev, is ongoing through March 17, 2010. There's still enough time to submit reviews and have them discussed, so please take a look at the library and share your thoughts. The documentation for the current version is available at: http://boost-log.sourceforge.net The downloads are at: http://sourceforge.net/projects/boost-log Here are the important points: 1. All comments are appreciated. Even if you don't have the time for in-depth study of everything, comments on parts of the library are welcome. In particular, if you are already using logging in your applications and have specialized requirements, you might want to directly check how the proposed library handles them. 2. The reviews and all comments should be submitted to the developers list, boost@lists.boost.org and the email should have "[log]" prefix in the subject to make sure it's not missed. The deadline for reviews is 23:59, March 17, PST = 2:59, March 18, EST = 7:59 March 18, GMT = 10:59, March 18 MSK 3. Please explicitly state in your review whether the library should be accepted. 4. The general review checklist is provided below: - What is your evaluation of the design? - What is your evaluation of the implementation? - What is your evaluation of the documentation? - What is your evaluation of the potential usefulness of the library? - Did you try to use the library? With what compiler? Did you have any problems? - How much effort did you put into your evaluation? A glance? A quick reading? In-depth study? - Are you knowledgeable about the problem domain? Thanks, Volodya

Just a quick (ok quite long) note / questions at this stage. Currently we have applications using a logging lib that uses a channel model exclusively. By that I mean there are no message attributes - all routing/filtering of messages between sources and sinks is performed based not on message content but by source identity. Sources are named. We create sources with names like (this isn't the only form supported but is a common one) "module.component.instance.severity". The routing/filtering is done using a simple pattern match on the source name. Configuration data is simply a list of patterns associated with a sink and used to determining if a given source is connected to a sink. so a typical config (filter) for a sink might be *.error comms.clientsession.* comms.clientsession.*.trace- comms.clientsession.*.debug- comms.clientsession.clientid1.* which would result in all errors being logged, along with severities other that trace and debug for all comms clientsessions being logged and all messages form clientid1 (ie including trance and debug messages) being logged. In this scheme creating a log source is relatively expensive as it attempts to match the source to the configured routing "rules", but once it has done so the result (connected or not to a given sink) is cached. Changing the config dynamically (something we use a lot) is very expensive as it requires that all sources be re-evaluated as above. However this is mitigated by the fact that we don't care exactly when the change is applied, only that it is applied eventually, and the logging system allows the rule eval/update to be done asynchronously to logging with only a small (pointer swap) operation to apply the new rule to a source being performed while holding a lock. Adding/removing sinks is logically just a variation on the dynamic config change case. The code in some cases contains enough logging/expensive enough data gathering that configuring "*" will bring the system performance down to an unusable level (and generate an unusable amount of data). Because the channel evaluation is done only on creation of source and change in log configuration (and the latter is async) the cost of logging something that will in not be sent anywhere is very low (comparing a pointer to null). The use pattern is that a given class has a set of sources (one per severity). When an object that logs is constructed (though in some cases it defers or changes the "naming" of the instance later (eg when a session is fully established with a client such that the client name can be associated with the server object). Sources that are named the same are effectively the same for logging purposes (but need have no knowledge of each others existence) - this is similar in effect to applying the same attributes to messages (but via different loggers) in boost.log. The logging system does not support sink specific formatting (beyond simple prepending of things like time stamps) - the log "message" is a preformatted string. We had ideas we might change that but haven't as yet found the need. Anyway - I've looked at the docs for boost.log and read some review comments to see if we could replace this with the proposed boost.log. I've reached the following (possibly wrong) conclusions: 1) there isn't a direct mapping from this style of logging 2) the obvious approach of making each element of the channel identifier an attribute would seem ok - but I'm concerned that the cost of evaluating the filters would be significant 3) formatting features are nice 4) it isn't immediately clear how to support dynamic configuration of "filtering" and sinks 5) attributes seem a little overloaded - the attribute as something to be filtered on and the attribute as something to be included in (formatted in to) the logged message roles are potentially different. It would be nice to have expensive to evaluate attributes that played no role in filtering but that were (after filtering "passed" the message) evaluated conditionally based on filtering. 6) There are use cases that boost.log can support (ones where the filtering is based on more "dynamic" attributes) that are not practical to support in the pure channel based model afaiks. I'd class most of these as "alarms" where there is some sort of thresholding or similar criteria applied to an attribute value (or combinations) to determine whether to log the message. At this point I suspect (I would need to run some benchamrks) that boost.log won't suit the uses cases I currently have, but is (subject to some concerns which might be unfounded re support for dynamic configuration) flexible enough to support far more use cases than what I'm using at present - at least when the performance in the case where the message is to be discarded is less critical. Unless I have more time suddenly appear from nowhere I probably won't be able to do a proper review, but I would be really interested in any feedback on whether this usage is supported (and how) by boost.log. Darryl

On 03/12/2010 05:05 PM, Darryl Green wrote:
Anyway - I've looked at the docs for boost.log and read some review comments to see if we could replace this with the proposed boost.log. I've reached the following (possibly wrong) conclusions:
1) there isn't a direct mapping from this style of logging
Except the filtering based on the message text, I'd say your case is covered by Boost.Log rather well. Some things, like having a source per severity level, are not required with Boost.Log.
2) the obvious approach of making each element of the channel identifier an attribute would seem ok - but I'm concerned that the cost of evaluating the filters would be significant
The cost will be more significant than comparing a pointer against NULL, that's for sure.
4) it isn't immediately clear how to support dynamic configuration of "filtering" and sinks
Well, the library allows to add/remove sinks, filters and formatters during the run time. You can simply call the interface methods as you would do it on the library configuration stage.
5) attributes seem a little overloaded - the attribute as something to be filtered on and the attribute as something to be included in (formatted in to) the logged message roles are potentially different. It would be nice to have expensive to evaluate attributes that played no role in filtering but that were (after filtering "passed" the message) evaluated conditionally based on filtering.
It looks to me that it's already implemented, but not exposed to the user (since it's only an optimization). If an attribute is not used during filtering, and the filter rejects the record, the attribute is not evaluated at all.
6) There are use cases that boost.log can support (ones where the filtering is based on more "dynamic" attributes) that are not practical to support in the pure channel based model afaiks. I'd class most of these as "alarms" where there is some sort of thresholding or similar criteria applied to an attribute value (or combinations) to determine whether to log the message.
Hmm, not sure I understand what you mean here. Severity levels, perhaps? But you do use them, although they are a part of the channel name in your case.

On Sat, 2010-03-13 at 22:14 +0300, Andrey Semashev wrote:
On 03/12/2010 05:05 PM, Darryl Green wrote:
Anyway - I've looked at the docs for boost.log and read some review comments to see if we could replace this with the proposed boost.log. I've reached the following (possibly wrong) conclusions:
1) there isn't a direct mapping from this style of logging
Except the filtering based on the message text, I'd say your case is covered by Boost.Log rather well. Some things, like having a source per severity level, are not required with Boost.Log.
Agreed. I should not have separated this from item 2 below.
2) the obvious approach of making each element of the channel identifier an attribute would seem ok - but I'm concerned that the cost of evaluating the filters would be significant
The cost will be more significant than comparing a pointer against NULL, that's for sure.
That is my major concern. In a server app with detailed logging/tracing available and many sessions generating messages the ratio of "disabled" "channels" to enabled ones can be very large, making the performance of the former very significant. However, I can't see a way to directly incorporate this sort of performance with the other desirable features of your library. Attempting to represent this scheme as some sort of specialised attribute (value) and filter still incurs a level of overhead in the attribute lookup and filter expression processing that is undesirable. Extending your lib by make our existing channel based log type into a logger feature and implementing open_record such that it applies the quick channel based filter first and returns a null record if the channel/logger is disabled would seem to be the best approach. It would also be desirable to use the set of sinks for which the channel filter passed to short circuit the core filtering by adding the set of sinks "attached" to the channel as an attribute of the message. Does that sound workable?
4) it isn't immediately clear how to support dynamic configuration of "filtering" and sinks
Well, the library allows to add/remove sinks, filters and formatters during the run time. You can simply call the interface methods as you would do it on the library configuration stage.
Ok.
5) attributes seem a little overloaded - the attribute as something to be filtered on and the attribute as something to be included in (formatted in to) the logged message roles are potentially different. It would be nice to have expensive to evaluate attributes that played no role in filtering but that were (after filtering "passed" the message) evaluated conditionally based on filtering.
It looks to me that it's already implemented, but not exposed to the user (since it's only an optimization). If an attribute is not used during filtering, and the filter rejects the record, the attribute is not evaluated at all.
Good.
6) There are use cases that boost.log can support (ones where the filtering is based on more "dynamic" attributes) that are not practical to support in the pure channel based model afaiks. I'd class most of these as "alarms" where there is some sort of thresholding or similar criteria applied to an attribute value (or combinations) to determine whether to log the message.
Hmm, not sure I understand what you mean here. Severity levels, perhaps? But you do use them, although they are a part of the channel name in your case.
I mean that your library makes it possible to write a filter that logs only if eg. packet_length is over 4K (assuming packet_length is an attribute) which is not possible using the early/cached evaluation implicit in the pure channel based approach. Using channel based filtering it is necessary to classify "packet_length > 4096" into some channel (maybe "comms.messages.warning" or similar, externally to the logging system, with less flexibility/configurability than boost.logging offers. This is one reason I'd like to be able to combine both approaches. regards Darryl.

On 03/14/2010 07:20 AM, Darryl Green wrote:
Extending your lib by make our existing channel based log type into a logger feature and implementing open_record such that it applies the quick channel based filter first and returns a null record if the channel/logger is disabled would seem to be the best approach. It would also be desirable to use the set of sinks for which the channel filter passed to short circuit the core filtering by adding the set of sinks "attached" to the channel as an attribute of the message.
Does that sound workable?
The trick with the quick filter in the logger feature is workable, and, in fact, should not be hard to implement, given that you already have the code that does filtering. But short circuiting the core is not doable without considerable changes in the core, log record and, probably, the logger. Also, this would make the loggers tied to the sinks, which may be just fine for your case, but not for Boost.Log in general. On the other hand, if your logging setup is completely static, in the sense that if a log record from a log source passes filtering, it is safe to say it will always pass, and if it gets rejected, then all log records from this source will be rejected, then it might be enough for you to just cache the result of filtering in the log source. That way you'll be able to quickly drop records that would be dropped by actual filters anyway. In order to support dynamic configuration updates, those cached filtering results would have to be reset. It should be easy to do it by a signal (using Boost.Signals, for example).

On Sun, 2010-03-14 at 14:21 +0300, Andrey Semashev wrote:
On 03/14/2010 07:20 AM, Darryl Green wrote:
Extending your lib by make our existing channel based log type into a logger feature...
The trick with the quick filter in the logger feature is workable,
But short circuiting the core is not doable without considerable changes in the core, log record and, probably, the logger. Also, this would make the loggers tied to the sinks, which may be just fine for your case, but not for Boost.Log in general.
I'm not quite following what coupling you are concerned about. Our existing logger has a shared ptr to a (maintained by the filtering logic) container of shared ptrs to sinks which seems harmless enough? Even if the attribute value isn't literally a container of pointers to sinks, it would surely be possible to have an attribute value identifying (with help from a sink front end filter that checked the attribute) which sinks the logger fast filter had selected?
On the other hand, if your logging setup is completely static, in the sense that if a log record from a log source passes filtering, it is safe to say it will always pass, and if it gets rejected, then all log records from this source will be rejected, then it might be enough for you to just cache the result of filtering in the log source. That way you'll be able to quickly drop records that would be dropped by actual filters anyway.
True. Except that it isn't completely static...
In order to support dynamic configuration updates, those cached filtering results would have to be reset. It should be easy to do it by a signal (using Boost.Signals, for example).
Conceptually yes, that would work.

On 03/14/2010 02:50 PM, Darryl Green wrote:
On Sun, 2010-03-14 at 14:21 +0300, Andrey Semashev wrote:
But short circuiting the core is not doable without considerable changes in the core, log record and, probably, the logger. Also, this would make the loggers tied to the sinks, which may be just fine for your case, but not for Boost.Log in general.
I'm not quite following what coupling you are concerned about. Our existing logger has a shared ptr to a (maintained by the filtering logic) container of shared ptrs to sinks which seems harmless enough?
My understanding was that your loggers had a set of pointers to sinks, which does not change, unless the configuration is updated (which was the reason of higher performance). This is the coupling I was talking about - a logger is bound to the fixed set of sinks, and in no way it can emit a record that would pass to a sink outside of this set. If my understanding is not correct, and the set of sinks is dynamically composed during filtering, then it doesn't differ from what Boost.Log currently does.
Even if the attribute value isn't literally a container of pointers to sinks, it would surely be possible to have an attribute value identifying (with help from a sink front end filter that checked the attribute) which sinks the logger fast filter had selected?
I think, you've missed an important thought before this, which makes me hard to understand what you meant here. Again, if your setup is static, in the sense I described earlier, you can cache the set of sinks which consumed the particular record (since you can always be sure that the set will always be the same). However, that won't speed things up, since even if you pass that cached information to the filter with the next record, it will have to find its sink in it, hence you will still have a lookup. And since your setup is not static in the first place, all these caching dances are moot anyway.

On Sun, 2010-03-14 at 15:12 +0300, Andrey Semashev wrote:
On 03/14/2010 02:50 PM, Darryl Green wrote:
On Sun, 2010-03-14 at 14:21 +0300, Andrey Semashev wrote:
My understanding was that your loggers had a set of pointers to sinks, which does not change, unless the configuration is updated (which was the reason of higher performance). This is the coupling I was talking about - a logger is bound to the fixed set of sinks, and in no way it can emit a record that would pass to a sink outside of this set.
Right. Although the use of "fixed" is (as you do say when you say it can be updated) overstating it a bit. I'm not really seeing why this is a problem for boost log - I've pretty much convinced myself I could graft the scheme I need in front of boost log without breaking anything.
Even if the attribute value isn't literally a container of pointers to sinks, it would surely be possible to have an attribute value identifying (with help from a sink front end filter that checked the attribute) which sinks the logger fast filter had selected?
I think, you've missed an important thought before this, which makes me hard to understand what you meant here.
If rather than having the logger use a set of pointers to route to sinks directly, it delivered its set of "pointers" as an attribute that sink filters could check for a match, then there would be no coupling issue. The "pointers" could, be, for example, a sequence of assumed sink names - but the filtering could chose to relate names to sinks in any way.
Again, if your setup is static, in the sense I described earlier, you can cache the set of sinks which consumed the particular record (since you can always be sure that the set will always be the same). However, that won't speed things up, since even if you pass that cached information to the filter with the next record, it will have to find its sink in it, hence you will still have a lookup.
Yes - that is similar filtering/performance wise to what I described above.
And since your setup is not static in the first place, all these caching dances are moot anyway.
It is as static as filters are. Which is very static compared to log record generation rate, so caching is an excellent strategy. Unfortunately I've also pretty much convinced myself that while the idea of attributes is powerful I don't much care for the way they are provided by boost log, so I won't be pursuing the sort of integration that we have been discussing. See my review for details.

I didn't think I'd have enough time to give this lib the attention it deserves, but as it turns out I've become sufficiently frustrated trying to use it that it has made reviewing easier... The library provides a good set of sinks out of the box, and it is relatively easy to add more. The interface used to actually log through is ok but it would be nice to make it easier to tailor to specific requirements. Given that filters have to expect that they will receive records from any/all components of an application that use boost log they need to: 1) very efficient 2) not fail (especially not throw) when expected attributes are not present or (where this is possible) attributes are not of the expected type/format. The latter aspect I comment on more below. I don't believe that the library achieves either of these. In addition the decision to avoid the coupling that would be required if exact attribute types were shared between the producer of the record and the filters seems reasonable - but it uglifies and complicates attribute access. In practice I don't believe it is required (there is no logging example or library provided attribute where knowledge of the type is not, in practice required/shared). It is however, clearly necessary that the actual set of attributes not be required and that attributes / types not known to the filters should be able to be ignored by the filters. I don't think these constraints should be too hard to meet while addressing 1 and 2 above and improving attribute access. If attribute access was simpler I wouldn't really care about the filter "lambda" or the formatting facilities as there would be no reason to use any logging specific facilities for these purposes. This would be an improvement as it avoids introducing yet another domain specific idiom into user code. Given the library scope there needs to be far more documentation, but I would rather see a smaller library that uses other libraries and established idioms (by which I do not mean printf :) instead of (re)inventing them. This comment is primarily directed at the filtering and formatting facilities, and the main barrier to removal of the existing facilities is the attribute access "quirks". I would really like to see mechanism where an enabling filter can be applied to loggers based on one or more constant valued attributes such that if the filter does not match the cost of invoking the non-matching loggers is very low. I've read the docs and some of the implementation. I've complied the lib and run a examples using gcc 4.3 As others have noted compilation speed is not a great feature of this library. It is a utility library and several users have said that they find it to be useful. However it isn't usable for me for some important use cases (the efficiency of "disabled" logs issue). I'm also concerned that this exact issue, as well as some of the filtering and formatting issues, would become even more significant and apply to many uses of the library if logging via this library was to become pervasive/the norm in widely used libraries (ie if in an application using a number of libraries the logging for all of them was handled via boost log). For these reasons I vote against the inclusion of the library in boost. I thought about making this a conditional yes based on addressing some of the above - but as this would mean removing some and radically changing other parts of the interface of the library this doesn't seem like a workable approach. Darryl

On 03/17/2010 05:30 PM, Darryl Green wrote:
Given that filters have to expect that they will receive records from any/all components of an application that use boost log they need to:
2) not fail (especially not throw) when expected attributes are not present or (where this is possible) attributes are not of the expected type/format. The latter aspect I comment on more below.
The filters/formatters can be set up in order not to throw exceptions.
Given the library scope there needs to be far more documentation,
Even more docs? What particular topics are not covered now that you'd like to see?

Vladimir Prus wrote:
Boosters,
2. The reviews and all comments should be submitted to the developers list, boost@lists.boost.org and the email should have "[log]" prefix in the subject to make sure it's not missed. The deadline for reviews is
23:59, March 17, PST = 2:59, March 18, EST = 7:59 March 18, GMT = 10:59, March 18 MSK
Unfortunately I left on a business trip the day the review period started and return this morning. I'd like to submit a late review if that is acceptable. We are currently using this library in a real project and so if my issues have not been addressed in other comments I'd like to make sure I raise them. I had previously reviewed at least one other log library effort and have used both of John Torjo's libraries in various projects. If a late review is acceptable please give me a reasonable cut-off date for providing it. Thanks, Jamie

On Friday 19 March 2010 04:34:04 Jamie Allsop wrote:
Vladimir Prus wrote:
Boosters,
2. The reviews and all comments should be submitted to the developers list, boost@lists.boost.org and the email should have "[log]" prefix in the subject to make sure it's not missed. The deadline for reviews is
23:59, March 17, PST = 2:59, March 18, EST = 7:59 March 18, GMT = 10:59, March 18 MSK
Unfortunately I left on a business trip the day the review period started and return this morning. I'd like to submit a late review if that is acceptable. We are currently using this library in a real project and so if my issues have not been addressed in other comments I'd like to make sure I raise them.
I had previously reviewed at least one other log library effort and have used both of John Torjo's libraries in various projects. If a late review is acceptable please give me a reasonable cut-off date for providing it.
Jamie, your review will be welcome, but if possible, please focus on the new issues you have discovered -- e.g. the state of docs and implementation is pretty much well established, so it's only important to know how the library fits your real use cases. I hope to announce the review results around: 23:59, March 24, PST = 2:59, March 25, EST = 7:59 March 25, GMT = 10:59, March 25 MSK All follow-up comments or late reviews better be submitted at least 24 hours before this date to be included. Is this fine with you? - Volodya
participants (4)
-
Andrey Semashev
-
Darryl Green
-
Jamie Allsop
-
Vladimir Prus