Some simple logging performance tests

Using GetTickCount and GetThreadTimes on Windows XP Pro, recorded time to execute some simple pieces of logging code in a loop. Divided by loop iterations to get time per iteration. This is a 3Ghz Pentium. Not sure how good these tests are, but here's the results: results: elapsed ms thread time ms ns / call (GetTickCount) (GetThreadTimes) (in thread) cout 1329 421 42100 enabled log 1469 453 45300 cout "disabled" 15 15 15 disabled log 531 531 531 Here's a little detail on tests cout, 10,000 times, l_dw increments each loop: cout << l_dw << endl; enabled log, 10,000 times: //outside loop: enable_logs("lgTest"); //in loop: BOOST_SCOPEDLOG(l_lgTest) << l_dw << endl; cout "disabled", 1,000,000 times: //outside loop: bool l_bEnabled = false; //in loop: if(l_bEnabled) { cout << l_dw << endl; } disabled log, 1,000,000 times: //outside loop: disable_logs("lgTest"); //this not in loop //in loop: BOOST_SCOPEDLOG(l_lgTest) << l_dw << endl;

Using GetTickCount and GetThreadTimes on Windows XP Pro, recorded time to execute some simple pieces of logging code in a loop. Divided by loop iterations to get time per iteration. This is a 3Ghz Pentium. Not sure how good these tests are, but here's the results:
Thanks Andrew, some useful results.
results: elapsed ms thread time ms ns / call (GetTickCount) (GetThreadTimes) (in thread) cout 1329 421 42100 enabled log 1469 453 45300 cout "disabled" 15 15 15 disabled log 531 531 531
I have been following all of the logging discussions and have taken at face value that disabled logs do not contribute a significant amount of time to the results, so I'm surprised about the disabled logs scenario being far worse than a disabled cout.
Here's a little detail on tests
Could you post the source rather than the pseudo code, just to help speed up the investigations of others (I know the code is simple to write!) In general, my concern with any of the logging libraries I've looked at, is their ability to provide a lightweight capability (speed and footprint) where needed e.g. in real-time embedded applications. John seems to care about this, but I confess at present having spent only a couple of hours with the existing code, it's still not clear what bits are modular 'extras' and what bits I have to keep. (Though recent posts between John and Gennadiy have helped this understanding somewhat.) I wanted to thank John, encourage him to continue with this essential library, but register my concern that the API probably needs to change (I agree with some of Gennadiy's comments) and the docs/testing need a fair bit more work (some of which really needed work before the review). I don't want to send the wrong message by putting in a negative review but equally I'm nervous about accepting things as is. I'm going to fence sit for now and not put in a review :) I guess this is a wider decision than the logging library. Do I accept a library that I suspect ought to go through some API changes, or do I call for a subsequent mini-review but accepting things in principle. The implication is that accepting it now may imply future support for an API we may wish to deprecate. Paul

Andrew Schweitzer <a.schweitzer.grps <at> gmail.com> writes:
Using GetTickCount and GetThreadTimes on Windows XP Pro, recorded time to execute some simple pieces of logging code in a loop. Divided by loop iterations to get time per iteration. This is a 3Ghz Pentium. Not sure how good these tests are, but here's the results:
results: elapsed ms thread time ms ns / call
(GetTickCount)
(GetThreadTimes) (in thread) cout 1329 421 42100 enabled log 1469 453 45300 cout "disabled" 15 15 15 disabled log 531 531 531
Compiler/version? Optimization settings? How long does a single or a very small number of iterations of the disabled cout and disable log take?
Here's a little detail on tests cout, 10,000 times, l_dw increments each loop: cout << l_dw << endl;
enabled log, 10,000 times: //outside loop: enable_logs("lgTest"); //in loop: BOOST_SCOPEDLOG(l_lgTest) << l_dw << endl;
cout "disabled", 1,000,000 times: //outside loop: bool l_bEnabled = false; //in loop: if(l_bEnabled) { cout << l_dw << endl; }
The disabled cout test, if it really is as simple a bit of code as that should take something close to 0 time to execute - which it does. If you put the if (l_bEnabled) test in a separate compilation unit to the loop body is performance the same?
disabled log, 1,000,000 times: //outside loop: disable_logs("lgTest"); //this not in loop //in loop: BOOST_SCOPEDLOG(l_lgTest) << l_dw << endl;
This involves at least the creation and destruction of a small temporary object, a function call to a separate compilation unit and an inderection through a shared ptr (not expensive) 500 odd ns seems broadly ok for that I guess - so something like 2 million disabled log messages/second.... Don't accidentally enable all logs/that log.... If you have a system with 100 logs, and only 1 enabled, the other 99 logs (being called at the same rate) are using the same amount of CPU as your 1 enabled log - not too bad - not as good as it could be either I'd guess. Some performance test should be added to the code before it is released - what enabled/disabled performance ratio do you think is needed for a "pass"? What sort of "nop function" do you think the disabled log needs to be equivalent to to pass?

[ Sent previously and canceled because I forgot to compress the text files - Pardon me if you get two copies ] On 11/15/05, Darryl Green <darryl.green@unitab.com.au> wrote:
Compiler/version? Optimization settings? How long does a single or a very small number of iterations of the disabled cout and disable log take?
Here is a small program I just wrote that might be useful to profile and benchmark the logging library. The program accepts the following command line arguments: -b | --bypass Bypass all logging; used as a control -d | --disable Disable the log -f | --file <filename> Send output to the named file -i | --iostreams Use a std::ofstream for output instead of Boost.Log. Can be combined with -d and -f to help benchmark raw iostreams vs. Boost.Log. The program uses SIGALRM to interrupt the benchmarking loop, so I'm not sure if it will work properly on Windows. I had tried a more portable implementation that checked for the boost::timer.elapsed time to reach 1.0, but the overhead of checking the clock overwhelmed the rest of the test so I scrapped that. In the code, I'm also not making use of the Boost.Log supplied "write_to_file" appender, which IMHO is fundamentally broken. It opens the file for append, writes a message, and then closes the file for every message it logs. I've provided an implementation called "my_write_to_file" which leaves the file open for the appenders lifetime. It does no explicit flushing on the stream. Here's the output when compiled with g++ 3.3.4 -O2 on a GNU/Linux with dual P4 Xeon 3.2 GHz and run with various flags: % ./boost-log-test Using enabled Boost.Log, output to (nowhere) 542303 logging operations in 0.99 sec =3D 547781 ops/sec or 1.82555e-06 sec/op % ./boost-log-test -d Using disabled Boost.Log 49492447 logging operations in 1 sec =3D 4.94924e+07 ops/sec or 2.02051e-08 sec/op % ./boost-log-test -f /dev/null Using enabled Boost.Log, output to /dev/null 484558 logging operations in 1 sec =3D 484558 ops/sec or 2.06374e-06 sec/op % ./boost-log-test -f /tmp/test.log Using enabled Boost.Log, output to /tmp/test.log 465831 logging operations in 1 sec =3D 465831 ops/sec or 2.1467e-06 sec/op % ./boost-log-test -d -i Using disabled iostreams 789447209 logging operations in 1.02 sec =3D 7.73968e+08 ops/sec or 1.29204e-09 sec/op % ./boost-log-test -i -f /dev/null Using enabled iostreams, output to /dev/null 5969546 logging operations in 1 sec =3D 5.96955e+06 ops/sec or 1.67517e-07 sec/op % ./boost-log-test -i -f /tmp/test.log Using enabled iostreams, output to /tmp/test.log 4223268 logging operations in 0.99 sec =3D 4.26593e+06 ops/sec or 2.34416e-07sec/op I've also profiled this code, and found some not-too-suprising results (also attached): - When the logger is enabled, the bulk of this program's time is spent in boost:logging::enabled_logger::~enabled_logger, which is calling write_msg with the result of std::ostringstream::str. - When the logger is disabled, much of the time is spent in ~enabled_logger as well as the is_enabled check Overall, the performance of the Boost.Log library when output is enabled is almost an order of magnitude slower than just writing the same data to a standard iostream. I believe the overhead of the enabled_logger design (object creation, heap allocation of an ostringstream every time a message is logged) might bear some re-examination. Is the enabled_logger object really necessary? Can the ostringstream be created on the caller's stack instead of the heap (e.g. by re-writing the BOOST_LOG macro) or perhaps re-used? I believe that the syntax of the logging macros (e.g. BOOST_LOG (x) << ...), while clever, contributes to the relatively poor performance of the library when compared to bare iostreams. I would suggest that by enclosing the entire logging statement in the macro invocation (e.g. BOOST_LOG(x, ...) ), a more optimal, feature-rich implementation is be possible. It becomes exceedingly simple to implement compile-time removal of logging calls with the "enclosed" macro-based interface, obviating the need for the is_compile_time complexities in the current code. Additionally, messages might be logged with __FILE__ and __LINE__ information automatically (if so desired). I'm sure there are other benefits (and weaknesses) that I'm overlooking as well. Hope this helps, -- Caleb Epstein caleb dot epstein at gmail dot com

I've also profiled this code, and found some not-too-suprising results (also attached):
- When the logger is enabled, the bulk of this program's time is spent in boost:logging::enabled_logger::~enabled_logger, which is calling write_msg with the result of std::ostringstream::str.
- When the logger is disabled, much of the time is spent in ~enabled_logger as well as the is_enabled check
Overall, the performance of the Boost.Log library when output is enabled is almost an order of magnitude slower than just writing the same data to a standard iostream. I believe the overhead of the enabled_logger design (object creation, heap allocation of an ostringstream every time a message is logged) might bear some re-examination. Is the enabled_logger object
Yes, heap allocation does consume a lot of time.
really necessary? Can the ostringstream be created on the caller's stack instead of the heap (e.g. by re-writing the BOOST_LOG macro) or perhaps re-used?
I believe that the syntax of the logging macros (e.g. BOOST_LOG (x) << ...), while clever, contributes to the relatively poor performance of the library when compared to bare iostreams. I would suggest that by enclosing the entire logging statement in the macro invocation (e.g. BOOST_LOG(x, ...) ), a more optimal, feature-rich implementation is be possible. It becomes exceedingly simple to implement compile-time removal of logging calls with the "enclosed" macro-based interface, obviating the need for the is_compile_time complexities in the current code. Additionally, messages might be logged with __FILE__ and __LINE__ information automatically (if so
Yes, I'll allow for __FILE__, __LINE__.
desired). I'm sure there are other benefits (and weaknesses) that I'm overlooking as well.
Thanks Caleb. I will definitely use this as a start, when implementing the new version of the logging lib. A lot of optimizations can be done, especially on the macro area. Best, John -- John Torjo, Contributing editor, C/C++ Users Journal -- "Win32 GUI Generics" -- generics & GUI do mix, after all -- http://www.torjo.com/win32gui/surfaces.html - Sky's the limit! -- http://www.torjo.com/cb/ - Click, Build, Run!

Whoops! Those were debug build numbers. Here's release build numbers. These and the debug tests are is vc7.1, using whatever optimization settings the default project contains. Again, I think these tests are just a quick check, not a thorough investigation. I'll try to post the actual test code tomorrow. results: elapsed ms thread time ms ns / call cout 10981 343 34300 enabled log 1281 203 20300 cout "disabled" 0 0 0 cout "dis" func 16 15 15 disabled log 16 15 15 In release build, I believe my contrived cout "disabled" test was optimized away. When replaced with a function call it seems to be same speed as disabled log: //in another file: bool UseCout(){return false;} //outside loop: extern bool UseCout(); //in loop: if(UseCout()) { cout << l_dw << endl; } Andrew Schweitzer wrote: [snip]
recorded time to execute some simple pieces of logging code in a loop. Divided by loop iterations to get time per iteration. This is a 3Ghz Pentium. Not sure how good these tests are, but here's the results:
results: elapsed ms thread time ms ns / call cout 1329 421 42100 enabled log 1469 453 45300 cout "disabled"15 15 15 disabled log 531 531 531 _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Code is attached. These were built by with vc7.1 by creating a console project, adding these files, adding links to the boost libs, including boost_root. It should build for you. Please post if you try and have problems. After printing 20,000 numbers to the console it should have output like: release |Tk:|1282| Th:|155| NS/call:|15625| cout| |Tk:|1281| Th:|343| NS/call:|34375| enabled log| |Tk:|0| Th:|0| NS/call:|0| cout disabled| |Tk:|16| Th:|15| NS/call:|15| disabled log| Basically works like this: string l_sDC; disable_logs("lgTest"); START_MEASUREQ(1000000, "disabled log", &l_sDL) BOOST_SCOPEDLOG(l_lgTest) << l_ui << endl; STOP_MEASURE() BOOST_SCOPEDLOG(l_lg) << l_sDL << endl; There's a supporting cMeasureTime class that records time. Macros basically start and stop a for loop, then record the results to a string (or log if you pass in a logger).
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
bool UseCout(){return false;} #include "windows.h" #include "boost/log/log.hpp" #include <boost/log/functions.hpp> #include <boost/format.hpp> #include <iostream> using namespace std; using namespace boost::logging; using namespace boost; class cMeasureTime { public: cMeasureTime(){memset(this, 0, sizeof(*this));} void Start() { m_uiStartTick = GetTickCount(); FILETIME l_CreationTime; FILETIME l_ExitTime; if(!GetThreadTimes(GetCurrentThread(), &l_CreationTime, &l_ExitTime, (FILETIME*)&m_ui64KStart, (FILETIME*)&m_ui64UStart)) { cout << "GetThreadTimes failed: " << GetLastError() << endl; } } void Stop() { ++m_uiCalls; DWORD l_dwEndTick = GetTickCount(); m_uiTickDiff = l_dwEndTick > m_uiStartTick ? l_dwEndTick - m_uiStartTick : m_uiStartTick - l_dwEndTick; m_uiTickDiffTotal += m_uiTickDiff; FILETIME l_CreationTime; FILETIME l_ExitTime; unsigned __int64 l_ui64KEnd, l_ui64UEnd; if(!GetThreadTimes(GetCurrentThread(), &l_CreationTime, &l_ExitTime, (FILETIME*)&l_ui64KEnd, (FILETIME*)&l_ui64UEnd)) { cout << "GetThreadTimes failed: " << GetLastError() << endl; } m_ui64KDiff = l_ui64KEnd - m_ui64KStart; m_ui64UDiff = l_ui64UEnd - m_ui64UStart; m_ui64KDiffTotal += m_ui64KDiff; m_ui64UDiffTotal += m_ui64UDiff; } unsigned int GetCalls() const {return m_uiCalls;} unsigned __int64 GetTickDiffMS() const {return m_uiTickDiff;} unsigned __int64 GetKDiffHNS() const {return m_ui64KDiff;} unsigned __int64 GetUDiffHNS() const {return m_ui64UDiff;} unsigned __int64 GetThDiffHNS() const {return GetKDiffHNS() + GetUDiffHNS();} unsigned __int64 GetKDiffMS() const {return GetKDiffHNS() / 10000;} unsigned __int64 GetUDiffMS() const {return GetUDiffHNS() / 10000;} unsigned __int64 GetThDiffMS() const {return GetKDiffMS() + GetUDiffMS();} unsigned int GetTickDiffTotalMS()const {return m_uiTickDiffTotal;} unsigned __int64 GetKDiffTotalHNS() const {return m_ui64KDiffTotal;} unsigned __int64 GetUDiffTotalHNS() const {return m_ui64UDiffTotal;} unsigned __int64 GetThDiffTotalHNS() const {return GetKDiffTotalHNS() + GetUDiffTotalHNS();} unsigned __int64 GetKDiffTotalMS() const {return GetUDiffTotalHNS() / 1000;} unsigned __int64 GetUDiffTotalMS() const {return GetKDiffTotalHNS() / 1000;} unsigned __int64 GetThDiffTotalMS() const {return GetKDiffTotalMS() + GetUDiffTotalMS();} unsigned __int64 GetThNSperCall(unsigned int uiItr) const { unsigned __int64 l_ui64ThNS = GetThDiffHNS() * 100; unsigned __int64 l_uiNSperCall = 0; if(uiItr) { l_uiNSperCall = l_ui64ThNS / uiItr; } return l_uiNSperCall; } unsigned __int64 GetThNSperCallTotal() const { unsigned __int64 l_ui64ThNS = GetThDiffTotalHNS() * 100; unsigned __int64 l_uiNSperCall = 0; if(m_uiCalls) { l_uiNSperCall = l_ui64ThNS / m_uiCalls; } return l_uiNSperCall; } string ShortString(unsigned int uiItr, string strTag = "") const { format l_fmt("|Tk:|%u| Th:|%u| NS/call:|%u| %s|"); l_fmt % GetTickDiffMS() % GetThDiffMS() % GetThNSperCall(uiItr) % strTag; return l_fmt.str(); } string ShortStringTotal(string strTag = "") const { format l_fmt("|Tk:|%u| Th:|%u| NS/call:|%u| %s|"); l_fmt % GetTickDiffMS() % GetThDiffMS() % GetThNSperCallTotal() % strTag; return l_fmt.str(); } private: unsigned int m_uiCalls; unsigned int m_uiTickDiff; unsigned __int64 m_ui64KDiff; unsigned __int64 m_ui64UDiff; unsigned int m_uiTickDiffTotal; unsigned __int64 m_ui64KDiffTotal; unsigned __int64 m_ui64UDiffTotal; unsigned int m_uiStartTick; unsigned __int64 m_ui64KStart; unsigned __int64 m_ui64UStart; }; #define START_MEASURE(uiItr_, szTag_, pLog_, pMeasureTime_, pString_) \ { \ unsigned int l_uiItr = uiItr_; \ const char* l_szTag = szTag_; \ logger* l_pLog = pLog_; \ cMeasureTime* l_pMT = pMeasureTime_; \ cMeasureTime l_MT; \ if(!l_pMT){l_pMT = &l_MT;} \ string* l_pString = pString_; \ l_pMT->Start(); \ for(unsigned int l_ui = 0; l_ui < l_uiItr; ++l_ui) \ { \ #define STOP_MEASURE() \ } \ l_pMT->Stop(); \ if(l_pString) \ { \ *l_pString = l_pMT->ShortString(l_uiItr, l_szTag); \ } \ if(l_pLog) \ { \ BOOST_SCOPEDLOG(*l_pLog) << l_pMT->ShortString(l_uiItr, l_szTag); \ } \ } \ #define START_MEASUREQ(uiItr_, szTag_, pString_) \ START_MEASURE(uiItr_, szTag_, NULL, NULL, pString_) extern bool UseCout(); int main(int argc, char* argv[]) { logger l_lg("lg"); logger l_lgTest("lgTest"); manipulate_logs("*").add_appender(write_to_cout); enable_logs("lg"); set_log_caching(false); string l_sCout, l_sDL, l_sEL, l_sDC; START_MEASUREQ(10000, "cout", &l_sCout) cout << l_ui << endl; STOP_MEASURE() START_MEASUREQ(1000000, "cout disabled", &l_sDC) if(UseCout()) { cout << l_ui << endl; } STOP_MEASURE() disable_logs("lgTest"); START_MEASUREQ(1000000, "disabled log", &l_sDL) BOOST_SCOPEDLOG(l_lgTest) << l_ui << endl; STOP_MEASURE() enable_logs("lgTest"); START_MEASUREQ(10000, "enabled log", &l_sEL) BOOST_SCOPEDLOG(l_lgTest) << l_ui << endl; STOP_MEASURE() #ifdef _DEBUG BOOST_SCOPEDLOG(l_lg) << "debug" << endl; #else BOOST_SCOPEDLOG(l_lg) << "release" << endl; #endif BOOST_SCOPEDLOG(l_lg) << l_sCout << endl; BOOST_SCOPEDLOG(l_lg) << l_sEL << endl; BOOST_SCOPEDLOG(l_lg) << l_sDC << endl; BOOST_SCOPEDLOG(l_lg) << l_sDL << endl; return 0; }

After printing 20,000 numbers to the console it should have output like: release |Tk:|1282| Th:|155| NS/call:|15625| cout| |Tk:|1281| Th:|343| NS/call:|34375| enabled log| |Tk:|0| Th:|0| NS/call:|0| cout disabled| |Tk:|16| Th:|15| NS/call:|15| disabled log|
Basically works like this:
string l_sDC; disable_logs("lgTest");
START_MEASUREQ(1000000, "disabled log", &l_sDL)
BOOST_SCOPEDLOG(l_lgTest) << l_ui << endl;
STOP_MEASURE()
BOOST_SCOPEDLOG(l_lg) << l_sDL << endl;
There's a supporting cMeasureTime class that records time. Macros basically start and stop a for loop, then record the results to a string (or log if you pass in a logger).
Thanks, Andrew -- I'll use your test as well -- especially the cMeasureTime class ;) Best, John -- John Torjo, Contributing editor, C/C++ Users Journal -- "Win32 GUI Generics" -- generics & GUI do mix, after all -- http://www.torjo.com/win32gui/surfaces.html - Sky's the limit! -- http://www.torjo.com/cb/ - Click, Build, Run!

I've been profiling an application here where I work, that uses the logging library from John Torjo. With the logging disabled and using the LOG(logging_name) << (whatever to log) << ... The profiler shows that operator<< is using considerable CPU time (something like 10% of the application CPU time), so it seems that the current approach isnt sufficient. I'll probably need to create a macro that takes the logger and what to be logged and use somekind of if, just like the disabled cout example. best regards, -- Felipe Magno de Almeida Developer from synergy and Computer Science student from State University of Campinas(UNICAMP). Unicamp: http://www.ic.unicamp.br Synergy: http://www.synergy.com.br "There is no dark side of the moon really. Matter of fact it's all dark."

Please ignore this email, a typo in the disabling of the logs wasnt disabling them. As I've seen in the code, it is guarded by an if and it couldnt be called any operator<< if the log is disabled. BTW, I really liked the compile_time enabling of logs, very clever. best regards, On 11/15/05, Felipe Magno de Almeida <felipe.m.almeida@gmail.com> wrote:
I've been profiling an application here where I work, that uses the logging library from John Torjo. With the logging disabled and using the
LOG(logging_name) << (whatever to log) << ...
The profiler shows that operator<< is using considerable CPU time (something like 10% of the application CPU time), so it seems that the current approach isnt sufficient. I'll probably need to create a macro that takes the logger and what to be logged and use somekind of if, just like the disabled cout example.
best regards, -- Felipe Magno de Almeida Developer from synergy and Computer Science student from State University of Campinas(UNICAMP). Unicamp: http://www.ic.unicamp.br Synergy: http://www.synergy.com.br "There is no dark side of the moon really. Matter of fact it's all dark."
-- Felipe Magno de Almeida Developer from synergy and Computer Science student from State University of Campinas(UNICAMP). Unicamp: http://www.ic.unicamp.br Synergy: http://www.synergy.com.br "There is no dark side of the moon really. Matter of fact it's all dark."
participants (6)
-
Andrew Schweitzer
-
Caleb Epstein
-
Darryl Green
-
Felipe Magno de Almeida
-
John Torjo
-
Paul Baxter