
Hello Philippe, Do you also plan to tackle the problem with QueryPerformanceCounter() on multi-core systems? QPC reports problematic/mismatching values on certain multi-core CPUs (e.g. Athlon X2). Cheers, Stephan ----- Original Message ----- From: "Philippe Vaucher" <philippe.vaucher@gmail.com> Newsgroups: gmane.comp.lib.boost.devel Sent: Sunday, October 29, 2006 6:16 PM Subject: Boost Timer Update

Hello, If I get it right there's nothing much I can do about it ? I don't think multi-core CPU defines some macro which I could test to indicate the user he'd not use the QueryPerformanceCounter timer.... Anyway 99% of the users are supposed to use microsec_timer which use posix_time::microsec_clock... the QPC, timeGetTime() will be there for people really needing those. What's your opinion ? Philippe

I forgot to add my plan was simply to mention the issue in the documentation, and emphasize the fact that microsec_timer is the "main" timer and that the others ones are there for completeness' sake. Philippe

On 10/31/06, Philippe Vaucher <philippe.vaucher@gmail.com> wrote:
You could use SetThreadAffinity to force QPC to only run on one core, although I'm not sure what other ramifications to the timer's design and use this might have. http://msdn.microsoft.com/library/default.asp?url=/library/en-us/directx9_c/... --Michael Fawcett

This is interesting but unfortunately it'd mean that the whole thread runs on one core, which very likely most programmer won't be happy with.... and running the QPC timer in a thread of his own just looks overkill to me. Maybe a mid solution would be to provide some macro allowing the user to make the lib automatically use SetThreadAffinityMask... but I think that simply mentionning the issue in the documentation is better. Is there really that much of a need for a QPC based timer ? In my current state of mind I really provide it as an alternative to the microsec_timer for those who specifically need it, but microsec_timer is portable and offers the same resolution than QPC... Philippe

On 10/31/06, Philippe Vaucher <philippe.vaucher@gmail.com> wrote:
Definitely something the user should be made aware of. I agree it's overkill for most, but it might be something user's want (see below).
Is that really the case? Microsoft's own documentation states: "The default precision of the timeGetTime function can be five milliseconds or more, depending on the machine. You can use the timeBeginPeriod and timeEndPeriod functions to increase the precision of timeGetTime. If you do so, the minimum difference between successive values returned by timeGetTime can be as large as the minimum period value set using timeBeginPeriod and timeEndPeriod. Use the QueryPerformanceCounter and QueryPerformanceFrequency functions to measure short time intervals at a high resolution." I have not done any tests to verify that QPC is indeed more accurate over short intervals, but if that is the case, it should be provided I think. Note that games often base their physics calculations off of elapsed time per frame and they need to behave the same no matter the framerate. These intervals are often as small as 0.003 seconds, sometimes smaller. Perhaps of interest, NVIDIA has a Timer Function Performance test app that shows the performance of various timing methods. I have no clue if the benchmark is written well, but speed of the actual timing function may be of interest to some users as well as its precision. http://developer.nvidia.com/object/timer_function_performance.html --Michael Fawcett

microsec_clock doens't use timeGetTime()... it uses GetSystemTime() if I remember correctly. QueryPerformanceCounter has indeed a better resolution than timeGetTime(), it also has less overhead... but unfortunately I don't know how it compares to GetSystemTime(). I will have to run some tests to determinate. At the moment my code offers : - microsec_timer, which uses boost::posix_time::microsec_clock which is itself based on GetSystemTime on windows and gettimeofday() on linux. I think that'd be the timer that most of the users should use. - second_timer, which uses boost::posix_time::second_clock, which I forgot what it was using. - qcp_timer, only available under windows, which uses QueryPerformanceCounter. - tgt_timer, only available under windows, which uses timeGetTime. And then I plan to add clock_timer which would use std::clock... about GetTickCount() I don't think it'd be worth adding it as it's the worse win32 timer that exists. I'll give a shot to the nvidia timer test thing in the next days. Philippe

Philippe Vaucher wrote:
microsec_clock doens't use timeGetTime()... it uses GetSystemTime() if I remember correctly.
I haven't tried it but according to various links, GetSystemTime() only gives ~10ms or ~15ms precision (eg. http://discuss.fogcreek.com/joelonsoftware3/default.asp?cmd=show&ixPost=85520). After Googling, I found this old but useful article on timers: http://www.ddj.com/dept/windows/184416651 In particular, this table summarizes the resolution: http://www.ddj.com/showArticle.jhtml?documentID=win0305a&pgno=17 Maybe GetSystemTime() should be renamed "centisec_timer" :) My personal vote would for microsec_timer to be implemented based on QueryPerformanceCounter() and just make sure we point to the MSDN documentation. Regards, -Edward

Thank you for the links! Some results looks bizarre tho, like timeGetTime() at the same level than GetTickCount() seems quite weird to me. The graphs also seems to indicate QPC has one of the largest call overhead, would it be a concern ? Anyway, this makes me wonder what is the rational behind using GetSystemTimeAsFileTime() to implement date_time::microsec_clock ? Does anyone know (or could an author entlighten us) ? This also raise another question, if we chose to implement microsec_timer with QPC, what will we name the timer implemented with posix_time::microsec_clock (which is just a typedef for date_time::microsec_clock) ? Philippe

Philippe Vaucher wrote:
Sorry to be so quiet...it's not that I'm not interested, just too busy to really contribute. Anyway, the reason for GetSystemTimeAsFileTime is that it works. And it doesn't suffer from the rollover issue and the stability issues that the QPC stuff does in the face of multi-threading. So it just works and it gives enough precision for 90% of the applications.
I'll be happy to replace the current microsec_clock implementation as long as you can deal with rollover and multi-threading. Oh, and you can't add any data members to the ptime which might expand it's size. And since the microsec_clock is all static there could be some complications if you need to store data to handle rollover. QPC based implementation been discussed for years, but I've never seen anyone actually implement something that works as reliably as GetSystemTimeAsFileTime.... Jeff

Jeff Garland wrote:
I'll be happy to replace the current microsec_clock implementation as long as you can deal with rollover and multi-threading.
I guess it depends on what you're using the timer for. I really don't mind either way as long as the behaviour is well-documented. When I'm using a "microsecond timer", it is because I want to measure short intervals, not long ones. But perhaps there should be clearer ways of specifying the precision/accuracy/resolution trade-offs desired? Philippe Vaucher wrote:
On second thought, apparently Java calls their QPC implementation is called "nano" instead. Perhaps we could use that terminology as well? On the QPC implementation, someone here (http://channel9.msdn.com/ShowPost.aspx?PostID=156175) suggested that it's might be good enough to call SetThreadAffinityMask() to cpu 0 and then set it back to the old value when done. I have no idea what type of overhead it imposes though. For more robust QPC implementations, the following two links have some more ideas. As Jeff suggests here, this is not new but this is the first time I've really looked seriously at the issue as QPC always worked well enough for my profiling purposes. http://msdn.microsoft.com/msdnmag/issues/04/03/HighResolutionTimer/ http://support.microsoft.com/default.aspx?scid=KB;EN-US;Q274323& Regards, -Edward

At the moment I think about an api like this : - portable: microsec_timer: documented as most robust timer, not the best resolution one, good enough for 90% of the timings. - portable: second_timer: this one is kindof obvious, no need to describe much - portable: clock_timer: timer based on std::clock, documentation about clock() issue and resolutions - windows: qpc_timer: documented as best resolution, not thread safe & multi core problem. Hability to set a macro that automatically calls SetThreadAffinityMask() - windows: tgt_timer: documented as a good alternative to qpc ( I think tgt > GetSytemTimeAsFileTime ) - To discuss but I may add those timers : - windows: gst_timer: uses GetSystemTimeAsFileTime (actually microsec_clock already uses gst but it may change so why not offer an explicit one). - windows: gtc_timer: uses GetTickCount() - linux: gtod_timer: uses gettimeofday() (actually microsec_clock already uses gtod but it may change so why not offer an explicit one). If you know of other linux timers that'd be worth using please mention them thank you. I think that if each timer type is well documented about resolution, thread safety, multi-core issues and all the others gotcha's those timers are giving us headaches about, then the user can choose the timer he wants and take the risk himself. Of course I think we should suggest the users to use microsec_timer for portable code. I kinda like this approach because the user is free to make his choices and aware of the issues... At the moment I'm not very happy about using QPC for microsec_timer because it looks like the pros (satisfies some people, not the majority?) doesn't outweight the cons (all the issues about QPC). On second thought, apparently Java calls their QPC implementation is called
"nano" instead. Perhaps we could use that terminology as well?
I already thought a bit about this and at start my timers were named nanosec_timer or smth, but then this idea collided with date_time::microsec terminology and I decided to be consistent with other parts of boost. Philippe

Hello guys, First of all, happy new year ! :) I thought it would be time for a little update. I structured the whole a bit, and now there are 8 different timers : Portable : - typedef timer<microsec_device> microsec_timer; // boost::bla::microsec_clock - typedef timer<second_device> second_timer; // boost::bla::second_clock - typedef timer<clock_device> clock_timer; // std::clock() Windows: - typedef timer<qpc_device> qpc_timer; // QueryPerformanceCounter() - typedef timer<tgt_device> tgt_timer; // timeGetTime() - typedef timer<gstaft_device> gstaft_timer; // GetSystemTimeAsFileTime() - typedef timer<gtc_device> gtc_timer; // GetTickCount() POSIX: - typedef timer<gtod_device> gtod_timer; // gettimeofday() device I created a tree like this at the moment : boost/timer.hpp boost/timer/devices.hpp boost/timer/implementation.hpp boost/timer/typedefs.hpp boost/timer/devices/clock.hpp boost/timer/devices/date_time.hpp boost/timer/devices/GetSystemTimeAsFileTime.hpp boost/timer/devices/GetTickCount.hpp boost/timer/devices/gettimeofday.hpp boost/timer/devices/QueryPerformanceCounter.hpp boost/timer/devices/timeGetTime.hpp And this brought me to some questions : What do you guys think about the structure ? Should we pollute the boost namespace ? Should I create a "timer" namespace inside the boost one ? should I create a "devices" namespace inside the timer one ? I also have another question, can someone point me to another POSIX/linux timing api beside gettimeofday ? Also, at start I wanted to provide some way to get the overhead/resolution of each devices from within the code, but then I removed it because I realized it's more trouble than it's worth, not to mention it probably won't be used. I decided to describe the overhead, resolution, pros/cons and issues in each device's header and in the incoming documentation instead. Thank you, Philippe p.s: win32 apis like GetProcessTimes() or GetThreadTimes() weren't supported because there isn't much benefit from having them and they don't exist on win9x. Tell me if you think I'd add them anyway.

Oh, I forgot to ask about the name of the devices. "gstaft_device" for GetSystemTimeAsFileTime(), "qpc_device" for QueryPerformanceCounter(), "gtod_device" for gettimeofday()... do you guys think it's ok ? I'm not very happy with this naming but I can't find anything that sounds better. I'm listening for ideas :) Philippe

I somehow guessed that answer, and I think I agree with it. Personnaly I'm for letting it as it is, and more or less let my code as it is, that means that people will use microsec_timer that uses microsec_clock for robust & good quality code. If they really want more precision, they can use qpc_timer, which will be documented so people know the issues about it. What do you think of this guideline ?
Well with all the conditions and what I know from QPC, it looks like an unpossible challenge, and if I succeed it's likely that the overhead will be huge. I think I'll just go with the "convenience" solution where I offer to already do the SetThreadAffinityMask calls for the users if he wants to etc. Philippe

Michael Fawcett wrote:
For what it's worth, I tried running the test on a dual processor Xeon, a dual core Athlon 64, and a single core Celeron D. In all cases was QueryPerformanceCounter the slowest by at least a factor 5 to the closest. GetTickCount was the fastest, and timeGetTime and the Pentium counter traded places in the middle depending on computer. -- Daniel Wesslén

Yes, but this test seems to measure the api overheard and not the timer's precision... I don't know how much having a big api overhead causes trouble over timing small intervals, but I expect that the better resolution of QPC outweight its api overhead. Tell me if I didn't understand something. Philippe

Me again ! :) I did a small wiki for boost::timer which is better than nothing about its current documentation :) You can see it at http://www.unitedsoft.ch/boost/wiki/ Philippe

Philippe Vaucher wrote:
Indeed it does.
As would I. It was surprising to me that the method that seems to be meant to be used for small intervals has such a comparatively large overhead.
Tell me if I didn't understand something.
Nono. I didn't mean much by it, just thought I'd provide the information for completeness. Hence "for what it's worth." -- Daniel Wesslén

I'm not sure if you've read these already or not but: http://www.gamedev.net/reference/programming/features/timing/ and http://www.ddj.com/dept/windows/184416651 Thanks, Michael Marcin

On 11/1/06, Michael Marcin <mike@mikemarcin.com> wrote:
That's a great link. It illustrates in tables the results of many topics of discussion in this thread, notably the accuracy of the timing functions, and the call overhead itself. Clearly the user must make the choice based on his situation. I think Philippe was already working with that in mind. Thanks, --Michael Fawcett
participants (7)
-
Daniel Wesslén
-
Edward Lam
-
Jeff Garland
-
Michael Fawcett
-
Michael Marcin
-
Philippe Vaucher
-
Stephan Kaiser