[uuid] uuid::create() is extremely slow

Andy, You did a nice job speeding up the uuid::create(engine) function to 0.4 seconds for 1 million UUIDs. But now the uuid::create() function is extremely slow - 131 seconds for 1 million UUIDs (source code below). Can you change it to use a static mt19937 engine? Thanks, Dave Jenkins #include <boost/uuid.hpp> #include <boost/progress.hpp> #include <string> int main(int, char*[]) { using namespace boost; int const N = 1000000; // mt19937 engine; timer t1; for (int i=0; i<N; i++) uuid::create(); // uuid::create(engine); std::cout << "elapsed time: " << t1.elapsed() << '\n'; return 0; }

On 01/06/07, Dave Jenkins <david@jenkins.net> wrote:
Andy, You did a nice job speeding up the uuid::create(engine) function to 0.4 seconds for 1 million UUIDs. But now the uuid::create() function is extremely slow - 131 seconds for 1 million UUIDs (source code below). Can you change it to use a static mt19937 engine?
On a related note: doesn't reseeding the random number generator every time uuid::create() is called reduce the quality of the random numbers from the mt19937 engine? Doesn't the randomness then come from the seeding mechanism which is used? So, on the whole it would indeed be better to use a static engine if possible. Regards, Jos -- Jos Hickson Software Engineer RawFlow Inc Old Pump House | 19 Hooper Street | London E1 8BU International: +44 (0)207 480 4220 Fax: +44 (0)207 481 4343 URL: www.rawflow.com

"Jos Hickson" <jos.hickson@gmail.com> wrote in news:fc03d05a0706010913l4865ef29re605685969592243@mail.gmail.com:
On 01/06/07, Dave Jenkins <david@jenkins.net> wrote:
Andy, You did a nice job speeding up the uuid::create(engine) function to 0.4 seconds for 1 million UUIDs. But now the uuid::create() function is extremely slow - 131 seconds for 1 million UUIDs (source code below). Can you change it to use a static mt19937 engine?
There was opposition to using static objects because of threading concerns. I do want a function that doesn't require the user to create a random number generator themselves. If I use a static engine, I need to protect it with a mutex. This is not optimal because there is the overhead of locking and unlocking the mutex. Some people would rather have a static engine per thread using thread local storage. This still requires one to link with Boost.Thread. I created the uuid::create(Engine) function (great advice from this list) so that users could do whatever they want (static engine per thread, use thread local storage,...). The uuid::create() function should handle the common case and not be best for all situations. So, I guess I need help to determine what the common case is.
On a related note: doesn't reseeding the random number generator every time uuid::create() is called reduce the quality of the random numbers from the mt19937 engine?
Probably.
Doesn't the randomness then come from the seeding mechanism which is used? So, on the whole it would indeed be better to use a static engine if possible.
Regards,
Jos
Andy.

On Fri, 2007-06-01 at 17:13 +0000, Andy wrote:
"Jos Hickson" <jos.hickson@gmail.com> wrote in news:fc03d05a0706010913l4865ef29re605685969592243@mail.gmail.com:
On 01/06/07, Dave Jenkins <david@jenkins.net> wrote:
Andy, You did a nice job speeding up the uuid::create(engine) function to 0.4 seconds for 1 million UUIDs. But now the uuid::create() function is extremely slow - 131 seconds for 1 million UUIDs (source code below). Can you change it to use a static mt19937 engine?
There was opposition to using static objects because of threading concerns. I do want a function that doesn't require the user to create a random number generator themselves.
If I use a static engine, I need to protect it with a mutex. This is not optimal because there is the overhead of locking and unlocking the mutex. Some people would rather have a static engine per thread using thread local storage. This still requires one to link with Boost.Thread. I created the uuid::create(Engine) function (great advice from this list) so that users could do whatever they want (static engine per thread, use thread local storage,...).
The uuid::create() function should handle the common case and not be best for all situations. So, I guess I need help to determine what the common case is.
I take it that means that the documentation for uuid::create() should have a warning saying "only useful for situation X" or something. IMHO the responsibility of a UUID library is to provide the best possible UUIDs and if this means having a static without thread protection and the warning being about the lack of thread safety then I think that is preferable. I didn't see any of the original reviews but I gather from what you said you'd improved that there was some issue about the quality of the seeding. If so, then as uuid::create() is the only thing that uses the seeding mechanism it would seem strange to put effort into improving it if the random number generator is then compromised by the repeated reseeding. Obviously, though, if the common use case is to only ever create a single UUID for the lifetime of an application then it should probably stay as it is. FTW I think this is what our most common use case will be. If things are left as they are, however, then it might be useful to give a user access to the seeding mechanism you use so that they don't need to implement their own if they are going to use the uuid::create(Engine) method. Otherwise I'd just like to say thanks for the very useful library and as we are about to start using UUIDs the timing could not have better!
On a related note: doesn't reseeding the random number generator every time uuid::create() is called reduce the quality of the random numbers from the mt19937 engine?
Probably.
Doesn't the randomness then come from the seeding mechanism which is used? So, on the whole it would indeed be better to use a static engine if possible.
Regards,
Jos
Andy.
Jos

Jos Hickson <jos.hickson@gmail.com> wrote in news:1180722852.7102.27.camel@musashimaru.sidthecat.net:
On Fri, 2007-06-01 at 17:13 +0000, Andy wrote:
"Jos Hickson" <jos.hickson@gmail.com> wrote in news:fc03d05a0706010913l4865ef29re605685969592243@mail.gmail.com:
On 01/06/07, Dave Jenkins <david@jenkins.net> wrote:
Andy, You did a nice job speeding up the uuid::create(engine) function to 0.4 seconds for 1 million UUIDs. But now the uuid::create() function is extremely slow - 131 seconds for 1 million UUIDs (source code below). Can you change it to use a static mt19937 engine?
There was opposition to using static objects because of threading concerns. I do want a function that doesn't require the user to create a random number generator themselves.
If I use a static engine, I need to protect it with a mutex. This is not optimal because there is the overhead of locking and unlocking the mutex. Some people would rather have a static engine per thread using thread local storage. This still requires one to link with Boost.Thread. I created the uuid::create(Engine) function (great advice from this list) so that users could do whatever they want (static engine per thread, use thread local storage,...).
The uuid::create() function should handle the common case and not be best for all situations. So, I guess I need help to determine what the common case is.
I take it that means that the documentation for uuid::create() should have a warning saying "only useful for situation X" or something. IMHO the responsibility of a UUID library is to provide the best possible UUIDs and if this means having a static without thread protection and the warning being about the lack of thread safety then I think that is preferable.
Whatever the solution, I will put warnings in the docs where needed. I do agree that creating better uuids are more important than creating them fast.
I didn't see any of the original reviews but I gather from what you said you'd improved that there was some issue about the quality of the seeding. If so, then as uuid::create() is the only thing that uses the seeding mechanism it would seem strange to put effort into improving it if the random number generator is then compromised by the repeated reseeding.
agreed
Obviously, though, if the common use case is to only ever create a single UUID for the lifetime of an application then it should probably stay as it is. FTW I think this is what our most common use case will be.
If things are left as they are, however, then it might be useful to give a user access to the seeding mechanism you use so that they don't need to implement their own if they are going to use the uuid::create(Engine) method.
This may be the best solution. I need to take some time to think about all this. I do want a solution that works for everyone.
Otherwise I'd just like to say thanks for the very useful library and as we are about to start using UUIDs the timing could not have better!
You're welcome. < snip >
Jos
Andy.

Andy wrote:
If I use a static engine, I need to protect it with a mutex. This is not optimal because there is the overhead of locking and unlocking the mutex. Some people would rather have a static engine per thread using thread local storage. This still requires one to link with Boost.Thread. I created the uuid::create(Engine) function (great advice from this list) so that users could do whatever they want (static engine per thread, use thread local storage,...).
Perhaps you could write a CAS/lock-free random number generator. I have no idea how difficult that would be, I just recently stumbled over the CAS/lock-free threading technique myself. Thomas

On 6/2/07, Thomas Sondergaard <ts_news1@sondergaard.cc> wrote:
Perhaps you could write a CAS/lock-free random number generator. I have no idea how difficult that would be, I just recently stumbled over the CAS/lock-free threading technique myself.
I understand CAS/lock-free (well, 'enough') - someone just needs to explain a good rand algorithm to me. Tony

Dave Jenkins wrote:
Andy, You did a nice job speeding up the uuid::create(engine) function to 0.4 seconds for 1 million UUIDs. But now the uuid::create() function is extremely slow - 131 seconds for 1 million UUIDs (source code below).
The requirement for uuid::create to be fast conflicts with the requirement that it needs to create a "universally unique identifier", that is, a high-quality random number. Typically, high-quality randomness is more important than performance, as one rarely needs to create one million UUIDs. One could argue that the high-performance case is adequately addressed by the engine-taking overload.
Can you change it to use a static mt19937 engine?
This introduces thread safety issues. All that said, the current uuid::create throws away most of the randomness produced by the random digest helper function and uses only 32 bits of it as a seed, so it's not really high quality. :-)

"Peter Dimov" <pdimov@mmltd.net> wrote in news:00d501c7a47a$32806870$6407a80a@pdimov2:
Dave Jenkins wrote:
Andy, You did a nice job speeding up the uuid::create(engine) function to 0.4 seconds for 1 million UUIDs. But now the uuid::create() function is extremely slow - 131 seconds for 1 million UUIDs (source code below).
The requirement for uuid::create to be fast conflicts with the requirement that it needs to create a "universally unique identifier", that is, a high-quality random number. Typically, high-quality randomness is more important than performance, as one rarely needs to create one million UUIDs. One could argue that the high-performance case is adequately addressed by the engine-taking overload.
Can you change it to use a static mt19937 engine?
This introduces thread safety issues.
All that said, the current uuid::create throws away most of the randomness produced by the random digest helper function and uses only 32 bits of it as a seed, so it's not really high quality. :-)
I will look into using all of the randomness produced by the random digest helper function. I did look at the function mersenne_twister::seed(It& first, It last), but couldn't get it to work. I take another shot at it. Thanks, Andy.

Andy wrote:
I will look into using all of the randomness produced by the random digest helper function. I did look at the function mersenne_twister::seed(It& first, It last), but couldn't get it to work. I take another shot at it.
You'll need to do that if you go the static engine route. An easier way to fix the current create() is just to return the digest as-is. A short sequence produced by a freshly seeded RNG is no more random than the seed; nothing is gained (in this case) by passing the digest through the engine.

Would it make any sense for uuid::create() to just use the OS's function, for OSes that can create uuids? (ie most OSes). Tony On 6/1/07, Peter Dimov <pdimov@mmltd.net> wrote:
Andy wrote:
I will look into using all of the randomness produced by the random digest helper function. I did look at the function mersenne_twister::seed(It& first, It last), but couldn't get it to work. I take another shot at it.
You'll need to do that if you go the static engine route.
An easier way to fix the current create() is just to return the digest as-is. A short sequence produced by a freshly seeded RNG is no more random than the seed; nothing is gained (in this case) by passing the digest through the engine.
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

"Gottlob Frege" <gottlobfrege@gmail.com> wrote in news:97ffb310706011815p5704ca0ds9a6ceb09deeb21b6@mail.gmail.com:
Would it make any sense for uuid::create() to just use the OS's function, for OSes that can create uuids? (ie most OSes).
The OS API on windows creates time-based uuids (I don't know about other OSs, but expect the same). The uuid::create() function creates random-number-based uuids. I do not want to mix these. In the future, I would like the uuid library to be able to create time-based uuids. I believe that calling OS funtions is a good solution for that. < snip > Andy.

Andy wrote:
"Gottlob Frege" <gottlobfrege@gmail.com> wrote in news:97ffb310706011815p5704ca0ds9a6ceb09deeb21b6@mail.gmail.com:
Would it make any sense for uuid::create() to just use the OS's function, for OSes that can create uuids? (ie most OSes).
The OS API on windows creates time-based uuids (I don't know about other OSs, but expect the same). The uuid::create() function creates random-number-based uuids. I do not want to mix these.
On Windows XP and Vista, one can use RtlGenRandom. It produces a cryptographic-quality random number. http://msdn2.microsoft.com/en-us/library/aa387694.aspx http://blogs.msdn.com/michael_howard/archive/2005/01/14/353379.aspx
participants (6)
-
Andy
-
Dave Jenkins
-
Gottlob Frege
-
Jos Hickson
-
Peter Dimov
-
Thomas Sondergaard