
I have uploaded a new version of the guid library to the boost vault (guid_v4.zip). (I seem to be unable to change the file description for guid_v3.zip to not say the last version.) Changes: I have written documentation. The create function now just creates a random number based guid instead of pretending to create a time-based one. It uses Boost Build System V2 I plan to add serialization and name-based creation in the near future. I welcome any (and hope to receive many) comments and criticisms. Andy Tompkins

On 9 Nov 2006, at 21:43, Andy wrote:
I have uploaded a new version of the guid library to the boost vault (guid_v4.zip). (I seem to be unable to change the file description for guid_v3.zip to not say the last version.)
Changes: I have written documentation. The create function now just creates a random number based guid instead of pretending to create a time-based one. It uses Boost Build System V2
I plan to add serialization and name-based creation in the near future.
I welcome any (and hope to receive many) comments and criticisms.
Andy Tompkins
Hi Andy, It seems from looking at the code that create_v4 will create identically the same guid if two computers (or nodes in a parallel computer) create a guid during the same second? This would be very common in any parallel program. This is why the paper you cite suggests using the MAC address of the computer for the lowest 6 octets. Matthias

Matthias Troyer <troyer@phys.ethz.ch> wrote in news:6A1ACC3A-7E4B-41FF-8EAD-17BD127AE8D4@phys.ethz.ch: < snip >
Hi Andy,
It seems from looking at the code that create_v4 will create identically the same guid if two computers (or nodes in a parallel computer) create a guid during the same second? This would be very common in any parallel program. This is why the paper you cite suggests using the MAC address of the computer for the lowest 6 octets.
Matthias
Hi, First thank you for your comments. Version 4 guids are just random values. There really is no 'node' part of the guid with this version. What you are talking about is a version 1 guid. They do have a node and the document does suggest using the MAC address. I took out the code that creates version 1 guids because I don't know of a portable way to get the MAC address. You are correct in that the version 1 code was not following the doc to create a time-based guid. The create_v4 private() function is called by create() which (when compiled with Boost.Thread) is protected by a mutex so only one thread can create a guid at a time. It is still possible for 2 calls to create_v4() to create the same guid if the 2 calls were in different processes and the internal random number generator was seeded with the same value. I am always looking for better ways to seed the random number generator. I plan to add a name-based create function (version 5) in the future. This can be used to ensure that two computers (or nodes in a parallel computer) will never create the same guid by ensuring that each computer (or node) uses a different 'name space identifier' Again, thank you for the comments. Andy Tompkins.

On Nov 13, 2006, at 5:15 PM, Andy wrote:
It is still possible for 2 calls to create_v4() to create the same guid if the 2 calls were in different processes and the internal random number generator was seeded with the same value. I am always looking for better ways to seed the random number generator.
If you use the clock as seed and I call the function from 128'000 nodes of a BlueGene/L at the same time then I'm sure to get identical guid
I plan to add a name-based create function (version 5) in the future. This can be used to ensure that two computers (or nodes in a parallel computer) will never create the same guid by ensuring that each computer (or node) uses a different 'name space identifier'
I look forward to this Matthias

Hi,
It seems from looking at the code that create_v4 will create identically the same guid if two computers (or nodes in a parallel computer) create a guid during the same second? This would be very common in any parallel program. This is why the paper you cite suggests using the MAC address of the computer for the lowest 6 octets.
In case it helps the discussion, over the last year or so we've had several reports of conflicting GUIDs generated independently in our applications. As far as I know all cases involved multiprocessor systems generating GUIDs with MAC address. We never managed to trap this live but our analysis indicates exactly the same piece of code executed exactly the same millisecond on several processors and at least two managed to generate the same 12 bits of randomness there are in GUIDs with MAC address.[*] Our GUID library (from ext2fs) uses /dev/urandom to generate the random values, so it's pretty good; it's just that 12 bits of randomness is not a lot and a millisecond is a long time these days. In light of this, I would recommend using fully random GUIDs by default. Otherwise there is a very real probability of generating the same GUID several times. Lassi [*] Our best guess was that all the processes got synchronised through a system level lock such as accessing a file on NFS, which then acted as a barrier and released all processes to the GUID- generating code exactly the same time.

On 11/14/06, Lassi A. Tuura <lassi.tuura@cern.ch> wrote:
In case it helps the discussion, over the last year or so we've had several reports of conflicting GUIDs generated independently in our applications.
As far as I know all cases involved multiprocessor systems generating GUIDs with MAC address. We never managed to trap this live but our analysis indicates exactly the same piece of code executed exactly the same millisecond on several processors and at least two managed to generate the same 12 bits of randomness there are in GUIDs with MAC address.[*]
Wouldn't adding the process (or thread) ID to the mix help in this case? gpd

Hi,
Wouldn't adding the process (or thread) ID to the mix help in this case?
Can you expand on what you mean by "mix"? What I was trying to say is that GUIDs with MAC address and system time have 14 bit field "clock sequence" field. The system clock resolution can be as much as ten milliseconds, during which the generator needs to be able to produce 16384 unique values. If the apps can require many GUIDs very rapidly or have weird herding effects as ours did, that just translates to near-impossible problem to solve, no matter what the sequence generator uses as source. (A system wide sequence generator could guarantee uniqueness by waiting to next clock tick on overflow. If you are writing your own code, you probably don't have one of those.) In our case the end result was something like a clash a month, and at worst several conflicts per week. Lassi
participants (4)
-
Andy
-
Giovanni Piero Deretta
-
Lassi A. Tuura
-
Matthias Troyer