Request interest on a Locally Unique Identifier library

12 Mar 2008

      Hi everyone,

I'm starting a library to manage Locally Unique Identifiers (LUID).
I'd like to ask if there is interest in a such library?

I haven't found on the net any reference to such a library. Please could you
point me to some libraries addressing this kind of problem or whatever could 
be related to this domain, algorithms, ...

If there is an interest, I would like to know which are the minimal features 
such
a library should have for a first review, and which one could be left for a 
second one.

Best regards

_____________________
Vicente Juan Botet Escriba

Next follows my first thoughts:

Boost.LUID simplifies the generation of locally unique identifiers in a 
configurable context.

Locally Unique Identifier or LUID is a special type of identifier used in 
software applications in order to provide a reference number which is unique 
in a given context (hence, "Locally" in opposition to "Globally" or 
"Universally"). Each generated LUID is guaranteed to be unique in its 
context.

LUID could have many applications. Some examples follow: creating unique 
identifiers in a database tables, network messages may be identified with a 
LUID to ensure that this messages are associated to a given session, 
transactions may be identified by LUIDs.

Note. Please let me know other applications of LUIDs you can have.

Reduced identifier size
An attractive feature of LUIDs when compared to alternatives is the minimal 
size used by this identifiers, that depends on the number of instances a 
given context can have (usually 4 bytes). This allows to use them in 
protocols fixing the size of the references exchanged to less that 16 
bytes(as it is the case of UUID and GUID). Usually LUIDs are represented by 
unsigned integers and have an upper bound value.

Usable as random access index in constant time
Another aspect, and not less interesting is that the access to the 
information they identify can be done in constant time.

Recovering unused identifiers
A first implementation could consists in a monotonic counter. The drawback 
of having a small size is that the number of unique identifier is more 
limited. This limit will finish by been reached if the application runs a 
long time. To paliate to this, we need a mechanism to recover the unused 
identifiers once the identified information is removed. Depending on the 
application the recovered identifiers should be discarded, reused 
immediately or frozen during a given duration or a number of times.

Usually they are reused in a first reusable first reused order (fifo), but 
depending on the implementation this could not be always the more efficient 
policy. So the user can either constraint this fifo order or let undefined 
this aspect.

Managing the lack of identifiers
In any case we need to manage with the overflow case, i.e. when there are no 
more free identifiers. The user could prefer an exception, return an invalid 
value, use errno, or call to a user defined function which could try to 
recover from this situation, changing the upper bound for example.

Ensuring coherency on reusable identifiers
All this seams to work up to the user do not release the same identifiers 
twice. This is an expensive task, but usually the applications has its own 
way to do it. If this is not the case the application could ask to the luid 
generator to ensure that.

Synchronization on multi threaded or multi process applications
The generation of new LUIDs and the recovering of unused LUIDs must be 
synchronized. So LUID are better adapted to systems where these operations 
are done in a reduced scope. Depending on this scope: single thread, 
multi-threaded process or a node,i.e. multiple process, the synchronization 
mechanisms vary. In a single thread scope no synchronization is needed. 
Synchronization on a multi-threaded process can be ensured using a thread 
mutex and so one.

In addition the synchronization can be ensured internally by the generator, 
or externally by the user, i.e. the user has already a synchronization 
mechanism.

LUIDs are not adapted to distributed system (multiple nodes generating and 
recovering luids) because it needs a coordination between the different 
nodes which will take too much time. UUIDs and GUIDs are more adapted to 
this situation. Anyway if the size constraint can not be avoided, one of the 
nodes could play the master role and the others behaves as slaves. Master 
and slaves need to work together. This library do not pretend to take in 
account this use case but, could be the base of such approach.

Persistency
Another aspect is the lifetime of the information to be identified by the 
luids. It can be the lifetime of the process, the host machine lifetime or 
persists to a shut down of the host machine using the filesystem.

Optimization
The last aspect will be the optimization. Different application have 
different needs, some have space constraints, and mot of the others have 
speed constraints.

One possible design could request to declare a variable for each luid 
generator
and use functions like make and release.

typedef boost::luid::luidg<uint32, ...> X_luid_generator;
X_luid_generator x_luidg;

////
uint32 uid = luidg.make();

...
luidg.release(uid);

As the uid have a meaning only associated to the information it identifies, 
usually
the use of this functions should be done implicitly when the information is 
stored
on a container is removed.

typedef boost::luid::luidg<uint32> X_luid_mgr;
typedef my_multi_index<X, ..., luid_generator<X_luid_mgr> > X_container;

// construct the container, constructing a X_luid_mgr
X_container x_cont(...);
X x;
...

// inserts the element in the container and
// returns the new luid associated to the inserted element (luidg.make())
unisgned int uid = x_cont.insert(x);
// this uid could be shared with other applications

...

// later on the application use the uid to obtain the context
X* x=x_cont.get(uid);

// do something
...

// removes the element identified by uid and release the uid. 
(luidg.release(uid))
x_cont.remove(uid);

This library could use the following Boost libraries
- Boost.DynamicBitSet to store the recovered luids
- Boost.Integer to obtain the default bounds
- Boost.Interprocess to lock on inter_node scope
- Boost.Intrusive to store the recovered luids in fifo order
- Boost.Interprocess to store on shared memory when the lifetime is longer 
than the process one.
- Boost.Parameters to pass the constructor parameters
- Boost.Parameters to configure the template parameters
- Boost.Threads to lock on multi_threaded scope

I wonder if this library could be used to extend the Boost.MultiIndex 
library with luid unique keys.

The number of user functions provided by LUID generator are quite simple:
* make: make a new luid (constant time complexity)
* release: release an luid (could check that the luid is not already free) 
(constant time complexity without ensuring coherency)
* error: the value returned when no more luids are available (constant time 
complexity)
* set_upper_bound: sets the new upper bound (could have linear time 
complexity when coherency is required)
* get_upper_bound: get the current upper bound (constant time complexity)
* count: get the number of used luids (chould be constant time complexity if 
a counter is used)
* clear: release all the used ressources and reset them as they were after 
construction (could have linear time complexity)

vicente.botet

tags

participants (1)