Re: [Boost-users] The problem isn't C's integer philosophy; it's your design. (was: size_type doubts / integer library..)

19 Aug 2008

      On Mon, Aug 18, 2008 at 07:05:10PM -0400, Daryle Walker wrote:
...
implementing.  This is one of those times.
Thanks for your deep analysis and your time.  Unfortunately, it's based on the
assumption that the task ID is opaque, which it is not..  Re-reading my
original post, this could be blamed on me..
...
...
Integer types in C and C++ are a mess.  For example, I have made a  
library
where a task is identified by a unique unsigned integer.  The extra  
information
about the tasks is stored in a std::vector.  When a new task is
[snip, rearrange]
...
You tasks IDs are conceptually opaque, why is any external code  
wanting to mess with them?  The external code shouldn't be doing
There are TWO task IDs: one is used by the library (which I also made), and
that one _is_ opaque (at least to library clients), it _is_ typedef, and it is
not even an integer (it's a pointer).  Now, to every created task, I need to
assign a positive[*] index (in an application which just uses the library)
which is _not_ opaque: the indices are vertex IDs which are in turn used to
construct a CSR representation (edge-lists) of a certain graph which is in turn
operated on by an external library.   (You may remember my earlier post where I
proposed that the CSR class from Boost.Graph provides a documented, public way
of accessing the raw CSR vectors.)  I will call these "other" IDs "task
indices" to avoid further confusion.

I could bundle the task index in the task structure and provide a function that
takes the task ID and returns the index.  This is not satisfactory for several
reasons:

  - the task library should be general-purpose and lightweight (task indices
    might not be needed in certain applications)
  - task indices may be assigned to different policies, which I do not want
    to hard-code into the library
  - I do not want to complicate interfaces with optional information (e.g.
    extra index parameter to task creation -- see 1st point)

Since the library is written in C (well defined ABI across compilers), I can't
provide overloaded interfaces.  Thus, I have chosen that the task library shall
be minimalistic, and that extra information about tasks, where needed, shall be
layered in external containers.  In this particular application, the choice of
the external container (vector) is dictated by a 3rd party library.
...
(The ID is a typedef and not a naked "unsigned," right?)  Anyway,  
does it really matter; this ID generation code is only used during  
task construction, right?
The "true" task ID (provided by the "task library") is a typedef and opaque,
yes.  The task index (used to construct a graph) is a plain int, not even a
typedef.  Why? [*] Because the external library uses plain ints[**] in its
interfaces.  The library can be configured to use longs, but then, fortunately,
a compile-time error happens when passing &v[0] of vector<int> to long*. 

[**] Now, the library itself does not use plain int, but a typedef.  But since
*all* data is passed in terms of the same typedef (vertex/edge IDs, vertex/edge
weights), this implies that I cannot configure the library to use unsigned
integers and expect it to work correctly (because it does more complicated
operations with weights than just indexing arrays).  Which leads me to
conclusion that the library interface is not ideal; it should define at least
index_type and weight_type instead of the single number_type.  But fixing the
library is out of my scope - I have to work with what I have.
...
Why are you using a number to refer to a container element in the  
first place?  Then I realized that you can't use iterators because  
they're not stable with vector's element adds or removes.  Then I
Correct.
...
wondered, why are you using a vector in the second place?  Wouldn't a
To interface with the external library, which has a C interface and requires a
bunch of int* (actually, number_type, see above) arguments, to which I pass
&v[0].  Vector is the only standard container which guarantees contiguous
storage.  And for the kind of data processing the library is doing, it is
actually good choice.
...
iterators, leaving them available to implement your ID type.  And you  
don't seem to need random-access to various task elements.  (A deque
I do indeed not need random-access to "real tasks".  In the task library I have
actually implemented something similar the design you have proposed here.
...
(I was going to suggest using Boost's pointer-containers, but then I  
realized that you really don't need containment at all.)  Now you'll  
pass this class around instead of an integer type.  The size may be  
higher though, two pointers (and an 'int' in debug-mode).
Indeed -- I do not need containment and the task library just returns opaque
task IDs -- it is the task creator's responsibility to keep track of them.

Now, it is almost certainly true that in this project I do not need the full
integer<T> class that I proposed, and even if it were available, it'd be too
much to convert the project at this point.  I like to rant, but I'm also
pragmatic :-) boost.integer and numeric_cast will do (thanks to you and Dave
for pointing it out)

I'll leave the implementation of the proposed simpler, non-ranged integer<T>
template as a challenge for myself and as a material for learning about other
boost libraries :-)
...
No, you should rethink your design on why you need integers in the  
first place.
Compact storage of graphs that enables fast algorithms (e.g. cache-friendly
layout, avoids frequent memory rellocations (unlike linked lists), etc.).  And
before you ask why I don't use Boost.graph: because I'm working with
hypergraphs.  A hypergraph can be represented as an ordinary bipartite graph,
but such representation is too awkward.  (Actually... time to think a bit about
this premise :-))

Re: [Boost-users] The problem isn't C's integer philosophy; it's your design. (was: size_type doubts / integer library..)

Zeljko Vrba