Re: [Boost-users] The problem isn't C's integer philosophy; it's your design. (was: size_type doubts / integer library..)
On Mon, Aug 18, 2008 at 07:05:10PM -0400, Daryle Walker wrote:
implementing. This is one of those times.
Thanks for your deep analysis and your time. Unfortunately, it's based on the assumption that the task ID is opaque, which it is not.. Re-reading my original post, this could be blamed on me..
Integer types in C and C++ are a mess. For example, I have made a library where a task is identified by a unique unsigned integer. The extra information about the tasks is stored in a std::vector. When a new task is
[snip, rearrange]
You tasks IDs are conceptually opaque, why is any external code wanting to mess with them? The external code shouldn't be doing
There are TWO task IDs: one is used by the library (which I also made), and that one _is_ opaque (at least to library clients), it _is_ typedef, and it is not even an integer (it's a pointer). Now, to every created task, I need to assign a positive[*] index (in an application which just uses the library) which is _not_ opaque: the indices are vertex IDs which are in turn used to construct a CSR representation (edge-lists) of a certain graph which is in turn operated on by an external library. (You may remember my earlier post where I proposed that the CSR class from Boost.Graph provides a documented, public way of accessing the raw CSR vectors.) I will call these "other" IDs "task indices" to avoid further confusion. I could bundle the task index in the task structure and provide a function that takes the task ID and returns the index. This is not satisfactory for several reasons: - the task library should be general-purpose and lightweight (task indices might not be needed in certain applications) - task indices may be assigned to different policies, which I do not want to hard-code into the library - I do not want to complicate interfaces with optional information (e.g. extra index parameter to task creation -- see 1st point) Since the library is written in C (well defined ABI across compilers), I can't provide overloaded interfaces. Thus, I have chosen that the task library shall be minimalistic, and that extra information about tasks, where needed, shall be layered in external containers. In this particular application, the choice of the external container (vector) is dictated by a 3rd party library.
(The ID is a typedef and not a naked "unsigned," right?) Anyway, does it really matter; this ID generation code is only used during task construction, right?
The "true" task ID (provided by the "task library") is a typedef and opaque, yes. The task index (used to construct a graph) is a plain int, not even a typedef. Why? [*] Because the external library uses plain ints[**] in its interfaces. The library can be configured to use longs, but then, fortunately, a compile-time error happens when passing &v[0] of vector<int> to long*. [**] Now, the library itself does not use plain int, but a typedef. But since *all* data is passed in terms of the same typedef (vertex/edge IDs, vertex/edge weights), this implies that I cannot configure the library to use unsigned integers and expect it to work correctly (because it does more complicated operations with weights than just indexing arrays). Which leads me to conclusion that the library interface is not ideal; it should define at least index_type and weight_type instead of the single number_type. But fixing the library is out of my scope - I have to work with what I have.
Why are you using a number to refer to a container element in the first place? Then I realized that you can't use iterators because they're not stable with vector's element adds or removes. Then I
Correct.
wondered, why are you using a vector in the second place? Wouldn't a
To interface with the external library, which has a C interface and requires a bunch of int* (actually, number_type, see above) arguments, to which I pass &v[0]. Vector is the only standard container which guarantees contiguous storage. And for the kind of data processing the library is doing, it is actually good choice.
iterators, leaving them available to implement your ID type. And you don't seem to need random-access to various task elements. (A deque
I do indeed not need random-access to "real tasks". In the task library I have actually implemented something similar the design you have proposed here.
(I was going to suggest using Boost's pointer-containers, but then I realized that you really don't need containment at all.) Now you'll pass this class around instead of an integer type. The size may be higher though, two pointers (and an 'int' in debug-mode).
Indeed -- I do not need containment and the task library just returns opaque task IDs -- it is the task creator's responsibility to keep track of them. Now, it is almost certainly true that in this project I do not need the full integer<T> class that I proposed, and even if it were available, it'd be too much to convert the project at this point. I like to rant, but I'm also pragmatic :-) boost.integer and numeric_cast will do (thanks to you and Dave for pointing it out) I'll leave the implementation of the proposed simpler, non-ranged integer<T> template as a challenge for myself and as a material for learning about other boost libraries :-)
No, you should rethink your design on why you need integers in the first place.
Compact storage of graphs that enables fast algorithms (e.g. cache-friendly layout, avoids frequent memory rellocations (unlike linked lists), etc.). And before you ask why I don't use Boost.graph: because I'm working with hypergraphs. A hypergraph can be represented as an ordinary bipartite graph, but such representation is too awkward. (Actually... time to think a bit about this premise :-))
On Aug 19, 2008, at 2:59 AM, Zeljko Vrba wrote:
On Mon, Aug 18, 2008 at 07:05:10PM -0400, Daryle Walker wrote:
[SNIP]
You tasks IDs are conceptually opaque, why is any external code wanting to mess with them? The external code shouldn't be doing
There are TWO task IDs: one is used by the library (which I also made), and that one _is_ opaque (at least to library clients), it _is_ typedef, and it is not even an integer (it's a pointer). Now, to every created task, I need to assign a positive[*] index (in an application which just uses the library) which is _not_ opaque: the indices are vertex IDs which are in turn used to construct a CSR representation (edge-lists) of a certain graph which is in turn operated on by an external library. (You may remember my earlier post where I proposed that the CSR class from Boost.Graph provides a documented, public way of accessing the raw CSR vectors.) I will call these "other" IDs "task indices" to avoid further confusion.
I could bundle the task index in the task structure and provide a function that takes the task ID and returns the index. This is not satisfactory for several reasons:
- the task library should be general-purpose and lightweight (task indices might not be needed in certain applications) - task indices may be assigned to different policies, which I do not want to hard-code into the library - I do not want to complicate interfaces with optional information (e.g. extra index parameter to task creation -- see 1st point)
Since the library is written in C (well defined ABI across compilers), I can't provide overloaded interfaces. Thus, I have chosen that the task library shall be minimalistic, and that extra information about tasks, where needed, shall be layered in external containers. In this particular application, the choice of the external container (vector) is dictated by a 3rd party library. [TRUNCATE the rest; ask again if it was important]
I think you should consider your design in C++, without thinking about C-compatibility, then go back and think how to make C- compatible analogues of your public concepts. Doing both full design and C-compatibility at the same time is messing you up. Another problem is defining the SEPARATION between tasks, graphs, and vectors. You seem to have mushed the three concepts together. Your points imply that your task ID numbers aren't really per task, but per task & application combination. This means that you should: 1. define some sort of application ID system 2. define mapping (global) object from a task ID (pointer) AND application ID pair to a number ID. This new ID is what get passed to your graph vectors. I think you can use Boost.Multi-Index to create something like Boost.BiMap, to make a tri-map so entering a task ID (pointer), application ID, or new number ID can get you either of the other two data. 3. your new number IDs do _not_ come from the graph vector sizes. Think about what happens if you grab a new ID after removing a non- end element. (The last two elements would have the same ID.) Instead, you have to keep a global with a running next ID count and/ or a std::set of current IDs. 4. the globals I mentioned shouldn't be full-blown public globals, but probably something class-static within factory classes. BTW, how are the graphing structures/functions taking the number IDs from the vectors and getting useful data from them? Callback functions? Is the graphing sub-system encapsulated? Maybe these number ID vectors should be encapsulated as a graph class, with various graphing functions mapping to member functions of the new graph class. -- Daryle Walker Mac, Internet, and Video Game Junkie darylew AT hotmail DOT com
On Tue, Aug 19, 2008 at 3:59 PM, Daryle Walker
On Aug 19, 2008, at 2:59 AM, Zeljko Vrba wrote:
On Mon, Aug 18, 2008 at 07:05:10PM -0400, Daryle Walker wrote: [snip]
I think you guys are making this discussion way too complicated. This
is not about designing graph interface, it's about using an integer
identifier of type that's different from std::vector::size_type, and
avoiding the warnings it leads to.
I'd use something like:
typedef int id_type;
template <class T>
class registry: private std::vector<T>
{
public:
id_type add( T const & x )
{
size_type i=size();
assert( i<=size_type(std::numeric_limits
participants (3)
-
Daryle Walker
-
Emil Dotchevski
-
Zeljko Vrba