
Dear All, Following are some comments about Ion's proposed Boost.Containers. I have not really reviewed the code, but frankly that doesn't seem necessary: Ion knows what he's doing, much of this code has already been in Boost for a while as part of Interprocess, and much of it is designed to implement standard interfaces. I'll confine my comments to a few words about the documentation, and some observations about the "flat" containers. Regarding the documentation, I believe that it would benefit from: - The "Acknowledgements" page is currently empty. There are various (c) notices in the source referring to sources from which parts of the code may have come. This page should clarify these sources, including any cases where the original code has now all been replaced, and assert that the licenses are all in order. It should also point out the history of the code in Boost, i.e. that parts were previously in Boost.Interprocess, so that anyone with previous exposure to the Interprocess containers is not confused. - A few lines of rationale for stateful allocators would be useful, i.e. pointing out that although the std::allocator has a typedef for pointer the containers can assume it is T*, and the implications of that. (I dealt with all this back in about 2004, and it took me a long time to get up-to-speed with the issues.) - Similarly, some rationale for string (and pointing out how it differs from std::string in the common implementations) would be useful. (Now, I can't even find where it says that it is non-COW. I know that I've seen that written somewhere. A page specifically about string shown on the top-level table of contents would be appropriate.) - The Doxygen reference docs seem to jump to the header docs from the table of contents; I think it would be more useful to jump primarily to the classes. Is this the same issue that I saw in the last review? Regarding the "flat" containers, I've previously posted about this in a couple of threads on this list. I have used this sort of structure in two applications recently: - In the first case, I store read-only (key,value) data in a binary file. I have a utility that prepares this file offline by inserting into a vector<pair<>>. I then sort it, and save the contents of the vector in the file. In the program that uses the data, I memory-map the file. I then have an adaptor class that takes a pair of pointers to the begin and end of the mapped data and permits read-only access like Ion's flat_multimap<>. - In the second case, I have an insert-only key-value data structure. New data is inserted in batches. I have a vector<pair<>> and insert new data at the end; at the end of each batch, I sort the new values and merge them into the body of the vector using std::inplace_merge. Neither of these cases can be implemented efficiently using the current containers. Doing so would need: - Access to the underlying container, at least for reading. - An "adaptor" version that could wrap a pair of pointers to raw memory, or more generally a pair of iterators, at least for reading. - Some sort of efficient batched insert mode. Regarding the batched insert, we have previously discussed the complexity of inserting a range. Fundamentally, however, I don't think it is acceptable to have efficient insertion ONLY in the case of the insert(begin,end) method: it is too common to want to insert several values in a loop without creating a temporary container. I'm not aware of any established (i.e. std:: or boost::) pattern for doing this: class flat_container { void insert(val); // appends at the end of the vector void commit(); // merges new values into the body of the vector }; - Between calling insert() and calling sort(), what do lookups do? Undefined? Only look at old values? Sort-on-demand? - Is there also an "insert one" method? How are the variants named? - What about erase()? - How about a "batch" or "transaction" object?: flat_container c; { flat_container::insert_batch b(c); b.insert(data1); b.insert(data2); } // b's dtor merges new values. A further interesting case is when the new values are already sorted, because insertion can be more efficient in this case: void insert_sorted(iter first, iter last) { resize vector to make space for new items for each new item in reverse order { find the position where it should be inserted using binary chop move old items after that point forward move the new item in behind them } } I could imagine a general insert_batch and a specialised insert_sorted_batch, for example. Anyway, this is all just food for thought really. It would be interesting to hear from other people who have used data structures like this. In my own code I have taken a rather ad-hoc approach, but that is the nature of custom vs. library development: to be useful, a library needs to try to second-guess the requirements of all potential users, yet do so without becoming too complex for anyone to learn. Hopefully there will be some more discussion about some of this, but as far as the review is concerned we should allow Ion to decide what he eventually commits. Regards, Phil.