Offering sorted_vector for Boost, any interest?

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi! I've come across lots of code that uses std::set where I think a sorted vector would perform better. Those places a characterized by: 1. fills the set only once, 2. just wants to drop duplicates or 3. calls set::find very often. Facing such code I often wished there was a ready to use sorted_vector implementation. I think a new container that stores elements in a vector instead of a tree offers strong advantages: a. one call to std::sort instead of managing a red-black-tree b. less memory overhead c. better data locality for lookup Possible things that I currently consider: - - offer one-time initialization read-only sorted container? - - offer mutable container with std::set-like interface (complexity changes)? - - offer taking contents from std::vector for init without copying - - offer equivalent to std::map also? - - provide a pointer container, too? Is there general interest in such a container? I'd like to develop an implementation and offer it to Boost. Frank -----BEGIN PGP SIGNATURE----- Version: GnuPG/MacGPG2 v2.0.17 (Darwin) Comment: GPGTools - http://gpgtools.org Comment: keyserver x-hkp://pool.sks-keyservers.net iEYEARECAAYFAk87/wQACgkQhAOUmAZhnmqVdQCggs40nPsT7wXvoEuO3TsSWlqS Wt4An06uRHECRercybiKC0Jq/ZKobEsu =zAK6 -----END PGP SIGNATURE-----

Frank Birbacher <bloodymir.crap <at> gmx.net> writes:
I've come across lots of code that uses std::set where I think a sorted vector would perform better. <snip> Is there general interest in such a container? I'd like to develop an implementation and offer it to Boost.
Frank
This is already in Boost.Container as boost::container::flat_set<>. Regards

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi! Am 15.02.12 20:18, schrieb Adam Merz:
This is already in Boost.Container as boost::container::flat_set<>.
Wow, cool. Although I get around Boost a lot I don't even see the obvious here. Boost is growing quite fast. I'm happy I can use it right away. Thanks to all of you for pointing me towards the flat_set. Frank -----BEGIN PGP SIGNATURE----- Version: GnuPG/MacGPG2 v2.0.17 (Darwin) Comment: GPGTools - http://gpgtools.org Comment: keyserver x-hkp://pool.sks-keyservers.net iEYEARECAAYFAk88LhYACgkQhAOUmAZhnmr6CwCZAbS9nOLIX8l7cx8C00wEkE74 bhQAnRfEsGz4UxN5i5G2tj0//heLkVmD =92HP -----END PGP SIGNATURE-----

AMDG On 02/15/2012 10:52 AM, Frank Birbacher wrote:
I've come across lots of code that uses std::set where I think a sorted vector would perform better. Those places a characterized by: 1. fills the set only once, 2. just wants to drop duplicates or 3. calls set::find very often. Facing such code I often wished there was a ready to use sorted_vector implementation.
I think a new container that stores elements in a vector instead of a tree offers strong advantages: a. one call to std::sort instead of managing a red-black-tree b. less memory overhead c. better data locality for lookup
Have you seen boost::container::flat_set? In Christ, Steven Watanabe

Frank Birbacher wrote:
I've come across lots of code that uses std::set where I think a sorted vector would perform better. Those places a characterized by: 1. fills the set only once, 2. just wants to drop duplicates or 3. calls set::find very often. Facing such code I often wished there was a ready to use sorted_vector implementation.
how would this be different than just a) using std::vector b) use std::sort to sort it c) use std::bsearch to find the element you're looking for I see no library here. Robert Ramey

Robert Ramey wrote:
Frank Birbacher wrote:
I've come across lots of code that uses std::set where I think a sorted vector would perform better. Those places a characterized by: 1. fills the set only once, 2. just wants to drop duplicates or 3. calls set::find very often. Facing such code I often wished there was a ready to use sorted_vector implementation.
how would this be different than just
a) using std::vector b) use std::sort to sort it c) use std::bsearch to find the element you're looking for
I see no library here.
You need to check your vision! Seriously, others have mentioned flat_set already, but the point is that using std::sort and the other algorithms is not reusable. You must use them, and use them correctly, every time you want the behavior, and you must be sure to use them at the right time. That's the essence of the need for a library: provide a reusable component for non-trivial functionality. _____ Rob Stewart robert.stewart@sig.com Software Engineer using std::disclaimer; Dev Tools & Components Susquehanna International Group, LLP http://www.sig.com ________________________________ IMPORTANT: The information contained in this email and/or its attachments is confidential. If you are not the intended recipient, please notify the sender immediately by reply and immediately delete this message and all its attachments. Any review, use, reproduction, disclosure or dissemination of this message or any attachment by an unintended recipient is strictly prohibited. Neither this message nor any attachment is intended as or should be construed as an offer, solicitation or recommendation to buy or sell any security or other financial instrument. Neither the sender, his or her employer nor any of their respective affiliates makes any warranties as to the completeness or accuracy of any of the information contained herein or that this message or any of its attachments is free of viruses.

Stewart, Robert wrote:
You need to check your vision! Seriously, others have mentioned flat_set already, but the point is that using std::sort and the other algorithms is not reusable. You must use them, and use them correctly, every time you want the behavior, and you must be sure to use them at the right time. That's the essence of the need for a library: provide a reusable component for non-trivial functionality.
I guess the issue is - what's trivial. It seems to that something like: // some comments here template<class T, class A> class fast_set : public std::vector<T, A> { const T & find(const T & t){ static bool sorted = false; if(! sorted){ std::sort(begin(), end()); sorted = true; } std::vector<T, A>::find(t); }; }; is pretty simple. Certainly simpler than documentation, header, tests etc required to make it a library. Any one who needs this as a package can easily make this as a 15 line header file. Some things are just to simple and obvious to need a library. Robert Ramey

From: ramey@rrsd.com
Stewart, Robert wrote:
You need to check your vision! Seriously, others have mentioned flat_set already, but the point is that using std::sort and the other algorithms is not reusable. You must use them, and use them correctly, every time you want the behavior, and you must be sure to use them at the right time. That's the essence of the need for a library: provide a reusable component for non-trivial functionality.
I guess the issue is - what's trivial. It seems to that something like:
// some comments here template<class T, class A> class fast_set : public std::vector<T, A> { const T & find(const T & t){ static bool sorted = false; if(! sorted){ std::sort(begin(), end()); sorted = true; } std::vector<T, A>::find(t); }; };
is pretty simple. Certainly simpler than documentation, header, tests etc required to make it a library. Any one who needs this as a package can easily make this as a 15 line header file.
Some things are just to simple and obvious to need a library.
Well, if you drop the requirement of having to work properly, I suppose anything can be written in 15 lines ;) Regards, Nate

On 15 February 2012 15:27, Robert Ramey <ramey@rrsd.com> wrote:
I guess the issue is - what's trivial. It seems to that something like:
// some comments here template<class T, class A> class fast_set : public std::vector<T, A> { const T & find(const T & t){ static bool sorted = false; if(! sorted){ std::sort(begin(), end()); sorted = true; } std::vector<T, A>::find(t); }; };
is pretty simple.
I really don't think you want to be making that claim... A few problems off the top of my head: 1. It doesn't return anything. 2. std::vector doesn't have a find method. 3. std::find() would be O(n), not O(log n). 4. You probably want to do the "sort" on insert, not find, as you can usually do much better performance-wise than std::sort when inserting into an already sorted container. 5. If you sort on insert, you don't need mutable members for const correctness. 6. If you sort on insert, you can call find() from multiple threads safely when no one is modifying the container. 7. You want to default A so that it matches the other containers.
Certainly simpler than documentation, header, tests etc required to make it a library.
I eagerly await you posting your revised version where that is true (and doesn't just include flat_set).
Some things are just to simple and obvious to need a library.
People keep telling me that... :-) -- Nevin ":-)" Liber <mailto:nevin@eviloverlord.com> (847) 691-1404

Nevin Liber wrote:
On 15 February 2012 15:27, Robert Ramey <ramey@rrsd.com> wrote:
I guess the issue is - what's trivial. It seems to that something like:
// some comments here template<class T, class A> class fast_set : public std::vector<T, A> { const T & find(const T & t){ static bool sorted = false; if(! sorted){ std::sort(begin(), end()); sorted = true; } std::vector<T, A>::find(t); }; };
is pretty simple.
I really don't think you want to be making that claim...
A few problems off the top of my head:
1. It doesn't return anything. 2. std::vector doesn't have a find method. 3. std::find() would be O(n), not O(log n). 4. You probably want to do the "sort" on insert, not find, as you can usually do much better performance-wise than std::sort when inserting into an already sorted container. 5. If you sort on insert, you don't need mutable members for const correctness. 6. If you sort on insert, you can call find() from multiple threads safely when no one is modifying the container. 7. You want to default A so that it matches the other containers.
Certainly simpler than documentation, header, tests etc required to make it a library.
I eagerly await you posting your revised version where that is true (and doesn't just include flat_set).
well, I wrote the above in about a minute to illustrate that the job is trivial. Of course I'd have to spend a little more time to get it exactly right. So after another minute I came up with this which does compile without error on my MSVC system. admiitadly, I haven't actually tested it though so it could still have an error, but I don't think that's the point here. note that 2) std::set DOES have a find method - which seemed to me to be a better conceptual match. 3) it is O(log n) for loook up - assuming std::bsearch is 4) I presumed sorting on insert would be a non starter since it would resort every time one added an item.. 6) It has the the same thread guarentees as std::vector 7) I would expect it to closely match std::vector - excepting the addition of find But my real point is that adding one more level to hide the details doesn't always get anything. In this case you would have to describe an interface which more work. I submit that these 15 lines of code is easier to understand than some new library documenation page. #include <vector> #include <algorithm> // some comments here template<class T, class A> class fast_set : public std::vector<T, A> { const T & find(const T & t){ static bool sorted = false; if(! sorted){ std::sort(begin(), end()); sorted = true; } return std::bsearch(begin(), end(), t); }; };
Some things are just to simple and obvious to need a library.
People keep telling me that... :-)
And they're right!!! Robert Ramey

From: ramey@rrsd.com
#include <vector> #include <algorithm>
// some comments here template<class T, class A> class fast_set : public std::vector<T, A> { const T & find(const T & t){ static bool sorted = false; if(! sorted){ std::sort(begin(), end()); sorted = true; } return std::bsearch(begin(), end(), t); }; };
I'm still not entirely sure whether you're being serious, but in the event that you are: 1. A 'static' variable declared inside a method is shared by all class instances. What you want is a class member variable. 2. Mutating operations like push_back() do not preserve the invariant "'sorted' is true iff. the elements are sorted". So, if you find() and then push_back() and then find() again, the second find() may return an incorrect result. To get this approach to work properly, you'd have to override every mutating operation and have it clear the 'sorted' flag. Regards, Nate

Nathan Ridge wrote:
From: ramey@rrsd.com
#include <vector> #include <algorithm>
// some comments here template<class T, class A> class fast_set : public std::vector<T, A> { const T & find(const T & t){ static bool sorted = false; if(! sorted){ std::sort(begin(), end()); sorted = true; } return std::bsearch(begin(), end(), t); }; };
I'm still not entirely sure whether you're being serious, but in the event that you are:
why would you think I'm not serious?
1. A 'static' variable declared inside a method is shared by all class instances. What you want is a class member variable.
obviously you're correct on this.
2. Mutating operations like push_back() do not preserve the invariant "'sorted' is true iff. the elements are sorted". So, if you find() and then push_back() and then find() again, the second find() may return an incorrect result.
To get this approach to work properly, you'd have to override every mutating operation and have it clear the 'sorted' flag.
hmmm - I was go on the initial post which specified 1. fills the set only once, 2. just wants to drop duplicates or 3. calls set::find very often. and I think that's what my example does. I think it illustrates also that its easier just to compose standard elements than to write a new library. Of course, maybe I didn't get what the poster really wanted - a common occurence. In any case, I'll concede that I've failed to convince everyone that this thing is too trivial to merit a library. Robert Ramey

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi! Am 16.02.12 00:28, schrieb Robert Ramey:
hmmm - I was go on the initial post which specified
1. fills the set only once, 2. just wants to drop duplicates or 3. calls set::find very often.
and I think that's what my example does.
Yes, I acknowlegde this. And I can think of more ways to compose std lib components in order to achive a one time sorted array. As software grows all the "you can see what it does" functionality hinders understanding of what the code achives in a larger picture (like inline copy&paste code), it hinders changing things because of left out generality (like half baked classes.)
In any case, I'll concede that I've failed to convince everyone that this thing is too trivial to merit a library.
Well, in fact this thing is already in a library of Boost. So already enough people had been convinced that this is worth it. I prefer a black box container that behaves like a std container and just happens to perform better in the use case at hand. Frank -----BEGIN PGP SIGNATURE----- Version: GnuPG/MacGPG2 v2.0.17 (Darwin) Comment: GPGTools - http://gpgtools.org Comment: keyserver x-hkp://pool.sks-keyservers.net iEYEARECAAYFAk88Q/gACgkQhAOUmAZhnmqczgCfel/GsWg4OyB9H8RCd4pn5bRV 3EIAoJTy/NXabpfVgi2UvQGdqqEmvPtc =SXR4 -----END PGP SIGNATURE-----

From: ramey@rrsd.com
Nathan Ridge wrote:
From: ramey@rrsd.com
#include <vector> #include <algorithm>
// some comments here template<class T, class A> class fast_set : public std::vector<T, A> { const T & find(const T & t){ static bool sorted = false; if(! sorted){ std::sort(begin(), end()); sorted = true; } return std::bsearch(begin(), end(), t); }; };
I'm still not entirely sure whether you're being serious, but in the event that you are:
why would you think I'm not serious?
Well, given the number of problems with that code, I thought perhaps you were trying to make the opposite point by illustration (that people *shouldn't* roll their own even if it seems simple, because of the capacity for making mistakes). My mistake, and my apologies.
2. Mutating operations like push_back() do not preserve the invariant "'sorted' is true iff. the elements are sorted". So, if you find() and then push_back() and then find() again, the second find() may return an incorrect result.
To get this approach to work properly, you'd have to override every mutating operation and have it clear the 'sorted' flag.
hmmm - I was go on the initial post which specified
1. fills the set only once, 2. just wants to drop duplicates or 3. calls set::find very often.
and I think that's what my example does.
In that case, the class should prevent users from calling mutating operations (for example by inheriting privately, and then bringing the nonmutating operations into scope via using declarations). Leaving the mutating operations accessible while using them can break the invariants of the class, causing silent runtime errors, is definitely a no-no.
I think it illustrates also that its easier just to compose standard elements than to write a new library.
I think what it illustrated, intentionally or not, is that it's easy to compose standard elements badly and think you've done it right :) A carefully designed and well thought-out library, on the other hand, can serve many different related use cases. Regards, Nate

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi! Am 15.02.12 23:59, schrieb Robert Ramey:
well, I wrote the above in about a minute to illustrate that the job is trivial.
Too many things seem trivial.
Of course I'd have to spend a little more time to get it exactly right. So after another minute I came up with this which does compile without error on my MSVC system.
I wonder. I guess you never instantiated your template.
const T & find(const T & t){ static bool sorted = false; if(! sorted){ std::sort(begin(), end()); sorted = true; } return std::bsearch(begin(), end(), t); };
Given the above interface I propose a simpler implementation: template<class T, class A> class fast_set : public std::vector<T, A> { const T & find(const T & t){ return t; }; }; This also removes errors with the static bool variable. Point is: how long does it take you to get it right? How many other people does it take to get it right (without reviews you won't get it right, or it takes long)? Frank -----BEGIN PGP SIGNATURE----- Version: GnuPG/MacGPG2 v2.0.17 (Darwin) Comment: GPGTools - http://gpgtools.org Comment: keyserver x-hkp://pool.sks-keyservers.net iEYEARECAAYFAk88PEIACgkQhAOUmAZhnmoA/QCgiNF5/doGGq8cFpIg5sdCvweK pOEAn1BtLGWpcoXEzQzzsxDIjVRQRjWf =Eati -----END PGP SIGNATURE-----
participants (7)
-
Adam Merz
-
Frank Birbacher
-
Nathan Ridge
-
Nevin Liber
-
Robert Ramey
-
Steven Watanabe
-
Stewart, Robert