Re: [Boost-users] container with large number of elements and a small number of invalide elements

1 Nov 2013

      On Fri, Nov 1, 2013 at 1:18 PM, Jeff Flinn <Jeffrey.Flinn@gmail.com> wrote:
...
On 11/1/2013 1:28 PM, MM wrote:
...
On 1 November 2013 12:37, Jeff Flinn <Jeffrey.Flinn@gmail.com
<mailto:Jeffrey.Flinn@gmail.**com <Jeffrey.Flinn@gmail.com>>> wrote:
    On 10/31/2013 1:14 PM, MM wrote:
        On 31 October 2013 16:10, Klaim - Joël Lamotte
        <mjklaim@gmail.com <mailto:mjklaim@gmail.com>
        <mailto:mjklaim@gmail.com <mailto:mjklaim@gmail.com>>> wrote:
What would "valid"/"invalid" means here?
             If it's "exist" or "don't exist", then maybe using
             boost::optional<MyPod> would be enough.
That's right. How about sizeof(  boost::optional<MyPod> ) vs
sizeof(
        MyPOD ) ?
        Also, what sort of iterator can iterate over just the valid
        values? or
        how to implement one?
        The point here is that there's a small number of elements that are
        invalid, their indices in the container are random,
So why not keep the invalid items from getting into the container in
    the first place?
Jeff
because they are processed at a later stage, their indices in the
container are relevant,
I'd need a better description of just what defines and item to be
valid/invalid. And just what this processed at a later stage means.
If you've some functions:
bool   valid(const MyPod& ) { return ...; }
  bool invalid(const MyPod& pod) { return !valid(pod); }
, and a pre-existing container then you can get an adapted view using
boost::range::adaptors::**filtered like:
typedef std::vector<MyPod> MyPods;
MyPods all(...);
MyPods bad;
boost::copy(all | filtered(invalid), std::back_inserter(bad));
boost::foreach(all | filtered(valid), someFunction);
It all depends on the context of where this is used. Another approach uses
the boost icl (interval container library). One example is:
http://www.boost.org/doc/libs/**1_55_0b1/libs/icl/doc/html/**
boost_icl/projects.html#boost_**icl.projects.large_bitset<http://www.boost.org/doc/libs/1_55_0b1/libs/icl/doc/html/boost_icl/projects.html#boost_icl.projects.large_bitset>
I've used a similar approach to just store the "valid" bases in a dna
sequences and keeping track of gaps with an interval container. The
advantage is avoiding the filtering cost when the application is primarily
interested in the valid items only.
Jeff
______________________________**_________________
Boost-users mailing list
Boost-users@lists.boost.org
http://lists.boost.org/**mailman/listinfo.cgi/boost-**users<http://lists.boost.org/mailman/listinfo.cgi/boost-users>
I would recommend checking out Google sparsehash -- specifically,
sparse_hash_set<T>:
http://code.google.com/p/sparsehash/source/browse/trunk/src/sparsehash/spars...

The biggest difference between that and something like an
std::vector<boost::optional<T>> is google::sparse_hash_set will re-order
values on you (the nature of hash tables). It also forces you to have a
"deleted key" which exists in the realm of possible T values. Based on
impression that I get from your original email, this fits with your idea of
an "invalid" T.

That said, it is a ridiculously low-overhead container (on the order of ~2
bits/entry vs 4 bytes/entry with a boost::optional).

-- 
Travis Gockel
Chief λ Combinator