accurate sum accumulator (kahan)

newer
[Memory Managed Pointer] Review...

Gaetano Mendola

25 Jul 2010 25 Jul '10

3:23 p.m.

Hi all, I had to implement in terms of boost accumulators the Kahan summation algorithm. I will paste at the end of this message the accumulator, the feature and the extractor source code. This accumulator reduces the numerical error obtained with standard sequential sum, as example the sum of 1e6f elements equal to 1e-6f should give 1.0f, this is the results obtained with sum and sum_kahan: tag::sum: 1.009038925 tag::sum_kahan: 1.000000000 I'm not sure if that sum_kahan is worth to be inserted in the accumulators provided with boost, I'm not an expert of boost:accumulators (I just followed the guide to make this new accumulator) so not sure if you prefer to have a fast_sum and an accurate_sum instead of sum and sum_kahan. Regards Gaetano Mendola // File kahan_sum.hpp #ifndef KAHAN_SUM #define KAHAN_SUM #include <boost/accumulators/framework/accumulator_base.hpp> #include <boost/accumulators/framework/parameters/sample.hpp> #include <boost/numeric/conversion/cast.hpp> namespace boost { namespace accumulators { namespace impl { template<typename Sample> struct sum_kahan_accumulator : accumulator_base { typedef Sample result_type; template<typename Args> sum_kahan_accumulator(Args const & args) : sum(args[sample | Sample()]), compensation(boost::numeric_cast<Sample>(0.0)) { } template<typename Args> void operator ()(Args const & args) { const Sample myTemp1 = args[sample] - this->compensation; const Sample myTemp2 = this->sum + myTemp1; this->compensation = (myTemp2 - this->sum) - myTemp1; this->sum = myTemp2; } result_type result(dont_care) const { return this->sum; } private: Sample sum; Sample compensation; }; }}} namespace boost { namespace accumulators { namespace tag { struct sum_kahan : depends_on< > { typedef impl::sum_kahan_accumulator< mpl::_1 > impl; }; }}} namespace boost { namespace accumulators { namespace extract { extractor< tag::sum_kahan > const sum_kahan = {}; } using extract::sum_kahan; }} #endif

Show replies by date

Paul A. Bristow

26 Jul 26 Jul

1:59 p.m.

...

-----Original Message----- From: boost-bounces@lists.boost.org [mailto:boost-bounces@lists.boost.org] On Behalf Of Gaetano Mendola Sent: Sunday, July 25, 2010 4:24 PM To: boost@lists.boost.org Subject: [boost] accurate sum accumulator (kahan)

Hi all, I had to implement in terms of boost accumulators the Kahan summation algorithm. I will paste at the end of this message the accumulator, the feature and the extractor source code.

This accumulator reduces the numerical error obtained with standard sequential sum, as example the sum of 1e6f elements equal to 1e-6f should give 1.0f, this is the results obtained with sum and sum_kahan:

tag::sum: 1.009038925 tag::sum_kahan: 1.000000000

I'm not sure if that sum_kahan is worth to be inserted in the accumulators provided with boost, I'm not an expert of boost:accumulators (I just followed the guide to make this new accumulator) so not sure if you prefer to have a fast_sum and an accurate_sum instead of sum and sum_kahan.

<snip> I think it would be an excellent idea to include both. I fear that tests on the appropriate NIST datasets would show the existing 'fast' algorithm to perform badly. http://www.itl.nist.gov/div898/strd/ Whereas I would expect your Kahan algorithm version to do much better. It would be in keeping with Boost tradition to be (or able to be) accurate. Some people would also like to know the accurate_sum compute time cost over existing (fast) sum (I presume there is some?). It would need some documentation on when and why one might need to prefer the accurate version. Paul PS As for naming, I'd vote for sum and accurate_sum (to keep backward compatibility). --- Paul A. Bristow Prizet Farmhouse Kendal, UK LA8 8AB +44 1539 561830, mobile +44 7714330204 pbristow@hetp.u-net.com

David Abrahams

2:09 p.m.

At Mon, 26 Jul 2010 14:59:29 +0100, Paul A. Bristow wrote:

...

PS As for naming, I'd vote for sum and accurate_sum (to keep backward compatibility).

It's possible I'm completely off the mark here, but I think maybe we want sum and quick_sum. -- Dave Abrahams BoostPro Computing http://www.boostpro.com

Paul A. Bristow

3:20 p.m.

...

-----Original Message----- From: boost-bounces@lists.boost.org [mailto:boost-bounces@lists.boost.org] On Behalf Of David Abrahams Sent: Monday, July 26, 2010 3:10 PM To: boost@lists.boost.org Subject: Re: [boost] accurate sum accumulator (kahan)

At Mon, 26 Jul 2010 14:59:29 +0100, Paul A. Bristow wrote:

...
PS As for naming, I'd vote for sum and accurate_sum (to keep backward compatibility).

It's possible I'm completely off the mark here, but I think maybe we want sum and quick_sum.

This would be ideal - but doesn't accumulator already use 'sum' so it would change behaviour of existing programs? Paul

David Abrahams

3:51 p.m.

At Mon, 26 Jul 2010 16:20:16 +0100, Paul A. Bristow wrote:

...

...
-----Original Message----- From: boost-bounces@lists.boost.org [mailto:boost-bounces@lists.boost.org] On Behalf Of David Abrahams Sent: Monday, July 26, 2010 3:10 PM To: boost@lists.boost.org Subject: Re: [boost] accurate sum accumulator (kahan)

At Mon, 26 Jul 2010 14:59:29 +0100, Paul A. Bristow wrote:

...
PS As for naming, I'd vote for sum and accurate_sum (to keep backward compatibility).

It's possible I'm completely off the mark here, but I think maybe we want sum and quick_sum.

This would be ideal - but doesn't accumulator already use 'sum' so it would change behaviour of existing programs?

Technically, of course it would, but aren't they getting the wrong answer currently? -- Dave Abrahams BoostPro Computing http://www.boostpro.com

Matthias Troyer

3:53 p.m.

On 26 Jul 2010, at 09:51, David Abrahams wrote:

...

At Mon, 26 Jul 2010 16:20:16 +0100, Paul A. Bristow wrote:

...
...
-----Original Message----- From: boost-bounces@lists.boost.org [mailto:boost-bounces@lists.boost.org] On Behalf Of David Abrahams Sent: Monday, July 26, 2010 3:10 PM To: boost@lists.boost.org Subject: Re: [boost] accurate sum accumulator (kahan)

At Mon, 26 Jul 2010 14:59:29 +0100, Paul A. Bristow wrote:

...
PS As for naming, I'd vote for sum and accurate_sum (to keep backward compatibility).

It's possible I'm completely off the mark here, but I think maybe we want sum and quick_sum.

This would be ideal - but doesn't accumulator already use 'sum' so it would change behaviour of existing programs?

Technically, of course it would, but aren't they getting the wrong answer currently?

I would still call it sum. We have accurate versus quick versions of variance and other accumulators that can be specified like "variance(accurate)" in the feature list (namespaces missing ... it has been a while since I used it last). I would just propose to add this sum as "sum(accurate)" Matthias

Paul A. Bristow

3:58 p.m.

...

-----Original Message----- From: boost-bounces@lists.boost.org [mailto:boost-bounces@lists.boost.org] On Behalf Of David Abrahams Sent: Monday, July 26, 2010 4:51 PM To: boost@lists.boost.org Subject: Re: [boost] accurate sum accumulator (kahan)

At Mon, 26 Jul 2010 16:20:16 +0100, Paul A. Bristow wrote:

...
...
At Mon, 26 Jul 2010 14:59:29 +0100, Paul A. Bristow wrote:

...
PS As for naming, I'd vote for sum and accurate_sum (to keep backward compatibility).

It's possible I'm completely off the mark here, but I think maybe we want sum and quick_sum.

This would be ideal - but doesn't accumulator already use 'sum' so it would change behaviour of existing programs?

Technically, of course it would, but aren't they getting the wrong answer currently?

Yes (a bit, sometimes, probably) - but quicker ;-) (Seriously - a change in accuracy might mean that tests just fail and this might cause more trouble than it was worth. This being floating point, even the Kahan method doesn't guarantee a *exact* answer, just a more accurate one). Paul

Topher Cooper

7:14 p.m.

First off, note that in the test case part of the inaccuracy is due to the fact that 1F-6 cannot be represented exactly in floating point. Obviously, however you do it, if you add 1000000 numbers that are not precisely equal to 1/1000000 together then the correct answer will not be 1.0. As for naming -- I'm always uncomfortable using a keyword like "accurate" -- since there are many ways to improve accuracy, with Kahan's algorithm being only one. For example: for some situations presorting the numbers in ascending order can improve accuracy. A two pass algorithm, when possible, will probably give you the same result as Kahan, with, ignoring paging, similar costs. First sum the numbers. Divide the result by the number of values. Sum the differences between each value and that mean value. Subtract the result from the original sum. Kahan requires 4 add/sub ops per value, the two-pass 3. Another technique: recursively sum the lower half and the upper half and add the results. Using an explicit stack and doing this iteratively would probably improve on Kahan in both accuracy and time. This should not be combined with an ascending sort (or if you do, use interleaved splits rather than lower/upper). Ignoring the bookkeeping, this requires a single addition per value. I'm not suggesting that any of these be added, but if they were -- what would they be called? Topher On Jul 26, 2010, at 11:58 AM, Paul A. Bristow wrote:

...

...
-----Original Message----- From: boost-bounces@lists.boost.org [mailto:boost-bounces@lists.boost.org] On Behalf Of David Abrahams Sent: Monday, July 26, 2010 4:51 PM To: boost@lists.boost.org Subject: Re: [boost] accurate sum accumulator (kahan)

At Mon, 26 Jul 2010 16:20:16 +0100, Paul A. Bristow wrote:

...
...
At Mon, 26 Jul 2010 14:59:29 +0100, Paul A. Bristow wrote:

...
PS As for naming, I'd vote for sum and accurate_sum (to keep backward compatibility).

It's possible I'm completely off the mark here, but I think maybe we want sum and quick_sum.

This would be ideal - but doesn't accumulator already use 'sum' so it would change behaviour of existing programs?

Technically, of course it would, but aren't they getting the wrong answer currently?

Yes (a bit, sometimes, probably) - but quicker ;-)

(Seriously - a change in accuracy might mean that tests just fail and this might cause more trouble than it was worth. This being floating point, even the Kahan method doesn't guarantee a *exact* answer, just a more accurate one).

Paul

_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Gaetano Mendola

8:03 p.m.

On 07/26/2010 09:14 PM, Topher Cooper wrote:

...

I'm not suggesting that any of these be added, but if they were -- what would they be called?

This is exactly my point. Regards Gaetano Mendola

Matthias Troyer

8:07 p.m.

On 26 Jul 2010, at 13:14, Topher Cooper wrote:

...

First off, note that in the test case part of the inaccuracy is due to the fact that 1F-6 cannot be represented exactly in floating point. Obviously, however you do it, if you add 1000000 numbers that are not precisely equal to 1/1000000 together then the correct answer will not be 1.0.

As for naming -- I'm always uncomfortable using a keyword like "accurate" -- since there are many ways to improve accuracy, with Kahan's algorithm being only one.

For example: for some situations presorting the numbers in ascending order can improve accuracy.

A two pass algorithm, when possible, will probably give you the same result as Kahan, with, ignoring paging, similar costs. First sum the numbers. Divide the result by the number of values. Sum the differences between each value and that mean value. Subtract the result from the original sum. Kahan requires 4 add/sub ops per value, the two-pass 3.

Another technique: recursively sum the lower half and the upper half and add the results. Using an explicit stack and doing this iteratively would probably improve on Kahan in both accuracy and time. This should not be combined with an ascending sort (or if you do, use interleaved splits rather than lower/upper). Ignoring the bookkeeping, this requires a single addition per value.

I'm not suggesting that any of these be added, but if they were -- what would they be called?

Why don't you just call it sum(kahan) Matthias

Eric Niebler

4:23 p.m.

On 7/26/2010 11:51 AM, David Abrahams wrote:

...

At Mon, 26 Jul 2010 16:20:16 +0100, Paul A. Bristow wrote:

...
...
-----Original Message----- From: boost-bounces@lists.boost.org [mailto:boost-bounces@lists.boost.org] On Behalf Of David Abrahams Sent: Monday, July 26, 2010 3:10 PM To: boost@lists.boost.org Subject: Re: [boost] accurate sum accumulator (kahan)

At Mon, 26 Jul 2010 14:59:29 +0100, Paul A. Bristow wrote:

...
PS As for naming, I'd vote for sum and accurate_sum (to keep backward compatibility).

It's possible I'm completely off the mark here, but I think maybe we want sum and quick_sum.

This would be ideal - but doesn't accumulator already use 'sum' so it would change behaviour of existing programs?

Technically, of course it would, but aren't they getting the wrong answer currently?

I tend to agree with Dave, so long as the kahan sum accumulator is a drop-in replacement for what we already have. Does it Just Work for integral types, too? -- Eric Niebler BoostPro Computing http://www.boostpro.com

John Maddock

27 Jul 27 Jul

4:17 p.m.

...

...
...
This would be ideal - but doesn't accumulator already use 'sum' so it would change behaviour of existing programs?

Technically, of course it would, but aren't they getting the wrong answer currently?

I tend to agree with Dave, so long as the kahan sum accumulator is a drop-in replacement for what we already have. Does it Just Work for integral types, too?

It's not required for integral types as these don't suffer from cancelation errors. John.

David Abrahams

6:29 p.m.

At Tue, 27 Jul 2010 17:17:18 +0100, John Maddock wrote:

...

...
...
...
This would be ideal - but doesn't accumulator already use 'sum' so it would change behaviour of existing programs?

Technically, of course it would, but aren't they getting the wrong answer currently?

I tend to agree with Dave, so long as the kahan sum accumulator is a drop-in replacement for what we already have. Does it Just Work for integral types, too?

It's not required for integral types as these don't suffer from cancelation errors.

And I'm loathe to disagree with John and Matthias on matters of numerics. -- Dave Abrahams BoostPro Computing http://www.boostpro.com

Simonson, Lucanus J

26 Jul 26 Jul

4:46 p.m.

David Abrahams wrote:

...

At Mon, 26 Jul 2010 16:20:16 +0100, Paul A. Bristow wrote:

...
...
At Mon, 26 Jul 2010 14:59:29 +0100, Paul A. Bristow wrote:

...
PS As for naming, I'd vote for sum and accurate_sum (to keep backward compatibility).

It's possible I'm completely off the mark here, but I think maybe we want sum and quick_sum.

This would be ideal - but doesn't accumulator already use 'sum' so it would change behaviour of existing programs?

Technically, of course it would, but aren't they getting the wrong answer currently?

In the past I've had people tell me that they couldn't accept my code that gives the right answer to replace code that gives the wrong answer because it might cause their regression tests to fail. I don't think I need to tell you my opinion on that, but if you change the sum behavior you might get people who will complain. Anyone who writes a regression test that compares equality of a floating point summation result to a reference value is pretty much asking for what they would be getting. I say we do it. My position is that people use libraries expecting superior results to what they could achieve themselves with little effort (better algorithmic complexity/accuracy of outputs) and that such behavior should be the default. Regards, Luke

Jeffrey Lee Hellrung, Jr.

4:55 p.m.

On 07/26/2010 09:46 AM, Simonson, Lucanus J wrote:

...

David Abrahams wrote:

...
At Mon, 26 Jul 2010 16:20:16 +0100, Paul A. Bristow wrote:

...
...
At Mon, 26 Jul 2010 14:59:29 +0100, Paul A. Bristow wrote:

...
PS As for naming, I'd vote for sum and accurate_sum (to keep backward compatibility).

It's possible I'm completely off the mark here, but I think maybe we want sum and quick_sum.

This would be ideal - but doesn't accumulator already use 'sum' so it would change behaviour of existing programs?

Technically, of course it would, but aren't they getting the wrong answer currently?

In the past I've had people tell me that they couldn't accept my code that gives the right answer to replace code that gives the wrong answer because it might cause their regression tests to fail. I don't think I need to tell you my opinion on that, but if you change the sum behavior you might get people who will complain. Anyone who writes a regression test that compares equality of a floating point summation result to a reference value is pretty much asking for what they would be getting. I say we do it. My position is that people use libraries expecting superior results to what they could achieve themselves with little effort (better algorithmic complexity/accuracy of outputs) and that such behavior should be the default.

Without being too familiar with the accumulators framework, wouldn't replacing sum with the kahan summation algorithm penalize non-floating-point types? Or worse, it will cause compiler errors, since kahan requires subtraction? Or is the proposition to replace the (naive) summation algorithm with kahan summation algorithm for only floating point types? - Jeff

David Abrahams

5:09 p.m.

The latter [sent from tiny mobile device] On Jul 26, 2010, at 12:55 PM, "Jeffrey Lee Hellrung, Jr." <jhellrung@ucla.edu

...

wrote:

...

r is the proposition to replace the (naive) summation algorithm with kahan summation algorithm for only floating point types?

Gaetano Mendola

5:59 p.m.

On 07/26/2010 06:55 PM, Jeffrey Lee Hellrung, Jr. wrote:

...

Or is the proposition to replace the (naive) summation algorithm with kahan summation algorithm for only floating point types?

A possible scenario can be: 1) Leave the naive summation algorith as it is now 2) Create (with a typedef?) fast_sum that is the same of sum (naive) 2) Rename the sum_kahan as accurate_sum implemented as sum (naive) 3) Specialize the accurate_sum for floating points types implemented as the real kahan's algorithm however this can lead to have someone using accurate_sum for integral type and find it out is not accurate at all ! IMHO "accurate" or "fast" are missleading. What if someone comes out with an algorithm that gives better results and even faster than actual implementation? Think about how many sorting algorithms there are out there. I think the best is to leave sum as it is now (no complains about someone's regression tests failing), implement the sum_kahan only for floating types and have it listed in documentation specifying that is implemented only for floating types. The string Kahan in the name will also give to users the hint on what it does and what the performance loss is going to be. BTW, what's the procedure to submit something officially to boost? Regards Gaetano Mendola

Giovanni Piero Deretta

6:48 p.m.

On Mon, Jul 26, 2010 at 6:59 PM, Gaetano Mendola <mendola@gmail.com> wrote:

...

On 07/26/2010 06:55 PM, Jeffrey Lee Hellrung, Jr. wrote:

...
Or is the proposition to replace the (naive) summation algorithm with kahan summation algorithm for only floating point types?

A possible scenario can be:

1) Leave the naive summation algorith as it is now 2) Create (with a typedef?) fast_sum that is the same of sum (naive) 2) Rename the sum_kahan as accurate_sum implemented as sum (naive) 3) Specialize the accurate_sum for floating points types implemented as the real kahan's algorithm

however this can lead to have someone using accurate_sum for integral type and find it out is not accurate at all !

wouldn't the native sum be accurate (actually exact) for integral types as long as the sum doesn't overflow? -- gpd

Eric Niebler

6:52 p.m.

On 7/26/2010 1:59 PM, Gaetano Mendola wrote:

...

On 07/26/2010 06:55 PM, Jeffrey Lee Hellrung, Jr. wrote:

...
Or is the proposition to replace the (naive) summation algorithm with kahan summation algorithm for only floating point types?

A possible scenario can be:

1) Leave the naive summation algorith as it is now 2) Create (with a typedef?) fast_sum that is the same of sum (naive) 2) Rename the sum_kahan as accurate_sum implemented as sum (naive) 3) Specialize the accurate_sum for floating points types implemented as the real kahan's algorithm

however this can lead to have someone using accurate_sum for integral type and find it out is not accurate at all !

IMHO "accurate" or "fast" are missleading. What if someone comes out with an algorithm that gives better results and even faster than actual implementation?

Think about how many sorting algorithms there are out there.

I think the best is to leave sum as it is now (no complains about someone's regression tests failing), implement the sum_kahan only for floating types and have it listed in documentation specifying that is implemented only for floating types. The string Kahan in the name will also give to users the hint on what it does and what the performance loss is going to be.

That's perfectly reasonable, but care must be taken that the sum_kahan satisfies any dependency on the "sum" feature. That is, if someone asks for "mean" (which depends on a "sum" feature) and "sum_kahan", they don't get "mean", "sum", and "sum_kahan". The accumulators library makes this pretty straightforward.

...

BTW, what's the procedure to submit something officially to boost?

In this case, you would open a feature request on svn.boost.org ("New Ticket"), assign to me, and attach your patch. If it looks good, I'll merge it. Your patch should have docs and tests. Don't worry about formatting the docs. I can do that. I wouldn't ask you to set up the doc build chain; it's a headache. Thanks! -- Eric Niebler BoostPro Computing http://www.boostpro.com

Adam Wulkiewicz

8:35 p.m.

New subject: new library (space partitioning)

Hi, It's my first message so I'd like to introduce myself. My name's Adam Wulkiewicz, I'm a doctoral student of Technical University of Lodz, Poland. I'd like to take part in evolution of Boost library. I've implemented some space partitioning data structures and k-dimensional generalizations of STL containers and thought that since these kind of containers are useful to many people it would be nice to have them in Boost. For people who don't know what are space partitioning data structures: These data structures divides arbitrary k-dimensional space containing some k-dimensional 'objects' to accelerate searching for an 'object' or subset of 'objects' placed in a part of divided space. Just like fe. a binary tree divides a set of numbers to accelerate searching. These data structures are used fe. in: - computer graphics to accelerate rendering (searching of 3D objects, fe. triangles in raytracing or photons in photon mapping), - games to accelerate testing for intersection of rays with a scene and movable objects. In these examples 'the space' is 'common' 3D euclidean space but you may divide every space you like to accelerate searching: spherical space, space containing features of an object used in image recognition process and so on. I've implemented: - k-dimensional generalizations of STL containers (used by some space partitioning data structures) - k-dimensional space partitioning data structures - regular grid - kd-tree - something I've called regular tree, it's k-dimensional generalization of quadtree/octree, so for k=2 it's quadtree, for k=3 it's octree, for k=4 it's 16tree and so on. - k-dimensional euclidean algebra/geometry data structures - points, vectors etc., - unit tests. There are boost-like names used in code allready. If you'd like to know more just ask. My question is, what do you think about it? And, If you like my idea, how to start? I'd like to show my code to you in order to receive your opinions and suggestions. May I use boost repository or must I place my code elsewere? Regards, Adam Wulkiewicz

OvermindDL1

9 p.m.

New subject: new library (space partitioning)

On Mon, Jul 26, 2010 at 2:35 PM, Adam Wulkiewicz <adam.wulkiewicz@gmail.com> wrote:

...

Hi,

It's my first message so I'd like to introduce myself. My name's Adam Wulkiewicz, I'm a doctoral student of Technical University of Lodz, Poland.

I'd like to take part in evolution of Boost library. I've implemented some space partitioning data structures and k-dimensional generalizations of STL containers and thought that since these kind of containers are useful to many people it would be nice to have them in Boost.

For people who don't know what are space partitioning data structures: These data structures divides arbitrary k-dimensional space containing some k-dimensional 'objects' to accelerate searching for an 'object' or subset of 'objects' placed in a part of divided space. Just like fe. a binary tree divides a set of numbers to accelerate searching.

These data structures are used fe. in: - computer graphics to accelerate rendering (searching of 3D objects, fe. triangles in raytracing or photons in photon mapping), - games to accelerate testing for intersection of rays with a scene and movable objects.

In these examples 'the space' is 'common' 3D euclidean space but you may divide every space you like to accelerate searching: spherical space, space containing features of an object used in image recognition process and so on.

I've implemented: - k-dimensional generalizations of STL containers (used by some space partitioning data structures) - k-dimensional space partitioning data structures - regular grid - kd-tree - something I've called regular tree, it's k-dimensional generalization of quadtree/octree, so for k=2 it's quadtree, for k=3 it's octree, for k=4 it's 16tree and so on. - k-dimensional euclidean algebra/geometry data structures - points, vectors etc., - unit tests.

There are boost-like names used in code allready. If you'd like to know more just ask.

My question is, what do you think about it?

I am quite interested, I would have use for such a library.

Jeffrey Lee Hellrung, Jr.

9:01 p.m.

New subject: new library (space partitioning)

On 7/26/2010 1:35 PM, Adam Wulkiewicz wrote:

...

Hi,

It's my first message so I'd like to introduce myself. My name's Adam Wulkiewicz, I'm a doctoral student of Technical University of Lodz, Poland.

I'd like to take part in evolution of Boost library. I've implemented some space partitioning data structures and k-dimensional generalizations of STL containers and thought that since these kind of containers are useful to many people it would be nice to have them in Boost. [...]

I'm interested. How does this compare to the facilities that CGAL offers? (I have taken only a cursory glance at CGAL's space partitioning data structures...) I believe you can upload to the Boost Vault (http://www.boostpro.com/vault/), or just upload to your public_html space that your university provides (if it does). - Jeff

Adam Wulkiewicz

10:59 p.m.

New subject: new library (space partitioning)

Jeffrey Lee Hellrung, Jr. wrote:

...

On 7/26/2010 1:35 PM, Adam Wulkiewicz wrote:

...
Hi,

It's my first message so I'd like to introduce myself. My name's Adam Wulkiewicz, I'm a doctoral student of Technical University of Lodz, Poland.

I'd like to take part in evolution of Boost library. I've implemented some space partitioning data structures and k-dimensional generalizations of STL containers and thought that since these kind of containers are useful to many people it would be nice to have them in Boost. [...]

I'm interested.

How does this compare to the facilities that CGAL offers? (I have taken only a cursory glance at CGAL's space partitioning data structures...)

I believe you can upload to the Boost Vault (http://www.boostpro.com/vault/), or just upload to your public_html space that your university provides (if it does).

I don't know about all features of CGAL either. I see that there is a kd-tree but I think it's a point version. The rest of these algorithms/ data structures are desined just for 2D and 3D data (Which is wierd because their AABBTree should be implemented as n-dimensional data structure). This kind of library should be fully n-d, with interchangable containers. What is more, there is no regular grid which is good for dynamically changing data. By the way, how the process of incorporation of new library looks like? When the library is good for become a candidate? Where it should be placed etc.?

Simonson, Lucanus J

27 Jul 27 Jul

6:48 p.m.

New subject: new library (space partitioning)

Adam Wulkiewicz wrote:

...

By the way, how the process of incorporation of new library looks like? When the library is good for become a candidate? Where it should be placed etc.?

You have two options. First, you can go through the entire review process described here: http://www.boost.org/development/submissions.html The process is long, expect it to take at least one year. I am keenly interested in seeing quality libraries like this developed under free license. My organization would use them and I get asked twice a year on average if my library has it and if not if it exists under free license elsewhere. I volunteer to be your review manager if the quality of the library meets my expectations so that is one stumbling block that won't trip you up. (Some worthy libraries have difficulty finding a review manager.) However, such a review would be quite controversial because some people will have a strong opinion that you should choose the other option. The other option is to engage with the newly accepted Boost.Geometry library. They have already some plans to provide a simple 2D space partitioning tree that exists already in non-boost ready form. I don't know the status of their development on it because I don't monitor their list and they don't discuss it on the boost dev list. They are quite busy finishing implmenting their library and getting it release ready in terms of documentation and meeting the post conditions of their review. My understanding is that there is at least four or five developers active in the project, though most of the work is done by the primary developer. Engaging with them wouldn't necessarily shorten the time it takes to get your code released as part of boost, but it would allow you to avoid a formal review and instead just have to satisfy them that your code is ready to become a part of their library. While I also have a geometric boost library about to be released as part of 1.44: https://svn.boost.org/svn/boost/sandbox/gtl/doc/index.htm I don't think you can add your containers as new features to my library because: 1. the community pretty much decided that Boost.Geometry is where such new development should go 2. it doesn't fit very well in my library, which is named Boost.Polygon. I provide a coordinate, point and rectangle concepts that could serve your needs, but you need spherical coordinate space concept as well, which Boost.Geometry already has. That said I'm not opposed to people extending my library in general, quite the opposite. I can report the the Google Summer of Code project I'm currently mentoring has succeeded in producing a robust and optimal voronoi diagram of points sweepline implementation as is working toward voronoi of line segments and, by extension, polygons. My advice to you is to talk to the Boost.Geometry people and find out what their schedule is for release of their library. If you are satisfied with adding a schedule dependency on their work then you write your interfaces to depend on their corrdinate, point, rectangle, sphere and coordinate system concepts and go through whatever process they have for accepting extensions. I think your code should go in the core and not an extension because it is generally useful for all geometry and even things that aren't commonly though of as geometry, like database indexing, but that will be for you to work out with them. They have put some thought into making their concepts extensible to n-dimensions, so hopefully you can benefit from that design work. In a way it would be a shame that they went to the effort and then not have that work used for your containers. Regards, Luke

Adam Wulkiewicz

9:22 p.m.

New subject: new library (space partitioning)

Simonson, Lucanus J wrote:

...

Adam Wulkiewicz wrote:

...
By the way, how the process of incorporation of new library looks like? When the library is good for become a candidate? Where it should be placed etc.?

You have two options. First, you can go through the entire review process described here: http://www.boost.org/development/submissions.html The process is long, expect it to take at least one year. I am keenly interested in seeing quality libraries like this developed under free license. My organization would use them and I get asked twice a year on average if my library has it and if not if it exists under free license elsewhere. I volunteer to be your review manager if the quality of the library meets my expectations so that is one stumbling block that won't trip you up. (Some worthy libraries have difficulty finding a review manager.) However, such a review would be quite controversial because some people will have a strong opinion that you should choose the other option.

The other option is to engage with the newly accepted Boost.Geometry library. They have already some plans to provide a simple 2D space partitioning tree that exists already in non-boost ready form. I don't know the status of their development on it because I don't monitor their list and they don't discuss it on the boost dev list. They are quite busy finishing implmenting their library and getting it release ready in terms of documentation and meeting the post conditions of their review. My understanding is that there is at least four or five developers active in the project, though most of the work is done by the primary developer. Engaging with them wouldn't necessarily shorten the time it takes to get your code released as part of boost, but it would allow you to avoid a formal review and instead just have to satisfy them that your code is ready to become a part of their library.

While I also have a geometric boost library about to be released as part of 1.44: https://svn.boost.org/svn/boost/sandbox/gtl/doc/index.htm I don't think you can add your containers as new features to my library because: 1. the community pretty much decided that Boost.Geometry is where such new development should go 2. it doesn't fit very well in my library, which is named Boost.Polygon. I provide a coordinate, point and rectangle concepts that could serve your needs, but you need spherical coordinate space concept as well, which Boost.Geometry already has.

That said I'm not opposed to people extending my library in general, quite the opposite. I can report the the Google Summer of Code project I'm currently mentoring has succeeded in producing a robust and optimal voronoi diagram of points sweepline implementation as is working toward voronoi of line segments and, by extension, polygons.

My advice to you is to talk to the Boost.Geometry people and find out what their schedule is for release of their library. If you are satisfied with adding a schedule dependency on their work then you write your interfaces to depend on their corrdinate, point, rectangle, sphere and coordinate system concepts and go through whatever process they have for accepting extensions. I think your code should go in the core and not an extension because it is generally useful for all geometry and even things that aren't commonly though of as geometry, like database indexing, but that will be for you to work out with them. They have put some thought into making their concepts extensible to n-dimensions, so hopefully you can benefit from that design work. In a way it would be a shame that they went to the effort and then not have that work used for your containers.

Thank you for detailed answer and a proposition to be my review manager. I think that the library I'd like to develop don't fit in Boost.Geometry. The purposes of both are different. Space partitioning (in mathematical meaning), is a separate problem. Functionality like this may be used in library like Boost.Geometry but might be used with a large number of different libraries as well. As I see it, it should be a set of general, n-dimensional solutions. Therefore I think, that it should be a separate library. I'd rather choose option number one (thank you again) but I'll wait a while for the feedback from the comunity. Regards, Adam

Barend Gehrels

28 Jul 28 Jul

9:14 a.m.

New subject: new library (space partitioning)

Hi Adam, First I'm very interested in your library. Is it already uploaded into Vault? Could you publish the URL? If you didn't upload it already, you might consider to use the Boost SVN sandbox instead.

...

I think that the library I'd like to develop don't fit in Boost.Geometry. The purposes of both are different. Space partitioning (in mathematical meaning), is a separate problem. Functionality like this may be used in library like Boost.Geometry but might be used with a large number of different libraries as well. As I see it, it should be a set of general, n-dimensional solutions. Therefore I think, that it should be a separate library.

I probably agree with this. However, I would like to investigate if a coupling of these two libraries is possible, to provide the users of Boost.Geometry with a good spatial index which works for them by default and which can thus also be used for other purposes. So Boost.Geometry might follow your library, or you might use some concepts of Boost.Geometry, but I would like to see the sources to say more about this. Regards, Barend

Adam Wulkiewicz

29 Jul 29 Jul

1:11 a.m.

New subject: new library (space partitioning)

Barend Gehrels wrote:

...

Hi Adam,

First I'm very interested in your library.

Is it already uploaded into Vault? Could you publish the URL? If you didn't upload it already, you might consider to use the Boost SVN sandbox instead.

I've just uploaded preliminary version into the Vault. There is a README.txt file with description and main.cpp running tests. Regards, Adam

Adam Wulkiewicz

1:14 a.m.

New subject: new library (space partitioning)

Sorry, I didn't write what is it called. It's space-0.1.zip.

Simonson, Lucanus J

7:07 p.m.

New subject: new library (space partitioning)

Adam Wulkiewicz wrote:

...

Sorry, I didn't write what is it called. It's space-0.1.zip.

I wish you had just posted the readme at the outset: --------------------README.txt from space-0.1.zip-------------------------- It's a preliminary version of library I've called Space. As you can see it's _not_ fully operational. The main reason is that I wanted to work out some basic concepts and interfaces before I'll go further. Contents 1. Parts of the library 2. The interfaces and uses 3. Additional notes 1. Parts of the library: space::ag - namespace containing compile-time linear algebra, it's not required by the other parts of the library but it's good supplement. - angle, vector, point, matrix, square_matrix - nD - orientation, rotation_matrix - 2/3D - rotation_quaternion - 3D space::kd - nD containers, types and containers may be used in space::spar containers(in regular grid and regular tree) types.hpp contains nD versions of values deque, vector which are recursive_container(s) - nD containers which are internally fe. std::vector of std::vectors of std::vectors... dynamic_array - nD container which is internally a single std::vector * the name should be changed static_array - recursive const size container * it might be a derivation of recursive_container with boost::array as a basic type * it has additional operators to make possible dereferencing All containers has nD resize, insert, erase etc. The iterator is replaced by enumerator and it iterates in nD area of elements. There is a line_enumerator also which iterates by the elements on a nD line between two elements. space::spar - nD space partitioning containers concepts point_type concept defined type value_type static member dimension operator[]() returning nth coordinate point_object concept defined point_type position() method returning object of point_type volumetric_object concept defined point_type aabb() method returning object of space::spar::aabb<point_type> If there will be structures using on bounding spheres or other bounding objects there should be more volumetric objects concepts. aabb - nD AABB it may be tested for intersection with a ray (for now ray is defined by origin point and direction vector, both are an implementation of point_type concept (it should be fe. coordinates concept in this case)) point_node_kd_tree - kd-tree for point objects internally has heap-like structure(all nodes in std::vector) points are the nodes, but only pointers are stored points passed by iterators in the constructor or build method splitting plane - position of the middle point - it is left balanced no addition of new points/rebalancing queries: closest point precise number of closest points all points from a region (nD hypersphere) point_kd_tree - kd-tree for point objects internally has structure of a linked nodes binary tree points are stored in containers in nodes containers are in leaf nodes and in internal nodes (which is bad) points passed by iterators in the constructor or build method splitting plane - half of the longest edge of current node aabb - not balanced this is just one of the algorithms because I want to implement different ones fe. an algorithm of building a tree with clipping of empty space (some articles mentions that this gives good results) no addition of new points/rebalancing queries: closest point precise number of closest points all points from a region (nD hypersphere) kd_tree - kd-tree for volumetric objects internally has structure of a linked nodes binary tree objects are stored in containers in nodes containers are in leaf nodes and in internal nodes (the same nodes are used in kd_tree and point_kd_tree) splitting plane - half of the longest edge of current node aabb - not balanced this is just one of the algorithms because I want to implement different ones fe. V. Havran proposed an algorithm of kd-tree building with cutting of empty space and automatical splitting stop detection which has O(NlogN) complexity, no addition of new points/rebalancing no queries regular_tree - nD generalization of 2Dquadtree/3Doctree internally has structure of a linked nodes, each node has 2^n children objects are stored in containers in nodes containers are in leaf nodes and in internal nodes (again) splitting plane - halves of the edges of current node aabb this is just one of the algorithms no rebuilding/rebalancing no queries regular_grid - nD not fully functional regular grid which should be called regular_grid_base it is a container which has a dimension and i placed in space (has aabb) and contains not infinitesimal cells, it is able to do some operations on subspaces(ranges) of this grid 2. The interfaces and uses It's an example where point_node_kd_tree is used but you may change it to space::spar::point_kd_tree as well. #include <vector> #include <boost/foreach.hpp> #include <space/spar.hpp> struct point { typedef float value_type; static const size_t dimension = 3; point() {} point(float a, float b, float c) { arr[0] = a; arr[1] = b; arr[2] = c; } const float & operator[](size_t i) const { return arr[i]; } float & operator[](size_t i) { return arr[i]; } float arr[3]; }; struct point_object { typedef point point_type; const point_type & position() const { return p; } point_type p; }; int main() { std::vector<point_object> points(100); BOOST_FOREACH(point_object &p, points) { p.p = point(rand() % 100, rand() % 100, rand() % 100); } space::spar::point_node_kd_tree<point_object> kdt(points.begin(), points.end()); point_object *p = kdt.find(point(50.0f, 50.0f, 50.0f), 100.0f); found_points<point_object> fp(3); kdt.find(fp, point(50.0f, 50.0f, 50.0f), 400.0f); std::vector<point_object*> fpts; fpts.reserve(fp.size()); fp.get(std::back_inserter(fpts)); return 0; } 3. Additional notes There is a lot of additional things to do in this matter: - different trees building algorithms, - different trees structures, - parallel building, - parallel queries, - incremental queries, - ray queries, - volume queries, - use of pool allocators, - etc. But first, I need to know, what do you think about this? Is it a good starting point? If not, what are your suggestions? --------------------------------------END of README.txt------------------------------------- I'm a little dissapointed that there is no insertion or rebalancing on any of the trees. This is more of an intention to write a library than a library. Since Boost.Geometry already provides an nD point type and point concept and planned to eventually provide nD kd-trees you might want to reconsider working with them and adopting their interface design, which was arrived at after literally years of (sometimes heated) debate on this list about what the best design for the interfaces should be. The argument that what you are doing is math and not geometry doesn't really hold water in my opinion. By the way, the iterators over nD data structures problem is an age old debate in C++ and there are people on this list who can provide you with valuable insight on what to do and what not to do based on fifteen years of experience with trying and sometimes failing to define a good iterator based interface for nD containers. Your interface seems a little off to me, not what I was expecting. Why does find return a pointer to point_object? If you don't find a point within 100.0f does it return a null pointer? Why not use a boost::optional or pass point_object by reference as first argument and return bool? Either is better by a long shot. Are you dynamically allocating the point_object you return? Why does point_object exist, why not just use point_type directly? Why does found_points exist, why not pass by reference a back inserter that accepts point_type as the first argument of find and return the number of points found? There is some complexity to your interface design that needs to be justified with your rationale for making these design decisions. Don't take this feedback as criticism, I'm still excited that you plan to contribute this library and supportive of your efforts. Regards, Luke

Adam Wulkiewicz

30 Jul 30 Jul

1:15 a.m.

New subject: new library (space partitioning)

Simonson, Lucanus J wrote:

...

This is more of an intention to write a library than a library.

Yes, you're right. Maby I was too hasty.

...

Since Boost.Geometry already provides an nD point type and point concept and planned to eventually provide nD kd-trees you might want to reconsider working with them and adopting their interface design

I wanted to show you what I've got at this stage. You're right I should adopt their nD point interface and box probably. But, if boost::geometry::point you have in mind, coordinates are chosen in compile-time only(or maby I'm wrong?) and this can't be used by space partitioning data structures.

...

By the way, the iterators over nD data structures problem is an age old debate in C++ and there are people on this list who can provide you with valuable insight on what to do and what not to do based on fifteen years of experience with trying and sometimes failing to define a good iterator based interface for nD containers. [...]

I will appreciate any help.

...

Your interface seems a little off to me, not what I was expecting.

Yes, it's not fully designed and rather low-level. It is the implmented interface, not the target.

...

If you don't find a point within 100.0f does it return a null pointer?

Yes. Btw, 100.0f is a square distance (it should be a distance).

...

Why not use a boost::optional or pass point_object by reference as first argument and return bool? Either is better by a long shot.

The thing is, there aren't copies of point_objects in kd-tree. There are pointers. If there were copies, the 2nd option will be the best. Btw, it's another issue I'm not sure about. Should copies or references/pointers be kept in kd-tree?

...

Are you dynamically allocating the point_object you return?

No, this is a pointer to one of the point_object passed in the constructor.

...

Why does point_object exist, why not just use point_type directly?

Because there may be some complex point object which shouldn't been point's interface implementation because it isn't a point. Fe. in Photon Mapping algorithm photons are used. They are point objects but aren't points, they have some position but may have some other nD attributes direction, color, etc. But, I'm open to suggestion.

...

Why does found_points exist, why not pass by reference a back inserter that accepts point_type as the first argument of find and return the number of points found?

Yes, this should be corrected. found_points should be encapsulated in the kd-tree. It is a container designed to take part in search of N point_objects in some max distance. If more than N objects are found the distance may be closer than max distance. This class contains information about the search. Without it in the interface, the number of points found and the max distance should be returned with the point objects.

...

Don't take this feedback as criticism, I'm still excited that you plan to contribute this library and supportive of your efforts.

I expect and appreciate criticism. I'm still learning. So, I think the searching interface might look more or less like this: (I assume that copies are used) [code] // find one closest point, max distance = 10 point_object p; bool found = kdt.find(p, point_type(...), 10); // find max 5 closest points, initial max distance = 10 // output: number of pointsfound and distance to the furthest point std::vector<point_object> fv; fv.reserve(5); std::pair<size_t, value_type> found_data = kdt.find(std::back_inserter(fv), point_type(...), 10, 5); // or value_type max_distance; size_t found_points = kdt.find(std::back_inserter(fv), max_distance, point_type(...), 10, 5); // find all points in area, initial max distance = 10 std::vector<point_object> fv2; fv2.reserve(10); size_t found_num = kdt.find(std::back_inserter(fv2), point_type(...), 10); [code] And building interface like this: [code] std::vector<point_object> v(100); std::vector<point_object> v2(10); // build kd-tree here space::spar::kd_tree<point_object> kdt(v.begin(), v.end()); kdt.add(v2.begin(), v2.end()); kdt.add(point_object(...)); kdt.rebuild(); kdt.clear(); kdt.build(v.begin(), v.end()); [code] Eventually: [code] std::vector<point_object> v(100); std::vector<point_object> v2(10); std::vector<point_object> v3(20); // just add points, tree not built space::spar::kd_tree<point_object> kdt(v.begin(), v.end()); kdt.add(v2.begin(), v2.end()); kdt.add(point_object(...)); // build kd-tree here kdt.rebuild(); kdt.add(v3.begin(), v3.end()); kdt.add(point_object(...)); kdt.rebuild(); [code] Eventually, rebuilding might be 'inside' add method. Rebuild might be implemented in build. There might be methods for removing points from given area of kd-tree. I used rebuild(), not (rebalance()) because there are some splitting rules producing nice, not balanced kd-trees (see: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.39.8380). Regards, Adam

Simonson, Lucanus J

2:31 a.m.

New subject: new library (space partitioning)

Adam Wulkiewicz wrote:

...

You're right I should adopt their nD point interface and box probably.

But, if boost::geometry::point you have in mind, coordinates are chosen in compile-time only(or maby I'm wrong?) and this can't be used by space partitioning data structures.

If they don't provide an operator[] or get(axis) interface and only get<>() then that is a problem and you probably should influence them to fix that so that before they release their library. It isn't too late.

...

...
By the way, the iterators over nD data structures problem is an age old debate in C++ and there are people on this list who can provide you with valuable insight on what to do and what not to do based on fifteen years of experience with trying and sometimes failing to define a good iterator based interface for nD containers. [...]

I will appreciate any help.

You might want to see if you can dovetail into ongoing work on linear algebra and matrix library development in boost. We have ublas, multiarray and several libraries under construction in this area.

...

...
Your interface seems a little off to me, not what I was expecting.

Yes, it's not fully designed and rather low-level. It is the implmented interface, not the target.

Ok, that makes more sense.

...

...
If you don't find a point within 100.0f does it return a null pointer?

Yes. Btw, 100.0f is a square distance (it should be a distance).

That actually makes sense for an implementation interface since you want to avoid the sqrt.

...

...
Why not use a boost::optional or pass point_object by reference as first argument and return bool? Either is better by a long shot.

The thing is, there aren't copies of point_objects in kd-tree. There are pointers. If there were copies, the 2nd option will be the best.

Btw, it's another issue I'm not sure about. Should copies or references/pointers be kept in kd-tree?

It depends to some degree on how many dimensions are stored in a point. If the point is not absurdly large then copies should always be better because you avoid pointer chasing and get better cache behavior. Just try sorting a vector of points and compare to sorting a vector of pointers to points and you'll quickly see that local copies wins.

...

...
Are you dynamically allocating the point_object you return?

No, this is a pointer to one of the point_object passed in the constructor.

One reason to do this sort of thing is if the point type disallowed copy construction and assignment, but I doubt you had that in mind. We lean pretty heavily on NRVO and inlining to make return by value fast.

...

...
Why does point_object exist, why not just use point_type directly?

Because there may be some complex point object which shouldn't been point's interface implementation because it isn't a point. Fe. in Photon Mapping algorithm photons are used. They are point objects but aren't points, they have some position but may have some other nD attributes direction, color, etc.

But, I'm open to suggestion.

You can use adaptor traits similar to what boost geometry does to avoid the wrapper class.

...

So, I think the searching interface might look more or less like this: (I assume that copies are used)

[code]

// find one closest point, max distance = 10 point_object p; bool found = kdt.find(p, point_type(...), 10);

// find max 5 closest points, initial max distance = 10 // output: number of pointsfound and distance to the furthest point std::vector<point_object> fv; fv.reserve(5); std::pair<size_t, value_type> found_data = kdt.find(std::back_inserter(fv), point_type(...), 10, 5); // or value_type max_distance; size_t found_points = kdt.find(std::back_inserter(fv), max_distance, point_type(...), 10, 5);

I wouldn't bother returning the max distance found. Let the user derive it from the points returned if they want it. It complicates the interface and I'm not sure how often it is useful.

...

// find all points in area, initial max distance = 10 std::vector<point_object> fv2; fv2.reserve(10); size_t found_num = kdt.find(std::back_inserter(fv2), point_type(...), 10);

This one looks nice and clean.

...

Eventually, rebuilding might be 'inside' add method. Rebuild might be implemented in build. There might be methods for removing points from given area of kd-tree.

I used rebuild(), not (rebalance()) because there are some splitting rules producing nice, not balanced kd-trees (see: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.39.8380).

What happens if the user queries before after they have added but before they have built? Throw an exception? Return nothing? Automatically rebuild? I wouldn't provide build, rebuild or rebalance interfaces. It should be like the red-black tree, we never tell a std::set to rebalance itself. If the user wants to rebuild the tree they can construct a new one with all the points. What about if you provided lower_bound and upper_bound: kdtree::iterator itr = kdt.lower_bound(point, 0.0f); kdtree::iterator itr2 = kdt.upper_bound(point, 0.0f); std::vector<point> pts(itr, itr2); kdt.erase(itr, itr2); You may also sometimes want all the points in the tree and would just use begin() and end(). This style of interface has the advantage that it is maximally intuitive to people who are used to the stl. It may have the drawback of being challenging to implement and perhaps execute slower than back inserter style. There are iterator tricks used and facilitated by boost to make it easier and more efficient to provide such an interface as a wrapper around the back inserter interface, for example. Now, let's imagine that instead of euclidean the user wants L distance metric. You can provide a version of lower_bound and upper_bound that take a distance metric functor as a third argument. Lots to think about. Regards, Luke

Adam Wulkiewicz

12:02 p.m.

New subject: new library (space partitioning)

Simonson, Lucanus J wrote:

...

Adam Wulkiewicz wrote:

...
You're right I should adopt their nD point interface and box probably.

But, if boost::geometry::point you have in mind, coordinates are chosen in compile-time only(or maby I'm wrong?) and this can't be used by space partitioning data structures.

If they don't provide an operator[] or get(axis) interface and only get<>() then that is a problem and you probably should influence them to fix that so that before they release their library. It isn't too late.

To clarify this. It is possible to create some data structures (kd-tree with sequential choosing of splitting plane, regular tree, etc.) but in more complex kd-tree splitting algorithm there must be this kind of interface used. I'd like to develop data structures with 'switchable' internal structure/building algorithm. If the tree structure is better, the searching is faster.

...

...
...
Why not use a boost::optional or pass point_object by reference as first argument and return bool? Either is better by a long shot.

The thing is, there aren't copies of point_objects in kd-tree. There are pointers. If there were copies, the 2nd option will be the best.

Btw, it's another issue I'm not sure about. Should copies or references/pointers be kept in kd-tree?

It depends to some degree on how many dimensions are stored in a point. If the point is not absurdly large then copies should always be better because you avoid pointer chasing and get better cache behavior. Just try sorting a vector of points and compare to sorting a vector of pointers to points and you'll quickly see that local copies wins.

...
...
Are you dynamically allocating the point_object you return?

No, this is a pointer to one of the point_object passed in the constructor.

One reason to do this sort of thing is if the point type disallowed copy construction and assignment, but I doubt you had that in mind. We lean pretty heavily on NRVO and inlining to make return by value fast.

...
...
Why does point_object exist, why not just use point_type directly?

Because there may be some complex point object which shouldn't been point's interface implementation because it isn't a point. Fe. in Photon Mapping algorithm photons are used. They are point objects but aren't points, they have some position but may have some other nD attributes direction, color, etc.

But, I'm open to suggestion.

You can use adaptor traits similar to what boost geometry does to avoid the wrapper class.

With points it's ok, since there will be only one instance of a point in the tree. User may still want to store them in different container and use kdt just for searching purposes but he may just use some wrapper. With big points (small photons will have around 28B) he may use a wrapper too. The problem is when you take volume objects into consideration. Volume object may be 'present' in some number of kd-tree nodes. So you must have pointers/references in there. And the 'global' question is, what to treat kd-tree like? As a container which purpose is to store points/volumes and provide fast searching or maby as an additional searching accelerator. If it should be the the 'main' container to store objects, there should be some container internally in the volume kd-tree and the tree structure should contain pointers/references/indexes. So I thought that this container should be managed by the user and the tree should be just a search accelerator. The main danger I see is that if container is changed, user must remember to create new kd-tree. Another thing is that volume objects may be a lot bigger than points, fe. it may be 3D meshes.

...

...
// find one closest point, max distance = 10 point_object p; bool found = kdt.find(p, point_type(...), 10);

// find max 5 closest points, initial max distance = 10 // output: number of pointsfound and distance to the furthest point std::vector<point_object> fv; fv.reserve(5); std::pair<size_t, value_type> found_data = kdt.find(std::back_inserter(fv), point_type(...), 10, 5); // or value_type max_distance; size_t found_points = kdt.find(std::back_inserter(fv), max_distance, point_type(...), 10, 5);

I wouldn't bother returning the max distance found. Let the user derive it from the points returned if they want it. It complicates the interface and I'm not sure how often it is useful.

Yes it complicates but it would be nice to return it somehow since it's calculated 'inside' allready and may be used in algorithms where the area of found points is used. Maby just implement 2 versions of this method? Btw, there might be fe. a search area functor. This will make possible to specify the size and the shape of the search area.

...

...
I used rebuild(), not (rebalance()) because there are some splitting rules producing nice, not balanced kd-trees (see: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.39.8380).

What happens if the user queries before after they have added but before they have built? Throw an exception? Return nothing? Automatically rebuild? I wouldn't provide build, rebuild or rebalance interfaces. It should be like the red-black tree, we never tell a std::set to rebalance itself. If the user wants to rebuild the tree they can construct a new one with all the points.

It makes it even simpler.

...

What about if you provided lower_bound and upper_bound:

kdtree::iterator itr = kdt.lower_bound(point, 0.0f); kdtree::iterator itr2 = kdt.upper_bound(point, 0.0f); std::vector<point> pts(itr, itr2); kdt.erase(itr, itr2);

You may also sometimes want all the points in the tree and would just use begin() and end(). This style of interface has the advantage that it is maximally intuitive to people who are used to the stl. It may have the drawback of being challenging to implement and perhaps execute slower than back inserter style. There are iterator tricks used and facilitated by boost to make it easier and more efficient to provide such an interface as a wrapper around the back inserter interface, for example.

Now, let's imagine that instead of euclidean the user wants L distance metric. You can provide a version of lower_bound and upper_bound that take a distance metric functor as a third argument.

Yes, I like it. Regards, Adam

Mathias Gaunard

7 Aug 7 Aug

2:29 p.m.

New subject: new library (space partitioning)

Le 29/07/2010 20:07, Simonson, Lucanus J wrote:

...

Your interface seems a little off to me, not what I was expecting.

I have to agree, and add more to that. What a spatial index does is fairly simple (to the user). Its interface should be the same as that of std::multiset, but instead of using operator<, it should use intersection (which, like std::multiset, should have a default but be changeable to an arbitrary function object). You're going to want to change std::multiset<T>::find slightly to allow arbitrary types, not just T (often considered an arbitrary limitation and defect by people), and to return a range instead of an iterator; and you might want to lazily evaluate that range as well, hence providing the incremental search you put on your to-do list. The default intersection function would obviously be a generic one from Geometry. I'm not sure Geometry has such a generic intersection function; but if it doesn't, that where it needs to be. Volume queries, ray queries etc. don't need anything special. They should work with the generic intersection function just like the rest, no need to special case anything. There is no need to introduce any new point/object concept, nor new nD containers. The spatial index should assume the very least information it needs in order to be truly generic. All the implementation details you might need should be in Geometry, and the default intersection function should be from Geometry as well (if they're not, that's where they should be anyway), so I don't see why you would want to make it a separate library, except out of craziness. Dynamic typing should not be the responsability of the index, so it should work with objects of type T, not pointers, and let the user deal with polymorphism, if he needs it, by instantiating the index with smart pointers or type erasure types. Memory allocation is purely dealt with by the allocator, so seeing "use of pool allocators" in the to-do list worries me. I would also like to see an intrusive version of the index, that doesn't do any memory allocation. The non-intrusive version can be built on top of the intrusive one, but the opposite isn't possible. So my opinion is that the whole design is done utterly wrong at the moment. I personally won't consider using it or promoting unless it is exactly as I described (intrusive support is optional).

Mathias Gaunard

3:13 p.m.

New subject: new library (space partitioning)

Just adding a few things to what I said: 07/08/2010 15:29, Mathias Gaunard wrote:

...

What a spatial index does is fairly simple (to the user). Its interface should be the same as that of std::multiset, but instead of using operator<, it should use intersection (which, like std::multiset, should have a default but be changeable to an arbitrary function object).

While intersection test should be enough for searching, you're going to need other primitives for maintaining the data structure. They probably should be template parameters as well. For r-trees, for example, you're going to need a function that gives the axis-aligned bounding box for a T, and Geometry does provide that with the 'envelope' function. But in a typical scenario, you're going to want to precompute all the bounding boxes for all objects before you index them, so having a customized function that just returns the pre-calculated envelope seems needed.

...

You're going to want to change std::multiset<T>::find slightly to allow arbitrary types, not just T (often considered an arbitrary limitation and defect by people), and to return a range instead of an iterator; and you might want to lazily evaluate that range as well, hence providing the incremental search you put on your to-do list.

You may want to add another find function that sorts the results by distance as well, for nearest neighbour queries.

Adam Wulkiewicz

8 Aug 8 Aug

8:32 p.m.

New subject: new library (space partitioning)

Mathias Gaunard wrote:

...

Le 29/07/2010 20:07, Simonson, Lucanus J wrote:

...
Your interface seems a little off to me, not what I was expecting.

I have to agree, and add more to that.

Yes, I wrote earlier that the interface should be redesigned and that I'd like to write the library, not that I have one fully functional. I've decided to show you the code as it is because I knew that you'll have many objections, different people will have different views and because I don't know about your plans and what is needed. I'm gathering information right now.

...

There is no need to introduce any new point/object concept, nor new nD containers. The spatial index should assume the very least information it needs in order to be truly generic.

All the implementation details you might need should be in Geometry, and the default intersection function should be from Geometry as well (if they're not, that's where they should be anyway), so I don't see why you would want to make it a separate library, except out of craziness.

It depends, how you see these things. You may want spatial index or various spacial indexes in Boost.Geometry. But you may also want multi-purpose space partitioning containers enclosed in separate library. It's a matter of view. Where user should search this functionality for? And for what data should these containers work for?

...

Memory allocation is purely dealt with by the allocator, so seeing "use of pool allocators" in the to-do list worries me.

There are large number of different data structures and there may be fe. a massive tree with large number of nodes created in the building process and it's better to have pools of nodes than to create each one separately.

Mathias Gaunard

9:59 p.m.

New subject: new library (space partitioning)

Le 08/08/2010 21:32, Adam Wulkiewicz wrote:

...

It depends, how you see these things. You may want spatial index or various spacial indexes in Boost.Geometry. But you may also want multi-purpose space partitioning containers enclosed in separate library. It's a matter of view. Where user should search this functionality for? And for what data should these containers work for?

Boost.Geometry works with un-intrusive concepts, and allows to use arbitrary types.

...

There are large number of different data structures and there may be fe. a massive tree with large number of nodes created in the building process and it's better to have pools of nodes than to create each one separately.

I don't understand your point. Your container wants to allocate memory, it uses its allocator parameter. The user wants to change the way memory is allocated, he changes that parameter. You can indeed allocate all the memory in one go at construction, I don't see why that requires a pool allocator.

Adam Wulkiewicz

9 Aug 9 Aug

4:35 p.m.

New subject: new library (space partitioning)

Mathias Gaunard wrote:

...

Le 08/08/2010 21:32, Adam Wulkiewicz wrote:

...
It depends, how you see these things. You may want spatial index or various spacial indexes in Boost.Geometry. But you may also want multi-purpose space partitioning containers enclosed in separate library. It's a matter of view. Where user should search this functionality for? And for what data should these containers work for?

Boost.Geometry works with un-intrusive concepts, and allows to use arbitrary types.

I'm talking about something different. There are many fields where spacial indexes are used. Does user who wants to implement an algorithm extracting features from the picture or searching data in the database should look for needed data structures somwhere deep in the geometry library? Or maby, the geometry library should use functionality provided by the space partitioning library (both using the same concepts)?

...

...
There are large number of different data structures and there may be fe. a massive tree with large number of nodes created in the building process and it's better to have pools of nodes than to create each one separately.

I don't understand your point. Your container wants to allocate memory, it uses its allocator parameter. The user wants to change the way memory is allocated, he changes that parameter.

You can indeed allocate all the memory in one go at construction, I don't see why that requires a pool allocator.

Yes, you're right. Although, there may be more than one allocators used in the tree building algorithm. There may be allocators for tree nodes, temporary objects at each recursive step and for objects in leafs. I thought about implementing this kind of container later and wanted to select proper allocators internally automatically (because it complicates the interface and the user must know about the internal details) but this is just a proposition.

Barend Gehrels

10 Aug 10 Aug

7:35 a.m.

New subject: new library (space partitioning)

Adam Wulkiewicz wrote:

...

Mathias Gaunard wrote:

...
Le 08/08/2010 21:32, Adam Wulkiewicz wrote:

...
It depends, how you see these things. You may want spatial index or various spacial indexes in Boost.Geometry. But you may also want multi-purpose space partitioning containers enclosed in separate library. It's a matter of view. Where user should search this functionality for? And for what data should these containers work for?

Boost.Geometry works with un-intrusive concepts, and allows to use arbitrary types.

I'm talking about something different. There are many fields where spacial indexes are used. Does user who wants to implement an algorithm extracting features from the picture or searching data in the database should look for needed data structures somwhere deep in the geometry library? Or maby, the geometry library should use functionality provided by the space partitioning library (both using the same concepts)? I don't think these are structures deep in the geometry library. It can be the structures of the users, something in a picture or database. Which is "adapted" to the Boost.Geometry Point Concept (this is normally done by one Macro call).

The spatial index could then use the Point Concept of Boost.Geometry, and both libraries will work together happily. Regards, Barend

Simonson, Lucanus J

11 Aug 11 Aug

5:08 p.m.

New subject: new library (space partitioning)

Barend Gehrels wrote:

...

Adam Wulkiewicz wrote:

...
Mathias Gaunard wrote:

...
Le 08/08/2010 21:32, Adam Wulkiewicz wrote:

...
It depends, how you see these things. You may want spatial index or various spacial indexes in Boost.Geometry. But you may also want multi-purpose space partitioning containers enclosed in separate library. It's a matter of view. Where user should search this functionality for? And for what data should these containers work for?

Boost.Geometry works with un-intrusive concepts, and allows to use arbitrary types.

I'm talking about something different. There are many fields where spacial indexes are used. Does user who wants to implement an algorithm extracting features from the picture or searching data in the database should look for needed data structures somwhere deep in the geometry library? Or maby, the geometry library should use functionality provided by the space partitioning library (both using the same concepts)? I don't think these are structures deep in the geometry library. It can be the structures of the users, something in a picture or database. Which is "adapted" to the Boost.Geometry Point Concept (this is normally done by one Macro call).

The spatial index could then use the Point Concept of Boost.Geometry, and both libraries will work together happily.

Why would a person doing image processing or database development think that their need for spacial indexes are somehow not a need for computational geometry? I'm quite sure the database developers feel the spatial indexes they already have are superior to anything we are using in VLSI or GIS, and they are undoubtedly right to the extent that their data structures are better suited to the needs of their application. Their spacial indexes would likely not be suitable for VLSI compared to our own data structures. The performance of a spacial index is heavily dependent on the characteristics of the data it stores. I hope we all understand that up front because it will cut down on silly arguments about performance later on. If Adam said he was writing a database building blocks library that was going to feature spacial indexese tailored to meet the needs of database query then Boost.Geometry would be a bad place to put it. However, Adam is saying he plans to implement a generic spacial indexes library with several different types of data structures and wants it to be used in multiple application domains. Adam may not realize it, but computational geometry is not an application domain. There are many application domains where computational geometry is used. Does a user who wants to implement an algorithm for extracting features from an image or searching for data in a Geospacial database want to look in the geometry library to find something they both need? Maybe not, but it is better to put those things that are needed in several application domains in a common library which is meaninfully named and easy to find instead of having a separate application domain specific library for each type of user. Geometry is the right name for a library that contains spacial indexes because spacial indexes are computational geometry unless they are otherwise encumbered with domain specific jargon in the interfaces or tailored to domain specific needs. So now lets talk about software design. If spacial indexes will depend on the Boost.Geometry library for low level features like point concept and traits that we generally agree will be a good idea for interoperability and Boost.Geometry will depend on spacial indexes to implement high level algorithms like closest neighbor query (which they provide now, but would like to optimize by using a good spacial index) then spacial indexes should go in Boost.Geometry to avoid cyclic dependency between libraries. The alternative is to partition Geometry into algorithms that do depend on spacial indexes, which would go in the spacial indexes library, and algorithms that do not, which would stay in Geometry. Such a partition is abitrary and nonsensical. I don't think it would be inconvenient for Adam to work with the Geometry library authors and I think he could benefit from their experience and direct help with the implementation of his vision for a generic spacial indexes capability in boost. Given recent experience I don't think the community would accept spacial indexes as its own library now that Geometry has been accepted. Regards, Luke

Adam Wulkiewicz

12 Aug 12 Aug

2:05 p.m.

New subject: new library (space partitioning)

Simonson, Lucanus J wrote:

...

I don't think it would be inconvenient for Adam to work with the Geometry library authors and I think he could benefit from their experience and direct help with the implementation of his vision for a generic spacial indexes capability in boost. Given recent experience I don't think the community would accept spacial indexes as its own library now that Geometry has been accepted.

Yes, you're right. Thank you all for your advices. I'd be glad to work on Boost.Geometry. I've allready started to implement kd-tree using Geometry concepts. When it's finished I'll show it to Barend and see what he thinks about it. Regards, Adam

Barend Gehrels

13 Aug 13 Aug

10:18 a.m.

New subject: new library (space partitioning)

Hi Adam,

...

Yes, you're right. Thank you all for your advices. I'd be glad to work on Boost.Geometry. I've allready started to implement kd-tree using Geometry concepts. When it's finished I'll show it to Barend and see what he thinks about it. That's good news, welcome!

Regards, Barend

Simonson, Lucanus J

6:12 p.m.

New subject: new library (space partitioning)

Barend Gehrels wrote:

...

Hi Adam,

...
Yes, you're right. Thank you all for your advices. I'd be glad to work on Boost.Geometry. I've allready started to implement kd-tree using Geometry concepts. When it's finished I'll show it to Barend and see what he thinks about it. That's good news, welcome!

Adam will probably want to subscribe to the Geometry dev list. I would like to do so as well and join the project. What is the link to the Geometry project that we can take the next steps? Thanks, Luke

Barend Gehrels

14 Aug 14 Aug

8:43 a.m.

New subject: new library (space partitioning)

Hi Luke, Thanks for your comments and suggestions. Simonson, Lucanus J wrote:

...

Adam will probably want to subscribe to the Geometry dev list. I would like to do so as well and join the project. What is the link to the Geometry project that we can take the next steps?

Most information can be found here: http://trac.osgeo.org/ggl with also links to the mailing list, documentation and SVN repository. Regards, Barend

Arash Partow

9:53 a.m.

New subject: new library (space partitioning)

On 13/08/2010 12:05 AM, Adam Wulkiewicz wrote:

...

I've allready started to implement kd-tree using Geometry concepts. When it's finished I'll show it to Barend and see what he thinks about it.

Could you please clarify: Does this mean if I just want to use the kd-tree (not necessarily for anything comp-geo related), I'll now have a dependency of boost.geometry?

Adam Wulkiewicz

15 Aug 15 Aug

9:46 a.m.

New subject: new library (space partitioning)

Arash Partow wrote:

...

On 13/08/2010 12:05 AM, Adam Wulkiewicz wrote:

...
I've allready started to implement kd-tree using Geometry concepts. When it's finished I'll show it to Barend and see what he thinks about it.

Could you please clarify: Does this mean if I just want to use the kd-tree (not necessarily for anything comp-geo related), I'll now have a dependency of boost.geometry?

Yes. Kd-tree will be probably somewhere in boost::geometry namespace and to use it you'll have to use point which has boost::geometry::point interface. If you didn't want to use geometry::point, you could just adapt it.

Mathias Gaunard

14 Aug 14 Aug

10:41 a.m.

New subject: new library (space partitioning)

09/08/2010 17:35, Adam Wulkiewicz wrote:

...

Yes, you're right. Although, there may be more than one allocators used in the tree building algorithm. There may be allocators for tree nodes, temporary objects at each recursive step and for objects in leafs.

I don't see why you'd need dynamic memory allocation other than for the nodes, assuming you're storing values and not not trying to store polymorphic objects through pointers.

Adam Wulkiewicz

15 Aug 15 Aug

10:07 a.m.

New subject: new library (space partitioning)

Mathias Gaunard wrote:

...

09/08/2010 17:35, Adam Wulkiewicz wrote:

...
Yes, you're right. Although, there may be more than one allocators used in the tree building algorithm. There may be allocators for tree nodes, temporary objects at each recursive step and for objects in leafs.

I don't see why you'd need dynamic memory allocation other than for the nodes, assuming you're storing values and not not trying to store polymorphic objects through pointers.

It strongly depends on used algorithm if you need it or not. If you use some kind of internal, temporary objects you need it. See e.g. http://www.cgg.cvut.cz/members/havran/ARTICLES/ingo06rtKdtree.pdf or http://kesen.realtimerendering.com/Intel-EG07.pdf

Mathias Gaunard

4:25 p.m.

New subject: new library (space partitioning)

Le 15/08/2010 11:07, Adam Wulkiewicz a écrit :

...

It strongly depends on used algorithm if you need it or not. If you use some kind of internal, temporary objects you need it. See e.g. http://www.cgg.cvut.cz/members/havran/ARTICLES/ingo06rtKdtree.pdf or http://kesen.realtimerendering.com/Intel-EG07.pdf=

I may be missing something here, but I don't see why the temporary objects in question cannot be stored on the execution stack.

Adam Wulkiewicz

6:49 p.m.

New subject: new library (space partitioning)

Mathias Gaunard wrote:

...

Le 15/08/2010 11:07, Adam Wulkiewicz a écrit :

...
It strongly depends on used algorithm if you need it or not. If you use some kind of internal, temporary objects you need it. See e.g. http://www.cgg.cvut.cz/members/havran/ARTICLES/ingo06rtKdtree.pdf or http://kesen.realtimerendering.com/Intel-EG07.pdf=

I may be missing something here, but I don't see why the temporary objects in question cannot be stored on the execution stack.

Implementing dynamic memory allocation on the stack could be quite tricky.

Barend Gehrels

16 Aug 16 Aug

8:04 a.m.

New subject: new library (space partitioning)

hi Adam,

...

...
Le 15/08/2010 11:07, Adam Wulkiewicz a écrit :

...
It strongly depends on used algorithm if you need it or not. If you use some kind of internal, temporary objects you need it. See e.g. http://www.cgg.cvut.cz/members/havran/ARTICLES/ingo06rtKdtree.pdf or http://kesen.realtimerendering.com/Intel-EG07.pdf=

I may be missing something here, but I don't see why the temporary objects in question cannot be stored on the execution stack.

Implementing dynamic memory allocation on the stack could be quite tricky.

Thanks for the links. In Boost.Geometry we do not allocate dynamic memory ourselves; all polygons, rings, lines, multi-polygons etc are stored as an std:: container, which can be selected at compile-time by the library user (as a template-template-parameter). Besides that it is not part of the concept so library users might select another solution. Don't know if this is possible in this case, but if it is possible, I would prefer that approach. Regards, Barend

Adam Wulkiewicz

10:27 a.m.

New subject: new library (space partitioning)

W dniu 2010-08-16 10:04, Barend Gehrels napisal(a):

...

hi Adam,

...
...
Le 15/08/2010 11:07, Adam Wulkiewicz a écrit :

...
It strongly depends on used algorithm if you need it or not. If you use some kind of internal, temporary objects you need it. See e.g. http://www.cgg.cvut.cz/members/havran/ARTICLES/ingo06rtKdtree.pdf or http://kesen.realtimerendering.com/Intel-EG07.pdf=

I may be missing something here, but I don't see why the temporary objects in question cannot be stored on the execution stack.

Implementing dynamic memory allocation on the stack could be quite tricky.

Thanks for the links.

In Boost.Geometry we do not allocate dynamic memory ourselves; all polygons, rings, lines, multi-polygons etc are stored as an std:: container, which can be selected at compile-time by the library user (as a template-template-parameter). Besides that it is not part of the concept so library users might select another solution.

Don't know if this is possible in this case, but if it is possible, I would prefer that approach.

Regards, Barend

I move this discusion to the ggl list. Regards, Adam

Adam Wulkiewicz

30 Jul 30 Jul

10:29 p.m.

New subject: new library (space partitioning)

Barend Gehrels wrote:

...

Hi Adam,

First I'm very interested in your library.

Is it already uploaded into Vault? Could you publish the URL? If you didn't upload it already, you might consider to use the Boost SVN sandbox instead.

...
I think that the library I'd like to develop don't fit in Boost.Geometry. The purposes of both are different. Space partitioning (in mathematical meaning), is a separate problem. Functionality like this may be used in library like Boost.Geometry but might be used with a large number of different libraries as well. As I see it, it should be a set of general, n-dimensional solutions. Therefore I think, that it should be a separate library.

I probably agree with this. However, I would like to investigate if a coupling of these two libraries is possible, to provide the users of Boost.Geometry with a good spatial index which works for them by default and which can thus also be used for other purposes. So Boost.Geometry might follow your library, or you might use some concepts of Boost.Geometry, but I would like to see the sources to say more about this.

First of all, I must explain myself. I walk in here and write that I'd like to desing a library. I don't want you to change something. I'd like to design a library which you might use if you haven't implement some container by yourself. Moreover, I think it should be separate from Boost.Geometry. What do you think about it? Have you any objections? Maby you have some plans and want to do it by yourself? If no, I think, it's the best to use concepts from Boost.Geometry because it's an existing library. Or to use some adaptor. But the most important thing is that if you want to make building of arbitrary kd-tree possible boost::geometry::point should provide a method returning given coordinate not in compile-time. Still, it is possible to create kd-tree using coordinates indexed in compile-time and I don't know if it will be a lot worse than the other possible trees. The problem may occur for volume objects. I might implement simple version of kd-tree for points using your interface and show it to you if you like. Later I'd like to use adaptors and to implement some tree building traits. Different building algorithms might use different interfaces (compile-time or not). Regards, Adam

Simonson, Lucanus J

10:57 p.m.

New subject: new library (space partitioning)

Adam Wulkiewicz wrote:

...

But the most important thing is that if you want to make building of arbitrary kd-tree possible boost::geometry::point should provide a method returning given coordinate not in compile-time.

This did come up in design discussions and I recall someone stating the need citing specifically nD spatial query as a place where they need it. The desire to make the interface minimal means that as a practical matter the designer of the interface needs to choose either runtime or compile time. Boost.Geometry chose compile time, I chose run time. The argument against runtime is that it may cause access to be slow due to unfortunate things like: struct pt { double x, y, z; }; double get_coord(const pt& p, unsigned int axis) { if(axis == 0) return p.x; if(axis == 1) return p.y; return p.z; } double coord = get_coord(pt, 0); //does the if statement compile away? I argue that the accessor should inline and constant propigation will cause the if statement to dissolve away resulting in zero runtime overhead. Sadly, this if tree type thing is what you have to write to use the compile time accessor with a runtime parameter, which the compiler can't compile away, in general if the runtime parameter is not a constant. double get_coord(const pt& p, unsigned int axis) { if(axis == 0) return get<0>(p); if(axis == 1) return get<1>(p); return get<2>(p); } Even though point may be defined as: typedef boost_array<double, 3> pt; and provide an operator[] you end up paying for branch or jump table access to its coordinates in addition to the array lookup for no good reason. Thinking through what the compiler would have to do to optimize away the logic tree or switch statement I find it pretty unlikely. I don't think it is too late for Boost.Geometry to change their interface, but I do recognize that it entails a lot of work. I encapsulate the use of the adaptor in a free function so that I can change the interface and don't need to change every place where I use it, but I don't know if Boost.Geometry uses the adaptor directly everywhere or not. Regards, Luke

Adam Wulkiewicz

7 Aug 7 Aug

1:03 p.m.

New subject: new library (space partitioning)

Hi, I couldn't decide what point concept is the best so I've implemented kd-tree building for 3 kind of points to perform some tests and to work out the basic structure of the library, data types adaptors etc. These are 'internal' point concepts: 1. compile-time point point_category = compiletime_point_tag coordinate_type coordinate_count coordinate_type const & get<K>() void set<K>(v) 2. runtime-time point point_category = runtime_point_tag coordinate_type coordinate_count coordinate_type const & get(k) void set(k, v) 3. dynamic point point_category = dynamic_point_tag coordinate_type coordinate_type const & get(k) void set(k, v) coordinate_count() I don't know which one is the best for space partitioning (ct or rt). There are some 'run-time' data structures (different BSP-tree, including kd-tree) and there are 'compile-time' data structures (quadtree/octree/..., regular grid, hierarchical grid, BVH tree). 'Run-time' and 'compile-time' means specific access to the coordinates. Building of all data structures is performed in run-time because coordinates values are set in runtime. Kd-tree (kdtree.hpp) has 2 template parameters Point and PointAdaptor. The default PointAdaptor is point_traits<Point>::point_adaptor. One may write adaptor modeling any of concepts above. There are some adaptors written allready (types.hpp). And if he doesn't want to pass PointAdaptor each time he creates kd-tree, he may specialize point_traits. So, kd-tree should work for: user-defined types boost::geometry - concept 1 boost::array - concept 2 std::vector, std::deque, uBlas vectors - concept 3 Example: Lets assume that point is defined as follows: template <typename V, size_t D> class runtime_point { public: typedef V coordinate_type; static const size_t coordinate_count = D; inline coordinate_type const& operator[](size_t i); inline coordinate_type & operator[](size_t i); private: coordinate_type c[coordinate_count]; }; The adaptor looks like this: template <typename Point> struct runtime_point_indexop : private Point { typedef runtime_point_tag point_category; typedef typename Point::coordinate_type coordinate_type; static const size_t coordinate_count = Point::coordinate_count; inline runtime_point_indexop() {} inline runtime_point_indexop(Point const& pt) : Point(pt) {} inline coordinate_type const& get(size_t i) const { return Point::operator[](i); } inline void set(size_t i, coordinate_type const& v) { Point::operator[](i) = v; } }; Btw, it's the default adaptor. One might write: typedef runtime_point<float, 2> rtp; std::vector<rtp> rtv; // ... // space::kdtree<rtp> rtkdt(rtv.begin(), rtv.end()); Or explicitly: typedef space::runtime_point_indexop<rtp> rtpadapt; space::kdtree<rtp, rtpadapt> rtkdt(rtv.begin(), rtv.end()); Or specialize point_traits in the first place: namespace space { template <typename T> struct point_traits<rtp> { typedef runtime_point_indexop<rtp> point_adaptor; }; } And then: space::kdtree<rtp> rtkdt(rtv.begin(), rtv.end()); In main.cpp there are: - 2 user-defined point types defined: * compiletime_point (boost::geometry::point interface) * runtime_point (point with operator[] instead of get/set) - point_traits specializations for: * boost::array * std::vector - kd-tree build time tests On my computer building takes (Core 2 Duo 2GHz, Release build, 50000 points, 2 coordinates per point, floats, full optimization): MSVC10 GCC3.4.5 runtime_point 0.018s 0.023s compiletime_point 0.017s 0.017s boost::array 0.019s 0.022s std::vector 9.5s 0.278s runtime_point adapted from 0.022s 0.023s compiletime_point compiletime_point adapted from 0.017s 0.017s runtime_point Btw, the results for std::vector are wierd. I think that it is nice to have space partitioning working for many point types (even for std::vectors, uBlas etc). But there must be 3 different kind of algorithms for some cases. It's not bad to convert from one representation to another. It's obvious that it's the best to have algorithms working on compile-time points and convert run-time points when necessary. What do you think about it? And what do you think about my data types, adaptors etc? Regards, Adam

Barend Gehrels

2 Aug 2 Aug

9:13 a.m.

New subject: new library (space partitioning)

...

First of all, I must explain myself. I walk in here and write that I'd like to desing a library. I don't want you to change something. I'd like to design a library which you might use if you haven't implement some container by yourself. Moreover, I think it should be separate from Boost.Geometry. What do you think about it? Have you any objections? Maby you have some plans and want to do it by yourself? Yes, I think that it is good to be separate, and I don't have objections. What you said, it can be used for non-geometry as well.

And yes, we did have plans, actually we do have a spatial index based on R-Tree which lives as an extension within Boost.Geometry. I don't think it inconvenient to have different (spatial) indexes, I think it is good. However, it would be very convenient to make the interfaces similar (or equal). The interface of the current R-Tree still has to be redesigned so it is not too late, for us.

...

If no, I think, it's the best to use concepts from Boost.Geometry because it's an existing library. Or to use some adaptor.

Yes, I would advocate using our concepts. The point concept is not complicated and points are easy to implement and adapt to our concepts. The point you have in the library I saw last week can be adapted as well.

...

But the most important thing is that if you want to make building of arbitrary kd-tree possible boost::geometry::point should provide a method returning given coordinate not in compile-time.

See also Luke's response on this. We have compile time access to select by coordinate index but, of course, you could add a wrapper in between translating from runtime to compiletime.

...

Still, it is possible to create kd-tree using coordinates indexed in compile-time and I don't know if it will be a lot worse than the other possible trees. The problem may occur for volume objects.

I think it would be good.

...

I might implement simple version of kd-tree for points using your interface and show it to you if you like.

...

Later I'd like to use adaptors and to implement some tree building traits. Different building algorithms might use different interfaces (compile-time or not). We have for several free functions also different internal algorithms, which can be specified as an extra argument. We call that a strategy,

Yes, I would like that :-) there are concepts for different strategies (distance etc.). You might look if such a system would be convenient for you as well. Regards, Barend

Simonson, Lucanus J

28 Jul 28 Jul

4:03 p.m.

New subject: new library (space partitioning)

Adam Wulkiewicz wrote:

...

I'd rather choose option number one (thank you again) but I'll wait a while for the feedback from the comunity.

You should search the archives of the dev list because we have discussed spatial indexing several times in the context of a failed GSOC project from 2006 and the design of my library as well as Boost.Geometry. Both libraries were criticized during review for lacking spatial query data structures. Extention to n-dimensions was discussed at length several times. There is no doubt that there is interest in a library such as you propose. At this stage we would probably like to see your code and also have you describe the interfaces you have in mind along with example code for how they would be used. The majority of interest will be in the interfaces and how the data structures are used and not the internal details of how they are implemented. Regards, Luke

Christian Holmquist

27 Jul 27 Jul

2:20 a.m.

New subject: new library (space partitioning)

...

I've implemented: - k-dimensional generalizations of STL containers (used by some space partitioning data structures) - k-dimensional space partitioning data structures - regular grid - kd-tree - something I've called regular tree, it's k-dimensional generalization of quadtree/octree, so for k=2 it's quadtree, for k=3 it's octree, for k=4 it's 16tree and so on. - k-dimensional euclidean algebra/geometry data structures - points, vectors etc., - unit tests.

This would be immensely useful. Is your library made available online somewhere already? Cheers, Christian

Adam Wulkiewicz

8:59 p.m.

New subject: new library (space partitioning)

Christian Holmquist wrote:

...

This would be immensely useful. Is your library made available online somewhere already?

Not yet, but I think, I'll put it in the Vault for a while. Regards, Adam

Demian Nave

28 Jul 28 Jul

3:08 a.m.

New subject: new library (space partitioning)

On 7/26/2010 4:35 PM, Adam Wulkiewicz wrote:

...

Hi,

It's my first message so I'd like to introduce myself. My name's Adam Wulkiewicz, I'm a doctoral student of Technical University of Lodz, Poland.

I'd like to take part in evolution of Boost library. I've implemented some space partitioning data structures and k-dimensional generalizations of STL containers and thought that since these kind of containers are useful to many people it would be nice to have them in Boost. [...]

Adam, I'd also be interested in this sort of library, especially if the data structures can be used with arbitrary geometry types given the appropriate operators. I'm particularly interested in generic data structures that can be extended to implement space partitioning of NURBS curves and surfaces. Cheers, Demian

Thorsten Ottosen

8:27 a.m.

New subject: new library (space partitioning)

Adam Wulkiewicz skrev:

...

Hi,

...

I've implemented: - k-dimensional generalizations of STL containers (used by some space partitioning data structures) - k-dimensional space partitioning data structures - regular grid - kd-tree - something I've called regular tree, it's k-dimensional generalization of quadtree/octree, so for k=2 it's quadtree, for k=3 it's octree, for k=4 it's 16tree and so on. - k-dimensional euclidean algebra/geometry data structures - points, vectors etc., - unit tests.

I'm very interested as such data-structures also play a role in AI and clustering/data mining. You may also consider BSP trees now that you are at it. -Thorsten

Christoph Heindl

9:20 a.m.

New subject: new library (space partitioning)

Adam,

...

I've implemented: - k-dimensional generalizations of STL containers (used by some space partitioning data structures) - k-dimensional space partitioning data structures - regular grid - kd-tree - something I've called regular tree, it's k-dimensional generalization of quadtree/octree, so for k=2 it's quadtree, for k=3 it's octree, for k=4 it's 16tree and so on. - k-dimensional euclidean algebra/geometry data structures - points, vectors etc., - unit tests.

There are boost-like names used in code allready. If you'd like to know more just ask.

I'm interested too. I might be able to contribute as I'm using space partitioning structures on an every day basis. Here are a couple of questions - what kind of split policies are implemented for the k-d tree? - how do you cope with unbalanced trees caused by dynamic insertion/deletion of points? - what queries are supported (i.e nearest neighbors, arbitrary volumes, parametric volumes) ? - do you offer incremental queries that can be stopped before evaluating the entire result set ? - did you consider to implement parallel queries ? - where's the code? Best regards, Christoph

Gaetano Mendola

26 Jul 26 Jul

11:08 p.m.

On 07/26/2010 08:52 PM, Eric Niebler wrote:

...

In this case, you would open a feature request on svn.boost.org ("New Ticket"), assign to me, and attach your patch. If it looks good, I'll merge it. Your patch should have docs and tests. Don't worry about formatting the docs. I can do that. I wouldn't ask you to set up the doc build chain; it's a headache.

Done. Regards Gaetano Mendola

Eric Niebler

27 Jul 27 Jul

1:04 p.m.

On 7/26/2010 7:08 PM, Gaetano Mendola wrote:

...

On 07/26/2010 08:52 PM, Eric Niebler wrote:

...
In this case, you would open a feature request on svn.boost.org ("New Ticket"), assign to me, and attach your patch. If it looks good, I'll merge it. Your patch should have docs and tests. Don't worry about formatting the docs. I can do that. I wouldn't ask you to set up the doc build chain; it's a headache.

Done.

(For those playing along at home, the ticket is here: https://svn.boost.org/trac/boost/ticket/4471) Thanks. This needs documentation, though. Like I said, I don't care about formatting, just tell what what text you'd like to see inserted where in the docs and I'll do it. -- Eric Niebler BoostPro Computing http://www.boostpro.com

Gaetano Mendola

5:14 p.m.

On 07/27/2010 03:04 PM, Eric Niebler wrote:

...

just tell what what text you'd like to see inserted where in the docs and I'll do it.

At this page: http://www.boost.org/doc/libs/1_43_0/doc/html/accumulators/user_s_guide.html... I would add a link sum_kahan pointing to this: =================================================== sum_kahan Implements the summation kahan's algorithm, reducing the accumulation error in case of floating point types. Result Type sample-type Depends On none Variants none Initialization Parameters none Accumulator Parameters none Extractor Parameters none Accumulator Complexity O(1), however compared to naive sum this accumulator performs 4 sum operations for each element accumulated Extractor Complexity O(1) Header #include <boost/accumulators/statistics/sum_kahan.hpp> Example accumulator_set<float, stats<tag::sum_kahan> > acc; BOOST_CHECK_EQUAL(0.0f, sum_kahan(acc)); for (size_t i = 0; i < 1e6; ++i) { acc(1e-6f); } BOOST_CHECK_EQUAL(1.0f, sum_kahan(acc)); ========================================================= I'm not sure if this was what you meant by documentation. Regards Gaetano Mendola

Gaetano Mendola

10 Apr 10 Apr

3:48 p.m.

On 27/07/2010 15:04, Eric Niebler wrote:

...

On 7/26/2010 7:08 PM, Gaetano Mendola wrote:

...
On 07/26/2010 08:52 PM, Eric Niebler wrote:

...
In this case, you would open a feature request on svn.boost.org ("New Ticket"), assign to me, and attach your patch. If it looks good, I'll merge it. Your patch should have docs and tests. Don't worry about formatting the docs. I can do that. I wouldn't ask you to set up the doc build chain; it's a headache.

Done.

(For those playing along at home, the ticket is here: https://svn.boost.org/trac/boost/ticket/4471)

Thanks. This needs documentation, though. Like I said, I don't care about formatting, just tell what what text you'd like to see inserted where in the docs and I'll do it.

Hi all, I see the ticket for the kahan accumulator: https://svn.boost.org/trac/boost/ticket/4471 is still "open", do you need something else to be done? Regards Gaetano Mendola

Eric Niebler

12 Apr 12 Apr

5:36 a.m.

New subject: [accumulators] maintainer needed (was: accurate sum accumulator (kahan))

I confess I haven't given this library the attention it needs. It's due to the fact that I have many competing demands for my time, and also because I'm not actually a statistician -- Matthias Troyer did that part, but I know he's busy, too -- so I don't really feel qualified to make certain changes. This is an uncomfortable position to be in, so I figured I'd ask for help. So, is there anybody with a mathematical bent and an interest in Boost.Accumulators who would like to help out with the backlog of bugs and patches to this library? I would stay on as assistant maintainer to help with the tricky meta-programming. Gaetano, see below. On 4/10/2011 10:48 PM, Gaetano Mendola wrote:

...

On 27/07/2010 15:04, Eric Niebler wrote:

...
On 7/26/2010 7:08 PM, Gaetano Mendola wrote:

...
On 07/26/2010 08:52 PM, Eric Niebler wrote:

...
In this case, you would open a feature request on svn.boost.org ("New Ticket"), assign to me, and attach your patch. If it looks good, I'll merge it. Your patch should have docs and tests. Don't worry about formatting the docs. I can do that. I wouldn't ask you to set up the doc build chain; it's a headache.

Done.

(For those playing along at home, the ticket is here: https://svn.boost.org/trac/boost/ticket/4471)

Thanks. This needs documentation, though. Like I said, I don't care about formatting, just tell what what text you'd like to see inserted where in the docs and I'll do it.

Hi all, I see the ticket for the kahan accumulator:

https://svn.boost.org/trac/boost/ticket/4471

is still "open", do you need something else to be done?

I don't think you ever responded to my request to add the needed docs to the patch, did you? -- Eric Niebler BoostPro Computing http://www.boostpro.com

Gaetano Mendola

11:09 a.m.

New subject: [accumulators] maintainer needed (was: accurate sum accumulator (kahan))

My reply at the end. On Tue, Apr 12, 2011 at 7:36 AM, Eric Niebler <eric@boostpro.com> wrote:

...

I confess I haven't given this library the attention it needs. It's due to the fact that I have many competing demands for my time, and also because I'm not actually a statistician -- Matthias Troyer did that part, but I know he's busy, too -- so I don't really feel qualified to make certain changes. This is an uncomfortable position to be in, so I figured I'd ask for help. So, is there anybody with a mathematical bent and an interest in Boost.Accumulators who would like to help out with the backlog of bugs and patches to this library? I would stay on as assistant maintainer to help with the tricky meta-programming.

Gaetano, see below.

On 4/10/2011 10:48 PM, Gaetano Mendola wrote:

...
On 27/07/2010 15:04, Eric Niebler wrote:

...
On 7/26/2010 7:08 PM, Gaetano Mendola wrote:

...
On 07/26/2010 08:52 PM, Eric Niebler wrote:

...
In this case, you would open a feature request on svn.boost.org ("New Ticket"), assign to me, and attach your patch. If it looks good, I'll merge it. Your patch should have docs and tests. Don't worry about formatting the docs. I can do that. I wouldn't ask you to set up the doc build chain; it's a headache.

Done.

(For those playing along at home, the ticket is here: https://svn.boost.org/trac/boost/ticket/4471)

Thanks. This needs documentation, though. Like I said, I don't care about formatting, just tell what what text you'd like to see inserted where in the docs and I'll do it.

Hi all, I see the ticket for the kahan accumulator:

https://svn.boost.org/trac/boost/ticket/4471

is still "open", do you need something else to be done?

I don't think you ever responded to my request to add the needed docs to the patch, did you?

Yes I did in that thread. Gaetano -- cpp-today.blogspot.com

Francesco Basile

27 Apr 27 Apr

10:17 a.m.

New subject: [accumulators] maintainer needed

Erich, I both have an interest Boost.Accumulator and I have a mathematical-physics background. I'm not an expert in meta-programming, but, thanks to your help offer, it should not be a problem. Francesco On 04/12/2011 07:36 AM, Eric Niebler wrote:

...

I confess I haven't given this library the attention it needs. It's due to the fact that I have many competing demands for my time, and also because I'm not actually a statistician -- Matthias Troyer did that part, but I know he's busy, too -- so I don't really feel qualified to make certain changes. This is an uncomfortable position to be in, so I figured I'd ask for help. So, is there anybody with a mathematical bent and an interest in Boost.Accumulators who would like to help out with the backlog of bugs and patches to this library? I would stay on as assistant maintainer to help with the tricky meta-programming.

Gaetano, see below.

On 4/10/2011 10:48 PM, Gaetano Mendola wrote:

...
On 27/07/2010 15:04, Eric Niebler wrote:

...
On 7/26/2010 7:08 PM, Gaetano Mendola wrote:

...
On 07/26/2010 08:52 PM, Eric Niebler wrote:

...
In this case, you would open a feature request on svn.boost.org ("New Ticket"), assign to me, and attach your patch. If it looks good, I'll merge it. Your patch should have docs and tests. Don't worry about formatting the docs. I can do that. I wouldn't ask you to set up the doc build chain; it's a headache.

Done.

(For those playing along at home, the ticket is here: https://svn.boost.org/trac/boost/ticket/4471)

Thanks. This needs documentation, though. Like I said, I don't care about formatting, just tell what what text you'd like to see inserted where in the docs and I'll do it.

Hi all, I see the ticket for the kahan accumulator:

https://svn.boost.org/trac/boost/ticket/4471

is still "open", do you need something else to be done?

I don't think you ever responded to my request to add the needed docs to the patch, did you?

Tomas Puverle

26 Jul 26 Jul

7:09 p.m.

...

PS As for naming, I'd vote for sum and accurate_sum (to keep backward compatibility).

Please don't call them "accurate_sum" and "quick_sum" (as proposed by someone else). There are several different types of "accurate sum" algorithms with different runtime trade-offs. 'sum' in itself is as accurate as it gets if all the numbers are normalized in some way. As this is frequently required for the overall computation, it would be wasteful to compute the secondary accumulator. There are other scenarios where a plain 'sum' is good enough. Thanks, Tom

5179

Age (days ago)

5455

Last active (days ago)

List overview

Download

68 comments

21 participants

participants (21)

Adam Wulkiewicz
Arash Partow
Barend Gehrels
Christian Holmquist
Christoph Heindl
David Abrahams
Demian Nave
Eric Niebler
Francesco Basile
Gaetano Mendola
Giovanni Piero Deretta
Jeffrey Lee Hellrung, Jr.
John Maddock
Mathias Gaunard
Matthias Troyer
OvermindDL1
Paul A. Bristow
Simonson, Lucanus J
Thorsten Ottosen
Tomas Puverle
Topher Cooper