[sorting] Implementation of histogram sort

Does the proposed Boost sorting library contain an implementation of histogram sort or some other in-place variant of bucket sort (American flag sort, etc)? I would like to use an integer sort for a problem I have but need to save as much memory as possible. Having one element of temporary storage per bucket is fine, though. -- Jeremiah Willcock

Hi, Are radix sort and spread sort inadequate? -Edouard -----Original Message----- From: boost-bounces@lists.boost.org [mailto:boost-bounces@lists.boost.org] On Behalf Of Jeremiah Willcock Sent: lundi 11 mai 2009 19:17 To: boost@lists.boost.org Subject: [boost] [sorting] Implementation of histogram sort Does the proposed Boost sorting library contain an implementation of histogram sort or some other in-place variant of bucket sort (American flag sort, etc)? I would like to use an integer sort for a problem I have but need to save as much memory as possible. Having one element of temporary storage per bucket is fine, though. -- Jeremiah Willcock _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost __________ Information from ESET NOD32 Antivirus, version of virus signature database 4068 (20090512) __________ The message was checked by ESET NOD32 Antivirus. http://www.eset.com __________ Information from ESET NOD32 Antivirus, version of virus signature database 4068 (20090512) __________ The message was checked by ESET NOD32 Antivirus. http://www.eset.com

The three included algorithms area a hybrid of in-place histogram bucket sorting and std::sort. integer_sort and float_sort use the spreadsort algorithm. string_sort has strong similarities to American flag sort; the primary differences are: 1) it uses std::sort as its fallback comparison sort (instead of insertion sort) and switches to comparison-based sorting at 256 elements, as opposed to American flag sort's 16 2) It does not bother trying to determine the maximum and minimum buckets (always uses 256 + empty), to avoid the overhead of a linear search through all elements 3) An optimization for long identical substrings, and also a comparison method that only looks at the portion of the string that is unsorted Worst-case runtime and memory usage is of the same order as American flag sort. Maximum memory overhead for integer_sort/float_sort is on the order of (2 to the power MAX_SPLITS) * (bit_length/MAX_SPLITS), which for MAX_SPLITS of 11 and 32-bit integers, works out to some kilobytes (roughly 64kB, on a 64-bit system is my estimate, not measured). They share the same basic algorithm, though float_sort has some additional code that handles casting floats to integers for more efficient sorting. For integer_sort and float_sort you may want to try different values of MAX_SPLITS to see which one works best for your processor; the default value of 11 works reasonably well on various systems, but the value of MAX_SPLITS can have a large performance impact. On Mon, May 11, 2009 at 10:16 AM, Jeremiah Willcock <jewillco@osl.iu.edu>wrote:
Does the proposed Boost sorting library contain an implementation of histogram sort or some other in-place variant of bucket sort (American flag sort, etc)? I would like to use an integer sort for a problem I have but need to save as much memory as possible. Having one element of temporary storage per bucket is fine, though.
-- Jeremiah Willcock _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
participants (3)
-
Edouard A.
-
Jeremiah Willcock
-
Steven Ross