Review Wizard Status Report for June 2009

============================================== Review Wizard Status Report for June 2009 ============================================== News ==== Futures: Williams variant Accepted; Gaskill variant Rejected Boost 1.38 Released New Libraries: Revised Libraries: Boost.Range Extension Accepted Polynomial Library Rejected Boost 1.39 Released Constrained Value Review - Review Result Pending Older Issues ============ The Time Series Library, accepted in August 2007, has not yet been submitted to SVN. Eric Niebler and John Phillips are working on making the changes suggested during the review. The Floating Point Utilities Library, has not yet been submitted to SVN. It is slated to be integrated with the Boost.Math library. The Switch Library, accepted provisionally in January 2008, has not yet been submitted for mini-review and full acceptance. The Phoenix Library, accepted provisionally in September 2008, has not yet been submitted for mini-review and full acceptance. A rewrite of Phoenix, basing it on the Proto metaprogramming library, has just begun. General Announcements ===================== As always, we need experienced review managers. The review queue has been growing substantially but we have had few volunteers, so manage reviews if possible and if not please make sure to watch the review schedule and participate. Please take a look at the list of libraries in need of managers and check out their descriptions. In general review managers are active boost participants or library contributors. If you can serve as review manager for any of them, email Ron Garcia or John Phillips, "garcia at osl dot iu dot edu" and "phillips at mps dot ohio-state dot edu" respectively. We are also suffering from a lack of reviewers. While we all understand time pressures and the need to complete paying work, the strength of Boost is based on the detailed and informed reviews submitted by you. A recent effort is trying to secure at least five people who promise to submit reviews as a precondition to starting the review period. Consider volunteering for this and even taking the time to create the review as early as possible. No rule says you can only work on a review during the review period. A link to this report will be posted to www.boost.org. If you would like us to make any modifications or additions to this report before we do that, please email Ron or John. If you're a library author and plan on submitting a library for review in the next 3-6 months, send Ron or John a short description of your library and we'll add it to the Libraries Under Construction below. We know that there are many libraries that are near completion, but we have hard time keeping track all of them. Please keep us informed about your progress. Review Queue ============ * Lexer * Shifted Pointer * Logging * Log * Join * Pimpl * Thread Pool * Endian * Meta State Machine * Conversion * Sorting * GIL.IO * AutoBuffer * String Convert -------------------- Lexer ----- :Author: Ben Hanson :Review Manager: Eric Neibler :Download: `Boost Vault <http://boost-consulting.com/vault/index.php?action=downloadfile&filename=boost.lexer.zip&directory=Strings%20-%20Text%20Processing
`__
:Description: A programmable lexical analyser generator inspired by 'flex'. Like flex, it is programmed by the use of regular expressions and outputs a state machine as a number of DFAs utilising equivalence classes for compression. Shifted Pointer --------------- :Author: Phil Bouchard :Review Manager: Needed :Download: `Boost Vault <http://www.boost-consulting.com/vault/index.php?&direction=0&order=&directory=Memory
`__
:Description: Smart pointers are in general optimized for a specific resource (memory usage, CPU cycles, user friendliness, ...) depending on what the user need to make the most of. The purpose of this smart pointer is mainly to allocate the reference counter (or owner) and the object itself at the same time so that dynamic memory management is simplified thus accelerated and cheaper on the memory map. Logging ------- :Author: John Torjo :Review Manager: Gennadiy Rozental :Download: http://torjo.com/log2/ :Description: Used properly, logging is a very powerful tool. Besides aiding debugging/testing, it can also show you how your application is used. The Boost Logging Library allows just for that, supporting a lot of scenarios, ranging from very simple (dumping all to one destination), to very complex (multiple logs, some enabled/some not, levels, etc). It features a very simple and flexible interface, efficient filtering of messages, thread-safety, formatters and destinations, easy manipulation of logs, finding the best logger/filter classes based on your application's needs, you can define your own macros and much more! Log --- :Author: Andrey Semashev :Review Manager: Needed :Download: `Boost Vault <http://tinyurl.com/cm9lum>`__ :Description: The library is aimed to help adding logging features to applications. It provides out-of-box support for many widely used capabilities, such as formatting and filtering based on attributes, sending logs to a syslog server or to Windows Event Log, or simply storing logs into files. It also provides basic support for the library initialization from a settings file. The library can also be used for a wider range of tasks and implement gathering and processing statistical information or notifying user about application events. Join ---- :Author: Yigong Liu :Review Manager: Needed :Download: http://channel.sourceforge.net/ :Description: Join is an asynchronous, message based C++ concurrency library based on join calculus. It is applicable both to multi-threaded applications and to the orchestration of asynchronous, event-based applications. It follows Comega's design and implementation and builds with Boost facilities. It provides a high level concurrency API with asynchronous methods, synchronous methods, and chords which are "join-patterns" defining the synchronization, asynchrony, and concurrency. Pimpl ----- :Author: Vladimir Batov :Review Manager: Needed :Download: | `Boost Vault <http://www.boost-consulting.com/vault/index.php?action=downloadfile&filename=Pimpl.zip&directory=&
`__ | http://www.ddj.com/cpp/205918714 (documentation)
:Description: The Pimpl idiom is a simple yet robust technique to minimize coupling via the separation of interface and implementation and then implementation hiding. This library provides a convenient yet flexible and generic deployment technique for the Pimpl idiom. It's seemingly complete and broadly applicable, yet minimal, simple and pleasant to use. Thread Pool ----------- :Author: Oliver Kowalke :Review Manager: Needed :Download: `Boost Vault <http://www.boostpro.com/vault/index.php?action=downloadfile&filename=boo...
`__
:Description: The library provides: - thread creation policies: determines the management of worker threads: - fixed set of threads in pool - create workerthreads on demand (depending on context) - let worker threads ime out after certain idle time - channel policies: manages access to queued tasks: - bounded channel with high and low watermark for queuing tasks - unbounded channel with unlimited number of queued tasks - rendezvous syncron hand-over between producer and consumer threads - queueing policy: determines how tasks will be removed from channel: - FIFO - LIFO - priority queue (attribute assigned to task) - smart insertions and extractions (for instance remove oldest task with certain attribute by newst one) - tasks can be chained and lazy submit of taks is also supported (thanks to Braddocks future library). - returns a task object from the submit function. The task it self can be interrupted if its is cooperative (means it has some interruption points in its code -> ``this_thread::interruption_point()`` ). Endian ------ :Author: Beman Dawes :Review Manager: Needed :Download: http://mysite.verizon.net/beman/endian-0.10.zip :Description: Header boost/integer/endian.hpp provides integer-like byte-holder binary types with explicit control over byte order, value type, size, and alignment. Typedefs provide easy-to-use names for common configurations. These types provide portable byte-holders for integer data, independent of particular computer architectures. Use cases almost always involve I/O, either via files or network connections. Although data portability is the primary motivation, these integer byte- holders may also be used to reduce memory use, file size, or network activity since they provide binary integer sizes not otherwise available. Meta State Machine ------------------ :Author: Christophe Henry :Review Manager: Needed :Download: `Boost Vault <http://www.boostpro.com/vault/index.php?direction=0&order=&directory...
`__
:Description: Msm is a framework which enables you to build a Finite State Machine in a straightforward, descriptive and easy-to-use manner . It requires minimal effort to generate a working program from an UML state machine diagram. This work was inspired by the state machine described in the book of David Abrahams and Aleksey Gurtovoy "C++ Template Metaprogramming" and adds most of what UML Designers are expecting from an UML State Machine framework: * Entry and Exit Methods * Guard Conditions * Sub state machines (also called composite states in UML) * History * Terminate Pseudo-State * Deferred Events * Orthogonal zones * Explicit entry into sub state machine states * Fork * Entry / Exit pseudo states * Conflicting transitions Conversion ---------- :Author: Vicente Botet :Review Manager: Needed :Download: `Boost Vault <http://www.boostpro.com/vault/index.php?action=downloadfile&filename=conversion.zip&directory=Utilities&
`__
:Description: Generic explicit conversion between unrelated types. Boost.Conversion provides: * a generic ``convert_to`` function which can be specialized by the user to make explicit conversion between unrelated types. * a generic ``assign_to`` function which can be specialized by the user to make explicit assignation between unrelated types. * conversion between ``std::complex`` of explicitly convertible types. * conversion between ``std::pair`` of explicitly convertible types. * conversion between ``boost::optional`` of explicitly convertible types. * conversion between ``boost::rational`` of explicitly convertible types. * conversion between ``boost::interval`` of explicitly convertible types. * conversion between ``boost::chrono::time_point`` and ``boost::ptime``. * conversion between ``boost::chrono::duration`` and ``boost::time_duration``. Sorting ------- :Author: Steven Ross :Review Manager: Needed :Download: `Boost Vault <http://www.boostpro.com/vault/index.php?action=downloadfile&filename=alg...
`__
:Description: A grouping of 3 templated hybrid radix/comparison-based sorting algorithms that provide superior worst-case and average-case performance to std::sort: integer_sort, which sorts fixed-size data types that support a rightshift (default of >>) and a comparison (default of <) operator. float_sort, which sorts standard floating-point numbers by safely casting them to integers. string_sort, which sorts variable-length data types, and is optimized for 8-bit character strings. All 3 algorithms have O(n(k/s + s)) runtime where k is the number of bits in the data type and s is a constant, and limited memory overhead (in the kB for realistic inputs). In testing, integer_sort varies from 35% faster to 8X as fast as std::sort, depending on processor, compiler optimizations, and data distribution. float_sort is roughly 7X as fast as std::sort on x86 processors. string_sort is roughly 2X as fast as std::sort. GIL.IO ------ :Author: Christian Henning :Review Manager: Needed :Download: `GIL Google Code Vault <http://gil-contributions.googlecode.com/files/rc2.zip
`__
:Description: I/O extension for ``boost::gil`` which allows reading and writing of/in various image formats ( tiff, jpeg, png, etc ). This review will also include the Toolbox extension which adds some common functionality to gil, such as new color spaces, algorithms, etc. AutoBuffer ---------- :Author: Thorsten Ottosen :Review Manager: Needed :Download: `Here <http://www.cs.aau.dk/~nesotto/boost/ auto_buffer.zip>`__ :Description: Boost.AutoBuffer provides a container for efficient dynamic, local buffers. Furthermore, the container may be used as an alternative to std::vector, offering greater flexibility and sometimes better performance. String Convert -------------- :Author: Vladimir Batov :Review Manager: Needed :Download: `Boost Vault <http://www.boostpro.com/vault/index.php?action=downloadfile&filename=boo...
`__
:Description: The library takes the approach of boost::lexical_cast in the area of string-to-type and type-to-string conversions, builds on the past boost::lexical_cast experience and advances that conversion functionality further to additionally provide: * throwing and non-throwing conversion-failure behavior; * support for the default value to be returned when conversion fails; * two types of the conversion-failure check -- basic and better/safe; * formatting support based on the standard I/O Streams and the standard (or user-defined) I/O Stream-based manipulators (like std::hex, std::scientific, etc.); * locale support; * support for boost::range-compliant char and wchar_t-based string containers; * no DefaultConstructibility requirement for the Target type; * consistent framework to uniformly incorporate any type-to-type conversions. It is an essential tool with applications making extensive use of configuration files or having to process/prepare considerable amounts of data in, say, XML, etc. Libraries under development =========================== Please let us know of any libraries you are currently developing that you intend to submit for review. Mirror ------ :Author: Matus Chochlik :Download: | http://svn.boost.org/svn/boost/sandbox/mirror/doc/index.html | `Boost Vault <http://www.boostpro.com/vault/index.php?action=downloadfile&filename=mir...
`__
:Description: The aim of the Mirror library is to provide useful meta-data at both compile-time and run-time about common C++ constructs like namespaces, types, typedef-ined types, classes and their base classes and member attributes, instances, etc. and to provide generic interfaces for their introspection. Mirror is designed with the principle of stratification in mind and tries to be as less intrusive as possible. New or existing classes do not need to be designed to directly support Mirror and no Mirror related code is necessary in the class' definition, as far as some general guidelines are followed Most important features of the Mirror library that are currently implemented include: * Namespace-name inspection. * Inspection of the whole scope in which a namespace is defined * Type-name querying, with the support for typedef-ined typenames and typenames of derived types like pointers, references, cv-qualified types, arrays, functions and template names. Names with or without nested-name-specifiers can be queried. * Inspection of the scope in which a type has been defined * Uniform and generic inspection of class' base classes. One can inspect traits of the base classes for example their types, whether they are inherited virtually or not and the access specifier (private, protected, public). * Uniform and generic inspection of class' member attributes. At compile-time the count of class' attributes and their types, storage class specifiers (static, mutable) and some other traits can be queried. At run-time one can uniformly query the names and/or values (when given an instance of the reflected class) of the member attributes and sequentially execute a custom functor on every attribute of a class. * Traversals of a class' (or generally type's) structure with user defined visitors, which are optionally working on an provided instance of the type or just on it's structure without any run-time data. These visitors are guided by Mirror through the structure of the class and optionally provided with contextual information about the current position in the traversal. I'm hoping to have it review ready in the next few months. Interval Template Library ------------------------- :Author: Joachim Faulhaber :Description: The Interval Template Library (Itl) provides intervals and two kinds of interval containers: Interval_sets and interval_maps. Interval_sets and maps can be used just as sets or maps of elements. Yet they are much more space and time efficient when the elements occur in contiguous chunks: intervals. This is obviously the case in many problem domains, particularly in fields that deal with problems related to date and time. Interval containers allow for intersection with interval_sets to work with segmentation. For instance you might want to intersect an interval container with a grid of months and then iterate over those months. Finally interval_maps provide aggregation on associated values, if added intervals overlap with intervals that are stored in the interval_map. This feature is called aggregate on overlap. It is shown by example: :: typedef set<string> guests; interval_map<time, guests> party; guests mary; mary.insert("Mary"); guests harry; harry.insert("Harry"); party += make_pair(interval<time>::rightopen(20:00, 22:00),mary); party += make_pair(interval<time>::rightopen_(21:00, 23:00),harry); // party now contains [20:00, 21:00)->{"Mary"} [21:00, 22:00)->{"Harry","Mary"} //guest sets aggregated on overlap [22:00, 23:00)->{"Harry"} As can be seen from the example an interval_map has both a decompositional behavior (on the time dimension) as well as a accumulative one (on the associated values). StlConstantTimeSize ------------------- :Author: Vicente J. Botet Escriba :Download: `Boost Vault <http://www.boostpro.com/vault/index.php?action=downloadfile&filename=constant_time_size.zip&directory=Containers&
`__
:Description: Boost.StlConstantTimeSize Defines a wrapper to the stl container list giving the user the chioice for the complexity of the size function: linear time, constant time or quasi-constant. In future versions the library could include a similar wrapper to slist. InterThreads ------------------- :Author: Vicente J. Botet Escriba :Download: `Boost Vault <http://www.boostpro.com/vault/index.php?action=downloadfile&filename=interthreads.zip&directory=Concurrent%20Programming&
`__
:Description: Boost.InterThreads extends Boost.Threads adding some features: * thread decorator: thread_decorator allows to define setup/cleanup functions which will be called only once by thread: setup before the thread function and cleanup at thread exit. * thread specific shared pointer: this is an extension of the thread_specific_ptr providing access to this thread specific context from other threads. As it is shared the stored pointer is a shared_ptr instead of a raw one. * thread keep alive mechanism: this mechanism allows to detect threads that do not prove that they are alive by calling to the keep_alive_point regularly. When a thread is declared dead a user provided function is called, which by default will abort the program. * thread tuple: defines a thread groupe where the number of threads is know statically and the threads are created at construction time. * set_once: a synchonizer that allows to set a variable only once, notifying to the variable value to whatever is waiting for that. * thread_tuple_once: an extension of the boost::thread_tuple which allows to join the thread finishing the first, using for that the set_once synchronizer. * thread_group_once: an extension of the boost::thread_group which allows to join the thread finishing the first, using for that the set_once synchronizer. (thread_decorator and thread_specific_shared_ptr) are based on the original implementation of threadalert written by Roland Schwarz. Boost.InterThreads extends Boost.Threads adding thread setup/cleanup decorator, thread specific shared pointer, thread keep alive mechanism and thread tuples.

Ronald Garcia wrote:
Sorting ------- :Author: Steven Ross
:Review Manager: Needed
:Download: `Boost Vault :<http://www.boostpro.com/vault/index.php?action=downloadfile&filename=alg...
`__
:Description: A grouping of 3 templated hybrid radix/comparison-based sorting algorithms that provide superior worst-case and average-case performance to std::sort: integer_sort, which sorts fixed-size data types that support a rightshift (default of >>) and a comparison (default of <) operator. float_sort, which sorts standard floating-point numbers by safely casting them to integers. string_sort, which sorts variable-length data types, and is optimized for 8-bit character strings.
All 3 algorithms have O(n(k/s + s)) runtime where k is the number of bits in the data type and s is a constant, and limited memory overhead (in the kB for realistic inputs). In testing, integer_sort varies from 35% faster to 8X as fast as std::sort, depending on processor, compiler optimizations, and data distribution. float_sort is roughly 7X as fast as std::sort on x86 processors. string_sort is roughly 2X as fast as std::sort.
The archive pointed at above does not seem to have any documentation at all. I though that having docs is a requirement for having a review at all? It is also unclear how the performance measurements were made and the graphs obtained -- what test program should be run, with what compiler and compiler options and on what system. At the very minimum, this should be clearly documented so that everybody can obtain the same graphs, if necessary. However, I have a bigger concern. How do we know these algorithms are indeed correct, and how they stand in relation to other sorting algorithms? Steven has provided a link to his article: http://portal.acm.org/citation.cfm?id=691537 However, it does not seem that full text is available for this article, so it's not possible to know what other algorithms spreadsort was compared to. I could not find references to this article from newer ones either, so no comparison results from there. Is Boost supposed to be the place for reviewing new algorithms, as opposed to reviewing new implementations of otherwise established algorithms? - Volodya

The documentation is in the algorithm_sorting.zip archive: libs/algorithm/sorting/index.html I'm not sure how you found the graphs but no documentation. Tests were run with the sample applications, built with bjam, with random numbers generated by the randomgen sample. I just uploaded spreadsort_paper.pdf to the boost vault, for those who don't have access to the paper through their library. I test it with the libs/algorithm/sorting/tune.pl script, with the -test_only option. To make this work you have to copy boost/algorithm/sorting and libs/algorithm/sorting from the zip archive to their respective locations in the boost release tree, and then tune.pl will work. You're welcome to look at how the tuning/testing script works and make suggested improvements; it takes a weighted average of runtime, giving more weight to more random distributions. Performance comparisons are done relative to std::sort. If you can recommend a better algorithm to compare against, I'd be happy to oblige. I compare the resulting sorted files to std::sort to make sure that the results are identical in tune.pl, on a set of large randomly-generated data sets. Normal results for integer_sort are around 70%-2X speed improvement. The high end mentioned is when the data allows for bucketsorting, on some systems. I have a mostly-sorted data test (1% of the data is swapped out of place) in my samples, as that appears to be one of the the worst-relative-cases vs. std::sort, a slight slowdown on some systems, and a speedup on others. If you're interested in a similar, also-published algorithm (published a few months later by a different author), there is adaptive left radix. It doesn't have the worst-case handling and uses insertion_sort, but follows the same basic concept. http://www.nik.no/2002/Maus.pdf string_sort is highly similar to the American National Flag algorithm, which was established as the best in-place high-speed string sorting algorithm a while ago. The main differences with string_sort are: 1) Using std::sort instead of insertion_sort for small data sets, allowing it to use 256 as the fallback size instead of 16, cutting back on worst-case memory and runtime overhead. 2) Because of the increased fallback size, it was efficient to eliminate finding the maximum and minimum values, saving an O(n) run through the data for every iteration. 3) An optimization for long identical substrings, passing over the data to check if every element is identical, and if so, checking the next character, etc., so as to avoid wasting time with unnecessary swap loop tests and recursion. This is a common issue with real strings. This was added at the request of Phil Endecott, and performs well on real strings. 4) In the fallback sort, only comparing characters that haven't been sorted yet, making the comparison-based fallback more efficient. float_sort uses a reasonably well-known trick that solves a major performance problem with x86 processors, but hasn't been widely accepted. I think it would be good to make this more readily available. The concept here is to provide a group of templated hybrid algorithms to solve common sorting problems faster than purely comparison-based algorithms can. With regards to the question of whether Boost is interested in innovative algorithms, that's for the Boost community to decide; I don't know. On Mon, Jun 1, 2009 at 9:54 PM, Vladimir Prus <vladimir@codesourcery.com>wrote:
Ronald Garcia wrote:
Sorting ------- :Author: Steven Ross
:Review Manager: Needed
:Download: `Boost Vault :< http://www.boostpro.com/vault/index.php?action=downloadfile&filename=alg...
`__
:Description: A grouping of 3 templated hybrid radix/comparison-based sorting algorithms that provide superior worst-case and average-case performance to std::sort: integer_sort, which sorts fixed-size data types that support a rightshift (default of >>) and a comparison (default of <) operator. float_sort, which sorts standard floating-point numbers by safely casting them to integers. string_sort, which sorts variable-length data types, and is optimized for 8-bit character strings.
All 3 algorithms have O(n(k/s + s)) runtime where k is the number of bits in the data type and s is a constant, and limited memory overhead (in the kB for realistic inputs). In testing, integer_sort varies from 35% faster to 8X as fast as std::sort, depending on processor, compiler optimizations, and data distribution. float_sort is roughly 7X as fast as std::sort on x86 processors. string_sort is roughly 2X as fast as std::sort.
The archive pointed at above does not seem to have any documentation at all. I though that having docs is a requirement for having a review at all? It is also unclear how the performance measurements were made and the graphs obtained -- what test program should be run, with what compiler and compiler options and on what system. At the very minimum, this should be clearly documented so that everybody can obtain the same graphs, if necessary.
However, I have a bigger concern. How do we know these algorithms are indeed correct, and how they stand in relation to other sorting algorithms? Steven has provided a link to his article:
http://portal.acm.org/citation.cfm?id=691537
However, it does not seem that full text is available for this article, so it's not possible to know what other algorithms spreadsort was compared to. I could not find references to this article from newer ones either, so no comparison results from there. Is Boost supposed to be the place for reviewing new algorithms, as opposed to reviewing new implementations of otherwise established algorithms?
- Volodya
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

On Tue, Jun 2, 2009 at 4:53 PM, Steven Ross <spreadsort@gmail.com> wrote:
With regards to the question of whether Boost is interested in innovative algorithms, that's for the Boost community to decide; I don't know.
I would consider it very strange for Boost not to be interested in innovative algorithms. It's a boost anyway. /$

Henrik Sundberg wrote:
On Tue, Jun 2, 2009 at 4:53 PM, Steven Ross <spreadsort@gmail.com> wrote:
With regards to the question of whether Boost is interested in innovative algorithms, that's for the Boost community to decide; I don't know.
I would consider it very strange for Boost not to be interested in innovative algorithms. It's a boost anyway.
I don't think "whether Boost is interested in innovative algorithms" is the right question. The question is what requirements should there be on algorithms implemented by Boost libraries, assuming Boost is not in business of reviewing algorithms. I hope you agree than including implementation of an algorithm from a paper published a day ago in arbitrary journal would be strange. And on the other hand, an algorithm described in Knuth is probably "safe". Where's the line between those? - Volodya

2009/6/2 Steven Ross <spreadsort@gmail.com>:
With regards to the question of whether Boost is interested in innovative algorithms, that's for the Boost community to decide; I don't know.
I think of Boost libraries under two basic (orthogonal) categories: 1) Innovative interfaces, where the implementation is an interesting but ultimately not terribly important detail (except where the implementation can itself be an interesting interface, like happened with Spirit). 2) Solid, portable, C++-style implementations of common needs. So I'd agree that C++ isn't the place for novel algorithms in general, but I'd claim radix sorting is a common enough goal that a nicely-tuned STL-style implementation would be appropriate for boost.

On Tue, Jun 2, 2009 at 11:42 PM, Scott McMurray <me22.ca+boost@gmail.com> wrote:
... but I'd claim radix sorting is a common enough goal that a nicely-tuned STL-style implementation would be appropriate for boost.
Radix sorting is indeed appropriate, IMO. However, there are many "proven" radix sort algorithms, and (probably) many more "unproven" algorithms. Seems to me that boost is a set of "generally useful" libraries, and not a "proving ground" for new algorithms. The point in question is then: At what point is an algorithm adequately "proven" for inclusion in a boost library? Jon

Are you guys saying that all algorithms in boost are currently proven? If I were to submit an I/O library (it's an example, I'm not working on I/O) using somehow a "new" algorithm to manage asynchronous requests, would you ask me to use a more classical approach during the review? What is the difference between a new algorithm and a new implementation of an old algorithm?
From the user point of view, when it "doesn't work", does the reason matter?
Does boost embrace innovation? Innovations comes in many forms. It doesn't matter if the algorithm is new or not as long as the author is able to give a certain degree of assurance regarding reliability and correctness, isn't it? An area where I would be extremely cautious with new algorithms is cryptography, for obvious security reasons (notwithstanding it's very easy to draw the line between the algorithm and the implementation details). I'm not sure I can think of other cases where new algorithms should be dismissed a priori. Kind regards. -- EA __________ Information from ESET NOD32 Antivirus, version of virus signature database 4125 (20090603) __________ The message was checked by ESET NOD32 Antivirus. http://www.eset.com

Edouard A. wrote:
Are you guys saying that all algorithms in boost are currently proven?
I believe that Boost libraries that have non-trivial algorithms use established ones. For example, consider Boost.Graph -- where correctness of operation cannot be established by testing, and where complexity estimate is paramount.
If I were to submit an I/O library (it's an example, I'm not working on I/O) using somehow a "new" algorithm to manage asynchronous requests, would you ask me to use a more classical approach during the review?
To make the discussion more specific, can you give pointers to classic algorithms for managing asynchronous requests and outline the differences you propose? If your example is made up, then I can come up with a made-up answer. Say, you want to support priority for async requests, and you decide to use specific flavour of heaps for that. It would be fine. However, if you also invent your own kind of heap, never seen in literature, that would be somewhat suspect.
What is the difference between a new algorithm and a new implementation of an old algorithm?
Is this a genuine question? - Volodya

On Wed, Jun 3, 2009 at 1:06 AM, Edouard A. <edouard@fausse.info> wrote:
Are you guys saying that all algorithms in boost are currently proven?
I'm not aware of any that aren't... But I don't know all of the libraries inside and out.
If I were to submit an I/O library (it's an example, I'm not working on I/O) using somehow a "new" algorithm to manage asynchronous requests, would you ask me to use a more classical approach during the review?
If you don't have a solid mathematical proof for the correctness of your algorithm, then I may instead ask for case studies, etc.
What is the difference between a new algorithm and a new implementation of an old algorithm?
Are you trolling here?
Does boost embrace innovation?
I think I have my answer.
Innovations comes in many forms. It doesn't matter if the algorithm is new or not as long as the author is able to give a certain degree of assurance regarding reliability and correctness, isn't it?
I'm questioning the degree of assurance required for a new algorithm to be unleashed on the unsuspecting masses. Jon

I'm questioning the degree of assurance required for a new algorithm to be unleashed on the unsuspecting masses.
Exactly the point I was trying to make. To be more precise the novelty of an algorithm shouldn't be held against it. What matters is that a reasonable degree of assurance regarding correctness and performance can be given. This concerns the library as a whole, not just whatever algorithm(s) it may use. I understand the concern of not wasting time evaluating libraries using algorithm(s) that are incorrect in the first place. -- EA

On Wed, Jun 3, 2009 at 8:29 AM, Edouard A. <edouard@fausse.info> wrote:
I'm questioning the degree of assurance required for a new algorithm to be unleashed on the unsuspecting masses.
Exactly the point I was trying to make.
So we agree violently then. ;-)
To be more precise the novelty of an algorithm shouldn't be held against it.
If you define "reasonable assurance" to exclude any algorithm that has not been published in a reputable journal, with at least 2 citations, is it still novel? ;-) Just kidding, WRT the novelty bit.
What matters is that a reasonable degree of assurance regarding correctness and performance can be given. This concerns the library as a whole, not just whatever algorithm(s) it may use.
So how do we define "reasonable degree of assurance"? Jon

What matters is that a reasonable degree of assurance regarding correctness and performance can be given. This concerns the library as a whole, not just whatever algorithm(s) it may use.
So how do we define "reasonable degree of assurance"?
Defining it would be like building Mathematics all over again... ;) We can however agree on some guidelines I guess. Actually, I think when a reviewer has got a doubt about the underlying algorithm, he/she has got no choice but to carefully review the algorithm itself. Formality is a tool, not a goal. -- EA

Edouard A. wrote:
What matters is that a reasonable degree of assurance regarding correctness and performance can be given. This concerns the library as a whole, not just whatever algorithm(s) it may use.
So how do we define "reasonable degree of assurance"?
Defining it would be like building Mathematics all over again... ;)
We can however agree on some guidelines I guess.
Actually, I think when a reviewer has got a doubt about the underlying algorithm, he/she has got no choice but to carefully review the algorithm itself.
Formality is a tool, not a goal.
I was skeptical of the algorithm itself originally, but Steven's persistence in explaining it eventually provided me with enough information to assure me that the algorithm is sound and does what he says it does. That leaves what consitutes a reasonable degree of assurance that the code implements the algorithm. In my work I put a lot of effort into validation of things that are often hard to validate. Sorting is embarrasingly easy to validate because the correct output is well defined and correct algorithms for getting the correct output are easy to come by. Unit tests that demonstrate correctness of the code on pseudo exhastive test cases should be easy for Steven to provide. I think there is very little chance of him getting a sorting algorithm that doesn't sort into boost no matter how badly the review is bungled, and I doubt the review will be bungled at all. My only concern is that the cast from integer to floating point be done in a way that won't break when the code is ported. I'm pretty sure Steven has that covered because I followed the discussion about it a while back, but I'll double check when the review comes up that such conversions are only enabled when a test that they will work for a given platform exists in the unit tests and that the default behavior is to fall back on std::sort when there is no certaintly that conversion will work. Regards, Luke

On Wed, Jun 3, 2009 at 9:10 AM, Simonson, Lucanus J < lucanus.j.simonson@intel.com> wrote:
That leaves what consitutes a reasonable degree of assurance that the code implements the algorithm. In my work I put a lot of effort into validation of things that are often hard to validate. Sorting is embarrasingly easy to validate because the correct output is well defined and correct algorithms for getting the correct output are easy to come by. Unit tests that demonstrate correctness of the code on pseudo exhastive test cases should be easy for Steven to provide. I think there is very little chance of him getting a sorting algorithm that doesn't sort into boost no matter how badly the review is bungled, and I doubt the review will be bungled at all.
I currently use an approach of generating large files of pseudo-random numbers with between 1 and 32 random bits, and running many different sample tests on these. I could add some unit tests for crazy cases too; I guess I might as well create the "ALR breaker" worst-case testcase, among others. My current "unit" tests other than for random data are for sorting sorted data (a task it performs quickly), and for sorting mostly-sorted data (a task it performs about as well or slightly slower than std::sort). I'll see if I can think of others, but if people have suggestions, I'd be happy to include them. I eliminate NANs and -0.0s from my tests because std::sort orders them arbitrarily (I change them to 0s in the tests); is that a problem? This was discussed before, and the only response was someone saying that I shouldn't have to worry about them. float_sort will order them in its radix-based portion based upon their cast integer value (which for -0.0 is just less than 0.0), but then when std::sort is called it will order them arbitrarily in its sub-sort.
My only concern is that the cast from integer to floating point be done in a way that won't break when the code is ported. I'm pretty sure Steven has that covered because I followed the discussion about it a while back, but I'll double check when the review comes up that such conversions are only enabled when a test that they will work for a given platform exists in the unit tests and that the default behavior is to fall back on std::sort when there is no certaintly that conversion will work.
Currently it will fire a compile-time assertion if the cast won't work. I can change that to a warning and an enable_if, so it will do what you suggest. It sounds like I have significantly more work to do before this is ready for review; it'll probably take me at least a month to get all the requested changes in (including doc updates, especially a thorough explanation and proof of worst-case performance).

Steven Ross wrote:
It sounds like I have significantly more work to do before this is ready for review; it'll probably take me at least a month to get all the requested changes in (including doc updates, especially a thorough explanation and proof of worst-case performance).
This work will be to your benefit though. Without the assurance of worst-case performance in the documentation I would expect people to feel disinclined to use the library. It is exactly that information that should convince people that they want to use the library. Along with the theoretical assurance you should also provide benchmark comparisions in the documentation because people who can't follow the math of the proof are exactly the people who will be convinced by benchmark results. People who aren't convinced by benchmarks we can assume to be shrewd enough to follow the proof. :) Luke

Agreed; so let's shelve the review until that's done. On Wed, Jun 3, 2009 at 9:55 AM, Simonson, Lucanus J < lucanus.j.simonson@intel.com> wrote:
Steven Ross wrote:
It sounds like I have significantly more work to do before this is ready for review; it'll probably take me at least a month to get all the requested changes in (including doc updates, especially a thorough explanation and proof of worst-case performance).
This work will be to your benefit though. Without the assurance of worst-case performance in the documentation I would expect people to feel disinclined to use the library. It is exactly that information that should convince people that they want to use the library. Along with the theoretical assurance you should also provide benchmark comparisions in the documentation because people who can't follow the math of the proof are exactly the people who will be convinced by benchmark results. People who aren't convinced by benchmarks we can assume to be shrewd enough to follow the proof. :)
Luke _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

On Wed, Jun 3, 2009 at 9:55 AM, Simonson, Lucanus J < lucanus.j.simonson@intel.com> wrote:
Steven Ross wrote:
It sounds like I have significantly more work to do before this is ready for review; it'll probably take me at least a month to get all the requested changes in (including doc updates, especially a thorough explanation and proof of worst-case performance).
This work will be to your benefit though. Without the assurance of worst-case performance in the documentation I would expect people to feel disinclined to use the library. It is exactly that information that should convince people that they want to use the library. Along with the theoretical assurance you should also provide benchmark comparisions in the documentation because people who can't follow the math of the proof are exactly the people who will be convinced by benchmark results. People who aren't convinced by benchmarks we can assume to be shrewd enough to follow the proof. :) <http://lists.boost.org/mailman/listinfo.cgi/boost>
I have completed reformatting and updating the documentation and adding a performance calculation/proof. My proposed Boost.Algorithm.Sorting library is ready for review. The library is available at: http://www.boostpro.com/vault/index.php?action=downloadfile&filename=algorithm_sorting.zip&directory=&

on 17.08.2009 at 22:45 Steven Ross wrote :
I have completed reformatting and updating the documentation and adding a performance calculation/proof. My proposed Boost.Algorithm.Sorting library is ready for review. The library is available at: http://www.boostpro.com/vault/index.php?action=downloadfile&filename=algorithm_sorting.zip&directory=& cannot find the file
-- Pavel

on 17.08.2009 at 22:45 Steven Ross wrote : sorry i found it -- Pavel

on 17.08.2009 at 22:45 Steven Ross wrote : > I have completed reformatting and updating the documentation and adding a > performance calculation/proof. My proposed Boost.Algorithm.Sorting library > is ready for review. > The library is available at: > http://www.boostpro.com/vault/index.php?action=downloadfile&filename=algorithm_sorting.zip&directory=& if you ask me i say 1. doc navigation is odd and inconvenient 2. integer constants MAX_SPLITS MAX_FINISHING_SPLITS MIN_SORT_SIZE LOG_MEAN_BIN_SIZE LOG_MIN_SPLIT_COUNT LOG_FINISHING_COUNT FLOAT_LOG_MEAN_BIN_SIZE FLOAT_LOG_MIN_SPLIT_COUNT FLOAT_LOG_FINISHING_COUNT must be members of a (nameless) enum 3. i think it's a valuable piece of software for specific domain -- Pavel

On Mon, Aug 17, 2009 at 12:48 PM, DE <satan66613@yandex.ru> wrote:
I have completed reformatting and updating the documentation and adding a performance calculation/proof. My proposed Boost.Algorithm.Sorting
on 17.08.2009 at 22:45 Steven Ross wrote : library
is ready for review. The library is available at:
http://www.boostpro.com/vault/index.php?action=downloadfile&filename=algorithm_sorting.zip&directory=& if you ask me i say 1. doc navigation is odd and inconvenient
I followed the conventions in: http://www.boost.org/doc/libs/1_34_1/more/writingdoc/index.html. Do you have any specific suggestions for improvement?
2. integer constants MAX_SPLITS MAX_FINISHING_SPLITS MIN_SORT_SIZE LOG_MEAN_BIN_SIZE LOG_MIN_SPLIT_COUNT LOG_FINISHING_COUNT FLOAT_LOG_MEAN_BIN_SIZE FLOAT_LOG_MIN_SPLIT_COUNT FLOAT_LOG_FINISHING_COUNT must be members of a (nameless) enum
Why? I don't see that in: http://www.boost.org/development/int_const_guidelines.html They are details that are not intended to be modified by the average user, and are in the detail scope for that reason. People with a good understanding of both the algorithm and the processor(s) they intend to run their software on may wish to modify them, which is the only reason they are documented.
3. i think it's a valuable piece of software for specific domain
Thanks.

I followed the conventions in: http://www.boost.org/doc/libs/1_34_1/more/writingdoc/index.html. Do you have any specific suggestions for improvement? once i gone to one of subpages i can not go to main page rather than
Why? I don't see that in: http://www.boost.org/development/int_const_guidelines.html They are details that are not intended to be modified by the average user, and are in the detail scope for that reason. People with a good understanding of both the algorithm and the processor(s) they intend to run their software on may wish to modify them, which is the only reason they are documented. i bet you won't need addresses of that variables so it's a better decision to make them members of unnamed enum
on 18.08.2009 at 0:37 Steven Ross wrote : pressing [back] in my browser there should be a navigation pane or something like main::subpage::subsubpage like on forum pages that way you get truly compile-time constants it's perfectly portable and does not assume size of int in any way just think of it also i suggest those guides are somewhat outdated
3. i think it's a valuable piece of software for specific domain Thanks. you are welcome
-- Pavel

On Tue, Aug 18, 2009 at 8:15 AM, DE <satan66613@yandex.ru> wrote:
I followed the conventions in: http://www.boost.org/doc/libs/1_34_1/more/writingdoc/index.html. Do you have any specific suggestions for improvement? once i gone to one of subpages i can not go to main page rather than
on 18.08.2009 at 0:37 Steven Ross wrote : pressing [back] in my browser there should be a navigation pane or something like
main::subpage::subsubpage
like on forum pages
I've added a "spirit-nav" bar at the top and bottom, just like in some other Boost documentation (and reuploaded). I think that's what you were looking for. It has home and up links that go back to the index page, and next and previous links that enable sequential scrolling through the docs.
Why? I don't see that in: http://www.boost.org/development/int_const_guidelines.html They are details that are not intended to be modified by the average user, and are in the detail scope for that reason. People with a good understanding of both the algorithm and the processor(s) they intend to run their software on may wish to modify them, which is the only reason they are documented. i bet you won't need addresses of that variables so it's a better decision to make them members of unnamed enum that way you get truly compile-time constants it's perfectly portable and does not assume size of int in any way just think of it also i suggest those guides are somewhat outdated
I don't see how it hurts, so I implemented your suggestion; the constants are now defined as enum { val = # }, instead of having a type. Thanks for your feedback. The library is ready for formal review.

on 19.08.2009 at 16:40 Steven Ross wrote :
I've added a "spirit-nav" bar at the top and bottom, just like in some other Boost documentation (and reuploaded). I think that's what you were looking for. It has home and up links that go back to the index page, and next and previous links that enable sequential scrolling through the docs. i'd have a look
I don't see how it hurts, so I implemented your suggestion; the constants are now defined as enum { val = # }, instead of having a type. Thanks for your feedback. i'm glad my suggestions helped you!
-- Pavel

on 19.08.2009 at 16:40 Steven Ross wrote :
I've added a "spirit-nav" bar at the top and bottom, just like in some other Boost documentation (and reuploaded). I think that's what you were looking for. It has home and up links that go back to the index page, and next and previous links that enable sequential scrolling through the docs. much better
I don't see how it hurts, so I implemented your suggestion; the constants are now defined as enum { val = # }, instead of having a type. Thanks for your feedback. well actually i meant that you can define all constants in one unnamed enum like
enum { MAX_SPLITS = 11, MAX_FINISHING_SPLITS = MAX_SPLITS + 1, LOG_MEAN_BIN_SIZE = 2, LOG_MIN_SPLIT_COUNT = 9, LOG_FINISHING_COUNT = 31, FLOAT_LOG_MEAN_BIN_SIZE = 2, FLOAT_LOG_MIN_SPLIT_COUNT = 8, FLOAT_LOG_FINISHING_COUNT = 4, MIN_SORT_SIZE = 3000 }; additionally since you introduce them through a header, it's guaranteed that they will not be presented in object code as separate entities (contrast to 'static const int' or whatever) -- Pavel

On Wed, Aug 19, 2009 at 11:26 AM, DE <satan66613@yandex.ru> wrote:
I don't see how it hurts, so I implemented your suggestion; the constants are now defined as enum { val = # }, instead of having a type. Thanks for your feedback. well actually i meant that you can define all constants in one unnamed enum like
enum { MAX_SPLITS = 11, MAX_FINISHING_SPLITS = MAX_SPLITS + 1, LOG_MEAN_BIN_SIZE = 2, LOG_MIN_SPLIT_COUNT = 9, LOG_FINISHING_COUNT = 31, FLOAT_LOG_MEAN_BIN_SIZE = 2, FLOAT_LOG_MIN_SPLIT_COUNT = 8, FLOAT_LOG_FINISHING_COUNT = 4, MIN_SORT_SIZE = 3000 }; <http://lists.boost.org/mailman/listinfo.cgi/boost>
Is there any reason why one approach (unnamed enum for each value vs. single unnamed enum for all) is better than the other?

on 20.08.2009 at 3:42 Steven Ross wrote :
Is there any reason why one approach (unnamed enum for each value vs. single unnamed enum for all) is better than the other? actually no but i think keepeng values in one enum is more convenient
-- Pavel

On Thu, Aug 20, 2009 at 8:20 AM, DE <satan66613@yandex.ru> wrote:
on 20.08.2009 at 3:42 Steven Ross wrote :
Is there any reason why one approach (unnamed enum for each value vs. single unnamed enum for all) is better than the other? actually no but i think keepeng values in one enum is more convenient
Thanks for the confirmation (and the suggestions). I'll just leave them as individual enums then.

on 20.08.2009 at 23:20 Steven Ross wrote :
Thanks for the confirmation (and the suggestions). I'll just leave them as individual enums then. you are welcome maybe you even start a new guideline...
-- Pavel

Jonathan Franklin wrote:
On Wed, Jun 3, 2009 at 8:29 AM, Edouard A. <edouard@fausse.info> wrote:
I'm questioning the degree of assurance required for a new algorithm to be unleashed on the unsuspecting masses.
Exactly the point I was trying to make.
So we agree violently then. ;-)
To be more precise the novelty of an algorithm shouldn't be held against it.
If you define "reasonable assurance" to exclude any algorithm that has not been published in a reputable journal, with at least 2 citations, is it still novel? ;-)
Just kidding, WRT the novelty bit.
These questions all seem valid, but wouldn't it be enough to raise them during the review? As the discussion suggest, "novelty" is not clear-cut enough to be a good exclusion criteria. So I think "the novelty of an algorithm shouldn't be held against it" in the sense that it should not strictly prevent a review. However, it may indeed be a strong reason to reject the library during review. Regards, Thomas

Thomas Klimpel wrote:
Jonathan Franklin wrote:
On Wed, Jun 3, 2009 at 8:29 AM, Edouard A. <edouard@fausse.info> wrote:
I'm questioning the degree of assurance required for a new algorithm to be unleashed on the unsuspecting masses.
Exactly the point I was trying to make.
So we agree violently then. ;-)
To be more precise the novelty of an algorithm shouldn't be held against it.
If you define "reasonable assurance" to exclude any algorithm that has not been published in a reputable journal, with at least 2 citations, is it still novel? ;-)
Just kidding, WRT the novelty bit.
These questions all seem valid, but wouldn't it be enough to raise them during the review?
I think that the it's better to raise issues earlier than later, because everybody involved can have a change to think and/or fix something. - Volodya

-----Original Message----- From: boost-bounces@lists.boost.org [mailto:boost-bounces@lists.boost.org] On Behalf Of Vladimir Prus Sent: Mittwoch, 3. Juni 2009 17:18 To: boost@lists.boost.org Subject: Re: [boost] sorting library proposal (Was: Review Wizard Status Report for June 2009)o Thomas Klimpel wrote:
Jonathan Franklin wrote:
On Wed, Jun 3, 2009 at 8:29 AM, Edouard A. <edouard@fausse.info> wrote:
I'm questioning the degree of assurance required for a new algorithm to be unleashed on the unsuspecting masses.
Exactly the point I was trying to make.
So we agree violently then. ;-)
To be more precise the novelty of an algorithm shouldn't be held against it.
If you define "reasonable assurance" to exclude any algorithm that has not been published in a reputable journal, with at least 2 citations, is it still novel? ;-)
Just kidding, WRT the novelty bit.
These questions all seem valid, but wouldn't it be enough to raise them during the review?
I think that the it's better to raise issues earlier than later, because everybody involved can have a change to think and/or fix something. - Volodya _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost Vladimir Prus wrote:
Thomas Klimpel wrote:
If you define "reasonable assurance" to exclude any algorithm that has not been published in a reputable journal, with at least 2 citations, is it still novel? ;-)
Just kidding, WRT the novelty bit.
These questions all seem valid, but wouldn't it be enough to raise them during the review?
I think that the it's better to raise issues earlier than later, because everybody involved can have a change to think and/or fix something.
Good point. I know that there are some geometry libraries currently under development that plan to be proposed for review sooner or later. At least one of them tries to address the robustness problems that occur in this context with a "novel" modification of an established technique. Should we tell the author that he has to publish these modifications first in a reputable journal before proposing his library for review? I don't think so. However, he will certainly have to convince the reviewers during the review that his implementation is reliable, which includes that the implemented algorithms are reliable. I also know that this case is different from the one currently discussed, in that this "novel" modification is an implementation detail of the geometry library, while the "novel" algorithmic improvements are an important motivation for proposing the "sorting" library for review. Regards, Thomas

On Wed, Jun 3, 2009 at 8:50 AM, Thomas Klimpel <Thomas.Klimpel@synopsys.com>wrote:
I know that there are some geometry libraries currently under development that plan to be proposed for review sooner or later. At least one of them tries to address the robustness problems that occur in this context with a "novel" modification of an established technique. Should we tell the author that he has to publish these modifications first in a reputable journal before proposing his library for review? I don't think so. However, he will certainly have to convince the reviewers during the review that his implementation is reliable, which includes that the implemented algorithms are reliable.
I also know that this case is different from the one currently discussed, in that this "novel" modification is an implementation detail of the geometry library, while the "novel" algorithmic improvements are an important motivation for proposing the "sorting" library for review.
string_sort would work okay without any novel modification from American Flag Sort, but wouldn't be much faster than std::sort, due to the overhead of its max/min checking. I agree though, integer_sort and float_sort wouldn't be safe to use with any other published algorithm I'm aware of due to poor worst-case performance without the change to use std::sort, and increase of the fallback sort size (is that really so novel? Or just modernizing the algorithm?). The worst-case performance check is less essential once that step is taken, and does nothing on 32-bit integers with current default settings.

On Tue, Jun 2, 2009 at 11:02 PM, Jonathan Franklin < franklin.jonathan@gmail.com> wrote:
Radix sorting is indeed appropriate, IMO. However, there are many "proven" radix sort algorithms, and (probably) many more "unproven" algorithms. Seems to me that boost is a set of "generally useful" libraries, and not a "proving ground" for new algorithms.
The point in question is then: At what point is an algorithm adequately "proven" for inclusion in a boost library?
L1 cache size, and also because it cuts memory usage (as described in the
There are multiple papers in the literature about Adaptive Left Radix, which is fast on random data (and like Spreadsort, it wasn't published a day ago; it's been over 6 years). ALR has inferior worst-case performance because it uses insertion_sort as its fallback sort; worst-case performance is the top reason to use Spreadsort. Is insertion_sort to std::sort such a big complicated change that it can't be accepted? The other main change is a rough estimate of worst-case performance at runtime, and fallback when std::sort has better performance. With default settings, this doesn't do much on 32-bit integers, but on systems where std::sort is better optimized, with larger data types (64-bit), or with a smaller cache it can improve performance. Those are the biggest changes with this library vs. these other modern Radix-sorting algorithms. I've described the changes in string_sort vs. American Flag Sort already, which are similar, coming down to two core changes: replacing insertion_sort with std::sort, and the identical substring optimization. I'm not strongly attached to the substring optimization, though it seems to work well, and handles a common near-worst-case performance issue better. The change to not compare the already-sorted portions of the strings is a simple and effective performance tweak. Going back to a higher level, it's easy to test a sorting algorithm for correctness, and radix sorting isn't that hard to understand; the general approach is not novel, just some optimizations and worst-case handling. It would get expensive to publish a paper for every little tweak I've stuck in a sorting algorithm, so depending on publications for every improvement is a great way to slow down progress, especially if you demand that people cite said publications. Also, Knuth has a good book on sorting, and an excellent analysis of pure comparison-based algorithms. People seem to get the impression from his book that comparison-based sorting is the only general way to sort, but he does have mention of hybrid algorithms, including "*Markku Tamminen*: *Two*Levels are as *Good as Any*. J. Algorithms 6: 138-144 (1985)" and the O(n) performance on continuous integrable functions described on page 177 of The Art of Computer Programming Volume 3. The current Spreadsort uses smaller chunks to be cache-friendly, because Tamminem's approach is too slow when operating on n paper on ALR), but that just multiplies the number of iterations for continuous integrable functions by log(n)/log(estimated L1 cache size). Spreadsort's other main change is to use std::sort as a fallback, because not all distributions are continuous and integrable, and the related fallback check to minimize worst-case performance that takes into account that std::sort can be faster on small lists if the distribution range is large. (Tamminem's and the ALR approach both have poor absolute worst-case performance). In closing, this has: insertion_sort -> std::sort a worst-case fallback check (integer_sort/float_sort), falling back to std::sort an optimization for long substrings (string_sort) float to integer cast (float_sort) assorted performance tweaks Are those changes really so dangerous?

On Wed, Jun 3, 2009 at 8:37 AM, Steven Ross <spreadsort@gmail.com> wrote:
Is insertion_sort to std::sort such a big complicated change that it can't be accepted?
Do you have a proof for it's correctness? If not, perhaps it would be easy to modify the proof(s) for the algorithm you modified.
I've described the changes in string_sort vs. American Flag Sort already,
Is there a publication you can reference? My apologies if you already posted it.
It would get expensive to publish a paper for every little tweak I've stuck in a sorting algorithm, so depending on publications for every improvement is a great way to slow down progress, especially if you demand that people cite said publications.
It's also very expensive to standardize on broken algorithms and interfaces. Publishing each little tweak in a separate paper is certainly a Bad Idea. However, if you think the "package" is good enough for a boost library, then it is certainly good enough for a publication.
People seem to get the impression from his book that comparison-based sorting is the only general way to sort, ...
"People" are wrong all the time. ;-)
a worst-case fallback check (integer_sort/float_sort), falling back to std::sort an optimization for long substrings (string_sort) float to integer cast (float_sort) assorted performance tweaks
Are those changes really so dangerous?
This is Math, not politics... Let's see a proof. ;-) Jon

On Wed, Jun 3, 2009 at 7:56 AM, Jonathan Franklin < franklin.jonathan@gmail.com> wrote:
On Wed, Jun 3, 2009 at 8:37 AM, Steven Ross <spreadsort@gmail.com> wrote:
Is insertion_sort to std::sort such a big complicated change that it can't be accepted?
Do you have a proof for it's correctness?
Do I have a proof for std::sort's correctness? Are you joking?
If not, perhaps it would be easy to modify the proof(s) for the algorithm you modified.
I've described the changes in string_sort vs. American Flag Sort already,
Is there a publication you can reference? My apologies if you already posted it.
McIlroy, P., "Computing Systems", *Engineering* *Radix* *Sort*, Vol. 6:1, pp. 5-27, 1993. It's a good algorithm on its own, but it was written before introsort existed.
It would get expensive to publish a paper for every little tweak I've stuck in
a sorting algorithm, so depending on publications for every improvement is a great way to slow down progress, especially if you demand that people cite said publications.
It's also very expensive to standardize on broken algorithms and interfaces.
Agreed, but the algorithm is a hybrid radix sort optimized for worst-case performance; the interface is no different, and the basic technique the same as other hybrid sorting approaches. I changed the fallback sort and when it is called.
Publishing each little tweak in a separate paper is certainly a Bad Idea. However, if you think the "package" is good enough for a boost library, then it is certainly good enough for a publication.
The thought has crossed my mind. Would it be better to submit for publication first, and to Boost second? Care to recommend a journal?
a worst-case fallback check (integer_sort/float_sort), falling back to
std::sort an optimization for long substrings (string_sort) float to integer cast (float_sort) assorted performance tweaks
Are those changes really so dangerous?
This is Math, not politics... Let's see a proof. ;-) <http://lists.boost.org/mailman/listinfo.cgi/boost>
I can write one of spreadsort's worst-case performance for a paper. It's a time-consuming task. Anything else you need proven? I don't see the need to prove that radix-sorting or hybrid-sorting work. This is well established fact, and Knuth discusses such algorithms. I personally consider software development a cross of math and engineering. Cache optimization, memory reuse, and loop simplification has more to do with engineering for real processors than math.

On Wed, Jun 3, 2009 at 9:54 AM, Steven Ross <spreadsort@gmail.com> wrote:
On Wed, Jun 3, 2009 at 7:56 AM, Jonathan Franklin < franklin.jonathan@gmail.com> wrote:
On Wed, Jun 3, 2009 at 8:37 AM, Steven Ross <spreadsort@gmail.com> wrote:
Is insertion_sort to std::sort such a big complicated change that it can't be accepted?
Do you have a proof for it's correctness?
Do I have a proof for std::sort's correctness? Are you joking?
You misunderstood, so let me rephrase: Do you have a formal proof that your modification to the algorithm is correct? And no, I'm not joking. It's a fair question. It's not a big deal if you don't have one, but I would encourage you to write one. Jon

Steven Ross wrote:
On Tue, Jun 2, 2009 at 11:02 PM, Jonathan Franklin < franklin.jonathan@gmail.com> wrote:
Radix sorting is indeed appropriate, IMO. However, there are many "proven" radix sort algorithms, and (probably) many more "unproven" algorithms. Seems to me that boost is a set of "generally useful" libraries, and not a "proving ground" for new algorithms.
The point in question is then: At what point is an algorithm adequately "proven" for inclusion in a boost library?
There are multiple papers in the literature about Adaptive Left Radix, which is fast on random data (and like Spreadsort, it wasn't published a day ago; it's been over 6 years).
Can you give references to this papers and include them in your docs. I think one of the references might be the link you previously posted: http://www.nik.no/2002/Maus.pdf ?
ALR has inferior worst-case performance because it uses insertion_sort as its fallback sort; worst-case performance is the top reason to use Spreadsort. Is insertion_sort to std::sort such a big complicated change that it can't be accepted?
If it's indeed the only change, then it's probably OK, as soon as you explicitly say that std::sort is guaranteed, by the C++ standard, to have the same or better complexity than insertion sort. While insertion sort is O(n^2), std::sort is required to be O(n*log(n)) in the *average case*, so it permitted to be same O(n^2). Consequently, you are likely to get better average performance, but better worst-case performance is not guaranteed. Unless you make some claims about the algorithm std::sort uses.
Going back to a higher level, it's easy to test a sorting algorithm for correctness, and radix sorting isn't that hard to understand; the general approach is not novel, just some optimizations and worst-case handling. It would get expensive to publish a paper for every little tweak I've stuck in a sorting algorithm, so depending on publications for every improvement is a great way to slow down progress, especially if you demand that people cite said publications.
In closing, this has: insertion_sort -> std::sort a worst-case fallback check (integer_sort/float_sort), falling back to std::sort an optimization for long substrings (string_sort) float to integer cast (float_sort) assorted performance tweaks
Are those changes really so dangerous?
I don't understand, and can't find a description of, for most of them. If you give a paper describing base algorithm, and then describe in detail refinements you did, this will enable reviewers and users to figure if those refinemenst are safe enough and how they might affect performance. Right now, docs neither provide references nor explain your refinements. - Volodya

Steven Ross wrote:
float_sort uses a reasonably well-known trick that solves a major performance problem with x86 processors, but hasn't been widely accepted. I think it would be good to make this more readily available.
What trick? - Volodya

You can sort floats by casting them to integers and accounting for the flipped representation of negatives. Comparison-based sorting of floats is over an order of magnitude slower on x86 systems I've tried relative to an old Altivec. On the Altivec runtime of comparison-based sorting of floats is comparable to that for sorting integers. On Wed, Jun 3, 2009 at 9:06 AM, Vladimir Prus <vladimir@codesourcery.com>wrote:
Steven Ross wrote:
float_sort uses a reasonably well-known trick that solves a major performance problem with x86 processors, but hasn't been widely accepted. I think it would be good to make this more readily available.
What trick?
- Volodya
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Steven Ross wrote:
You can sort floats by casting them to integers and accounting for the flipped representation of negatives. Comparison-based sorting of floats is over an order of magnitude slower on x86 systems I've tried relative to an old Altivec. On the Altivec runtime of comparison-based sorting of floats is comparable to that for sorting integers.
Then, should you actually have this: BOOST_STATIC_ASSERT(sizeof(Cast_type) == sizeof(Data_type)); BOOST_STATIC_ASSERT(std::numeric_limits<Data_type>::is_iec559); BOOST_STATIC_ASSERT(std::numeric_limits<Cast_type>::is_integer); I would have expected sorting to work on whatever I throw to it -- a user might not even know what iec559 is. In other works, can you enable cast to int conditionally? And while we are at it, I am not sure that 'float_sort' is good name -- it is very generic and closes the door for different algorithm that might sort floats. - Volodya

What would your recommend? hybrid_float_sort? Also, with the O(nlog(n)) performance issue, introsort guarantees O(nlog(n)) performance. True, somebody could use something else for their std::sort implementation, but my only alternative is to put introsort into my algorithm directly, which makes the algorithm much less modular and adds additional source to maintain where the need is questionable, along with not taking advantage of possible machine-specific optimizations in the provided standard library implementation. On Wed, Jun 3, 2009 at 9:37 AM, Vladimir Prus <vladimir@codesourcery.com>wrote:
Steven Ross wrote:
You can sort floats by casting them to integers and accounting for the flipped representation of negatives. Comparison-based sorting of floats is over an order of magnitude slower on x86 systems I've tried relative to an old Altivec. On the Altivec runtime of comparison-based sorting of floats is comparable to that for sorting integers.
Then, should you actually have this:
BOOST_STATIC_ASSERT(sizeof(Cast_type) == sizeof(Data_type)); BOOST_STATIC_ASSERT(std::numeric_limits<Data_type>::is_iec559); BOOST_STATIC_ASSERT(std::numeric_limits<Cast_type>::is_integer);
I would have expected sorting to work on whatever I throw to it -- a user might not even know what iec559 is. In other works, can you enable cast to int conditionally?
And while we are at it, I am not sure that 'float_sort' is good name -- it is very generic and closes the door for different algorithm that might sort floats.
- Volodya
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Ronald Garcia skrev:
AutoBuffer ---------- :Author: Thorsten Ottosen
:Review Manager: Needed
:Download: `Here <http://www.cs.aau.dk/~nesotto/boost/auto_buffer.zip>`__
:Description: Boost.AutoBuffer provides a container for efficient dynamic, local buffers. Furthermore, the container may be used as an alternative to std::vector, offering greater flexibility and sometimes better performance.
I don't quite understand. I already found a review manager for this in Robert Stewart. -Thorsten

On Jun 2, 2009, at 6:50 AM, Thorsten Ottosen wrote:
Ronald Garcia skrev:
AutoBuffer ---------- :Author: Thorsten Ottosen :Review Manager: Needed
I don't quite understand. I already found a review manager for this in Robert Stewart.
I understand this to be a typo ;). Thanks for pointing it out. ron

Hello Ronald, Please add the Channel library i am working on as "Libraries under development". Its info are attached below. Thanks Yigong website: http://channel.sourceforge.net Download: http://www.boostpro.com/vault/index.php?PHPSESSID=0a3b81fe0bb1fefb29c9b6194dd2806f&direction=0&order=&directory=Distributed%20Computing <http://www.boostpro.com/vault/index.php?PHPSESSID=0a3b81fe0bb1fefb29c9b6194dd2806f&direction=0&order=&directory=Distributed%20Computing> Description: Channel is a C++ template library to provide name spaces for distributed message passing and event dispatching. Message senders and receivers bind to names in name space; binding and matching rules decide which senders will bind to which receivers (the binding-set); then message passing could happen among bound senders and receivers. The type of name space is a template parameter of Channel. Various name spaces (linear/hierarchical/associative) can be used for different applications. For example, integer ids can be used to send messages in linear name space, string path name ids (such as "/sports/basketball") can be used to send messages in hierarchical name space and regex patterns or Linda tuple-space style tuples can be used to send messages in associative name space. Dispatcher is another configurable template parameter of Channel; which dispatch messages/events from senders to bounded receivers. The design of dispatchers can vary in several dimensions: how msgs move: push or pull; how callbacks executed: synchronous or asynchronous. Sample dispatchers includes : synchronous broadcast dispatcher, asynchronous dispatchers with choice_arbiter and join_arbiters. Name space and dispatchers are orthogonal; they can mix and match together freely. Name spaces and name-binding create binding-sets for sender and receiver, and dispatchers are algorithms defined over the binding-set. Distributed channels can be connected to allow transparent distributed message passing. Filters and translators are used to control name space changes. On Mon, Jun 1, 2009 at 8:43 PM, Ronald Garcia <garcia@osl.iu.edu> wrote:
Libraries under development ===========================
Please let us know of any libraries you are currently developing that you intend to submit for review.

Hi Vicente, My apologies for that. I have the information and will add those to the list. Thanks, ron On Jun 2, 2009, at 11:40 AM, vicente.botet wrote:
Hi,
You forgot my Bitfields and Synchro libraries under developement.
Best, Vicente
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Hi, ----- Original Message ----- From: "Ronald Garcia" <garcia@osl.iu.edu> To: <boost@lists.boost.org>; <boost-users@lists.boost.org>; <boost-announce@lists.boost.org> Sent: Tuesday, June 02, 2009 5:43 AM Subject: [boost] Review Wizard Status Report for June 2009
============================================== Review Wizard Status Report for June 2009 ==============================================
<snip>
Older Issues ============
The Time Series Library, accepted in August 2007, has not yet been submitted to SVN. Eric Niebler and John Phillips are working on making the changes suggested during the review.
The Floating Point Utilities Library, has not yet been submitted to SVN. It is slated to be integrated with the Boost.Math library.
The Switch Library, accepted provisionally in January 2008, has not yet been submitted for mini-review and full acceptance.
The Phoenix Library, accepted provisionally in September 2008, has not yet been submitted for mini-review and full acceptance. A rewrite of Phoenix, basing it on the Proto metaprogramming library, has just begun.
what is the status of the accepted libraries * Forward (fast-track) Tobias Schwinger December 7, 2007 * Factory (fast-track) Tobias Schwinger December 21, 2007 Best, Vicente

Hi Vicente, On Jun 10, 2009, at 4:14 PM, vicente.botet wrote:
what is the status of the accepted libraries
* Forward (fast-track) Tobias Schwinger December 7, 2007 * Factory (fast-track) Tobias Schwinger December 21, 2007
According to the review schedule, they have been checked into the boost trunk. It looks like they are in boost/functional. Ron
Best, Vicente
participants (13)
-
DE
-
Edouard A.
-
Henrik Sundberg
-
Jonathan Franklin
-
Ronald Garcia
-
Scott McMurray
-
Simonson, Lucanus J
-
Steven Ross
-
Thomas Klimpel
-
Thorsten Ottosen
-
vicente.botet
-
Vladimir Prus
-
Yigong Liu