
I have uploaded the preliminary code for external sorting onto vault. This code has not been optimized in any way, and currently supports radix_sort only.

On 3/31/07, Sam Schetterer <samthecppman@gmail.com> wrote:
I have uploaded the preliminary code for external sorting onto vault. This code has not been optimized in any way, and currently supports radix_sort only.
Is radix sort a good choice for this kind of sorting? I would have thought that a merge sort would work more nicely. ~ Scott McMurray

On 3/31/07, me22 <me22.ca@gmail.com> wrote:
On 3/31/07, Sam Schetterer <samthecppman@gmail.com> wrote:
I have uploaded the preliminary code for external sorting onto vault. This code has not been optimized in any way, and currently supports radix_sort only.
Is radix sort a good choice for this kind of sorting? I would have thought that a merge sort would work more nicely.
~ Scott McMurray _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
I do use merge-sort. I also use radix_soprt. What I do is I first load 1/16 of the data into memory, sort it using radix_sort (I will implement other sorts for this), and write it to disk. I also record the location in the file. Then, I merge all of the sorted data on the disk into one large sorted file.

Basically I would gather from this that a 10gb file for example would take 30gb of disk space to sort it ? That would appear to be what your suggesting here. And why 1/16 specifically ? Wouldnt it be better to use available memory (Free real not paged) if possible instead of specifically doing 1/16'th ? 10gb is more then most(Non-Server) machines have now that is true but 2-4gb is becoming more and more common now and of course servers may have way more then 10gb. Seems to me that if your going to do this much writing(2 X Original size) wouldnt it be simpler to just use a memory mapped file. Sam Schetterer wrote:
On 3/31/07, me22 <me22.ca@gmail.com> wrote:
On 3/31/07, Sam Schetterer <samthecppman@gmail.com> wrote:
I have uploaded the preliminary code for external sorting onto vault. This code has not been optimized in any way, and currently supports radix_sort only.
Is radix sort a good choice for this kind of sorting? I would have thought that a merge sort would work more nicely.
~ Scott McMurray _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
I do use merge-sort. I also use radix_soprt. What I do is I first load 1/16 of the data into memory, sort it using radix_sort (I will implement other sorts for this), and write it to disk. I also record the location in the file. Then, I merge all of the sorted data on the disk into one large sorted file. _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

On 3/31/07, Richard Day <richardvday@verizon.net> wrote:
Basically I would gather from this that a 10gb file for example would take 30gb of disk space to sort it ? That would appear to be what your suggesting here. And why 1/16 specifically ? Wouldnt it be better to use available memory (Free real not paged) if possible instead of specifically doing 1/16'th ? 10gb is more then most(Non-Server) machines have now that is true but 2-4gb is becoming more and more common now and of course servers may have way more then 10gb.
Seems to me that if your going to do this much writing(2 X Original size) wouldnt it be simpler to just use a memory mapped file. No, it is not 2x prignal size on disk. This illustration should help o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o beomes (I leave out much data for simplicity) o o o o o o o o o o o o o o o o o o o o o o o o o which becomes ooooo ooooo ooooo ooooo ooooo all on one big file in memory. That is what I am trying to say.
participants (3)
-
me22
-
Richard Day
-
Sam Schetterer