Filesystem: Choose random file from directory
data:image/s3,"s3://crabby-images/eed88/eed88a34e5909f73035d67c004674b574dc3f458" alt=""
Hi, I have a directory with about 500.000 thousand small files in it. I need to randomly select one of those file. What would be the *most efficient* way to achieve this? Thank you, Andrej
data:image/s3,"s3://crabby-images/4cdcd/4cdcd17a691cba4a52a825a7044fad92fd130fec" alt=""
Can you rename your files as numbers? That way you just generate a number that you use to build the file name to open? Otherwise I don't know if iterating through files with boost::filesystem is "fast". Joel Lamotte.
data:image/s3,"s3://crabby-images/03aaa/03aaa64fefdb21d289b582a02c64e4b05ac9f6ee" alt=""
Generate a random number 'n' between 1 and 500K and then take the nth file from the directory. Do you need truly random (in a math or crypto sense) or just "unpredictable and varied"? Charles -----Original Message----- From: boost-users-bounces@lists.boost.org [mailto:boost-users-bounces@lists.boost.org] On Behalf Of Andrej van der Zee Sent: Friday, June 22, 2012 6:28 PM To: boost-users@lists.boost.org Subject: [Boost-users] Filesystem: Choose random file from directory Hi, I have a directory with about 500.000 thousand small files in it. I need to randomly select one of those file. What would be the *most efficient* way to achieve this? Thank you, Andrej _______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users
data:image/s3,"s3://crabby-images/5bef1/5bef166f92826327022dfc2a2aa1bb6149bdbf2f" alt=""
On Sat, Jun 23, 2012 at 12:27:39PM -0700, Charles Mills wrote:
Generate a random number 'n' between 1 and 500K and then take the nth file from the directory.
I would say that the core question is probably, how do I get a cheap indexed lookup in a directory. You've got to consider that directories, much like SQL tables, have no real innate sort order in most filesystems. All you have is the order given by the OS when doing an opendir/FindFirstFile, unless you use some ordered access method. You can't really ask through some interface that you want the Nth file, just that you want to iterate through the contents to get an implicit sequence of files. You could do a linear lookup to accumulate the paths into a container and select from that, that would be the cheap solution, but the time used may be rather high, particularly for such a degenerate case with uncommonly many files. -- Lars Viklund | zao@acc.umu.se
data:image/s3,"s3://crabby-images/03aaa/03aaa64fefdb21d289b582a02c64e4b05ac9f6ee" alt=""
How often do you have to do this? Does it truly have to be random? Does the list of files change? Could you build a list once and index into it? Can you build a list of partial names, index into that randomly, retrieve all of the names that match it, and then pick a random member of that set? Could you generate a random three-character mask, use that as a wild card, and pick one of the matching files at random, repeating if none match? Are they all in one folder? Charles -----Original Message----- From: boost-users-bounces@lists.boost.org [mailto:boost-users-bounces@lists.boost.org] On Behalf Of Lars Viklund Sent: Saturday, June 23, 2012 2:35 PM To: boost-users@lists.boost.org Subject: Re: [Boost-users] Filesystem: Choose random file from directory On Sat, Jun 23, 2012 at 12:27:39PM -0700, Charles Mills wrote:
Generate a random number 'n' between 1 and 500K and then take the nth file from the directory.
I would say that the core question is probably, how do I get a cheap indexed lookup in a directory. You've got to consider that directories, much like SQL tables, have no real innate sort order in most filesystems. All you have is the order given by the OS when doing an opendir/FindFirstFile, unless you use some ordered access method. You can't really ask through some interface that you want the Nth file, just that you want to iterate through the contents to get an implicit sequence of files. You could do a linear lookup to accumulate the paths into a container and select from that, that would be the cheap solution, but the time used may be rather high, particularly for such a degenerate case with uncommonly many files. -- Lars Viklund | zao@acc.umu.se _______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users
participants (4)
-
Andrej van der Zee
-
Charles Mills
-
Klaim - Joël Lamotte
-
Lars Viklund