On Tue, May 5, 2020 at 9:59 AM Niall Douglas via Boost < boost@lists.boost.org> wrote:
I cannot say for sure, but it was abandoned at around the same time as LLFIO demonstrated to Beman a way of enumerating directory contents, with complete stat_t per entry, @ > 4 million entries/sec/core on all the major platforms. That makes any notion of caching pointless, just enumerate the entire directory, always.
I've also been arguing strenously before WG21 to deprecate directory_iterator as fundamentally incorrect ASAP, and I don't think I've been unsuccessful. Recent papers to reach WG21 proposing sorely needed improvements to directory_iterator have all been shot down. The feeling I got in the room was the whole thing needs replacing. My current hope for proposing std::directory_handle for standardisation is early 2021.
Niall
Interesting opinion. Usually these sorts of things are a series of trade offs; memory vs time, latency vs throughput; convenience vs pick-your-favourite-metric, so saying once size would fit all is a bit dubious. Nonetheless, I looked at your library and thought it might give me exactly what I want because the API allows me to spend memory to save time. But it turned out to be really slow when recursing. This makes sense because it's generating many small queries which, because it's calling a low level API, the OS is unable to help with. Here's a callstack from WPA trace with the hot path. Microsoft Windows Profiler Line # Process Thread ID Stack Count Weight (in view) (ms) 12 | |- test.exe!llfio_v2_62985a1f::directory_handle::read 163634 19,004.423700 13 | | |- ntdll.dll!ZwQueryDirectoryFile 161368 18,737.389200 14 | | | |- ntoskrnl.exe!KiSystemServiceCopyEnd 161349 18,735.136900 15 | | | | |- ntoskrnl.exe!NtQueryDirectoryFile 161346 18,734.804000 16 | | | | | ntoskrnl.exe!NtQueryDirectoryFileEx 161346 18,734.804000 17 | | | | | |- ntoskrnl.exe!BuildQueryDirectoryIrp 160107 18,590.971100 18 | | | | | | |- ntoskrnl.exe!ProbeForWrite 160073 18,587.073100 19 | | | | | | | |- ntoskrnl.exe!KiPageFault 122698 14,246.119500 20 | | | | | | | | |- ntoskrnl.exe!MmAccessFault 79304 9,208.701200 21 | | | | | | | | | |- ntoskrnl.exe!MiDispatchFault 55441 6,439.250700 22 | | | | | | | | | | |- ntoskrnl.exe!MiResolveDemandZeroFault 51240 5,950.194000 23 | | | | | | | | | | | |- ntoskrnl.exe!MiResolvePrivateZeroFault 48839 5,670.213100 24 | | | | | | | | | | | | |- ntoskrnl.exe!MiCompletePrivateZeroFault 25331 2,938.969600 25 | | | | | | | | | | | | | |- ntoskrnl.exe!MiCompletePrivateZeroFault<itself> 13842 1,606.017700 I presume I am using the API correctly, but if not I'm happy to try something else. For reference, here are some rough timings from my test: boost::recursive_directory_iterator: ~30seconds. FindNextFile: ~13seconds llfio: ~980 seconds This was reading file size and modified date during iteration, which if they had been cached in recursive_directory_iterator, probably would have made it close in time to FindNextFile, which would be ideal for me. -- chris