
Frank Mori Hess: ...
for each in deps_: if ready() erase;
for each pending task: if all dependencies are ready() execute task ...
Okay, lets try ignoring the wait_for_any call. Say you have on average N method requests just sitting in your scheduler that are not ready, and have on average M method requests that are ready to execute, and each method request has on average P dependencies (I'd imagine P would probably be small). That means each pass through your pseudocode will take O(N*P) to iterate through all the unready dependencies of the unready method requests.
The number of unready dependencies can be lower than N*P, but let's ignore that for now.
Then it will take O(M+N) to go through all the method requests and execute the M ready tasks. That results in O(N*P/M+1) per method request executed.
I think that going through the method requests takes (N+M)*P, because of the "all dependencies are ready" part. (I think that this is also true for your scheduler.) So we have (2*N+M)*P total, or (2*N/M+1)*P per executed request.
Having M in the denominator is nice, as you would expect it to scale with N, and it means the scheduler probably won't break down under heavy load.
On the downside, in a common case I'd expect M to be small (between 0 and 1), with performance concerns about the scheduler starting to come into play as M approaches 1.
If the scheduler performance is insufficient, it will not be able to maintain a low M. If it does manage to maintain a low M, then its performance is sufficient. Stated differently, in the additional N*P time taken by the first loop, some of the futures will become ready that wouldn't otherwise, and the second loop will execute more tasks, automatically diluting the overhead.