Igor --
Thanks again for the reply.
Igor R
Can't you just "round-robin" them?
I think that's a problem if some jobs are slower than others. As an example (with the time axis going down the list): Request Thread 1 Thread 2 Thread 3 (Type) Queue / Work Queue / Work Queue / Work -------- ------------ ------------ ------------ A (Slow) A B (Fast) A B C (Fast) A B C D (Fast) D / A C E (Fast) D / A E F (Fast) D / A E F G (Fast) D,G / A F At this point, I have two idle thread/services, and one working thread/service with two more tasks queued up. So if I just round-robin across all existing threads/services, I can get fast jobs piled up "behind" slow jobs. By comparison, if I have the infrastructure to only assign to idle threads/services, it looks like this: Request Thread 1 Thread 2 Thread 3 (Type) Queue / Work Queue / Work Queue / Work -------- ------------ ------------ ------------ A (Slow) A B (Fast) A B C (Fast) A B C D (Fast) A D C E (Fast) A D E F (Fast) A F E G (Fast) A F G Only tasks A and G are still being run, and they are in parallel; the other fast tasks are completed.
Why io_service-per-core would give worse scalability than thread-per-core?
It's not about io_service-per-core vs. thread-per-core; it's about whether it's io_service-per-thread or io_service-across-multiple- threads. One io_service with multiple threads allows idle threads to obtain work as soon as it becomes available; having an io_service with only one thread of execution doesn't seem to offer that. Unless I'm missing the point, which is (as always) very possible! :) Anyway, thanks again, and please don't hesitate to hit me with the clue-by-four if I'm being dense. Best regards, Anthony Foiani