Sam Darwin
The CI jobs we're hosting burst up to 500 simultaneous jobs. Is it enough? The solution is autoscaling (on AWS).
In my experience, the cost (in engineering complexity and/or in money paid to AWS) is not really worth it for CI. In build2 we have a fixed set of dedicated CI machines (hosted in our office, which makes it even cheaper than Hetzner, long-term). Their background job is to continuously rebuild packages in our repository (in particular, this detects regressions due to new versions of dependencies). But if there is a CI job, then it takes priority over the background rebuilds. Adding a new CI machine is basically a matter of procuring the hardware since the host OS image boots over the network and requires minimal per-machine setup. Another thing to keep in mind is that for certain targets, most notably Mac OS, you are restricted to specific hardware you can run them on (legally, in case of Mac OS). For example, while AWS provides auto-scaling for Apple hardware, it's on the 24-hour minimum allocation period (to comply with the Apple license). Being able to "drop down" all the way to running own hardware on premises is the only guaranteed way of handling such cases.