Today I was chatting with my coworker what happens when you over saturate a server with containers, each with a small number of CPU quotas. In k8s, since it is container-based, you can literally assign CPU quotas of less than 1 CPU, like 0.5CPU 500milicore or 0.001 CPU 1 millicore. That opened up a whole lot of possibilities.
One interesting scenario that we covered is below.
Assuming we have a CPU intensive request that a single core will take 1 second to finish. Assuming that we have 4 requests arrive at the same time, one can decide to allocate two requests to the two cores, one each, and assign the next two once finished. Clearly, the first two requests will cost 1 second each and because of queuing up, the second two requests will take 2 seconds (1 sec waiting) to respond. And the average is 1.5 second. In scenario2, assuming that we can slice and dice the tasks into smaller pieces, and the computing is fairly assigned to all the tasks. Surprisingly, it will all four requests close to 2 seconds to be finished, with the average being (2+2+2+2)/4 = 2.

Another interesting scenario is we still need to split tasks into smaller ones, but this time, we will focus all the horse power on one requests at a time. This time our average will be (0.5 + 1 + 1.5 + 2) / 4 = 1.25.
Just some food for thought. In the next chapter, it will be interesting to implement a scheduler and reproduce this in a containerized environment.