In the previous post, we showed that flask.run can do more than one thing at a time using threaded and process arguments. In this article, let’s dig a little deeper into how Flask did it.
That magic must be attributed to two interesting implementations ForkingMixIn and ThreadingMixIn under the socketserver module. socketserver is a Python built-in library for networking and web server. If you have never used the library yourself, it is because most people use higher-level frameworks that actually use socketserver behind the scene. For example, Flask is using Werkzeug, Werkzeug is using http.server for serving, http.server uses socketserver. By looking into the source code of socketserver, it only uses very low level libraries like socket, os, threading, hence, having a knowledge how socketserver works definitely will help with the understanding how Python web serving works.
The implementation of ThreadingMixIn is very straightforward. Every time when `process_request` is called, a new Python thread is created and starts the thread.
I was not familiar with the concept of forking when I first read it. I thought it must be calling library like multiprocessing just as it did in ThreadingMixIn by calling threading library. After some research, it turned out forking is such an important concept in operating system that it is probably the most key concept behind process management. Even multiprocessing itself is probably using os.fork behind the scene.
If you have never used fork before the code of a if statement based on pid might be confusing. os.fork is function that the return value will be the process id. The magical part is after os.fork got called, a branded new process will be forked / cloned. Previously you had one, and now you have two cloned processes. In order to distinguish the old one from the new one, pid will be assigned to 0 in the new process so you know it is newly created. In the old process, instead of 0, the value of pid will be a process id of the newly created process.
Knowing that, we know to put the instructions for the parent process in the if statement, and put the instructions for the newly created child process into the else statement. That is exactly how the code was written, the parent process only keep track of child process id. The child process will finish the request and exit it with the proper status.
To make sure I fully understand how os.fork works. I wrote the following script. In fact, I ask a process to calculate factorial for some big number making sure it completely tie a CPU for a good period of time. If it is just multithreading sharing the same CPU, nothing will be accelerated.
I ran four factorial calculations in total, with a os.fork operation between each.
At first glance, the logs might be a bit hard to read so formatted them a bit into this spreadsheet. The 81(85) is the main process. And it took 9.6 seconds to calculate the factorial of 200K. Then once the fork is called, you can see the 81(86) got spawned and 85 immediately start calculating, right next line indicated the new spawn process recognized itself and also started calculating just < a us later (569 us = 671651-671082). Now we are two processes running in parallel, 85 and 86.
In the next round (separated by the red line), 85 finished first, spawned 88, and both of them got to work also within a ms (513us = 671850 – 671337). Then right after it, 86 finished, spawned 89 and both of got to work within a ms (571us = 685073-684502).
The same story repeat itself in the last round with 8 processes existing at the same time busy calculating factorial. One interesting observation is that all the first 3 rounds finish in similar time with about 9.6 seconds, then 19.68 > 19.2=2*9.6, and 29.24 > 28.8=3*9.6. The biggest difference is at the last round with (13.97=43.206446 – 29.239630), that is 45.5% (13.97/9.6) performance deterioration, why?
That is because CPUs were busy, I have a 6core i7 on my MacBook. If all 6 cores were busy doing work, that will finish 6 factorial calculation, if the rest 2 got assigned to two cores, it will take another 9.6 seconds so the total should be 19.2 seconds, which it did not. If everything is perfectly optimized, and all 6 cores all contribute to the calculation, it should take 8 * 9.6 / 6 = 12.8 seconds. So the theoretical minimum is 12.8 and maximum is 19.2 and the reality is 13.97. I have to say that is the proof that there is some cross CPU collaboration with some overhead.
Well, that is enough fun for today. In conclusion, we had a good view of the two implementations for threading based and process based mixin. We looked into the os.fork in more detail and demonstrated how forking got used to distribute computation heavy jobs.