March | 2022 | datafireball

It is very important to understand how fast is your Python code so that your ml endpoint is fast and efficient. In the modern ML application deployment, endpoints are usually deployed inside a container, the speed of your “def predict(input)” function usually directly determines your throughput and cost. To put it in plain words, your inference service might cost you $100K a year at the latency of 200ms. If the performance reduced to be 400ms, expect your cloud cost will simply double. (there could be opportunities where io bounded and CPU bounded steps are decoupled into different microservices to improve the efficiency)

Here is a list of ways to profile:

print(datetime.now())
Profiler: CProfile, yappi, vmprof

First, we need to have a simple block of code to profile, in the following block of code, we created a function myfunc that will call three different implementations of summations, between each call, there will also be a short period of sleep that varies between 0 and 3. And myfunc will be called in total 5 times. The expectation is that after the script got profiled, we will discover the func_slow will take a significant amount of the execution and by how much.

After the cProfile command is executed, the output will be dumped into a separate file profile.output which can be fed to other tools to analyze or visualize.