Marriage Penalty or Blessing

The US tax code has a few options for people in relationship to file taxes, for example, they can file separately as single individuals prior to marriage, and once married, they can file jointly or even separately as a married couple. After some analysis , it turns out the tax code is really incentivizing marriage in so many ways, so to all the singles out there, go get married (just kidding). 🙂

  1. Tax Schedule

You can easily find tax scheduled for different filing status on the internet. Attached are two screenshots from the 2022 Forbes. The more you earn, the more tax you will have to pay on those excessive income, “$xxx plus a higher percentage of the excess over…”

If we look at the tax visually, if we are given a flat tax rate, you should expect a straightline shooting from the bottom left to the top right, and the fact that the lines have different segments and they are curling to the top left is an indication that the marginal tax rate is getting higher and higher.

2. Calculate Single and Married Couple Tax

Now we have a good understanding of the tax, we can wrap all the different tax logic behind a function so for any given income, we can calculate the total tax required.

If we imagine a couple and both of them have income, we can simulate how much tax they will have to pay filing separately as singles (don’t get married for tax reasons, stay single) vs get married. We can even iterate through all different scenarios for every possible combination of a couple.

In the code above, the p1 and p2 columns represent different income for each people, tax_p1 and tax_p2 are the taxes they will need to pay file as singles. tax_p1p2_single and tax_p1p2_couple are the total different taxes they owe before and after marriage. In order to understand the numerical differences, we created two new columns. marriage_tax is the additional taxes the couples will have to pay. If you got the newly wedded tax penalty, the value will be positive, if you saved money, it will be negative. Also, the percentage column will represent the proportional difference.

3. Plot and Understand the Difference

After the difference columns are populated, we can plot the heatmap using seaborn.

Each axis represents one individual’s income situation. If you two make the same amount of money, you should be on the diagonal line from top left to bottom right. If there is one big earner and the other makes significantly less (like a stay at home dad or mom), you will be pushed away where the top and left most edge are extreme scenarios where there is only one earner in the relationship.

In the colorpicker below, blue means the percentage is really low, which indicates the marriage saves a lot of money tax wise. I created a mask to hide all the areas where there is no tax difference which is th empty area mostly on the diagnal and bottom right.

Some take aways:

  • a couple can save up to 30% of tax getting married instead of staying single. This happens when there is a big income gap between the two.
  • if there are two earners, and they are earning roughly the same amount, marriage doesn’t make a big difference
  • very likely, it doesn’t “hurt” to get married (except the small red triangle when a couple both earns around 300K)

Disclaimer, this article is absolutely NOT meant to provide any official tax, financial or relationship guidance.

Needle in a haystack

Cambridge dictionary defines the phrase “find a needle in the haystack” as “something that is impossible or extremely difficult to find, especially because the area that you have to search is too large”. If we are being really serious, how difficult it is to find a needle in a haystack, be specific.

Needle

There are many needles for sale on Amazon, here is one of the most popular ones, there are in total 23 needles in the pack with 0.317oz of weight. If we first convert weight to metrics, 0.317ounce = 8.98grams. That means each needle weight on average 0.4g/piece. Depending on how much the package itself weigh, the real weight should even be less than 0.4g.

When it comes to needle size, just like different clothes manufacturers have different size chart, different needle manufacturers have their own manual explaining their size system. For example, John and James is a popular UK brand and they have a size chart here. The smallest needle that I can find is their JJ195 for tapestry and cross stitch with size28 of length 32.5mm and diameter of 0.53mm. The volume for a cylinder of that shape is pi * (d/2)^2 * l = 7.17mm^3.

Plain steel’s density about 7.85 g/cm3 (reference). The inferred weight is about 0.06g. This is a significantly smaller number compared to the 0.4g previously guessed. However, if we take another look, the previously estimated 0.4g was based on relatively medium size needles. Even in JJ’s manual, you can find many needles with a larger diameter size, ranging from 0.53 to 2.34, if we moderately increase the needle size by only a factor of 2, that will increase the volume by a factor of 8. 0.06g * 8 ~ 0.48g which doesn’t sound that crazy at all.

To summarize, a small needle can weight 0.06g with a size of 7mm^3.

Haystack

For lack of better term, haystack is merely a stack of grass. They exist in many forms. On modern farms, hays are usually handled by industrial machines and packed into square or round shape of various sizes.

John Deere is one of the larges manufacturers for this kind of industrial machines. The type of machine that pack loose hays into compact hay bales is called baler. For example, 450M round baler.

Interestingly, you can find the baler specifications in the user manual. The largest bale is probably packed by 560M with the weight of 1089kg. And volume of 4.13 m^3.

Methods

By Weight

needle ~ 0.06g, haystack ~ 1089kg.

haystack / needle = 1089kg/0.06g=18,150,000.0 = 18M

By Volume

needle ~ 7.17mm^3, haystack ~ 4.13m^3

haystack / needle = (4.13*1000)^3/7.17 = 9,824,964,714 = 10B

Conclusion

Fun fact, there are about 7.8B people in the world, and finding a needle in a haystack is about the same odds of finding another person on earth. Long story short, it is hard to find a needle in a haystack. Hold on to your needle and don’t drop it.

OBS live streaming

In this post, we will set up a live stream to broadcast a live camera, a chrome browser window playing youtube, and a video game arranged in one view. The stream will be created by OBS while we will use Nginx RTMP server. We will “watch our live stream” via VLC on another machine within the same private network.

A screenshot of desktop station with a lot of applications running.

OBS studio running with the three windows created.

Properly configure stream to RTMP server first.

Start the docker container running Nginx with RTMP plugin.

Point the VLC stream to open RTMP stream

Side by side of the streamer’s view and watcher’s view.

Enjoy!

Resources:

Python Code Profiling

It is very important to understand how fast is your Python code so that your ml endpoint is fast and efficient. In the modern ML application deployment, endpoints are usually deployed inside a container, the speed of your “def predict(input)” function usually directly determines your throughput and cost. To put it in plain words, your inference service might cost you $100K a year at the latency of 200ms. If the performance reduced to be 400ms, expect your cloud cost will simply double. (there could be opportunities where io bounded and CPU bounded steps are decoupled into different microservices to improve the efficiency)

Here is a list of ways to profile:

  1. print(datetime.now())
  2. Profiler: CProfile, yappi, vmprof

First, we need to have a simple block of code to profile, in the following block of code, we created a function myfunc that will call three different implementations of summations, between each call, there will also be a short period of sleep that varies between 0 and 3. And myfunc will be called in total 5 times. The expectation is that after the script got profiled, we will discover the func_slow will take a significant amount of the execution and by how much.

After the cProfile command is executed, the output will be dumped into a separate file profile.output which can be fed to other tools to analyze or visualize.

Sometimes the output is a bit hard to interpret and we will like to visualize. There is a lightweight tool called snakeviz that we can visualize.

If you are using an IDE like Pycharm, they have the functionality integrated.

Youtube 360 Video Download

Above is a Youtube video uploaded by J Utah captured using Insta360 Pro back in 2018. If you have never seen a 360 video before, it is basically an immersive experience where all angles are captured. Since traditional display (monitor, cellphone) has limited real-estate, they only display the portion that fit into your display, users can pan/zoom/rotate to look in all directions. If you have a VR device (like Oculus, Google cardboard), these videos can create the viewing best experience.

Due to my internet speed, I downloaded this video locally using a site called Y2mate. After it is downloaded, it is just a typical mp4 file that looks … a bit weird.

It is easy to notice this video got divided into two halves, the top half includes left, front and right from the point of view of a driver. The lower half is like someone sitting on top of the car facing backwards that incorporated the up, back and down.

It may be hard to picture how all angles stitch together the 360 view if this is the first time you see a layout like that, a quick way to help understand is to take a piece of paper and cut out two strips (size of 1X3) each representing the upper and lower half. Then it is very easy to understand how all those 6 views fit together into a cube.

The cube mapping is a very popular environment mapping method to represent the 360 space, after rearranging the tiles, you get something like this. I marked C90 to represent clockwise 90-degree rotation and CC90 for counter clockwise 90 degree. (Note: the bottom view is basically the car hard top which some reflection of the top view, my rotation may be wrong as it is hard to read, all buildings look similar)

Now we know the views are all captured and arranged in an organized way. Can we process the video and tinker with the output format so it can be recognized as a 360 video that you can view in VLC?

The mapping and math behind it will be discussed in the next post.

KITTI dataset

KITTI is one of the most popular public datasets and industry benchmark when it comes to autonomous driving research. This article will include data explorations for the published dataset so readers will have a more intuitive understanding of how the it is captured and how it should be used.

Let’s work backwards by diving straight into the dataset. There is a section of raw data where you can download. In the bottom of the page, you can find different scenaries and linkage to the downloads.

For example, here is one.

After data is downloaded, we can see there synced data is of size 458MB which is likely the 0.4GB that appears in the title. The extract file is ~50% bigger and probably contains all the raw data points before any processing. The calibration and tracklet are small files.

Calibration

Calibration is the process where you reset your sensors so it is benchmarked or calibrated to a known measurements for accuracy purpose. Just like you have a scale, the calibration is to make sure your scale says 0 without any weight. It is the same for the different sensors in an AV.

There are 3 small plain text files.

Tracklets

Sync

Extract

Images/Camera

Extract folder contains 6 folders where 4 contains the all the images captured by the cameras, one from lidar and one from the motion sensor.

By putting the first image of all the cameras together right next to each other, it is pretty easy to tell the first two are greyscales and next two are color images. Then we can tell the left cameras from the right cameras by comparing its relative distance to some benchmarks like the light rail or road lanes.

Inside each of the camera folder, there also exists a timestamp text file that stores all the timestamp of when the images were captured. The frequency for each camera were about 10Hz and all cameras seem started capturing “at the same time”, the earliest being 445ms and the latest being 454ms, a nominal difference of 9ms, however, we are not sure if the 4 cameras all share the same clock, and if not, if the 4 clocks are perfectly synced.

IMU (Inertial measurement unit)

Interestingly, we can see the car is driving north at negative -6.94 and east at negative -11.36, so basically it is driving towards southwest more to the west.

It also matches the forward velocity of 13.32184 captured by the sensor itself, which is about 29 mph or 46 kmph.

Then it showed various types of accelerations.

LIDAR / Cloud points

The lidar was also capturing at a 10Hz frequency.

Does DIVIDE&Conquer Help?

Today I was chatting with my coworker what happens when you over saturate a server with containers, each with a small number of CPU quotas. In k8s, since it is container-based, you can literally assign CPU quotas of less than 1 CPU, like 0.5CPU 500milicore or 0.001 CPU 1 millicore. That opened up a whole lot of possibilities.

One interesting scenario that we covered is below.
Assuming we have a CPU intensive request that a single core will take 1 second to finish. Assuming that we have 4 requests arrive at the same time, one can decide to allocate two requests to the two cores, one each, and assign the next two once finished. Clearly, the first two requests will cost 1 second each and because of queuing up, the second two requests will take 2 seconds (1 sec waiting) to respond. And the average is 1.5 second. In scenario2, assuming that we can slice and dice the tasks into smaller pieces, and the computing is fairly assigned to all the tasks. Surprisingly, it will all four requests close to 2 seconds to be finished, with the average being (2+2+2+2)/4 = 2.

Another interesting scenario is we still need to split tasks into smaller ones, but this time, we will focus all the horse power on one requests at a time. This time our average will be (0.5 + 1 + 1.5 + 2) / 4 = 1.25.

Just some food for thought. In the next chapter, it will be interesting to implement a scheduler and reproduce this in a containerized environment.

Python socketserver ForkingMixIn and ThreadingMixIn

In the previous post, we showed that flask.run can do more than one thing at a time using threaded and process arguments. In this article, let’s dig a little deeper into how Flask did it.

That magic must be attributed to two interesting implementations ForkingMixIn and ThreadingMixIn under the socketserver module. socketserver is a Python built-in library for networking and web server. If you have never used the library yourself, it is because most people use higher-level frameworks that actually use socketserver behind the scene. For example, Flask is using Werkzeug, Werkzeug is using http.server for serving, http.server uses socketserver. By looking into the source code of socketserver, it only uses very low level libraries like socket, os, threading, hence, having a knowledge how socketserver works definitely will help with the understanding how Python web serving works.

ThreadingMixIn

The implementation of ThreadingMixIn is very straightforward. Every time when `process_request` is called, a new Python thread is created and starts the thread.

ForkingMixIn

I was not familiar with the concept of forking when I first read it. I thought it must be calling library like multiprocessing just as it did in ThreadingMixIn by calling threading library. After some research, it turned out forking is such an important concept in operating system that it is probably the most key concept behind process management. Even multiprocessing itself is probably using os.fork behind the scene.

If you have never used fork before the code of a if statement based on pid might be confusing. os.fork is function that the return value will be the process id. The magical part is after os.fork got called, a branded new process will be forked / cloned. Previously you had one, and now you have two cloned processes. In order to distinguish the old one from the new one, pid will be assigned to 0 in the new process so you know it is newly created. In the old process, instead of 0, the value of pid will be a process id of the newly created process.

Knowing that, we know to put the instructions for the parent process in the if statement, and put the instructions for the newly created child process into the else statement. That is exactly how the code was written, the parent process only keep track of child process id. The child process will finish the request and exit it with the proper status.

os.fork

To make sure I fully understand how os.fork works. I wrote the following script. In fact, I ask a process to calculate factorial for some big number making sure it completely tie a CPU for a good period of time. If it is just multithreading sharing the same CPU, nothing will be accelerated.

I ran four factorial calculations in total, with a os.fork operation between each.

At first glance, the logs might be a bit hard to read so formatted them a bit into this spreadsheet. The 81(85) is the main process. And it took 9.6 seconds to calculate the factorial of 200K. Then once the fork is called, you can see the 81(86) got spawned and 85 immediately start calculating, right next line indicated the new spawn process recognized itself and also started calculating just < a us later (569 us = 671651-671082). Now we are two processes running in parallel, 85 and 86.

In the next round (separated by the red line), 85 finished first, spawned 88, and both of them got to work also within a ms (513us = 671850 – 671337). Then right after it, 86 finished, spawned 89 and both of got to work within a ms (571us = 685073-684502).

The same story repeat itself in the last round with 8 processes existing at the same time busy calculating factorial. One interesting observation is that all the first 3 rounds finish in similar time with about 9.6 seconds, then 19.68 > 19.2=2*9.6, and 29.24 > 28.8=3*9.6. The biggest difference is at the last round with (13.97=43.206446 – 29.239630), that is 45.5% (13.97/9.6) performance deterioration, why?

That is because CPUs were busy, I have a 6core i7 on my MacBook. If all 6 cores were busy doing work, that will finish 6 factorial calculation, if the rest 2 got assigned to two cores, it will take another 9.6 seconds so the total should be 19.2 seconds, which it did not. If everything is perfectly optimized, and all 6 cores all contribute to the calculation, it should take 8 * 9.6 / 6 = 12.8 seconds. So the theoretical minimum is 12.8 and maximum is 19.2 and the reality is 13.97. I have to say that is the proof that there is some cross CPU collaboration with some overhead.

Well, that is enough fun for today. In conclusion, we had a good view of the two implementations for threading based and process based mixin. We looked into the os.fork in more detail and demonstrated how forking got used to distribute computation heavy jobs.