Youtube 360 Video Download

Above is a Youtube video uploaded by J Utah captured using Insta360 Pro back in 2018. If you have never seen a 360 video before, it is basically an immersive experience where all angles are captured. Since traditional display (monitor, cellphone) has limited real-estate, they only display the portion that fit into your display, users can pan/zoom/rotate to look in all directions. If you have a VR device (like Oculus, Google cardboard), these videos can create the viewing best experience.

Due to my internet speed, I downloaded this video locally using a site called Y2mate. After it is downloaded, it is just a typical mp4 file that looks … a bit weird.

It is easy to notice this video got divided into two halves, the top half includes left, front and right from the point of view of a driver. The lower half is like someone sitting on top of the car facing backwards that incorporated the up, back and down.

It may be hard to picture how all angles stitch together the 360 view if this is the first time you see a layout like that, a quick way to help understand is to take a piece of paper and cut out two strips (size of 1X3) each representing the upper and lower half. Then it is very easy to understand how all those 6 views fit together into a cube.

The cube mapping is a very popular environment mapping method to represent the 360 space, after rearranging the tiles, you get something like this. I marked C90 to represent clockwise 90-degree rotation and CC90 for counter clockwise 90 degree. (Note: the bottom view is basically the car hard top which some reflection of the top view, my rotation may be wrong as it is hard to read, all buildings look similar)

Now we know the views are all captured and arranged in an organized way. Can we process the video and tinker with the output format so it can be recognized as a 360 video that you can view in VLC?

The mapping and math behind it will be discussed in the next post.

KITTI dataset

KITTI is one of the most popular public datasets and industry benchmark when it comes to autonomous driving research. This article will include data explorations for the published dataset so readers will have a more intuitive understanding of how the it is captured and how it should be used.

Let’s work backwards by diving straight into the dataset. There is a section of raw data where you can download. In the bottom of the page, you can find different scenaries and linkage to the downloads.

For example, here is one.

After data is downloaded, we can see there synced data is of size 458MB which is likely the 0.4GB that appears in the title. The extract file is ~50% bigger and probably contains all the raw data points before any processing. The calibration and tracklet are small files.


Calibration is the process where you reset your sensors so it is benchmarked or calibrated to a known measurements for accuracy purpose. Just like you have a scale, the calibration is to make sure your scale says 0 without any weight. It is the same for the different sensors in an AV.

There are 3 small plain text files.





Extract folder contains 6 folders where 4 contains the all the images captured by the cameras, one from lidar and one from the motion sensor.

By putting the first image of all the cameras together right next to each other, it is pretty easy to tell the first two are greyscales and next two are color images. Then we can tell the left cameras from the right cameras by comparing its relative distance to some benchmarks like the light rail or road lanes.

Inside each of the camera folder, there also exists a timestamp text file that stores all the timestamp of when the images were captured. The frequency for each camera were about 10Hz and all cameras seem started capturing “at the same time”, the earliest being 445ms and the latest being 454ms, a nominal difference of 9ms, however, we are not sure if the 4 cameras all share the same clock, and if not, if the 4 clocks are perfectly synced.

IMU (Inertial measurement unit)

Interestingly, we can see the car is driving north at negative -6.94 and east at negative -11.36, so basically it is driving towards southwest more to the west.

It also matches the forward velocity of 13.32184 captured by the sensor itself, which is about 29 mph or 46 kmph.

Then it showed various types of accelerations.

LIDAR / Cloud points

The lidar was also capturing at a 10Hz frequency.