There are plenty of images data out there that are of the format RGBA (Red Green Blue and Alpha), in which Alpha represents the transparency. By leaving RGB all empty and Alpha to store the information of the shape, the image itself is like a layer that you can add on top of any other photos that sort of “float around”, just like lots of icons out there. However, this type of photos aren’t necessarily friendly or ready to be directly fed into many of the machine learning frameworks which usually work with RGB or greyscale directly. This is a post to document what I did to convert some of this kind of alpha only images into RGB image.
First, there are plenty of libraries out there process images. I am merely sharing some of the work that I did without necessarily comparing the performance of different approaches. The libraries that I will cover in this post is imageio and Pillow. I have read htat imageio is supposed to be your first choice as it is well maintained while Pillow isn’t anymore. However, I found that imageio is very easy to use but its functionality is limited and not as diverse as Pillow. Imageio, as the name indicates, deals with read and write of images, while Pillow has more functionalities of dealing with images itself like channels, data manipulation, etc.
My very first try was to convert RGBA into a matrix that has (200, 200, 4) which my image already has the square size of 200×200. And then drop the last column which is alpha and then populate the RGB channel using the value of alpha channel. In that case, our RGB will be equally populated and our photo should look black and white.
The code is simple, m is PIL.Image.imread returned object. I first resize it to be 256*256, which is more ML friendly. And then convert it into an array, data which has the shape of 40000, 4. After assigning RGB and dropping A, we call the reshape to turn it back into 256*256*4 and reconstruct the image “fromarray”. Here astype(np.unit8) caught me offguard a bit and there maybe one of the reasons that Pillow is not perfect due to the lack of maintenance. And the invert in the end will invert the color (black to white and white to black) so it has the background that I like.
The code is easy to understand but when I execute it, it was slow, I did not run any benchmark but it was noticeable slow comparing with some other preprocessing that I ran before.
In the end, I realize that there is already a built in function called getchannel so you can get a specific channel without dealing with arrays directly.
convert_rgb_fast does the same thing as the function above but execute much faster, likely because I was doing lots of matrix assignment and index slicing which is not very efficient I guess.
Also, by using getchannel, you can easily convert it to have only one channel that is basically greyscale. All the channels don’t have name and RGBA is just convention, if you only have one channel, all the tools out there will assume it is greyscale.
Image preprocessing is likely as important as training the model itself, the easy part of image processing is that it can be easily distributed using frameworks like mapreduce, spark and others. Using Python is probably the easiest way for model data professionals and we will find an opportunity in the future to demonstrate how to speedup the data cleaning.