SQLite – Architecture

I am planning to spend sometime learning the internals about a open source database, and SQLite has been recommended to me as the “light” and well written tool to get started. Thanks for the community of SQLite, there are several good tutorials, articles and thorough technical documentations available on SQLite’s documentation page.

I started with the “Architecture of SQLite” and it provided a very good high level overview of how the components of SQLite pieces together.

arch2

Here is a great graph highlighting the ecosystem of SQLite and its major components. To me, there are two major components that I am specifically interested in at first glance, first is the SQLCompiler, I have never taken any true computer science class which I am assuming “compiler” is a core class that focus on this. I think better understanding of a SQL compiler works will be a good starting point. Second is the Backend which is the true “meat” of any database, how the data are allocated to the disk efficiently, and how the search happens, how you can optimized the allocations using canonical data structures and algorithms to provide outstanding performance is extremely interesting to me.

I am not going to reiterate too much what this article has already stated in this post. I will start with Tokenizer and hopefully have a better understanding of what a tokenizer is and how it is implemented in SQLite in the next chapter.

 

inode – metadata about LinuxVFS file

In Linux virtual file system, every file / directory is mapped to an object named “inode”. It is a data structure that stored the necessary metadata about the file, instead of content of the file itself.
Below is a great video tutorial from Youtube made by “theurbanpenguin” that was super helpful for me to learn what inode is.

A few handy commands:

ln file1 file2  # this will create a hard link file2 linked to the content of file1
stat f1    # this command will get the inode information

Theano Installation on Windows

I am trying to install Theano on my windows desktop and be able to use GPU there.

I am following this tutorial and IT DOES NOT WORK out of box!

I don’t think it is a complete waste of time, but it is sort data mining problem going through that detail tutorial. I only found less than 5 lines of code helpful which is included below.

I have already set up CUDA in visual studio before, so please refer to this post for the work that has already been done.

what I did only for theano:

  1. Install Anaconda Python
  2. pip install theano
  3. conda install mingw libpython
  4. Download Microsoft 10 SDK and install
  5. Add the following system environment (thanks to this NVIDIA post)
    1. LIB=C:\Program Files (x86)\Windows Kits\10\Lib\10.0.14393.0\um\x64;C:\Program Files (x86)\Windows Kits\10\Lib\10.0.14393.0\ucrt\x64
    2. INCLUDE=C:\Program Files (x86)\Windows Kits\10\Include\10.0.14393.0\ucrt
  6. .theanorc.txt[global]
    device = gpu
    floatX = float32[nvcc]
    flags=-LC:\Anaconda\libs
    compiler_bindir=C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\bin

    # Thanks to this Stackoverflow post

 

And here is a screenshot of it is working!

gpu

 

Pay attention to the time.

CUDA – By Example – Julia set

I bought a book “CUDA by example” written by Jason Sanders and Edward Kandrot. It was a pretty well written book where they start from hello world and really get into how CUDA C works, for someone who has not touched C for more than 5 years after college, it is a good choice.

Without purchasing that book, you can also start exploring the CUDA capability by downloading the examples from the book’s website.

Following is one of the examples from Chapter where you parallel program GPU to calculate and plot the Julia set.

However, time flies so you might need a bit tinker around since the source code from the author is not compatible with the latest CUDA toolkit.

Here are a few things on my side that I have to do:

  1. add __device__ to the cuComplex constructor which was missing before.
  2. use <<<N, 1>>> instead of <<N,1>>
  3. tweak the environment variable in VisualStudio Community Express thanks to this stackoverflow post.
  4. in the end, I still have to manually drop the glut32.dll into the debug folder where the .exe file resides.
julia

Juliaset plotted using GPU

TODO:

I think I got the idea of parallel processing from this example, however, I was trying to tweak the parameters and hopefully the performance difference between CPU and GPU might become obvious, however, I did not manage to get a satisfying result since all the jobs all finish really fast regardless of how large I code N to be or how big I change the for loop from 200. need to do more work.

Nvidia GeForce GTX 1050 Ti

I purchased this EVGA GTX 1050 TI and recently it just got delivered. The installation was pretty straight forward, nothing more complex than open the case and plug in the new card. After you install the driver, development kit and then you should be good to go.

I confirmed that my installation is correct by running the deviceQuery.exe and also by creating a project in VisualStudio.

cuda_visualstudiodevicequery

 

More about GPU programming coming in the future!

Lena – the origin about the de facto test image

(Warning: this post might contain nudity which is inappropriate for underage audience) 

Many of you probably have seen or worked with this image sometime in your experience working with image and signal processing.

lena

It is a widely used images in the academic world and of course, has been cited or referenced countless times in all kinds of video/blog tutorials. But have you ever asked, who is this young lady in this picture and why everyone is using this picture, and even, why this image even came into the dull and male-dominated circle 🙂

1. Who is her

This lady is named Lena Söderberg who showed up in the playboy magazine back in the 1970s. That photo was taken almost 40 ~ 50 years ago!  It is actually not that hard to find some images out in the public internet of the raw magazine photo.

top-supermodels-of-computer-graphics-28-728

2. How and Why her?

In a nutshell, it was a gentlemen named Alex Sawchuck who randomly found a copy of magazine and cropped the shoulder above piece to be scanned and then used in a paper later on.

“The Original Sin

Alexander Sawchuk estimates that it was in June or July of 1973 when he, then an assistant professor of electrical engineering at the USC Signal and Image Processing Institute (SIPI), along with a graduate student and the SIPI lab manager, was hurriedly searching the lab for a good image to scan for a colleague’s conference paper. They had tired of their stock of usual test images, dull stuff dating back to television standards work in the early 1960s. They wanted something glossy to ensure good output dynamic range, and they wanted a human face. Just then, somebody happened to walk in with a recent issue of Playboy.

The engineers tore away the top third of the centerfold so they could wrap it around the drum of their Muirhead wirephoto scanner, which they had outfitted with analog-to-digital converters (one each for the red, green, and blue channels) and a Hewlett Packard 2100 minicomputer. The Muirhead had a fixed resolution of 100 lines per inch and the engineers wanted a 512 ✕ 512 image, so they limited the scan to the top 5.12 inches of the picture, effectively cropping it at the subject’s shoulders.” – CMU

There is also a very good presentation on slideshare where you can find the origins of all the interesting stories behind some of the popular or even famous images.

Anyway, next time when you saw that image, you might infuse your research with a big more imagination after knowing the story behind it.

References:

[1] Lena Story

[2] Professional Communication Society Newsletter

[3] Lenna Wikipedia

[4] Top Model in Computer Graphics

A few Numpy Functions

Today I was taking the deep learning course from Udacity, they use numpy very often in that class and here are some notes about a few handy numpy functions that I learned.

np.arange

arange is not a typo, it is simply a function very much like built-in range function but return a ndarray instead of list, that is probably why it is called a(rray)range. Here is the source code for arange in case you are interested.

In [33]: np.arange(-1, 1, 0.5)
Out[33]: array([-1. , -0.5,  0. ,  0.5])
In [51]: np.arange(12).reshape((3,4))

Out[51]: 
array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])

np.vstack

v(ertically)stack all the passed ndarray/list into a new array.

In [37]: np.vstack([[1,2], [3,4], [5,6], [7,8]])

Out[37]: 
array([[1, 2],
       [3, 4],
       [5, 6],
       [7, 8]])

At the same time, you have other sibling utility functions like hstack, concatenate ..etc. that have a very similar usage.

In [38]: np.hstack([[1,2], [3,4], [5,6]])
Out[38]: array([1, 2, 3, 4, 5, 6])

np.sum

the sum is pretty straight-forward, summing up all the numbers, however, there is one pitfall that I fell over is did not pay attention to the argument ‘axis‘. 0 means column wise and 1 means row wise.

In [53]: x

Out[53]: 
array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])

In [54]: x.sum(0)

Out[54]: array([12, 15, 18, 21])

In [55]: x.sum(1)

Out[55]: array([ 6, 22, 38])