Theano Installation on Windows

I am trying to install Theano on my windows desktop and be able to use GPU there.

I am following this tutorial and IT DOES NOT WORK out of box!

I don’t think it is a complete waste of time, but it is sort data mining problem going through that detail tutorial. I only found less than 5 lines of code helpful which is included below.

I have already set up CUDA in visual studio before, so please refer to this post for the work that has already been done.

what I did only for theano:

  1. Install Anaconda Python
  2. pip install theano
  3. conda install mingw libpython
  4. Download Microsoft 10 SDK and install
  5. Add the following system environment (thanks to this NVIDIA post)
    1. LIB=C:\Program Files (x86)\Windows Kits\10\Lib\10.0.14393.0\um\x64;C:\Program Files (x86)\Windows Kits\10\Lib\10.0.14393.0\ucrt\x64
    2. INCLUDE=C:\Program Files (x86)\Windows Kits\10\Include\10.0.14393.0\ucrt
  6. .theanorc.txt[global]
    device = gpu
    floatX = float32[nvcc]
    compiler_bindir=C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\bin

    # Thanks to this Stackoverflow post


And here is a screenshot of it is working!



Pay attention to the time.

CUDA – By Example – Julia set

I bought a book “CUDA by example” written by Jason Sanders and Edward Kandrot. It was a pretty well written book where they start from hello world and really get into how CUDA C works, for someone who has not touched C for more than 5 years after college, it is a good choice.

Without purchasing that book, you can also start exploring the CUDA capability by downloading the examples from the book’s website.

Following is one of the examples from Chapter where you parallel program GPU to calculate and plot the Julia set.

However, time flies so you might need a bit tinker around since the source code from the author is not compatible with the latest CUDA toolkit.

Here are a few things on my side that I have to do:

  1. add __device__ to the cuComplex constructor which was missing before.
  2. use <<<N, 1>>> instead of <<N,1>>
  3. tweak the environment variable in VisualStudio Community Express thanks to this stackoverflow post.
  4. in the end, I still have to manually drop the glut32.dll into the debug folder where the .exe file resides.

Juliaset plotted using GPU


I think I got the idea of parallel processing from this example, however, I was trying to tweak the parameters and hopefully the performance difference between CPU and GPU might become obvious, however, I did not manage to get a satisfying result since all the jobs all finish really fast regardless of how large I code N to be or how big I change the for loop from 200. need to do more work.

Nvidia GeForce GTX 1050 Ti

I purchased this EVGA GTX 1050 TI and recently it just got delivered. The installation was pretty straight forward, nothing more complex than open the case and plug in the new card. After you install the driver, development kit and then you should be good to go.

I confirmed that my installation is correct by running the deviceQuery.exe and also by creating a project in VisualStudio.



More about GPU programming coming in the future!

Lena – the origin about the de facto test image

(Warning: this post might contain nudity which is inappropriate for underage audience) 

Many of you probably have seen or worked with this image sometime in your experience working with image and signal processing.


It is a widely used images in the academic world and of course, has been cited or referenced countless times in all kinds of video/blog tutorials. But have you ever asked, who is this young lady in this picture and why everyone is using this picture, and even, why this image even came into the dull and male-dominated circle 🙂

1. Who is her

This lady is named Lena Söderberg who showed up in the playboy magazine back in the 1970s. That photo was taken almost 40 ~ 50 years ago!  It is actually not that hard to find some images out in the public internet of the raw magazine photo.


2. How and Why her?

In a nutshell, it was a gentlemen named Alex Sawchuck who randomly found a copy of magazine and cropped the shoulder above piece to be scanned and then used in a paper later on.

“The Original Sin

Alexander Sawchuk estimates that it was in June or July of 1973 when he, then an assistant professor of electrical engineering at the USC Signal and Image Processing Institute (SIPI), along with a graduate student and the SIPI lab manager, was hurriedly searching the lab for a good image to scan for a colleague’s conference paper. They had tired of their stock of usual test images, dull stuff dating back to television standards work in the early 1960s. They wanted something glossy to ensure good output dynamic range, and they wanted a human face. Just then, somebody happened to walk in with a recent issue of Playboy.

The engineers tore away the top third of the centerfold so they could wrap it around the drum of their Muirhead wirephoto scanner, which they had outfitted with analog-to-digital converters (one each for the red, green, and blue channels) and a Hewlett Packard 2100 minicomputer. The Muirhead had a fixed resolution of 100 lines per inch and the engineers wanted a 512 ✕ 512 image, so they limited the scan to the top 5.12 inches of the picture, effectively cropping it at the subject’s shoulders.” – CMU

There is also a very good presentation on slideshare where you can find the origins of all the interesting stories behind some of the popular or even famous images.

Anyway, next time when you saw that image, you might infuse your research with a big more imagination after knowing the story behind it.


[1] Lena Story

[2] Professional Communication Society Newsletter

[3] Lenna Wikipedia

[4] Top Model in Computer Graphics

A few Numpy Functions

Today I was taking the deep learning course from Udacity, they use numpy very often in that class and here are some notes about a few handy numpy functions that I learned.


arange is not a typo, it is simply a function very much like built-in range function but return a ndarray instead of list, that is probably why it is called a(rray)range. Here is the source code for arange in case you are interested.

In [33]: np.arange(-1, 1, 0.5)
Out[33]: array([-1. , -0.5,  0. ,  0.5])
In [51]: np.arange(12).reshape((3,4))

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])


v(ertically)stack all the passed ndarray/list into a new array.

In [37]: np.vstack([[1,2], [3,4], [5,6], [7,8]])

array([[1, 2],
       [3, 4],
       [5, 6],
       [7, 8]])

At the same time, you have other sibling utility functions like hstack, concatenate ..etc. that have a very similar usage.

In [38]: np.hstack([[1,2], [3,4], [5,6]])
Out[38]: array([1, 2, 3, 4, 5, 6])


the sum is pretty straight-forward, summing up all the numbers, however, there is one pitfall that I fell over is did not pay attention to the argument ‘axis‘. 0 means column wise and 1 means row wise.

In [53]: x

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])

In [54]: x.sum(0)

Out[54]: array([12, 15, 18, 21])

In [55]: x.sum(1)

Out[55]: array([ 6, 22, 38])


WAV – Deepdive into file format

WAV (WAVE, waveform audio file format) is one of the most popular audio file formats out there. Understanding how the format is structured, how the data is stored is the key to understanding audio. This blog post will be my study note of the file format and a deep dive into a WAV file by looking into the hex code.

Here is the wiki page of what WAV is in general, and here is another tutorial from topherlee that really mapped out the file structure of a wav file. You can download a free sample wav from a site called wavsource for later analysis. (I downloaded the about_time.wav from here).

1.View Binary File

First, we need to figure out a way to view binary files, there are so many tools out there but I found a really cool way from stackoverflow. If you open the file in VIM, it will look like this:


Then you can run the command :% ! xxd, you will have a view like this:


This is not something magical about VIM, it is because there is a command in Linux called xxd that most people don’t know, it is out of the scope of this post to dive into xxd, however, that online is all you need, to quit the hex view, you need to run `:% ! xxd -r` and you are good to go.

2.Understanding format header

Like most of the file format, the first few bytes are about the metadata header about the file, I found this really helpful website that explained the structure of the wav format. The following image shows the canonical wave file format in a simple way.


Attached is an image that I made 🙂 that explained every byte in the WAV file.


  1. 10 00 00 00 (little endian) ~ 00000010 (hex) ~ 16 (dec): subchunk 1 size
  2. 0100 (little endian) ~ 0001(hex) ~ 1: audio format PCM
  3. 0100 (little endian) ~ 0001(hex) ~ 1: channel number:
  4. 0100 (little endian) ~ 1 (dec): sample rate
  5. 0800 (little endian) ~ 8(dec): bits per sample
  6. 80 70 00 00 (little endian) ~ 28,800 (dec): subchunk 2 size

A quick summary of the all the metadatas

  • channel: 1
  • file size: 29Kbytes
  • 8 bits per sample
  • PCM audio format
  • sample rate: 11, 025 (samples per second)
  • byte rate: 11,025

A few interesting math equation to help you understand those terminologies better:

  • Bytes Rate = (Sample Rate * BitsPerSample * Channels) / 8

3. Understanding data format

First we need to refresh our freshman memory, what is a “signed short”, short is simply a two bytes representation of a number, not as a long as four bytes integer, but good enough to represent our audio data, 16bit! that is all we need. For signed short, a positive integer will be the way it is, however, for negative numbers, it will be its two’s compliment. So what is a “two compliment”? it is simply to subtract the absolute value of the negative number from the range, in this case is 2^16, i.e the signed short of -N ~ (2^16 – N).

For example, -32640, signed short will be 2^16-(32640) =32896 (decimal) = 1000,0000,1000,0000 (binary) = 8 0 8 0 (hex).

One more example, 32639, signed short will be 32639 (decimal) = 0111,1111,0111,1111 (binary) = 7 f 7 f (hex).

Now after understanding how to map a number between (-32,767, 32,768) to byte codes on the disk/memory. We can start looking into the data.

Here are some Python code to read in the WAV file and extract the data part out.

wavdatapythonLooking into the data, we can see the first 14 samples are all -32640 and then the 15th sample changed to 32639. So based on the examples above, we know the bytes representation of -32640 is 8080(hex) and 32639 is 7f7f(hex). Now we might need to revisit our VIM xxd representation of the raw WAV file. We have covered the metadata part of WAV file, and starting from the data chunk, we can clearly see ‘8080’ repeated by 14 times and then 7f7f, so on and so forth.

haha, now we understand all the nitty-gritty details about WAV, let’s take a look a data part and have a more intuitive understanding of “about_time”. The length of the data is 14400, which means 14400 samples and 16 bits each sample. In section one, we know the sample rate is about 11025 samples/second, theoretically, the total length of the audio file should be around 14400/11025=1.3 seconds.

TODO: the actual length of about 3 seconds, something is wrong here.


Here I am plotting the first 500 samples and you can easily see the sound, a few tips to interpret the lines, really high and really low means loud, really “dense” means high frequency, ie, high pitch. Of course, this is not a single sound, but a mixture.


Here is a histogram of all the sample values distribution. I don’t quite understand what exactly does the value mean, the magnitude of the sound? If that is the case, does that mean most of the values are too loud? and too quiet? I assume in this case, it is simply the vocal of a male reading some words, mostly it will either be the guys voice and the quiet moment…

TODO: need to look into what does (-32,767, 32,768) mean?

In the end, here is a spectrum graph where I borrowed the source code mostly from here.


well, this is what I have done and need to do more research and have a better intuitive understanding of what everything is.


Python – Profiling cProfile

Recently I have a project that I have lots of raw data (time series data), however, the output need to some higher level statistics which requires some aggregation of the raw data. So here are the choices that I have, either for every request, pull the raw data and calculate it on the fly, or preprocess all the raw data and store the stats somewhere else, so when use needs the data, then it is simply a look up.

Since the data is pretty big and updated at a daily level, batch preprocessing all the data is like boiling the ocean, and what is even worse, we need to reboil the ocean every day. There is also a possibility that the user who requested this service won’t even be able to use all that much data that frequent, which will result in a huge waste of computing power. On the other hand, calculating on the fly faces some other challenges, you need to ensure your logic is so well written and generic and will succeed for all the parts. Mostly important, the performance need to be fast enough to serve as a service. In the history, I wrote my python code in a style that “hmm, it is fast, hmm.. it is taking a long time”. Nothing more than a linux “time python”. Now I face the challenge of turning whatever calculation into something that will be fast at a service level (<100ms). Have a quantitive understanding of how much time each step takes, where the bottle neck is and then we can strategically to improve certain parts without switching to other programming languages (C, Java)..

Then I learned that this type of analysis is called profiling:

“A profile is a set of statistics that describes how often and for how long various parts of the program executed.” – Python Documentation

The cProfile is the de-facto profiling tool to benchmark Python code. It is not the mostly user friendly tool but once you spent some time on it, getting familiar with its syntax, then you will have a tool like linux top command but for your python code.

import cProfile'range(10)')

Either the python documentation or pyMOTW can help you get started quickly. Then I came across a blog post from Julien Danjou – Profiling Python using cProfile: a concrete case which introduced me to KCacheGrind.

If you are a Mac user like me, brew install everything following this instruction.

pip install pyprof2calltree, and then you will be good to go.

pyprof2calltree -k -i file_profile

In the end, you will have a beautiful visualized way of how long each step takes.

Screen Shot 2016-07-23 at 6.59.56 PM.png