Theano Installation on Windows

I am trying to install Theano on my windows desktop and be able to use GPU there.

I am following this tutorial and IT DOES NOT WORK out of box!

I don’t think it is a complete waste of time, but it is sort data mining problem going through that detail tutorial. I only found less than 5 lines of code helpful which is included below.

I have already set up CUDA in visual studio before, so please refer to this post for the work that has already been done.

what I did only for theano:

  1. Install Anaconda Python
  2. pip install theano
  3. conda install mingw libpython
  4. Download Microsoft 10 SDK and install
  5. Add the following system environment (thanks to this NVIDIA post)
    1. LIB=C:\Program Files (x86)\Windows Kits\10\Lib\10.0.14393.0\um\x64;C:\Program Files (x86)\Windows Kits\10\Lib\10.0.14393.0\ucrt\x64
    2. INCLUDE=C:\Program Files (x86)\Windows Kits\10\Include\10.0.14393.0\ucrt
  6. .theanorc.txt[global]
    device = gpu
    floatX = float32[nvcc]
    flags=-LC:\Anaconda\libs
    compiler_bindir=C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\bin

    # Thanks to this Stackoverflow post

 

And here is a screenshot of it is working!

gpu

 

Pay attention to the time.

CUDA – By Example – Julia set

I bought a book “CUDA by example” written by Jason Sanders and Edward Kandrot. It was a pretty well written book where they start from hello world and really get into how CUDA C works, for someone who has not touched C for more than 5 years after college, it is a good choice.

Without purchasing that book, you can also start exploring the CUDA capability by downloading the examples from the book’s website.

Following is one of the examples from Chapter where you parallel program GPU to calculate and plot the Julia set.

However, time flies so you might need a bit tinker around since the source code from the author is not compatible with the latest CUDA toolkit.

Here are a few things on my side that I have to do:

  1. add __device__ to the cuComplex constructor which was missing before.
  2. use <<<N, 1>>> instead of <<N,1>>
  3. tweak the environment variable in VisualStudio Community Express thanks to this stackoverflow post.
  4. in the end, I still have to manually drop the glut32.dll into the debug folder where the .exe file resides.
julia

Juliaset plotted using GPU

TODO:

I think I got the idea of parallel processing from this example, however, I was trying to tweak the parameters and hopefully the performance difference between CPU and GPU might become obvious, however, I did not manage to get a satisfying result since all the jobs all finish really fast regardless of how large I code N to be or how big I change the for loop from 200. need to do more work.