I bought a book “CUDA by example” written by Jason Sanders and Edward Kandrot. It was a pretty well written book where they start from hello world and really get into how CUDA C works, for someone who has not touched C for more than 5 years after college, it is a good choice.
Without purchasing that book, you can also start exploring the CUDA capability by downloading the examples from the book’s website.
Following is one of the examples from Chapter where you parallel program GPU to calculate and plot the Julia set.
However, time flies so you might need a bit tinker around since the source code from the author is not compatible with the latest CUDA toolkit.
Here are a few things on my side that I have to do:
- add __device__ to the cuComplex constructor which was missing before.
- use <<<N, 1>>> instead of <<N,1>>
- tweak the environment variable in VisualStudio Community Express thanks to this stackoverflow post.
- in the end, I still have to manually drop the glut32.dll into the debug folder where the .exe file resides.

Juliaset plotted using GPU
TODO:
I think I got the idea of parallel processing from this example, however, I was trying to tweak the parameters and hopefully the performance difference between CPU and GPU might become obvious, however, I did not manage to get a satisfying result since all the jobs all finish really fast regardless of how large I code N to be or how big I change the for loop from 200. need to do more work.