I was following the “High Performance Computing using Rcpp” in Hadley Wichham’s Advanced R. I did an experiment in R which there are two functions, one function myRowSum written in C++, and the other one written in plain R. As you can see from the code, they are very similar, same variable name, same for loop, same logic…etc. However, I have been totally blown away by the difference between total time.
I created a dummy matrix with 100,000 rows and 9 columns each.
Then I am thinking, maybe I should try what is the difference between some `vectorized` function in R and see how that compares with Rcpp. Again, Rcpp beat apply function, after I changed the record to 1 million rows times 10 columns each row. It took 10+mins and the for loop in R was still running and I have to stop it because I have no idea how long it gonna take.
This experiment totally changed some of my impressions and I started to understand why people really hates for loop in R. Again, all these interesting stories happens in R and I have never jumped out of the R environment, Rcpp makes R possible to it easy to boost the performance of R to C++ level.
If you are using R and found it slow, don’t blame R, blame yourself!