select
mpn,
REGEXP_REPLACE(upper(mpn), ‘[^a-zA-Z0-9]+’, ”) as mpn_norm
from
…
Monthly Archives: December 2014
R – memory management
- m1 <- 1:(10^9)
- m2 <- 1:(10^9) it exceeded the 12GB physical memory, then starting swapping, I terminated R process.
- restart R
- m1 <- 1:(10^8)
- m2 <- 1:(10^8)
- m3 <- 1:(10^9), then I did `remove(list=ls())` and it removed the object from the environment but the memory did not get released.
- gc() clear memory using garbage collection in R
R – Should I go C++ or should I ditch for loop in R.
I was following the “High Performance Computing using Rcpp” in Hadley Wichham’s Advanced R. I did an experiment in R which there are two functions, one function myRowSum written in C++, and the other one written in plain R. As you can see from the code, they are very similar, same variable name, same for loop, same logic…etc. However, I have been totally blown away by the difference between total time.
I created a dummy matrix with 100,000 rows and 9 columns each.
Then I am thinking, maybe I should try what is the difference between some `vectorized` function in R and see how that compares with Rcpp. Again, Rcpp beat apply function, after I changed the record to 1 million rows times 10 columns each row. It took 10+mins and the for loop in R was still running and I have to stop it because I have no idea how long it gonna take.
This experiment totally changed some of my impressions and I started to understand why people really hates for loop in R. Again, all these interesting stories happens in R and I have never jumped out of the R environment, Rcpp makes R possible to it easy to boost the performance of R to C++ level.
If you are using R and found it slow, don’t blame R, blame yourself!





