Author Archives: datafireball
Business – EBITDA (Earnings Before Interest, Taxes, Depreciation And Amortization)
EBITDA = Revenue – Expenses (excluding Interest, Taxes, Depreciation and Amortization)
Enterprise Software: EDIHQ
Website Link
ODS and Teradata
Hive – Normalize String to Alphanumeric character only using regular expression
select
mpn,
REGEXP_REPLACE(upper(mpn), ‘[^a-zA-Z0-9]+’, ”) as mpn_norm
from
…
R – memory management
- m1 <- 1:(10^9)
- m2 <- 1:(10^9) it exceeded the 12GB physical memory, then starting swapping, I terminated R process.
- restart R
- m1 <- 1:(10^8)
- m2 <- 1:(10^8)
- m3 <- 1:(10^9), then I did `remove(list=ls())` and it removed the object from the environment but the memory did not get released.
- gc() clear memory using garbage collection in R
R – Should I go C++ or should I ditch for loop in R.
I was following the “High Performance Computing using Rcpp” in Hadley Wichham’s Advanced R. I did an experiment in R which there are two functions, one function myRowSum written in C++, and the other one written in plain R. As you can see from the code, they are very similar, same variable name, same for loop, same logic…etc. However, I have been totally blown away by the difference between total time.
I created a dummy matrix with 100,000 rows and 9 columns each.
Then I am thinking, maybe I should try what is the difference between some `vectorized` function in R and see how that compares with Rcpp. Again, Rcpp beat apply function, after I changed the record to 1 million rows times 10 columns each row. It took 10+mins and the for loop in R was still running and I have to stop it because I have no idea how long it gonna take.
This experiment totally changed some of my impressions and I started to understand why people really hates for loop in R. Again, all these interesting stories happens in R and I have never jumped out of the R environment, Rcpp makes R possible to it easy to boost the performance of R to C++ level.
If you are using R and found it slow, don’t blame R, blame yourself!
Hadoop – Memory Overcommit
sublime
Coursera – OMG MMD Final is coming, are you ready!
The true power of Coursera is not only the accessibility of its learning materials but also the true pressure from the deadlines! Deadlines of homework, deadlines of quiz and now is the timing of final!






