R – Tsoutliers

tsoutliers is a package developed by Javier López-de-Lacalle, who is also maintaining other packages like KFKSDS(Kalman Filter, Smoother and Disturbance Smoother), meboot(Maximum Entropy Bootstrap for Time Series), stsm(Structural Time Series Models).

In the tsoutliers package itself, there are four categories that all the outliers could be categorized into, you can either dive into these two(paper1, paper2) papers  or take a quick look at this IBM knowledge page to have a one sentence description for each of these terms.

  1. IO (Innovational Outlier)
  2. AO (additive outlier)
  3. LS (level shifting)
  4. TC (transient change)

In a short sentence, AO is a type of outlier that only affect one observation while the other three all have impact on the coming ones following the first outlier. However, LS will lead to a permanently shift. IO and TC are very similar from the shape of the plot, i.e., the initial impact die out gradually a long with time. To figure out the difference between IO and TC, you might need to read the paper, but as the author mentioned “on a time series, the effect of an IO is more intricate than the effects of other types of outliers.”

Here is the mathematical representation of the four types of outliers.

 

Mac – Some notes installing tsoutliers

I am trying to use the tsoutliers package from R which could only be installed from source code. The installation is not friendly at all based on my experience. To guarantee the success, you have to make sure that you have the necessary dependencies ready, like the proper compiler.

There are indeed many different versions of compilers available across all the platforms. I am using a MAC, and when I first started programming. My friends told me a easy way to get a lot necessary developers tools is to download XCode and you can download the “Developer Command Line Tool”.

xcode_cmdlinetools
I did some research, seems like there are two commonly used compilers available for mac users. GCC which is the one from GNU, and Clang from Apple. The simple reason that why Apple rebuilt the GNU hippo is because of license where GCC is GPL based which means whichever code uses GPL licenced code need to be open source too… however, Clang is BSD based which allows the the code to be implemented into proprietary software.

clang

R – tapply

Somehow I have read a post long time ago showing that the importance of understanding all the different types of *pply, like apply, lapply, tapply, vapply..etc. Today, I watched the awesomeness of tapply ‘LIVE’! Also, I really think the people who understand R writes code in an absolute different style from people’s “intuition”. Here is the Stackoverflow post that I got totally blown away.

tapply_vector

R – How dare you use “==” in if statement

There are usually so many places in each language that just don’t make that much sense but so easy for an inexperienced programmer to adopt and take for granted.

“==” in R is such a thing. I came from a Python programming background and I take for granted that to use “==” in an if statement whenever comparing two objects based on value.

However, in R, I just noticed that even in the documentation of “==”, they clearly write down a misunderstanding of “==” will lead to a disaster.

r_equal