This post originates from this Stackoverflow question. It is the first time I ever came across the term “Dynamic Time Warping” and it turned out it is a really straight forward concept in the end after reading this introduction from Macquarie University.
In a short sentence, it will to match the pattern between two series by finding the best consistent path.
idx<-seq(0,6.28,len=100);
query<-sin(idx)+runif(100)/10;
template<-cos(idx)
library(dtw);
plot(dtw(query,template,keep=TRUE),type=”threeway”)
plot(dtw(query,template,keep=TRUE,step=rabinerJuangStepPattern(6,”c”)),type=”twoway”,offset=-2);
For example, lets start with the Query data, it starts at value 0, and Reference data start with 1. Then we say they are not good match. We need to keep search down the Query sequence until we hit the value closest to 1, which is basically at index 30 at the Query index. That explains why the alignment start flat horizontally. Actually, it turns out from then on, the query data and the reference data lines up pretty well. And that explains why the alignment plot is almost perfect diagonal (it should be perfect diagonal if you compare one series to itself). Then after the query data reaches value 0 at index 100. The path need to end at top right corner. And that is why there is also a vertical line in the end.
After all these interesting math games and plots, we might need to spend some time figuring out how should that be applied to our data science life, right? Believe or not, there is an article from the Journal of Statistical Software by Toni Giorgio, who is the author is this package dtw.
So you basically need to understand what index1 and index2 mean and then building a mapping function using those two vectors to map the input/query data to the reference/template. Then you can scale the input data in whatever way you want.
Here is a visualized way of the optimal path: