Time Series Similarity Using Dynamic Time Warping -Explained

Abhishek Mishra on 2020-11-04

Photo Credit : Trend Comparison

Find out why DTW is a very useful technique to compare two or more time series signals and add it to your time series analysis toolbox!!

1. Introduction

In this world which is getting dominated by Internet of Things (IoT), there is an increasing need to understand signals from devices installed in households, shopping malls, factories and offices. For example, any voice assistant detects, authenticates and interprets commands from humans even if it is slow or fast. Our voice tone tends to be different during different times of the day. In the early morning after we get up from bed, we interact with a slower, heavier and lazier tone compared to other times of the day. These devices treat the signals as time series and compare the peaks, troughs and slopes by taking into account the varying lags and phases in the signals to come up with a similarity score. One of the most common algorithms used to accomplish this is Dynamic Time Warping (DTW). It is a very robust technique to compare two or more Time Series by ignoring any shifts and speed.

As part of Walmart Real Estate team, I am working on understanding the energy consumption pattern of different assets like refrigeration units, dehumidifiers, lighting, etc. installed in the retail stores.This will help in improving quality of data collected from IoT sensors, detect and prevent faults in the systems and improve energy consumption forecasting and planning. This analysis involves time series of energy consumption during different times of a day i.e. different days of a week, weeks of a month and months of a year. Time series forecasting often gives bad predictions when there is sudden shift in phase of the energy consumption due to unknown factors. For example if the defrost schedule, items refresh routine for a refrigeration unit, or weather changes suddenly and are not captured to explain the phase shifts of energy consumption, it is important to detect these change points.

In the example below, the items refresh routine of a store has shifted by 2 hours on Tuesday leading the shift in peak energy consumption of refrigeration units and this information was not available to us for many such stores.

The peak at 2 am got shifted to 4 am. DTW when run recursively for consecutive days can identify the cases for which phase shift occurred without much change in shape of signals.

The training data can be restricted to Tuesday onwards to improve the prediction of energy consumption in future in this case as phase shift was detected on Tuesday. The setup improved the predictions substantially ( > 50%) for the stores for which the reason of shift was not known. This was not possible by traditional ways of one to one comparison of signals.

In this blog, I will explain how DTW algorithm works and throw some light on the calculation of the similarity score between two time series and its implementation in python. Most of the contents in this blog have been sourced from this paper, also mentioned in the references section below.

2. Why do we need DTW ?

Any two time series can be compared using euclidean distance or other similar distances on a one to one basis on time axis. The amplitude of first time series at time T will be compared with amplitude of second time series at time T. This will result into a very poor comparison and similarity score even if the two time series are very similar in shape but out of phase in time.

DTW compares amplitude of first signal at time T with amplitude of second signal at time T+1 and T-1 or T+2 and T-2. This makes sure it does not give low similarity score for signals with similar shape and different phase.

3. How it works?

Let us take two time series signals P and Q

Series 1 (P) : 1,4,5,10,9,3,2,6,8,4

Series 2 (Q): 1,7,3,4,1,10,5,4,7,4

Step 1 : Empty Cost Matrix Creation

Create an empty cost matrix M with x and y labels as amplitudes of the two series to be compared.

Step 2: Cost Calculation

Fill the cost matrix using the formula mentioned below starting from left and bottom corner.

M(i, j) = |P(i) — Q(j)| + min ( M(i-1,j-1), M(i, j-1), M(i-1,j) )

where

M is the matrix

i is the iterator for series P

j is the iterator for series Q

Let us take few examples (11,3 and 8 ) to illustrate the calculation as highlighted in the below table.

for 11,

|10 –4| + min( 5, 12, 5 )

= 6 + 5

= 11

Similarly for 3,

|4 –1| + min( 0 )

= 3+ 0

= 3

and for 8,

|1 –3| + min( 6)

= 2 + 6

= 8

The full table will look like this:

Step 3: Warping Path Identification

Identify the warping path starting from top right corner of the matrix and traversing to bottom left. The traversal path is identified based on the neighbour with minimum value.

In our example it starts with 15 and looks for minimum value i.e. 15 among its neighbours 18, 15 and 18.

The next number in the warping traversal path is 14. This process continues till we reach the bottom or the left axis of the table.

The final path will look like this:

Let this warping path series be called as d.

d = [15,15,14,13,11,9,8,8,4,4,3,0]

Step 4: Final Distance Calculation

Time normalised distance , D

where k is the length of the series d.

k = 12 in our case.

D = ( 15 + 15 + 14 + 13 + 11 + 9 + 8 + 8 + 4 + 4 + 3 + 0 ) /12

= 104/12

= 8.63

Let us take another example with two very similar time series with unit time shift difference.

Cost matrix and warping path will look like this.

DTW distance ,D =

( 0 + 0 + 0 + 0 + 0 +0 +0 +0 +0 +0 +0 ) /11

= 0

Zero DTW distance implies that the time series are very similar and that is indeed the case as observed in the plot.

3. Python Implementation

There are many libraries contributed in python. I have shared the links below.

dtw-python A comprehensive implementation of dynamic time warping (DTW) algorithms. DTW computes the optimal (least cumulative…pypi.org

dtw Dtw is a Python Module for computing Dynamic Time Warping distance. It can be used as a similarity measured between…pypi.org

However, for a better understanding of the algorithm it is a good practice to write the function yourself as per the code snippet below.

DTW.py

series1 = pd.Series([1,4,5,10,9,3,2,6,8,4])
series2 = pd.Series([1,7,3,4,1,10,5,4,7,4])

##Fill DTW Matrix
def fill_dtw_cost_matrix(s1, s2):
    l_s_1, l_s_2 = len(s1), len(s2)
    cost_matrix = np.zeros((l_s_1+1, l_s_2+1))
    for i in range(l_s_1+1):
        for j in range(l_s_2+1):
            cost_matrix[i, j] = np.inf
    cost_matrix[0, 0] = 0
    
    for i in range(1, l_s_1+1):
        for j in range(1, l_s_2+1):
            cost = abs(s1[i-1] - s2[j-1])
            #take last min from the window
            prev_min = np.min([cost_matrix[i-1, j], cost_matrix[i, j-1], cost_matrix[i-1, j-1]])
            cost_matrix[i, j] = cost + prev_min
    return cost_matrix
  
 ##Call DTW function
dtw_cost_matrix = fill_dtw_cost_matrix(series1,series2)

I have not focused much on the time and space complexity in this code. However natural implementation of DTW has a time and space complexity of O(M,N) where M and N are the lengths of the respective time series sequences between which DTW distance is to be calculated. Faster implementations like PrunedDTW, SparseDTW, FastDTW and MultiscaleDTW are also available.

4. Applications

5. References

Dynamic programming algorithm optimization for spoken word recognition - IEEE Journals & Magazine IEEE Xplore, delivering full text access to the world's highest quality technical literature in engineering and…ieeexplore.ieee.org

H. Sakoe and S. Chiba, “Dynamic programming algorithm optimization for spoken word recognition,” in IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 26, no. 1, pp. 43–49, February 1978, doi: 10.1109/TASSP.1978.1163055.Understanding Dynamic Time Warping - The Databricks Blog Try this notebook in Databricks This blog is part 1 of our two-part series . To go to part 2, go to Using Dynamic Time…databricks.com

Dynamic Time Warping (DTW) Algorithm for Time Series Analysismedium.com

GenTXWarper - Mining gene expression time series Dynamic time warping (DTW) is a time series alignment algorithm developed originally for speech recognition(1) . It…www.psb.ugent.be

https://www.irit.fr/~Julien.Pinquier/Docs/TP_MABS/res/dtw-sakoe-chiba78.pdf

dtw-python A comprehensive implementation of dynamic time warping (DTW) algorithms. DTW computes the optimal (least cumulative…pypi.org