Data Scientist


Scalable learning of time series classifiers

Times series classification is an important data analysis task.  The largest dataset in the standard set of benchmark time-series classification tasks, the UCR respository, contains approximately 10,000 series. We are working with the French Space agency on classifying land usage from satellite images.  This task requires learning from many millions of time series and classifying many billions.  The preexisting state-of-the-art does not scale to these magnitudes.  We are developing new time series classification technologies that will.

Proximity Forest provides a significant advance on the state of the art in time series classification. By coupling the efficiency of divide and conquer tree classifiers with the effectiveness of specialised similarity measures specifically designed for time series, Proximity Forest achieves very high accuracy for modest computation. An implementation can be downloaded here.

Elastic Bands are our novel lower bound for Dynamic Time Warping that are both faster and tighter than the popular LB_Keogh. An implementation can be downloaded here.

The following is a blog post on the use of Barycentric averaging in time series classification: http://www.kdnuggets.com/2014/12/averaging-improves-accuracy-speed-time-series-classification.html. The code can be downloaded here: http://francois-petitjean.com/Research/ICDM2014-DTW/index.php. The slides for the ICDM 2014 paper can be downloaded here: http://francois-petitjean.com/Research/ICDM2014-DTW/Slides.pdf.

The TSI software for the SDM 2017 paper on time series indexing can be downloaded here: https://github.com/ChangWeiTan/TSI. Slides for the SDM 2017 paper can be downloaded here: http://francois-petitjean.com/Research/SDM17-slides.pdf.

The software for the best paper award winning SDM 2018 paper on finding the best warping window can be downloaded here: https://github.com/ChangWeiTan/FastWWSearch (Matlab version).

Publications

Proximity Forest: an effective and scalable distance-based classifier for time series.
Lucas, B., Shifaz, A., Pelletier, C., O’Neill, L., Zaidi, N., Goethals, B., Petitjean, F., & Webb, G. I.
Data Mining and Knowledge Discovery, in press.
[DOI] [Bibtex] [Abstract]

@Article{LucasEtAl2019,
Title = {Proximity Forest: an effective and scalable distance-based classifier for time series},
Author = {Lucas, Benjamin
and Shifaz, Ahmed
and Pelletier, Charlotte
and O'Neill, Lachlan
and Zaidi, Nayyar
and Goethals, Bart
and Petitjean, Fran{\c{c}}ois
and Webb, Geoffrey I.},
Journal = {Data Mining and Knowledge Discovery},
Year = {in press},
Abstract = {Research into the classification of time series has made enormous progress in the last decade. The UCR time series archive has played a significant role in challenging and guiding the development of new learners for time series classification. The largest dataset in the UCR archive holds 10,000  time series only; which may explain why the primary research focus has been on creating algorithms that have high accuracy on relatively small datasets. This paper introduces Proximity Forest, an algorithm that learns accurate models from datasets with millions of time series, and classifies a time series in milliseconds. The models are ensembles of highly randomized Proximity Trees. Whereas conventional decision trees branch on attribute values (and usually perform poorly on time series), Proximity Trees branch on the proximity of time series to one exemplar time series or another; allowing us to leverage the decades of work into developing relevant measures for time series. Proximity Forest gains both efficiency and accuracy by stochastic selection of both exemplars and similarity measures. Our work is motivated by recent time series applications that provide orders of magnitude more time series than the UCR benchmarks. Our experiments demonstrate that Proximity Forest is highly competitive on the UCR archive: it ranks among the most accurate classifiers while being significantly faster. We demonstrate on a 1M time series Earth observation dataset that Proximity Forest retains this accuracy on datasets that are many orders of magnitude greater than those in the UCR repository, while learning its models at least 100,000 times faster than current state-of-the-art models Elastic Ensemble and COTE.},
Doi = {10.1007/s10618-019-00617-3},
ISSN = {1573-756X},
Keywords = {time series},
Related = {scalable-time-series-classifiers}
}
ABSTRACT Research into the classification of time series has made enormous progress in the last decade. The UCR time series archive has played a significant role in challenging and guiding the development of new learners for time series classification. The largest dataset in the UCR archive holds 10,000  time series only; which may explain why the primary research focus has been on creating algorithms that have high accuracy on relatively small datasets. This paper introduces Proximity Forest, an algorithm that learns accurate models from datasets with millions of time series, and classifies a time series in milliseconds. The models are ensembles of highly randomized Proximity Trees. Whereas conventional decision trees branch on attribute values (and usually perform poorly on time series), Proximity Trees branch on the proximity of time series to one exemplar time series or another; allowing us to leverage the decades of work into developing relevant measures for time series. Proximity Forest gains both efficiency and accuracy by stochastic selection of both exemplars and similarity measures. Our work is motivated by recent time series applications that provide orders of magnitude more time series than the UCR benchmarks. Our experiments demonstrate that Proximity Forest is highly competitive on the UCR archive: it ranks among the most accurate classifiers while being significantly faster. We demonstrate on a 1M time series Earth observation dataset that Proximity Forest retains this accuracy on datasets that are many orders of magnitude greater than those in the UCR repository, while learning its models at least 100,000 times faster than current state-of-the-art models Elastic Ensemble and COTE.

Elastic bands across the path: A new framework and methods to lower bound DTW.
Tan, C. W., Petitjean, F., & Webb, G. I.
Proceedings of the 2019 SIAM International Conference on Data Mining, in press.
[URL] [Bibtex] [Abstract]

@InProceedings{,
Title = {Elastic bands across the path: A new framework and methods to lower bound DTW},
Author = {Chang Wei Tan and Francois Petitjean and Geoffrey I. Webb},
Booktitle = {Proceedings of the 2019 SIAM International Conference on Data Mining},
Year = {in press},
Abstract = {There has been renewed recent interest in developing effective lower bounds for Dynamic Time Warping (DTW) distance between time series. These have many applications in time series indexing, clustering, forecasting, regression and classification. One of the key time series classification algorithms, the nearest neighbor algorithm with DTW distance (NN-DTW) is very expensive to compute, due to the quadratic complexity of DTW. Lower bound search can speed up NN-DTW substantially. An effective and tight lower bound quickly prunes off unpromising nearest neighbor candidates from the search space and minimises the number of the costly DTW computations. The speed up provided by lower bound search becomes increasingly critical as training set size increases. Different lower bounds provide different trade-offs between computation time and tightness. Most existing lower bounds interact with DTW warping window sizes. They are very tight and effective at smaller warping window sizes, but become looser as the warping window increases, thus reducing the pruning effectiveness for NN-DTW. In this work, we present a new class of lower bounds that are tighter than the popular Keogh lower bound, while requiring similar computation time. Our new lower bounds take advantage of the DTW boundary condition, monotonicity and continuity constraints to create a tighter lower bound. Of particular significance, they remain relatively tight even for large windows. A single parameter to these new lower bounds controls the speed-tightness trade-off. We demonstrate that these new lower bounds provide an exceptional balance between computation time and tightness for the NN-DTW time series classification task, resulting in greatly improved efficiency for NN-DTW lower bound search.},
Keywords = {time series},
Related = {scalable-time-series-classifiers},
Url = {https://arxiv.org/abs/1808.09617}
}
ABSTRACT There has been renewed recent interest in developing effective lower bounds for Dynamic Time Warping (DTW) distance between time series. These have many applications in time series indexing, clustering, forecasting, regression and classification. One of the key time series classification algorithms, the nearest neighbor algorithm with DTW distance (NN-DTW) is very expensive to compute, due to the quadratic complexity of DTW. Lower bound search can speed up NN-DTW substantially. An effective and tight lower bound quickly prunes off unpromising nearest neighbor candidates from the search space and minimises the number of the costly DTW computations. The speed up provided by lower bound search becomes increasingly critical as training set size increases. Different lower bounds provide different trade-offs between computation time and tightness. Most existing lower bounds interact with DTW warping window sizes. They are very tight and effective at smaller warping window sizes, but become looser as the warping window increases, thus reducing the pruning effectiveness for NN-DTW. In this work, we present a new class of lower bounds that are tighter than the popular Keogh lower bound, while requiring similar computation time. Our new lower bounds take advantage of the DTW boundary condition, monotonicity and continuity constraints to create a tighter lower bound. Of particular significance, they remain relatively tight even for large windows. A single parameter to these new lower bounds controls the speed-tightness trade-off. We demonstrate that these new lower bounds provide an exceptional balance between computation time and tightness for the NN-DTW time series classification task, resulting in greatly improved efficiency for NN-DTW lower bound search.

Efficient search of the best warping window for Dynamic Time Warping.
Tan, C. W., Herrmann, M., Forestier, G., Webb, G. I., & Petitjean, F.
Proceedings of the 2018 SIAM International Conference on Data Mining, 2018.
[PDF] [Bibtex] [Abstract]

@InProceedings{TanEtAl18,
Title = {Efficient search of the best warping window for Dynamic Time Warping},
Author = {Chang Wei Tan and Matthieu Herrmann and Germain Forestier and Geoffrey I. Webb and Francois Petitjean},
Booktitle = {Proceedings of the 2018 {SIAM} International Conference on Data Mining},
Year = {2018},
Abstract = {Time series classification maps time series to labels. The nearest neighbour algorithm (NN) using the Dynamic Time Warping (DTW) similarity measure is a leading algorithm for this task and a component of the current best ensemble classifiers for time series. However, NN-DTW is only a winning combination when its meta-parameter – its warping window – is learned from the training data. The warping window (WW) intuitively controls the amount of distortion allowed when comparing a pair of time series. With a training database of N time series of lengths L, a naive approach to learning the WW requires Omega(N2·L3) operations. This often translates in NN-DTW requiring days for training on datasets containing a few thousand time series only. In this paper, we introduce FastWWSearch: an efficient and exact method to learn WW. We show on 86 datasets that our method is always faster than the state of the art, with at least one order of magnitude and up to 1000x speed-up.},
Comment = {Best Research Paper Award},
Keywords = {time series},
Related = {scalable-time-series-classifiers}
}
ABSTRACT Time series classification maps time series to labels. The nearest neighbour algorithm (NN) using the Dynamic Time Warping (DTW) similarity measure is a leading algorithm for this task and a component of the current best ensemble classifiers for time series. However, NN-DTW is only a winning combination when its meta-parameter – its warping window – is learned from the training data. The warping window (WW) intuitively controls the amount of distortion allowed when comparing a pair of time series. With a training database of N time series of lengths L, a naive approach to learning the WW requires Omega(N2·L3) operations. This often translates in NN-DTW requiring days for training on datasets containing a few thousand time series only. In this paper, we introduce FastWWSearch: an efficient and exact method to learn WW. We show on 86 datasets that our method is always faster than the state of the art, with at least one order of magnitude and up to 1000x speed-up.

Generating synthetic time series to augment sparse datasets.
Forestier, G., Petitjean, F., Dau, H. A., Webb, G. I., & Keogh, E.
IEEE International Conference on Data Mining (ICDM-17), pp. 865-870, 2017.
[PDF] [Bibtex]

@InProceedings{ForestierEtAl17,
Title = {Generating synthetic time series to augment sparse datasets},
Author = {Germain Forestier and Francois Petitjean and Hoang Anh Dau and Geoffrey I Webb and Eamonn Keogh},
Booktitle = {IEEE International Conference on Data Mining (ICDM-17)},
Year = {2017},
Pages = {865-870},
Keywords = {time series},
Related = {scalable-time-series-classifiers}
}
ABSTRACT 

Indexing and classifying gigabytes of time series under time warping.
Tan, C. W., Webb, G. I., & Petitjean, F.
Proceedings of the 2017 SIAM International Conference on Data Mining, pp. 282-290, 2017.
[PDF] [DOI] [Bibtex]

@InProceedings{TanEtAl17a,
Title = {Indexing and classifying gigabytes of time series under time warping},
Author = {Tan, Chang Wei and Webb, Geoffrey I and Petitjean, Fran{\c{c}}ois},
Booktitle = {Proceedings of the 2017 SIAM International Conference on Data Mining},
Year = {2017},
Organization = {SIAM},
Pages = {282-290},
Doi = {10.1137/1.9781611974973.32},
Keywords = {time series},
Related = {scalable-time-series-classifiers}
}
ABSTRACT 

Faster and more accurate classification of time series by exploiting a novel dynamic time warping averaging algorithm.
Petitjean, F., Forestier, G., Webb, G. I., Nicholson, A. E., Chen, Y., & Keogh, E.
Knowledge and Information Systems, 47(1), 1-26, 2016.
[PDF] [URL] [Bibtex] [Abstract]

@Article{PetitjeanEtAl16a,
Title = {Faster and more accurate classification of time series by exploiting a novel dynamic time warping averaging algorithm},
Author = {F. Petitjean and G. Forestier and G.I. Webb and A.E. Nicholson and Y. Chen and E. Keogh},
Journal = {Knowledge and Information Systems},
Year = {2016},
Number = {1},
Pages = {1-26},
Volume = {47},
Abstract = {A concerted research effort over the past two decades has heralded significant improvements in both the efficiency and effectiveness of time series classification. The consensus that has emerged in the community is that the best solution is a surprisingly simple one. In virtually all domains, the most accurate classifier is the nearest neighbor algorithm with dynamic time warping as the distance measure. The time complexity of dynamic time warping means that successful deployments on resource-constrained devices remain elusive. Moreover, the recent explosion of interest in wearable computing devices, which typically have limited computational resources, has greatly increased the need for very efficient classification algorithms. A classic technique to obtain the benefits of the nearest neighbor algorithm, without inheriting its undesirable time and space complexity, is to use the nearest centroid algorithm. Unfortunately, the unique properties of (most) time series data mean that the centroid typically does not resemble any of the instances, an unintuitive and underappreciated fact. In this paper we demonstrate that we can exploit a recent result by Petitjean et al. to allow meaningful averaging of “warped� time series, which then allows us to create super-efficient nearest “centroid� classifiers that are at least as accurate as their more computationally challenged nearest neighbor relatives. We demonstrate empirically the utility of our approach by comparing it to all the appropriate strawmen algorithms on the ubiquitous UCR Benchmarks and with a case study in supporting insect classification on resource-constrained sensors.},
Keywords = {time series},
Related = {scalable-time-series-classifiers},
Url = {http://link.springer.com/article/10.1007/s10115-015-0878-8}
}
ABSTRACT A concerted research effort over the past two decades has heralded significant improvements in both the efficiency and effectiveness of time series classification. The consensus that has emerged in the community is that the best solution is a surprisingly simple one. In virtually all domains, the most accurate classifier is the nearest neighbor algorithm with dynamic time warping as the distance measure. The time complexity of dynamic time warping means that successful deployments on resource-constrained devices remain elusive. Moreover, the recent explosion of interest in wearable computing devices, which typically have limited computational resources, has greatly increased the need for very efficient classification algorithms. A classic technique to obtain the benefits of the nearest neighbor algorithm, without inheriting its undesirable time and space complexity, is to use the nearest centroid algorithm. Unfortunately, the unique properties of (most) time series data mean that the centroid typically does not resemble any of the instances, an unintuitive and underappreciated fact. In this paper we demonstrate that we can exploit a recent result by Petitjean et al. to allow meaningful averaging of “warped� time series, which then allows us to create super-efficient nearest “centroid� classifiers that are at least as accurate as their more computationally challenged nearest neighbor relatives. We demonstrate empirically the utility of our approach by comparing it to all the appropriate strawmen algorithms on the ubiquitous UCR Benchmarks and with a case study in supporting insect classification on resource-constrained sensors.

Dynamic Time Warping Averaging of Time Series Allows Faster and More Accurate Classification.
Petitjean, F., Forestier, G., Webb, G. I., Nicholson, A., Chen, Y., & Keogh, E.
Proceedings of the 14th IEEE International Conference on Data Mining, pp. 470-479, 2014.
[PDF] [URL] [Bibtex] [Abstract]

@InProceedings{PetitjeanEtAl14b,
Title = {Dynamic Time Warping Averaging of Time Series Allows Faster and More Accurate Classification},
Author = {F. Petitjean and G. Forestier and G.I. Webb and A. Nicholson and Y. Chen and E. Keogh},
Booktitle = {Proceedings of the 14th {IEEE} International Conference on Data Mining},
Year = {2014},
Pages = {470-479},
Abstract = {Recent years have seen significant progress in improving both the efficiency and effectiveness of time series classification. However, because the best solution is typically the Nearest Neighbor algorithm with the relatively expensive Dynamic Time Warping as the distance measure, successful deployments on resource constrained devices remain elusive. Moreover, the recent explosion of interest in wearable devices, which typically have limited computational resources, has created a growing need for very efficient classification algorithms. A commonly used technique to glean the benefits of the Nearest Neighbor algorithm, without inheriting its undesirable time complexity, is to use the Nearest Centroid algorithm. However, because of the unique properties of (most) time series data, the centroid typically does not resemble any of the instances, an unintuitive and underappreciated fact. In this work we show that we can exploit a recent result to allow meaningful averaging of 'warped' times series, and that this result allows us to create ultra-efficient Nearest 'Centroid' classifiers that are at least as accurate as their more lethargic Nearest Neighbor cousins.},
Comment = {One of nine papers invited to Knowledge and Information Systems journal ICDM-14 special issue},
Keywords = {time series},
Related = {scalable-time-series-classifiers},
Url = {http://dx.doi.org/10.1109/ICDM.2014.27}
}
ABSTRACT Recent years have seen significant progress in improving both the efficiency and effectiveness of time series classification. However, because the best solution is typically the Nearest Neighbor algorithm with the relatively expensive Dynamic Time Warping as the distance measure, successful deployments on resource constrained devices remain elusive. Moreover, the recent explosion of interest in wearable devices, which typically have limited computational resources, has created a growing need for very efficient classification algorithms. A commonly used technique to glean the benefits of the Nearest Neighbor algorithm, without inheriting its undesirable time complexity, is to use the Nearest Centroid algorithm. However, because of the unique properties of (most) time series data, the centroid typically does not resemble any of the instances, an unintuitive and underappreciated fact. In this work we show that we can exploit a recent result to allow meaningful averaging of 'warped' times series, and that this result allows us to create ultra-efficient Nearest 'Centroid' classifiers that are at least as accurate as their more lethargic Nearest Neighbor cousins.