Skip to content

Scalable learning of time series classifiers

Time series are a primary means of capturing dynamic processes. Driven by big data applications including mapping of land use from satellite observations over time, we are revolutionising time series classification by developing technologies that can learn from and accurately classify orders of magnitude larger time series collections than the previous state of the art.

MiniROCKET uses convolutional filters from deep learning to extract from time series diverse features of many types that have previously each been addressed by specialised techniques. MiniROCKET generates a wide variety of these filters and then uses them to extract features from each series. A simple linear classifer can learn from these features models that are as accurate as the prior state-of-the-art, but can do so in a fraction of the time and create models that classify with blistering speed. An implementation can be downloaded here. The paper can be downloaded here.

Proximity Forest provides a significant advance on the state of the art in time series classification. By coupling the efficiency of divide and conquer tree classifiers with the effectiveness of specialised similarity measures specifically designed for time series, Proximity Forest achieves very high accuracy for modest computation. An implementation can be downloaded here. The paper can be downloaded here.

InceptionTime brings the power of deep learning to time series classification. An implementation can be downloaded here. The paper can be downloaded here.

LB Webb and LB Enhanced are our novel lower bounds for Dynamic Time Warping that are both faster and tighter than the popular LB_Keogh. Implementation can be downloaded here and here. The papers can be downloaded here and here.

The following is a blog post on the use of Barycentric averaging in time series classification: http://www.kdnuggets.com/2014/12/averaging-improves-accuracy-speed-time-series-classification.html. The code can be downloaded here: http://francois-petitjean.com/Research/ICDM2014-DTW/index.php. The slides for the ICDM 2014 paper can be downloaded here: http://francois-petitjean.com/Research/ICDM2014-DTW/Slides.pdf.

The TSI software for the SDM 2017 paper on time series indexing can be downloaded here: https://github.com/ChangWeiTan/TSI. Slides for the SDM 2017 paper can be downloaded here: http://francois-petitjean.com/Research/SDM17-slides.pdf.

The software for the best paper award winning SDM 2018 paper on finding the best warping window can be downloaded here: https://github.com/ChangWeiTan/FastWWSearch (Matlab version).

Publications

Time series extrinsic regression.
Tan, C. W., Bergmeir, C., Petitjean, F., & Webb, G. I.
Data Mining and Knowledge Discovery, in press.
[DOI] [Bibtex] [Abstract]

@Article{RN3458,
author = {Tan, Chang Wei and Bergmeir, Christoph and Petitjean, François and Webb, Geoffrey I.},
journal = {Data Mining and Knowledge Discovery},
title = {Time series extrinsic regression},
year = {in press},
issn = {1573-756X},
abstract = {This paper studies time series extrinsic regression (TSER): a regression task of which the aim is to learn the relationship between a time series and a continuous scalar variable; a task closely related to time series classification (TSC), which aims to learn the relationship between a time series and a categorical class label. This task generalizes time series forecasting, relaxing the requirement that the value predicted be a future value of the input series or primarily depend on more recent values. In this paper, we motivate and study this task, and benchmark existing solutions and adaptations of TSC algorithms on a novel archive of 19 TSER datasets which we have assembled. Our results show that the state-of-the-art TSC algorithm Rocket, when adapted for regression, achieves the highest overall accuracy compared to adaptations of other TSC algorithms and state-of-the-art machine learning (ML) algorithms such as XGBoost, Random Forest and Support Vector Regression. More importantly, we show that much research is needed in this field to improve the accuracy of ML models. We also find evidence that further research has excellent prospects of improving upon these straightforward baselines.},
doi = {10.1007/s10618-021-00745-9},
keywords = {time series},
publisher = {Springer US},
related = {scalable-time-series-classifiers},
url = {https://rdcu.be/cgCAn},
}
ABSTRACT This paper studies time series extrinsic regression (TSER): a regression task of which the aim is to learn the relationship between a time series and a continuous scalar variable; a task closely related to time series classification (TSC), which aims to learn the relationship between a time series and a categorical class label. This task generalizes time series forecasting, relaxing the requirement that the value predicted be a future value of the input series or primarily depend on more recent values. In this paper, we motivate and study this task, and benchmark existing solutions and adaptations of TSC algorithms on a novel archive of 19 TSER datasets which we have assembled. Our results show that the state-of-the-art TSC algorithm Rocket, when adapted for regression, achieves the highest overall accuracy compared to adaptations of other TSC algorithms and state-of-the-art machine learning (ML) algorithms such as XGBoost, Random Forest and Support Vector Regression. More importantly, we show that much research is needed in this field to improve the accuracy of ML models. We also find evidence that further research has excellent prospects of improving upon these straightforward baselines.

A Bayesian-inspired, deep learning-based, semi-supervised domain adaptation technique for land cover mapping.
Lucas, B., Pelletier, C., Schmidt, D., Webb, G. I., & Petitjean, F.
Machine Learning, in press.
[DOI] [Bibtex] [Abstract]

@Article{lucas2021bayesian,
author = {Lucas, Benjamin and Pelletier, Charlotte and Schmidt, Daniel and Webb, Geoffrey I and Petitjean, Fran{\c{c}}ois},
journal = {Machine Learning},
title = {A Bayesian-inspired, deep learning-based, semi-supervised domain adaptation technique for land cover mapping},
year = {in press},
abstract = {Land cover maps are a vital input variable to many types of environmental research and management. While they can be produced automatically by machine learning techniques, these techniques require substantial training data to achieve high levels of accuracy, which are not always available. One technique researchers use when labelled training data are scarce is domain adaptation (DA)—where data from an alternate region, known as the source domain, are used to train a classifier and this model is adapted to map the study region, or target domain. The scenario we address in this paper is known as semi-supervised DA, where some labelled samples are available in the target domain. In this paper we present Sourcerer, a Bayesian-inspired, deep learning-based, semi-supervised DA technique for producing land cover maps from satellite image time series (SITS) data. The technique takes a convolutional neural network trained on a source domain and then trains further on the available target domain with a novel regularizer applied to the model weights. The regularizer adjusts the degree to which the model is modified to fit the target data, limiting the degree of change when the target data are few in number and increasing it as target data quantity increases. Our experiments on Sentinel-2 time series images compare Sourcerer with two state-of-the-art semi-supervised domain adaptation techniques and four baseline models. We show that on two different source-target domain pairings Sourcerer outperforms all other methods for any quantity of labelled target data available. In fact, the results on the more difficult target domain show that the starting accuracy of Sourcerer (when no labelled target data are available), 74.2%, is greater than the next-best state-of-the-art method trained on 20,000 labelled target instances.},
doi = {10.1007/s10994-020-05942-z},
keywords = {time series},
publisher = {Springer US},
related = {scalable-time-series-classifiers},
}
ABSTRACT Land cover maps are a vital input variable to many types of environmental research and management. While they can be produced automatically by machine learning techniques, these techniques require substantial training data to achieve high levels of accuracy, which are not always available. One technique researchers use when labelled training data are scarce is domain adaptation (DA)—where data from an alternate region, known as the source domain, are used to train a classifier and this model is adapted to map the study region, or target domain. The scenario we address in this paper is known as semi-supervised DA, where some labelled samples are available in the target domain. In this paper we present Sourcerer, a Bayesian-inspired, deep learning-based, semi-supervised DA technique for producing land cover maps from satellite image time series (SITS) data. The technique takes a convolutional neural network trained on a source domain and then trains further on the available target domain with a novel regularizer applied to the model weights. The regularizer adjusts the degree to which the model is modified to fit the target data, limiting the degree of change when the target data are few in number and increasing it as target data quantity increases. Our experiments on Sentinel-2 time series images compare Sourcerer with two state-of-the-art semi-supervised domain adaptation techniques and four baseline models. We show that on two different source-target domain pairings Sourcerer outperforms all other methods for any quantity of labelled target data available. In fact, the results on the more difficult target domain show that the starting accuracy of Sourcerer (when no labelled target data are available), 74.2%, is greater than the next-best state-of-the-art method trained on 20,000 labelled target instances.

Tight lower bounds for Dynamic Time Warping.
Webb, G. I., & Petitjean, F.
Pattern Recognition, 115, Art. no. 107895, 2021.
[DOI] [Bibtex] [Abstract]

@Article{WEBB2021107895,
author = {Geoffrey I. Webb and Fran\c{c}ois Petitjean},
journal = {Pattern Recognition},
title = {Tight lower bounds for Dynamic Time Warping},
year = {2021},
issn = {0031-3203},
volume = {115},
abstract = {Dynamic Time Warping (DTW) is a popular similarity measure for aligning and comparing time series. Due to DTW’s high computation time, lower bounds are often employed to screen poor matches. Many alternative lower bounds have been proposed, providing a range of different trade-offs between tightness and computational efficiency. LB_KEOGH provides a useful trade-off in many applications. Two recent lower bounds, LB_IMPROVED and LB_ENHANCED, are substantially tighter than LB_KEOGH. All three have the same worst case computational complexity—linear with respect to series length and constant with respect to window size. We present four new DTW lower bounds in the same complexity class. LB_PETITJEAN is substantially tighter than LB_IMPROVED, with only modest additional computational overhead. LB_WEBB is more efficient than LB_IMPROVED, while often providing a tighter bound. LB_WEBB is always tighter than LB_KEOGH. The parameter free LB_WEBB is usually tighter than LB_ENHANCED. A parameterized variant, LB_Webb_Enhanced, is always tighter than LB_ENHANCED. A further variant, LB_WEBB*, is useful for some constrained distance functions. In extensive experiments, LB_WEBB proves to be very effective for nearest neighbor search.},
articlenumber = {107895},
doi = {10.1016/j.patcog.2021.107895},
keywords = {time series},
related = {scalable-time-series-classifiers},
}
ABSTRACT Dynamic Time Warping (DTW) is a popular similarity measure for aligning and comparing time series. Due to DTW’s high computation time, lower bounds are often employed to screen poor matches. Many alternative lower bounds have been proposed, providing a range of different trade-offs between tightness and computational efficiency. LB_KEOGH provides a useful trade-off in many applications. Two recent lower bounds, LB_IMPROVED and LB_ENHANCED, are substantially tighter than LB_KEOGH. All three have the same worst case computational complexity—linear with respect to series length and constant with respect to window size. We present four new DTW lower bounds in the same complexity class. LB_PETITJEAN is substantially tighter than LB_IMPROVED, with only modest additional computational overhead. LB_WEBB is more efficient than LB_IMPROVED, while often providing a tighter bound. LB_WEBB is always tighter than LB_KEOGH. The parameter free LB_WEBB is usually tighter than LB_ENHANCED. A parameterized variant, LB_Webb_Enhanced, is always tighter than LB_ENHANCED. A further variant, LB_WEBB*, is useful for some constrained distance functions. In extensive experiments, LB_WEBB proves to be very effective for nearest neighbor search.

FastEE: Fast Ensembles of Elastic Distances for time series classification.
Tan, C. W., Petitjean, F., & Webb, G. I.
Data Mining and Knowledge Discovery, 34(1), 231-272, 2020.
[DOI] [Bibtex] [Abstract]

@Article{Tan2019,
Title = {FastEE: Fast Ensembles of Elastic Distances for time series classification},
Author = {Tan, Chang Wei
and Petitjean, Fran{\c{c}}ois
and Webb, Geoffrey I.},
Journal = {Data Mining and Knowledge Discovery},
Year = {2020},
Number = {1},
Pages = {231-272},
Volume = {34},
Abstract = {In recent years, many new ensemble-based time series classification (TSC) algorithms have been proposed. Each of them is significantly more accurate than their predecessors. The Hierarchical Vote Collective of Transformation-based Ensembles (HIVE-COTE) is currently the most accurate TSC algorithm when assessed on the UCR repository. It is a meta-ensemble of 5 state-of-the-art ensemble-based classifiers. The time complexity of HIVE-COTE---particularly for training---is prohibitive for most datasets. There is thus a critical need to speed up the classifiers that compose HIVE-COTE. This paper focuses on speeding up one of its components: Ensembles of Elastic Distances (EE), which is the classifier that leverages on the decades of research into the development of time-dedicated measures. Training EE can be prohibitive for many datasets. For example, it takes a month on the ElectricDevices dataset with 9000 instances. This is because EE needs to cross-validate the hyper-parameters used for the 11 similarity measures it encompasses. In this work, Fast Ensembles of Elastic Distances is proposed to train EE faster. There are two versions to it. The exact version makes it possible to train EE 10 times faster. The approximate version is 40 times faster than EE without significantly impacting the classification accuracy. This translates to being able to train EE on ElectricDevices in 13 h.},
Day = {18},
Doi = {10.1007/s10618-019-00663-x},
ISSN = {1573-756X},
Keywords = {time series},
Related = {scalable-time-series-classifiers}
}
ABSTRACT In recent years, many new ensemble-based time series classification (TSC) algorithms have been proposed. Each of them is significantly more accurate than their predecessors. The Hierarchical Vote Collective of Transformation-based Ensembles (HIVE-COTE) is currently the most accurate TSC algorithm when assessed on the UCR repository. It is a meta-ensemble of 5 state-of-the-art ensemble-based classifiers. The time complexity of HIVE-COTE–-particularly for training–-is prohibitive for most datasets. There is thus a critical need to speed up the classifiers that compose HIVE-COTE. This paper focuses on speeding up one of its components: Ensembles of Elastic Distances (EE), which is the classifier that leverages on the decades of research into the development of time-dedicated measures. Training EE can be prohibitive for many datasets. For example, it takes a month on the ElectricDevices dataset with 9000 instances. This is because EE needs to cross-validate the hyper-parameters used for the 11 similarity measures it encompasses. In this work, Fast Ensembles of Elastic Distances is proposed to train EE faster. There are two versions to it. The exact version makes it possible to train EE 10 times faster. The approximate version is 40 times faster than EE without significantly impacting the classification accuracy. This translates to being able to train EE on ElectricDevices in 13 h.

InceptionTime: Finding AlexNet for Time Series Classification.
Fawaz, H. I., Lucas, B., Forestier, G., Pelletier, C., Schmidt, D. F., Weber, J., Webb, G. I., Idoumghar, L., Muller, P., & Petitjean, F.
Data Mining and Knowledge Discovery, 34, 1936-1962, 2020.
[DOI] [Bibtex]

@Article{fawaz2019inceptiontime,
author = {Hassan Ismail Fawaz and Benjamin Lucas and Germain Forestier and Charlotte Pelletier and Daniel F. Schmidt and Jonathan Weber and Geoffrey I. Webb and Lhassane Idoumghar and Pierre-Alain Muller and François Petitjean},
journal = {Data Mining and Knowledge Discovery},
title = {InceptionTime: Finding AlexNet for Time Series Classification},
year = {2020},
pages = {1936-1962},
volume = {34},
doi = {10.1007/s10618-020-00710-y},
issue = {6},
keywords = {time series},
related = {scalable-time-series-classifiers},
url = {https://rdcu.be/b6TXh},
}
ABSTRACT 

TS-CHIEF: A Scalable and Accurate Forest Algorithm for Time Series Classification.
Shifaz, A., Pelletier, C., Petitjean, F., & Webb, G. I.
Data Mining and Knowledge Discovery, 34(3), 742-775, 2020.
[DOI] [Bibtex] [Abstract]

@Article{shifazetal2019,
Title = {TS-CHIEF: A Scalable and Accurate Forest Algorithm for Time Series Classification},
Author = {Shifaz, Ahmed and Pelletier, Charlotte and Petitjean, Francois and Webb, Geoffrey I},
Journal = {Data Mining and Knowledge Discovery},
Year = {2020},
Number = {3},
Pages = {742-775},
Volume = {34},
Abstract = {Time Series Classification (TSC) has seen enormous progress over the last two decades. HIVE-COTE (Hierarchical Vote Collective of Transformation-based Ensembles) is the current state of the art in terms of classification accuracy. HIVE-COTE recognizes that time series data are a specific data type for which the traditional attribute-value representation, used predominantly in machine learning, fails to provide a relevant representation. HIVE-COTE combines multiple types of classifiers: each extracting information about a specific aspect of a time series, be it in the time domain, frequency domain or summarization of intervals within the series. However, HIVE-COTE (and its predecessor, FLAT-COTE) is often infeasible to run on even modest amounts of data. For instance, training HIVE-COTE on a dataset with only 1500 time series can require 8 days of CPU time. It has polynomial runtime with respect to the training set size, so this problem compounds as data quantity increases. We propose a novel TSC algorithm, TS-CHIEF (Time Series Combination of Heterogeneous and Integrated Embedding Forest), which rivals HIVE-COTE in accuracy but requires only a fraction of the runtime. TS-CHIEF constructs an ensemble classifier that integrates the most effective embeddings of time series that research has developed in the last decade. It uses tree-structured classifiers to do so efficiently. We assess TS-CHIEF on 85 datasets of the University of California Riverside (UCR) archive, where it achieves state-of-the-art accuracy with scalability and efficiency. We demonstrate that TS-CHIEF can be trained on 130 k time series in 2 days, a data quantity that is beyond the reach of any TSC algorithm with comparable accuracy.},
Doi = {10.1007/s10618-020-00679-8},
Keywords = {time series},
Related = {scalable-time-series-classifiers},
Url = {https://arxiv.org/abs/1906.10329}
}
ABSTRACT Time Series Classification (TSC) has seen enormous progress over the last two decades. HIVE-COTE (Hierarchical Vote Collective of Transformation-based Ensembles) is the current state of the art in terms of classification accuracy. HIVE-COTE recognizes that time series data are a specific data type for which the traditional attribute-value representation, used predominantly in machine learning, fails to provide a relevant representation. HIVE-COTE combines multiple types of classifiers: each extracting information about a specific aspect of a time series, be it in the time domain, frequency domain or summarization of intervals within the series. However, HIVE-COTE (and its predecessor, FLAT-COTE) is often infeasible to run on even modest amounts of data. For instance, training HIVE-COTE on a dataset with only 1500 time series can require 8 days of CPU time. It has polynomial runtime with respect to the training set size, so this problem compounds as data quantity increases. We propose a novel TSC algorithm, TS-CHIEF (Time Series Combination of Heterogeneous and Integrated Embedding Forest), which rivals HIVE-COTE in accuracy but requires only a fraction of the runtime. TS-CHIEF constructs an ensemble classifier that integrates the most effective embeddings of time series that research has developed in the last decade. It uses tree-structured classifiers to do so efficiently. We assess TS-CHIEF on 85 datasets of the University of California Riverside (UCR) archive, where it achieves state-of-the-art accuracy with scalability and efficiency. We demonstrate that TS-CHIEF can be trained on 130 k time series in 2 days, a data quantity that is beyond the reach of any TSC algorithm with comparable accuracy.

ROCKET: Exceptionally fast and accurate time series classification using random convolutional kernels.
Dempster, A., Petitjean, F., & Webb, G. I.
Data Mining and Knowledge Discovery, 34, 1454-1495, 2020.
[DOI] [Bibtex] [Abstract]

@Article{dempster2020rocket,
author = {Angus Dempster and Francois Petitjean and Geoffrey I. Webb},
journal = {Data Mining and Knowledge Discovery},
title = {ROCKET: Exceptionally fast and accurate time series classification using random convolutional kernels},
year = {2020},
pages = {1454 - 1495},
volume = {34},
abstract = {Most methods for time series classification that attain state-of-the-art accuracy have high computational complexity, requiring significant training time even for smaller datasets, and are intractable for larger datasets. Additionally, many existing methods focus on a single type of feature such as shape or frequency. Building on the recent success of convolutional neural networks for time series classification, we show that simple linear classifiers using random convolutional kernels achieve state-of-the-art accuracy with a fraction of the computational expense of existing methods. Using this method, it is possible to train and test a classifier on all 85 ‘bake off’ datasets in the UCR archive in <2h, and it is possible to train a classifier on a large dataset of more than one million time series in approximately 1 h.},
doi = {10.1007/s10618-020-00701-z},
issue = {5},
keywords = {time series},
related = {scalable-time-series-classifiers},
}
ABSTRACT Most methods for time series classification that attain state-of-the-art accuracy have high computational complexity, requiring significant training time even for smaller datasets, and are intractable for larger datasets. Additionally, many existing methods focus on a single type of feature such as shape or frequency. Building on the recent success of convolutional neural networks for time series classification, we show that simple linear classifiers using random convolutional kernels achieve state-of-the-art accuracy with a fraction of the computational expense of existing methods. Using this method, it is possible to train and test a classifier on all 85 ‘bake off’ datasets in the UCR archive in <2h, and it is possible to train a classifier on a large dataset of more than one million time series in approximately 1 h.

MINIROCKET: A Very Fast (Almost) Deterministic Transform for Time Series Classification.
Dempster, A., Schmidt, D. F., & Webb, G. I.
arxiv, 2012.08791, 2020.
[URL] [Bibtex] [Abstract]

@Article{DempsterEtAl2021,
author = {Dempster, Angus and Schmidt, Daniel F. and Webb, Geoffrey I.},
journal = {arxiv},
title = {MINIROCKET: A Very Fast (Almost) Deterministic Transform for Time Series Classification},
year = {2020},
pages = {2012.08791},
abstract = {Until recently, the most accurate methods for time series classification were limited by high computational complexity. ROCKET achieves state-of-the-art accuracy with a fraction of the computational expense of most existing methods by transforming input time series using random convolutional kernels, and using the transformed features to train a linear classifier. We reformulate ROCKET into a new method, MINIROCKET, making it up to 75 times faster on larger datasets, and making it almost deterministic (and optionally, with additional computational expense, fully deterministic), while maintaining essentially the same accuracy. Using this method, it is possible to train and test a classifier on all of 109 datasets from the UCR archive to state-of-the-art accuracy in less than 10 minutes. MINIROCKET is significantly faster than any other method of comparable accuracy (including ROCKET), and significantly more accurate than any other method of even roughly-similar computational expense. As such, we suggest that MINIROCKET should now be considered and used as the default variant of ROCKET.},
keywords = {time series},
related = {scalable-time-series-classifiers},
url = {https://arxiv.org/abs/2012.08791},
}
ABSTRACT Until recently, the most accurate methods for time series classification were limited by high computational complexity. ROCKET achieves state-of-the-art accuracy with a fraction of the computational expense of most existing methods by transforming input time series using random convolutional kernels, and using the transformed features to train a linear classifier. We reformulate ROCKET into a new method, MINIROCKET, making it up to 75 times faster on larger datasets, and making it almost deterministic (and optionally, with additional computational expense, fully deterministic), while maintaining essentially the same accuracy. Using this method, it is possible to train and test a classifier on all of 109 datasets from the UCR archive to state-of-the-art accuracy in less than 10 minutes. MINIROCKET is significantly faster than any other method of comparable accuracy (including ROCKET), and significantly more accurate than any other method of even roughly-similar computational expense. As such, we suggest that MINIROCKET should now be considered and used as the default variant of ROCKET.

Unsupervised Domain Adaptation Techniques for Classification of Satellite Image Time Series.
Lucas, B., Pelletier, C., Schmidt, D., Webb, G. I., & Petitjean, F.
IGARSS 2020-2020 IEEE International Geoscience and Remote Sensing Symposium, pp. 1074–1077, 2020.
[DOI] [Bibtex] [Abstract]

@InProceedings{lucas2020unsupervised,
author = {Lucas, Benjamin and Pelletier, Charlotte and Schmidt, Daniel and Webb, Geoffrey I and Petitjean, Fran{\c{c}}ois},
booktitle = {IGARSS 2020-2020 IEEE International Geoscience and Remote Sensing Symposium},
title = {Unsupervised Domain Adaptation Techniques for Classification of Satellite Image Time Series},
year = {2020},
organization = {IEEE},
pages = {1074--1077},
abstract = {Land cover maps are vitally important to many elements of environmental management. However the machine learning algorithms used to produce them require a substantive quantity of labelled training data to reach the best levels of accuracy. When researchers wish to map an area where no labelled training data are available, one potential solution is to use a classifier trained on another geographical area and adapting it to the target location-this is known as Unsupervised Domain Adaptation (DA). In this paper we undertake the first experiments using unsupervised DA methods for the classification of satellite image time series (SITS) data. Our experiments draw the interesting conclusion that existing methods provide no benefit when used on SITS data, and that this is likely due to the temporal nature of the data and the change in class distributions between the regions. This suggests that an unsupervised domain adaptation technique for SITS would be extremely beneficial for land cover mapping.},
doi = {0.1109/IGARSS39084.2020.9324339},
keywords = {time series},
related = {scalable-time-series-classifiers},
}
ABSTRACT Land cover maps are vitally important to many elements of environmental management. However the machine learning algorithms used to produce them require a substantive quantity of labelled training data to reach the best levels of accuracy. When researchers wish to map an area where no labelled training data are available, one potential solution is to use a classifier trained on another geographical area and adapting it to the target location-this is known as Unsupervised Domain Adaptation (DA). In this paper we undertake the first experiments using unsupervised DA methods for the classification of satellite image time series (SITS) data. Our experiments draw the interesting conclusion that existing methods provide no benefit when used on SITS data, and that this is likely due to the temporal nature of the data and the change in class distributions between the regions. This suggests that an unsupervised domain adaptation technique for SITS would be extremely beneficial for land cover mapping.

Deep Learning for the Classification of Sentinel-2 Image Series.
Pelletier, C., Webb, G. I., & Petitjean, F.
IEEE International Geoscience And Remote Sensing Symposium, 2019.
[DOI] [Bibtex] [Abstract]

@InProceedings{PelletierEtAl19b,
Title = {Deep Learning for the Classification of Sentinel-2 Image Series},
Author = {Pelletier, Charlotte and Webb, Geoffrey I. and Petitjean, Francois},
Booktitle = {IEEE International Geoscience And Remote Sensing Symposium},
Year = {2019},
Month = {Jul},
Abstract = {Satellite image time series (SITS) have proven to be essential for accurate and up-to-date land cover mapping over large areas. Most works about SITS have focused on the use of traditional classification algorithms such as Random Forests (RFs). Deep learning algorithms have been very successful for supervised tasks, in particular for data that exhibit a structure between attributes, such as space or time. In this work, we compare for the first time RFs to the two leading deep learning algorithms for handling temporal data: Recurrent Neural Networks (RNNs) and temporal Convolutional Neural Networks (TempCNNs). We carry out a large experiment using Sentinel-2 time series. We compare both accuracy and computational times to classify 10,980 km 2 over Australia. The results highlights the good performance of TemCNNs that obtain the highest accuracy. They also show that RNNs might be less suited for large scale study as they have higher runtime complexity.},
Doi = {10.1109/IGARSS.2019.8900123},
Keywords = {time series},
Related = {scalable-time-series-classifiers}
}
ABSTRACT Satellite image time series (SITS) have proven to be essential for accurate and up-to-date land cover mapping over large areas. Most works about SITS have focused on the use of traditional classification algorithms such as Random Forests (RFs). Deep learning algorithms have been very successful for supervised tasks, in particular for data that exhibit a structure between attributes, such as space or time. In this work, we compare for the first time RFs to the two leading deep learning algorithms for handling temporal data: Recurrent Neural Networks (RNNs) and temporal Convolutional Neural Networks (TempCNNs). We carry out a large experiment using Sentinel-2 time series. We compare both accuracy and computational times to classify 10,980 km 2 over Australia. The results highlights the good performance of TemCNNs that obtain the highest accuracy. They also show that RNNs might be less suited for large scale study as they have higher runtime complexity.

Using Sentinel-2 Image Time Series to map the State of Victoria, Australia.
Pelletier, C., Ji, Z., Hagolle, O., Morse-McNabb, E., Sheffield, K., Webb, G. I., & Petitjean, F.
Proceedings 10th International Workshop on the Analysis of Multitemporal Remote Sensing Images, MultiTemp 2019, 2019.
[DOI] [Bibtex] [Abstract]

@InProceedings{PelletierEtAl19c,
Title = {Using Sentinel-2 Image Time Series to map the State of Victoria, Australia},
Author = {Pelletier, C. and Ji, Z. and Hagolle, O. and Morse-McNabb, E. and Sheffield, K. and Webb, G. I. and Petitjean, F.},
Booktitle = {Proceedings 10th International Workshop on the Analysis of Multitemporal Remote Sensing Images, MultiTemp 2019},
Year = {2019},
Abstract = {Sentinel-2 satellites are now acquiring images of the entire Earth every five days from 10 to 60 m spatial resolution. The supervised classification of this new optical image time series allows the operational production of accurate land cover maps over large areas. In this paper, we investigate the use of one year of Sentinel-2 data to map the state of Victoria in Australia. In particular, we produce two land cover maps using the most established and advanced algorithms in time series classification: Random Forest (RF) and Temporal Convolutional Neural Network (TempCNN). To our knowledge, these are the first land cover maps at 10 m spatial resolution for an Australian state.},
Doi = {10.1109/Multi-Temp.2019.8866921},
Keywords = {cartography;convolutional neural nets;geophysical image processing;image classification;image resolution;land cover;optical images;optical information processing;remote sensing;terrain mapping;time series;TempCNN;temporal convolutional neural network;random forest;land cover maps;Victoria state;Australian state;spatial resolution;time series classification;Sentinel-2 data;accurate land cover maps;operational production;optical image time series;supervised classification;Sentinel-2 satellites;Australia;sentinel-2 image time series;Radio frequency;Australia;Spatial resolution;Time series analysis;Agriculture;Convolutional neural networks;Sentinel-2 images;land cover map;time series;Temporal Convolutional Neural Networks;Random Forests}
}
ABSTRACT Sentinel-2 satellites are now acquiring images of the entire Earth every five days from 10 to 60 m spatial resolution. The supervised classification of this new optical image time series allows the operational production of accurate land cover maps over large areas. In this paper, we investigate the use of one year of Sentinel-2 data to map the state of Victoria in Australia. In particular, we produce two land cover maps using the most established and advanced algorithms in time series classification: Random Forest (RF) and Temporal Convolutional Neural Network (TempCNN). To our knowledge, these are the first land cover maps at 10 m spatial resolution for an Australian state.

Temporal Convolutional Neural Network for the Classification of Satellite Image Time Series.
Pelletier, C., Webb, G. I., & Petitjean, F.
Remote Sensing, 11(5), Art. no. 523, 2019.
[DOI] [Bibtex] [Abstract]

@Article{PelletierEtAl19,
Title = {Temporal Convolutional Neural Network for the Classification of Satellite Image Time Series},
Author = {Pelletier, Charlotte and Webb, Geoffrey I. and Petitjean, Francois},
Journal = {Remote Sensing},
Year = {2019},
Number = {5},
Volume = {11},
Abstract = {Latest remote sensing sensors are capable of acquiring high spatial and spectral Satellite Image Time Series (SITS) of the world. These image series are a key component of classification systems that aim at obtaining up-to-date and accurate land cover maps of the Earth’s surfaces. More specifically, current SITS combine high temporal, spectral and spatial resolutions, which makes it possible to closely monitor vegetation dynamics. Although traditional classification algorithms, such as Random Forest (RF), have been successfully applied to create land cover maps from SITS, these algorithms do not make the most of the temporal domain. This paper proposes a comprehensive study of Temporal Convolutional Neural Networks (TempCNNs), a deep learning approach which applies convolutions in the temporal dimension in order to automatically learn temporal (and spectral) features. The goal of this paper is to quantitatively and qualitatively evaluate the contribution of TempCNNs for SITS classification, as compared to RF and Recurrent Neural Networks (RNNs) —a standard deep learning approach that is particularly suited to temporal data. We carry out experiments on Formosat-2 scene with 46 images and one million labelled time series. The experimental results show that TempCNNs are more accurate than the current state of the art for SITS classification. We provide some general guidelines on the network architecture, common regularization mechanisms, and hyper-parameter values such as batch size; we also draw out some differences with standard results in computer vision (e.g., about pooling layers). Finally, we assess the visual quality of the land cover maps produced by TempCNNs.},
Articlenumber = {523},
Doi = {10.3390/rs11050523},
ISSN = {2072-4292},
Keywords = {time series},
Related = {scalable-time-series-classifiers}
}
ABSTRACT Latest remote sensing sensors are capable of acquiring high spatial and spectral Satellite Image Time Series (SITS) of the world. These image series are a key component of classification systems that aim at obtaining up-to-date and accurate land cover maps of the Earth’s surfaces. More specifically, current SITS combine high temporal, spectral and spatial resolutions, which makes it possible to closely monitor vegetation dynamics. Although traditional classification algorithms, such as Random Forest (RF), have been successfully applied to create land cover maps from SITS, these algorithms do not make the most of the temporal domain. This paper proposes a comprehensive study of Temporal Convolutional Neural Networks (TempCNNs), a deep learning approach which applies convolutions in the temporal dimension in order to automatically learn temporal (and spectral) features. The goal of this paper is to quantitatively and qualitatively evaluate the contribution of TempCNNs for SITS classification, as compared to RF and Recurrent Neural Networks (RNNs) —a standard deep learning approach that is particularly suited to temporal data. We carry out experiments on Formosat-2 scene with 46 images and one million labelled time series. The experimental results show that TempCNNs are more accurate than the current state of the art for SITS classification. We provide some general guidelines on the network architecture, common regularization mechanisms, and hyper-parameter values such as batch size; we also draw out some differences with standard results in computer vision (e.g., about pooling layers). Finally, we assess the visual quality of the land cover maps produced by TempCNNs.

Proximity Forest: an effective and scalable distance-based classifier for time series.
Lucas, B., Shifaz, A., Pelletier, C., O'Neill, L., Zaidi, N., Goethals, B., Petitjean, F., & Webb, G. I.
Data Mining and Knowledge Discovery, 33, 607-635, 2019.
[DOI] [Bibtex] [Abstract]

@Article{LucasEtAl2019,
Title = {Proximity Forest: an effective and scalable distance-based classifier for time series},
Author = {Lucas, Benjamin and Shifaz, Ahmed and Pelletier, Charlotte and O'Neill, Lachlan and Zaidi, Nayyar and Goethals, Bart and Petitjean, Francois and Webb, Geoffrey I.},
Journal = {Data Mining and Knowledge Discovery},
Year = {2019},
Pages = {607-635},
Volume = {33},
Abstract = {Research into the classification of time series has made enormous progress in the last decade. The UCR time series archive has played a significant role in challenging and guiding the development of new learners for time series classification. The largest dataset in the UCR archive holds 10,000  time series only; which may explain why the primary research focus has been on creating algorithms that have high accuracy on relatively small datasets. This paper introduces Proximity Forest, an algorithm that learns accurate models from datasets with millions of time series, and classifies a time series in milliseconds. The models are ensembles of highly randomized Proximity Trees. Whereas conventional decision trees branch on attribute values (and usually perform poorly on time series), Proximity Trees branch on the proximity of time series to one exemplar time series or another; allowing us to leverage the decades of work into developing relevant measures for time series. Proximity Forest gains both efficiency and accuracy by stochastic selection of both exemplars and similarity measures. Our work is motivated by recent time series applications that provide orders of magnitude more time series than the UCR benchmarks. Our experiments demonstrate that Proximity Forest is highly competitive on the UCR archive: it ranks among the most accurate classifiers while being significantly faster. We demonstrate on a 1M time series Earth observation dataset that Proximity Forest retains this accuracy on datasets that are many orders of magnitude greater than those in the UCR repository, while learning its models at least 100,000 times faster than current state-of-the-art models Elastic Ensemble and COTE.},
Doi = {10.1007/s10618-019-00617-3},
ISSN = {1573-756X},
Keywords = {time series},
Related = {scalable-time-series-classifiers},
Url = {http://em.rdcu.be/wf/click?upn=lMZy1lernSJ7apc5DgYM8bHEJ10ZZvhuXh3IDzKuxwo-3D_EiQ5rJCaH7kHXE05jBLNILvdOnEvtpll2LtM3EMuDoQOPhStTqn2GD-2B7F6FuAgI5OTQz622MbTs21-2FhZD0JMq0DthrX5gL70NXPgfox249pWyTtZBzPrvlIg-2BMKdhnK-2FOdKSDsnyU2URIgjJCzSq-2BXTFaMywtQFXF0NjWgHfcePj0hCP1-2FVU7eEZuKJZkyZOmx5Ct5XYj1-2FVGZITf2I1fccopJ0d20GbG7tHnRatKCU0vVEsLavea2pstZO3rT1-2Fo-2Fh5hjuMhScoCGcKvA5pGA-3D-3D}
}
ABSTRACT Research into the classification of time series has made enormous progress in the last decade. The UCR time series archive has played a significant role in challenging and guiding the development of new learners for time series classification. The largest dataset in the UCR archive holds 10,000  time series only; which may explain why the primary research focus has been on creating algorithms that have high accuracy on relatively small datasets. This paper introduces Proximity Forest, an algorithm that learns accurate models from datasets with millions of time series, and classifies a time series in milliseconds. The models are ensembles of highly randomized Proximity Trees. Whereas conventional decision trees branch on attribute values (and usually perform poorly on time series), Proximity Trees branch on the proximity of time series to one exemplar time series or another; allowing us to leverage the decades of work into developing relevant measures for time series. Proximity Forest gains both efficiency and accuracy by stochastic selection of both exemplars and similarity measures. Our work is motivated by recent time series applications that provide orders of magnitude more time series than the UCR benchmarks. Our experiments demonstrate that Proximity Forest is highly competitive on the UCR archive: it ranks among the most accurate classifiers while being significantly faster. We demonstrate on a 1M time series Earth observation dataset that Proximity Forest retains this accuracy on datasets that are many orders of magnitude greater than those in the UCR repository, while learning its models at least 100,000 times faster than current state-of-the-art models Elastic Ensemble and COTE.

Exploring Data Quantity Requirements for Domain Adaptation in the Classification of Satellite Image Time Series.
Lucas, B., Pelletier, C., Inglada, J., Schmidt, D., Webb, G. I., & Petitjean, F.
Proceedings 10th International Workshop on the Analysis of Multitemporal Remote Sensing Images, MultiTemp 2019, 2019.
[DOI] [Bibtex] [Abstract]

@InProceedings{LucasEtAl2019b,
Title = {Exploring Data Quantity Requirements for Domain Adaptation in the Classification of Satellite Image Time Series},
Author = {Lucas, B. and Pelletier, C. and Inglada, J. and Schmidt, D. and Webb, G. I. and Petitjean, F},
Booktitle = {Proceedings 10th International Workshop on the Analysis of Multitemporal Remote Sensing Images, MultiTemp 2019},
Year = {2019},
Publisher = {IEEE, Institute of Electrical and Electronics Engineers},
Abstract = {Land cover maps are a vital input variable in all types of environmental research and management. However the modern state-of-The-Art machine learning techniques used to create them require substantial training data to produce optimal accuracy. Domain Adaptation is one technique researchers might use when labelled training data are unavailable or scarce. This paper looks at the result of training a convolutional neural network model on a region where data are available (source domain), and then adapting this model to another region (target domain) by retraining it on the available labelled data, and in particular how these results change with increasing data availability. Our experiments performing domain adaptation on satellite image time series, draw three interesting conclusions: (1) a model trained only on data from the source domain delivers 73.0% test accuracy on the target domain; (2) when all of the weights are retrained on the target data, over 16,000 instances were required to improve upon the accuracy of the source-only model; and (3) even if sufficient data is available in the target domain, using a model pretrained on a source domain will result in better overall test accuracy compared to a model trained on target domain data only-88.9% versus 84.7%.},
Doi = {10.1109/Multi-Temp.2019.8866898},
Keywords = {time series},
Related = {scalable-time-series-classifiers}
}
ABSTRACT Land cover maps are a vital input variable in all types of environmental research and management. However the modern state-of-The-Art machine learning techniques used to create them require substantial training data to produce optimal accuracy. Domain Adaptation is one technique researchers might use when labelled training data are unavailable or scarce. This paper looks at the result of training a convolutional neural network model on a region where data are available (source domain), and then adapting this model to another region (target domain) by retraining it on the available labelled data, and in particular how these results change with increasing data availability. Our experiments performing domain adaptation on satellite image time series, draw three interesting conclusions: (1) a model trained only on data from the source domain delivers 73.0% test accuracy on the target domain; (2) when all of the weights are retrained on the target data, over 16,000 instances were required to improve upon the accuracy of the source-only model; and (3) even if sufficient data is available in the target domain, using a model pretrained on a source domain will result in better overall test accuracy compared to a model trained on target domain data only-88.9% versus 84.7%.

Elastic bands across the path: A new framework and methods to lower bound DTW.
Tan, C. W., Petitjean, F., & Webb, G. I.
Proceedings of the 2019 SIAM International Conference on Data Mining, pp. 522-530, 2019.
[URL] [Bibtex] [Abstract]

@InProceedings{TanEtAl19,
Title = {Elastic bands across the path: A new framework and methods to lower bound DTW},
Author = {Tan, Chang Wei and Petitjean, Francois and Webb, Geoffrey I.},
Booktitle = {Proceedings of the 2019 SIAM International Conference on Data Mining},
Year = {2019},
Pages = {522-530},
Abstract = {There has been renewed recent interest in developing effective lower bounds for Dynamic Time Warping (DTW) distance between time series. These have many applications in time series indexing, clustering, forecasting, regression and classification. One of the key time series classification algorithms, the nearest neighbor algorithm with DTW distance (NN-DTW) is very expensive to compute, due to the quadratic complexity of DTW. Lower bound search can speed up NN-DTW substantially. An effective and tight lower bound quickly prunes off unpromising nearest neighbor candidates from the search space and minimises the number of the costly DTW computations. The speed up provided by lower bound search becomes increasingly critical as training set size increases. Different lower bounds provide different trade-offs between computation time and tightness. Most existing lower bounds interact with DTW warping window sizes. They are very tight and effective at smaller warping window sizes, but become looser as the warping window increases, thus reducing the pruning effectiveness for NN-DTW. In this work, we present a new class of lower bounds that are tighter than the popular Keogh lower bound, while requiring similar computation time. Our new lower bounds take advantage of the DTW boundary condition, monotonicity and continuity constraints to create a tighter lower bound. Of particular significance, they remain relatively tight even for large windows. A single parameter to these new lower bounds controls the speed-tightness trade-off. We demonstrate that these new lower bounds provide an exceptional balance between computation time and tightness for the NN-DTW time series classification task, resulting in greatly improved efficiency for NN-DTW lower bound search.},
Keywords = {time series},
Related = {scalable-time-series-classifiers},
Url = {https://arxiv.org/abs/1808.09617}
}
ABSTRACT There has been renewed recent interest in developing effective lower bounds for Dynamic Time Warping (DTW) distance between time series. These have many applications in time series indexing, clustering, forecasting, regression and classification. One of the key time series classification algorithms, the nearest neighbor algorithm with DTW distance (NN-DTW) is very expensive to compute, due to the quadratic complexity of DTW. Lower bound search can speed up NN-DTW substantially. An effective and tight lower bound quickly prunes off unpromising nearest neighbor candidates from the search space and minimises the number of the costly DTW computations. The speed up provided by lower bound search becomes increasingly critical as training set size increases. Different lower bounds provide different trade-offs between computation time and tightness. Most existing lower bounds interact with DTW warping window sizes. They are very tight and effective at smaller warping window sizes, but become looser as the warping window increases, thus reducing the pruning effectiveness for NN-DTW. In this work, we present a new class of lower bounds that are tighter than the popular Keogh lower bound, while requiring similar computation time. Our new lower bounds take advantage of the DTW boundary condition, monotonicity and continuity constraints to create a tighter lower bound. Of particular significance, they remain relatively tight even for large windows. A single parameter to these new lower bounds controls the speed-tightness trade-off. We demonstrate that these new lower bounds provide an exceptional balance between computation time and tightness for the NN-DTW time series classification task, resulting in greatly improved efficiency for NN-DTW lower bound search.

Efficient search of the best warping window for Dynamic Time Warping.
Tan, C. W., Herrmann, M., Forestier, G., Webb, G. I., & Petitjean, F.
Proceedings of the 2018 SIAM International Conference on Data Mining, pp. 459-467, 2018.
[PDF] [Bibtex] [Abstract]

@InProceedings{TanEtAl18,
Title = {Efficient search of the best warping window for Dynamic Time Warping},
Author = {Tan, Chang Wei and Herrmann, Matthieu and Forestier, Germain and Webb, Geoffrey I. and Petitjean, Francois},
Booktitle = {Proceedings of the 2018 {SIAM} International Conference on Data Mining},
Year = {2018},
Pages = {459-467},
Abstract = {Time series classification maps time series to labels. The nearest neighbour algorithm (NN) using the Dynamic Time Warping (DTW) similarity measure is a leading algorithm for this task and a component of the current best ensemble classifiers for time series. However, NN-DTW is only a winning combination when its meta-parameter – its warping window – is learned from the training data. The warping window (WW) intuitively controls the amount of distortion allowed when comparing a pair of time series. With a training database of N time series of lengths L, a naive approach to learning the WW requires Omega(N2·L3) operations. This often translates in NN-DTW requiring days for training on datasets containing a few thousand time series only. In this paper, we introduce FastWWSearch: an efficient and exact method to learn WW. We show on 86 datasets that our method is always faster than the state of the art, with at least one order of magnitude and up to 1000x speed-up.},
Comment = {Best Research Paper Award},
Keywords = {time series},
Related = {scalable-time-series-classifiers}
}
ABSTRACT Time series classification maps time series to labels. The nearest neighbour algorithm (NN) using the Dynamic Time Warping (DTW) similarity measure is a leading algorithm for this task and a component of the current best ensemble classifiers for time series. However, NN-DTW is only a winning combination when its meta-parameter – its warping window – is learned from the training data. The warping window (WW) intuitively controls the amount of distortion allowed when comparing a pair of time series. With a training database of N time series of lengths L, a naive approach to learning the WW requires Omega(N2·L3) operations. This often translates in NN-DTW requiring days for training on datasets containing a few thousand time series only. In this paper, we introduce FastWWSearch: an efficient and exact method to learn WW. We show on 86 datasets that our method is always faster than the state of the art, with at least one order of magnitude and up to 1000x speed-up.

Generating synthetic time series to augment sparse datasets.
Forestier, G., Petitjean, F., Dau, H. A., Webb, G. I., & Keogh, E.
IEEE International Conference on Data Mining (ICDM-17), pp. 865-870, 2017.
[PDF] [Bibtex]

@InProceedings{ForestierEtAl17,
Title = {Generating synthetic time series to augment sparse datasets},
Author = {Forestier, Germain and Petitjean, Francois and Dau, Hoang Anh and Webb, Geoffrey I and Keogh, Eamonn},
Booktitle = {IEEE International Conference on Data Mining (ICDM-17)},
Year = {2017},
Pages = {865-870},
Keywords = {time series},
Related = {scalable-time-series-classifiers}
}
ABSTRACT 

Indexing and classifying gigabytes of time series under time warping.
Tan, C. W., Webb, G. I., & Petitjean, F.
Proceedings of the 2017 SIAM International Conference on Data Mining, pp. 282-290, 2017.
[PDF] [DOI] [Bibtex]

@InProceedings{TanEtAl17a,
Title = {Indexing and classifying gigabytes of time series under time warping},
Author = {Tan, Chang Wei and Webb, Geoffrey I. and Petitjean, Francois},
Booktitle = {Proceedings of the 2017 SIAM International Conference on Data Mining},
Year = {2017},
Organization = {SIAM},
Pages = {282-290},
Doi = {10.1137/1.9781611974973.32},
Keywords = {time series},
Related = {scalable-time-series-classifiers}
}
ABSTRACT 

Faster and more accurate classification of time series by exploiting a novel dynamic time warping averaging algorithm.
Petitjean, F., Forestier, G., Webb, G. I., Nicholson, A. E., Chen, Y., & Keogh, E.
Knowledge and Information Systems, 47(1), 1-26, 2016.
[PDF] [URL] [Bibtex] [Abstract]

@Article{PetitjeanEtAl16a,
author = {Petitjean, F. and Forestier, G. and Webb, G. I. and Nicholson, A. E. and Chen, Y. and Keogh, E.},
journal = {Knowledge and Information Systems},
title = {Faster and more accurate classification of time series by exploiting a novel dynamic time warping averaging algorithm},
year = {2016},
number = {1},
pages = {1-26},
volume = {47},
abstract = {A concerted research effort over the past two decades has heralded significant improvements in both the efficiency and effectiveness of time series classification. The consensus that has emerged in the community is that the best solution is a surprisingly simple one. In virtually all domains, the most accurate classifier is the nearest neighbor algorithm with dynamic time warping as the distance measure. The time complexity of dynamic time warping means that successful deployments on resource-constrained devices remain elusive. Moreover, the recent explosion of interest in wearable computing devices, which typically have limited computational resources, has greatly increased the need for very efficient classification algorithms. A classic technique to obtain the benefits of the nearest neighbor algorithm, without inheriting its undesirable time and space complexity, is to use the nearest centroid algorithm. Unfortunately, the unique properties of (most) time series data mean that the centroid typically does not resemble any of the instances, an unintuitive and underappreciated fact. In this paper we demonstrate that we can exploit a recent result by Petitjean et al. to allow meaningful averaging of “warped� time series, which then allows us to create super-efficient nearest “centroid� classifiers that are at least as accurate as their more computationally challenged nearest neighbor relatives. We demonstrate empirically the utility of our approach by comparing it to all the appropriate strawmen algorithms on the ubiquitous UCR Benchmarks and with a case study in supporting insect classification on resource-constrained sensors.},
keywords = {time series},
related = {scalable-time-series-classifiers},
url = {http://link.springer.com/article/10.1007/s10115-015-0878-8},
}
ABSTRACT A concerted research effort over the past two decades has heralded significant improvements in both the efficiency and effectiveness of time series classification. The consensus that has emerged in the community is that the best solution is a surprisingly simple one. In virtually all domains, the most accurate classifier is the nearest neighbor algorithm with dynamic time warping as the distance measure. The time complexity of dynamic time warping means that successful deployments on resource-constrained devices remain elusive. Moreover, the recent explosion of interest in wearable computing devices, which typically have limited computational resources, has greatly increased the need for very efficient classification algorithms. A classic technique to obtain the benefits of the nearest neighbor algorithm, without inheriting its undesirable time and space complexity, is to use the nearest centroid algorithm. Unfortunately, the unique properties of (most) time series data mean that the centroid typically does not resemble any of the instances, an unintuitive and underappreciated fact. In this paper we demonstrate that we can exploit a recent result by Petitjean et al. to allow meaningful averaging of “warped� time series, which then allows us to create super-efficient nearest “centroid� classifiers that are at least as accurate as their more computationally challenged nearest neighbor relatives. We demonstrate empirically the utility of our approach by comparing it to all the appropriate strawmen algorithms on the ubiquitous UCR Benchmarks and with a case study in supporting insect classification on resource-constrained sensors.

Dynamic Time Warping Averaging of Time Series Allows Faster and More Accurate Classification.
Petitjean, F., Forestier, G., Webb, G. I., Nicholson, A., Chen, Y., & Keogh, E.
Proceedings of the 14th IEEE International Conference on Data Mining, pp. 470-479, 2014.
[PDF] [URL] [Bibtex] [Abstract]

@InProceedings{PetitjeanEtAl14b,
author = {Petitjean, F. and Forestier, G. and Webb, G. I. and Nicholson, A. and Chen, Y. and Keogh, E.},
booktitle = {Proceedings of the 14th {IEEE} International Conference on Data Mining},
title = {Dynamic Time Warping Averaging of Time Series Allows Faster and More Accurate Classification},
year = {2014},
pages = {470-479},
abstract = {Recent years have seen significant progress in improving both the efficiency and effectiveness of time series classification. However, because the best solution is typically the Nearest Neighbor algorithm with the relatively expensive Dynamic Time Warping as the distance measure, successful deployments on resource constrained devices remain elusive. Moreover, the recent explosion of interest in wearable devices, which typically have limited computational resources, has created a growing need for very efficient classification algorithms. A commonly used technique to glean the benefits of the Nearest Neighbor algorithm, without inheriting its undesirable time complexity, is to use the Nearest Centroid algorithm. However, because of the unique properties of (most) time series data, the centroid typically does not resemble any of the instances, an unintuitive and underappreciated fact. In this work we show that we can exploit a recent result to allow meaningful averaging of 'warped' times series, and that this result allows us to create ultra-efficient Nearest 'Centroid' classifiers that are at least as accurate as their more lethargic Nearest Neighbor cousins.},
comment = {One of nine papers invited to Knowledge and Information Systems journal ICDM-14 special issue},
keywords = {time series},
related = {scalable-time-series-classifiers},
url = {http://dx.doi.org/10.1109/ICDM.2014.27},
}
ABSTRACT Recent years have seen significant progress in improving both the efficiency and effectiveness of time series classification. However, because the best solution is typically the Nearest Neighbor algorithm with the relatively expensive Dynamic Time Warping as the distance measure, successful deployments on resource constrained devices remain elusive. Moreover, the recent explosion of interest in wearable devices, which typically have limited computational resources, has created a growing need for very efficient classification algorithms. A commonly used technique to glean the benefits of the Nearest Neighbor algorithm, without inheriting its undesirable time complexity, is to use the Nearest Centroid algorithm. However, because of the unique properties of (most) time series data, the centroid typically does not resemble any of the instances, an unintuitive and underappreciated fact. In this work we show that we can exploit a recent result to allow meaningful averaging of 'warped' times series, and that this result allows us to create ultra-efficient Nearest 'Centroid' classifiers that are at least as accurate as their more lethargic Nearest Neighbor cousins.