Skip to content

Scalable learning of time series classifiers

Time series describe dynamic processes. Driven by big data applications including mapping of land use from satellite observations over time, our award winning research is revolutionising time series classification by developing technologies that can learn from and accurately classify orders of magnitude larger time series collections than the previous state of the art.

ROCKET and its successors MiniROCKET, MultiROCKET and HYDRA use convolutional filters from deep learning to extract diverse time series features of types that have previously each been addressed by specialised techniques. ROCKET generates a many of these filters and uses them to extract features from each series. From these features a simple linear classifier can learn models that are as accurate as the prior state-of-the-art, but do so in a fraction of the time and create models that classify with blistering speed. Angus Dempster received The Computing Research and Education Association of Australasia Distinguished Dissertation Award for this research. An implementation can be downloaded here. The most recent paper can be found here. Angus' video explaining ROCKET and its successors can be found here.

Proximity Forest provides a significant advance on the state of the art in time series classification. By coupling the efficiency of divide and conquer tree classifiers with the effectiveness of specialised similarity measures specifically designed for time series, Proximity Forest achieves very high accuracy for modest computation. An implementation can be downloaded here. The most recent paper can be found here.

TS-Chief builds upon Proximity Forest, enhancing its proximity-based methods by integrating interval statistics and dictionary techniques.  An implementation can be found here and the paper found here.

InceptionTime brings the power of deep learning to time series classification. An implementation can be downloaded here. The paper can be downloaded here.

LB Webb and LB Enhanced are our novel lower bounds for Dynamic Time Warping that are both faster and tighter than the popular LB_Keogh. Implementations can be downloaded here and here. The papers can be found here and here.

The following is a blog post on the use of Barycentric averaging in time series classification: http://www.kdnuggets.com/2014/12/averaging-improves-accuracy-speed-time-series-classification.html. The code can be downloaded here: http://francois-petitjean.com/Research/ICDM2014-DTW/index.php. The slides for the ICDM 2014 paper can be downloaded here: http://francois-petitjean.com/Research/ICDM2014-DTW/Slides.pdf.

The TSI software for the SDM 2017 paper on time series indexing can be downloaded here: https://github.com/ChangWeiTan/TSI. Slides for the SDM 2017 paper can be found here: http://francois-petitjean.com/Research/SDM17-slides.pdf.

The software for the best paper award winning SDM 2018 paper on finding the best warping window can be downloaded here: https://github.com/ChangWeiTan/FastWWSearch (Matlab version).

Publications

A Bayesian-inspired, deep learning-based, semi-supervised domain adaptation technique for land cover mapping.
Lucas, B., Pelletier, C., Schmidt, D., Webb, G. I., & Petitjean, F.
Machine Learning, 112, 1941-1973, 2023.
[Bibtex] [Abstract]  → Access on publisher site

@Article{lucas2021bayesian,
author = {Lucas, Benjamin and Pelletier, Charlotte and Schmidt, Daniel and Webb, Geoffrey I and Petitjean, Fran{\c{c}}ois},
journal = {Machine Learning},
title = {A Bayesian-inspired, deep learning-based, semi-supervised domain adaptation technique for land cover mapping},
year = {2023},
pages = {1941-1973},
volume = {112},
abstract = {Land cover maps are a vital input variable to many types of environmental research and management. While they can be produced automatically by machine learning techniques, these techniques require substantial training data to achieve high levels of accuracy, which are not always available. One technique researchers use when labelled training data are scarce is domain adaptation (DA) - where data from an alternate region, known as the source domain, are used to train a classifier and this model is adapted to map the study region, or target domain. The scenario we address in this paper is known as semi-supervised DA, where some labelled samples are available in the target domain. In this paper we present Sourcerer, a Bayesian-inspired, deep learning-based, semi-supervised DA technique for producing land cover maps from satellite image time series (SITS) data. The technique takes a convolutional neural network trained on a source domain and then trains further on the available target domain with a novel regularizer applied to the model weights. The regularizer adjusts the degree to which the model is modified to fit the target data, limiting the degree of change when the target data are few in number and increasing it as target data quantity increases. Our experiments on Sentinel-2 time series images compare Sourcerer with two state-of-the-art semi-supervised domain adaptation techniques and four baseline models. We show that on two different source-target domain pairings Sourcerer outperforms all other methods for any quantity of labelled target data available. In fact, the results on the more difficult target domain show that the starting accuracy of Sourcerer (when no labelled target data are available), 74.2%, is greater than the next-best state-of-the-art method trained on 20,000 labelled target instances.},
doi = {10.1007/s10994-020-05942-z},
keywords = {time series, earth observation analytics},
publisher = {Springer US},
related = {earth-observation-analytics},
}
ABSTRACT Land cover maps are a vital input variable to many types of environmental research and management. While they can be produced automatically by machine learning techniques, these techniques require substantial training data to achieve high levels of accuracy, which are not always available. One technique researchers use when labelled training data are scarce is domain adaptation (DA) - where data from an alternate region, known as the source domain, are used to train a classifier and this model is adapted to map the study region, or target domain. The scenario we address in this paper is known as semi-supervised DA, where some labelled samples are available in the target domain. In this paper we present Sourcerer, a Bayesian-inspired, deep learning-based, semi-supervised DA technique for producing land cover maps from satellite image time series (SITS) data. The technique takes a convolutional neural network trained on a source domain and then trains further on the available target domain with a novel regularizer applied to the model weights. The regularizer adjusts the degree to which the model is modified to fit the target data, limiting the degree of change when the target data are few in number and increasing it as target data quantity increases. Our experiments on Sentinel-2 time series images compare Sourcerer with two state-of-the-art semi-supervised domain adaptation techniques and four baseline models. We show that on two different source-target domain pairings Sourcerer outperforms all other methods for any quantity of labelled target data available. In fact, the results on the more difficult target domain show that the starting accuracy of Sourcerer (when no labelled target data are available), 74.2%, is greater than the next-best state-of-the-art method trained on 20,000 labelled target instances.

Smooth Perturbations for Time Series Adversarial Attacks.
Pialla, G., Fawaz, H. I., Devanne, M., Weber, J., Idoumghar, L., Muller, P., Bergmeir, C., Schmidt, D., Webb, G. I., & Forestier, G.
Proceedings of the 2022 Pacific-Asia Conference on Knowledge Discovery and Data Mining, Cham, pp. 485-496, 2022.
[Bibtex] [Abstract]  → Access on publisher site

@InProceedings{10.1007/978-3-031-05933-9_38,
author = {Pialla, Gautier and Fawaz, Hassan Ismail and Devanne, Maxime and Weber, Jonathan and Idoumghar, Lhassane and Muller, Pierre-Alain and Bergmeir, Christoph and Schmidt, Daniel and Webb, Geoffrey I. and Forestier, Germain},
booktitle = {Proceedings of the 2022 Pacific-Asia Conference on Knowledge Discovery and Data Mining},
title = {Smooth Perturbations for Time Series Adversarial Attacks},
year = {2022},
address = {Cham},
editor = {Gama, Jo{\~a}o and Li, Tianrui and Yu, Yang and Chen, Enhong and Zheng, Yu and Teng, Fei},
pages = {485-496},
publisher = {Springer International Publishing},
abstract = {Adversarial attacks represent a threat to every deep neural network. They are particularly effective if they can perturb a given model while remaining undetectable. They have been initially introduced for image classifiers, and are well studied for this task. For time series, few attacks have yet been proposed. Most that have are adaptations of attacks previously proposed for image classifiers. Although these attacks are effective, they generate perturbations containing clearly discernible patterns such as sawtooth and spikes. Adversarial patterns are not perceptible on images, but the attacks proposed to date are readily perceptible in the case of time series. In order to generate stealthier adversarial attacks for time series, we propose a new attack that produces smoother perturbations. We find that smooth perturbations are harder to detect by the naked eye. We also show how adversarial training can improve model robustness against this attack, thus making models less vulnerable.},
doi = {10.1007/978-3-031-05933-9_38},
isbn = {978-3-031-05933-9},
keywords = {time series},
related = {scalable-time-series-classifiers},
}
ABSTRACT Adversarial attacks represent a threat to every deep neural network. They are particularly effective if they can perturb a given model while remaining undetectable. They have been initially introduced for image classifiers, and are well studied for this task. For time series, few attacks have yet been proposed. Most that have are adaptations of attacks previously proposed for image classifiers. Although these attacks are effective, they generate perturbations containing clearly discernible patterns such as sawtooth and spikes. Adversarial patterns are not perceptible on images, but the attacks proposed to date are readily perceptible in the case of time series. In order to generate stealthier adversarial attacks for time series, we propose a new attack that produces smoother perturbations. We find that smooth perturbations are harder to detect by the naked eye. We also show how adversarial training can improve model robustness against this attack, thus making models less vulnerable.

MultiRocket: multiple pooling operators and transformations for fast and effective time series classification.
Tan, C. W., Dempster, A., Bergmeir, C., & Webb, G. I.
Data Mining and Knowledge Discovery, 36, 1623-1646, 2022.
[Bibtex] [Abstract]  → Access on publisher site

@Article{Tan2022,
author = {Tan, Chang Wei and Dempster, Angus and Bergmeir, Christoph and Webb, Geoffrey I.},
journal = {Data Mining and Knowledge Discovery},
title = {MultiRocket: multiple pooling operators and transformations for fast and effective time series classification},
year = {2022},
issn = {1573-756X},
pages = {1623-1646},
volume = {36},
abstract = {We propose MultiRocket, a fast time series classification (TSC) algorithm that achieves state-of-the-art accuracy with a tiny fraction of the time and without the complex ensembling structure of many state-of-the-art methods. MultiRocket improves on MiniRocket, one of the fastest TSC algorithms to date, by adding multiple pooling operators and transformations to improve the diversity of the features generated. In addition to processing the raw input series, MultiRocket also applies first order differences to transform the original series. Convolutions are applied to both representations, and four pooling operators are applied to the convolution outputs. When benchmarked using the University of California Riverside TSC benchmark datasets, MultiRocket is significantly more accurate than MiniRocket, and competitive with the best ranked current method in terms of accuracy, HIVE-COTE 2.0, while being orders of magnitude faster.},
doi = {10.1007/s10618-022-00844-1},
keywords = {time series},
related = {scalable-time-series-classifiers},
}
ABSTRACT We propose MultiRocket, a fast time series classification (TSC) algorithm that achieves state-of-the-art accuracy with a tiny fraction of the time and without the complex ensembling structure of many state-of-the-art methods. MultiRocket improves on MiniRocket, one of the fastest TSC algorithms to date, by adding multiple pooling operators and transformations to improve the diversity of the features generated. In addition to processing the raw input series, MultiRocket also applies first order differences to transform the original series. Convolutions are applied to both representations, and four pooling operators are applied to the convolution outputs. When benchmarked using the University of California Riverside TSC benchmark datasets, MultiRocket is significantly more accurate than MiniRocket, and competitive with the best ranked current method in terms of accuracy, HIVE-COTE 2.0, while being orders of magnitude faster.

Ultra fast warping window optimization for Dynamic Time Warping.
Tan, C. W., Herrmann, M., & Webb, G. I.
IEEE International Conference on Data Mining (ICDM-21), pp. 589-598, 2021.
[Bibtex]  → Access on publisher site

@InProceedings{TanEtAlUltraFast2021,
author = {Tan, Chang Wei and Herrmann, Matthieu and Webb, Geoffrey I.},
booktitle = {IEEE International Conference on Data Mining (ICDM-21)},
title = {Ultra fast warping window optimization for Dynamic Time Warping},
year = {2021},
pages = {589-598},
doi = {10.1109/ICDM51629.2021.00070},
keywords = {time series},
related = {scalable-time-series-classifiers},
url = {https://changweitan.com/research/UltraFastWWSearch.pdf},
}
ABSTRACT 

Early abandoning and pruning for elastic distances including dynamic time warping.
Herrmann, M., & Webb, G. I.
Data Mining and Knowledge Discovery, 35(6), 2577-2601, 2021.
[Bibtex]  → Access on publisher site

@Article{Herrmann_2021,
author = {Matthieu Herrmann and Geoffrey I. Webb},
journal = {Data Mining and Knowledge Discovery},
title = {Early abandoning and pruning for elastic distances including dynamic time warping},
year = {2021},
number = {6},
pages = {2577-2601},
volume = {35},
doi = {10.1007/s10618-021-00782-4},
keywords = {time series},
publisher = {Springer Science and Business Media {LLC}},
related = {scalable-time-series-classifiers},
url = {https://rdcu.be/cuoN0},
}
ABSTRACT 

Live fuel moisture content estimation from MODIS: A deep learning approach.
Zhu, L., Webb, G. I., Yebra, M., Scortechini, G., Miller, L., & Petitjean, F.
ISPRS Journal of Photogrammetry and Remote Sensing, 179, 81-91, 2021.
[Bibtex] [Abstract]  → Access on publisher site

@Article{ZHU202181,
author = {Liujun Zhu and Geoffrey I. Webb and Marta Yebra and Gianluca Scortechini and Lynn Miller and Francois Petitjean},
journal = {ISPRS Journal of Photogrammetry and Remote Sensing},
title = {Live fuel moisture content estimation from MODIS: A deep learning approach},
year = {2021},
issn = {0924-2716},
pages = {81-91},
volume = {179},
abstract = {Live fuel moisture content (LFMC) is an essential variable to model fire danger and behaviour. This paper presents the first application of deep learning to LFMC estimation based on the historical LFMC ground samples of the Globe-LFMC database, as a step towards operational daily LFMC mapping in the Contiguous United States (CONUS). One-year MODerate resolution Imaging Spectroradiometer (MODIS) time series preceding each LFMC sample were extracted as the primary data source for training. The proposed temporal convolutional neural network for LFMC (TempCNN-LFMC) comprises three 1-D convolutional layers that learn the multi-scale temporal dynamics (features) of one-year MODIS time series specific to LFMC estimation. The learned features, together with a few auxiliary variables (e.g., digital elevation model), are then passed to three fully connected layers to extract the non-linear relationships with LFMC. In the primary training and validation scenario, the neural network was trained using samples from 2002 to 2013 and then adopted to estimating the LFMC from 2014 to 2018, achieving an overall root mean square error (RMSE) of 25.57% and a correlation coefficient (R) of 0.74. Good consistency on spatial patterns and temporal trends of accuracy was observed. The trained model achieved a similar RMSE of 25.98%, 25.20% and 25.93% for forest, shrubland, and grassland, respectively, without requiring prior information on the vegetation type.},
doi = {10.1016/j.isprsjprs.2021.07.010},
keywords = {time series, Live fuel moisture content, earth observation analytics, MODIS, Convolutional neural network, Time series analysis, Fire risk, Fire danger},
related = {earth-observation-analytics},
}
ABSTRACT Live fuel moisture content (LFMC) is an essential variable to model fire danger and behaviour. This paper presents the first application of deep learning to LFMC estimation based on the historical LFMC ground samples of the Globe-LFMC database, as a step towards operational daily LFMC mapping in the Contiguous United States (CONUS). One-year MODerate resolution Imaging Spectroradiometer (MODIS) time series preceding each LFMC sample were extracted as the primary data source for training. The proposed temporal convolutional neural network for LFMC (TempCNN-LFMC) comprises three 1-D convolutional layers that learn the multi-scale temporal dynamics (features) of one-year MODIS time series specific to LFMC estimation. The learned features, together with a few auxiliary variables (e.g., digital elevation model), are then passed to three fully connected layers to extract the non-linear relationships with LFMC. In the primary training and validation scenario, the neural network was trained using samples from 2002 to 2013 and then adopted to estimating the LFMC from 2014 to 2018, achieving an overall root mean square error (RMSE) of 25.57% and a correlation coefficient (R) of 0.74. Good consistency on spatial patterns and temporal trends of accuracy was observed. The trained model achieved a similar RMSE of 25.98%, 25.20% and 25.93% for forest, shrubland, and grassland, respectively, without requiring prior information on the vegetation type.

MINIROCKET: A Very Fast (Almost) Deterministic Transform for Time Series Classification.
Dempster, A., Schmidt, D. F., & Webb, G. I.
Proceedings of the 27thACM SIGKDD Conference on Knowledge Discovery and Data Mining, pp. 248-257, 2021.
[Bibtex] [Abstract]  → Access on publisher site

@InProceedings{dempsteretal21kdd,
author = {Angus Dempster and Daniel F. Schmidt and Geoffrey I. Webb},
booktitle = {Proceedings of the 27thACM SIGKDD Conference on Knowledge Discovery and Data Mining},
title = {{MINIROCKET:} {A} Very Fast (Almost) Deterministic Transform for Time Series Classification},
year = {2021},
pages = {248-257},
abstract = {Until recently, the most accurate methods for time series classification were limited by high computational complexity. ROCKET achieves state-of-the-art accuracy with a fraction of the computational expense of most existing methods by transforming input time series using random convolutional kernels, and using the transformed features to train a linear classifier. We reformulate ROCKET into a new method, MINIROCKET, making it up to 75 times faster on larger datasets, and making it almost deterministic (and optionally, with additional computational expense, fully deterministic), while maintaining essentially the same accuracy. Using this method, it is possible to train and test a classifier on all of 109 datasets from the UCR archive to state-of-the-art accuracy in less than 10 minutes. MINIROCKET is significantly faster than any other method of comparable accuracy (including ROCKET), and significantly more accurate than any other method of even roughly-similar computational expense. As such, we suggest that MINIROCKET should now be considered and used as the default variant of ROCKET.},
doi = {10.1145/3447548.3467231},
keywords = {time series},
related = {scalable-time-series-classifiers},
url = {https://arxiv.org/abs/2012.08791},
}
ABSTRACT Until recently, the most accurate methods for time series classification were limited by high computational complexity. ROCKET achieves state-of-the-art accuracy with a fraction of the computational expense of most existing methods by transforming input time series using random convolutional kernels, and using the transformed features to train a linear classifier. We reformulate ROCKET into a new method, MINIROCKET, making it up to 75 times faster on larger datasets, and making it almost deterministic (and optionally, with additional computational expense, fully deterministic), while maintaining essentially the same accuracy. Using this method, it is possible to train and test a classifier on all of 109 datasets from the UCR archive to state-of-the-art accuracy in less than 10 minutes. MINIROCKET is significantly faster than any other method of comparable accuracy (including ROCKET), and significantly more accurate than any other method of even roughly-similar computational expense. As such, we suggest that MINIROCKET should now be considered and used as the default variant of ROCKET.

Time series extrinsic regression.
Tan, C. W., Bergmeir, C., Petitjean, F., & Webb, G. I.
Data Mining and Knowledge Discovery, 35(3), 1032-1060, 2021.
[Bibtex] [Abstract]  → Access on publisher site

@Article{tan2021regression,
author = {Tan, Chang Wei and Bergmeir, Christoph and Petitjean, Francois and Webb, Geoffrey I.},
journal = {Data Mining and Knowledge Discovery},
title = {Time series extrinsic regression},
year = {2021},
issn = {1573-756X},
number = {3},
pages = {1032-1060},
volume = {35},
abstract = {This paper studies time series extrinsic regression (TSER): a regression task of which the aim is to learn the relationship between a time series and a continuous scalar variable; a task closely related to time series classification (TSC), which aims to learn the relationship between a time series and a categorical class label. This task generalizes time series forecasting, relaxing the requirement that the value predicted be a future value of the input series or primarily depend on more recent values. In this paper, we motivate and study this task, and benchmark existing solutions and adaptations of TSC algorithms on a novel archive of 19 TSER datasets which we have assembled. Our results show that the state-of-the-art TSC algorithm Rocket, when adapted for regression, achieves the highest overall accuracy compared to adaptations of other TSC algorithms and state-of-the-art machine learning (ML) algorithms such as XGBoost, Random Forest and Support Vector Regression. More importantly, we show that much research is needed in this field to improve the accuracy of ML models. We also find evidence that further research has excellent prospects of improving upon these straightforward baselines.},
doi = {10.1007/s10618-021-00745-9},
keywords = {time series},
publisher = {Springer US},
related = {scalable-time-series-classifiers},
url = {https://rdcu.be/cgCAn},
}
ABSTRACT This paper studies time series extrinsic regression (TSER): a regression task of which the aim is to learn the relationship between a time series and a continuous scalar variable; a task closely related to time series classification (TSC), which aims to learn the relationship between a time series and a categorical class label. This task generalizes time series forecasting, relaxing the requirement that the value predicted be a future value of the input series or primarily depend on more recent values. In this paper, we motivate and study this task, and benchmark existing solutions and adaptations of TSC algorithms on a novel archive of 19 TSER datasets which we have assembled. Our results show that the state-of-the-art TSC algorithm Rocket, when adapted for regression, achieves the highest overall accuracy compared to adaptations of other TSC algorithms and state-of-the-art machine learning (ML) algorithms such as XGBoost, Random Forest and Support Vector Regression. More importantly, we show that much research is needed in this field to improve the accuracy of ML models. We also find evidence that further research has excellent prospects of improving upon these straightforward baselines.

Tight lower bounds for Dynamic Time Warping.
Webb, G. I., & Petitjean, F.
Pattern Recognition, 115, Art. no. 107895, 2021.
[Bibtex] [Abstract]  → Access on publisher site

@Article{WEBB2021107895,
author = {Geoffrey I. Webb and Fran\c{c}ois Petitjean},
journal = {Pattern Recognition},
title = {Tight lower bounds for Dynamic Time Warping},
year = {2021},
issn = {0031-3203},
volume = {115},
abstract = {Dynamic Time Warping (DTW) is a popular similarity measure for aligning and comparing time series. Due to DTW's high computation time, lower bounds are often employed to screen poor matches. Many alternative lower bounds have been proposed, providing a range of different trade-offs between tightness and computational efficiency. LB_KEOGH provides a useful trade-off in many applications. Two recent lower bounds, LB_IMPROVED and LB_ENHANCED, are substantially tighter than LB_KEOGH. All three have the same worst case computational complexity - linear with respect to series length and constant with respect to window size. We present four new DTW lower bounds in the same complexity class. LB_PETITJEAN is substantially tighter than LB_IMPROVED, with only modest additional computational overhead. LB_WEBB is more efficient than LB_IMPROVED, while often providing a tighter bound. LB_WEBB is always tighter than LB_KEOGH. The parameter free LB_WEBB is usually tighter than LB_ENHANCED. A parameterized variant, LB_Webb_Enhanced, is always tighter than LB_ENHANCED. A further variant, LB_WEBB*, is useful for some constrained distance functions. In extensive experiments, LB_WEBB proves to be very effective for nearest neighbor search.},
articlenumber = {107895},
doi = {10.1016/j.patcog.2021.107895},
keywords = {time series},
related = {scalable-time-series-classifiers},
}
ABSTRACT Dynamic Time Warping (DTW) is a popular similarity measure for aligning and comparing time series. Due to DTW's high computation time, lower bounds are often employed to screen poor matches. Many alternative lower bounds have been proposed, providing a range of different trade-offs between tightness and computational efficiency. LB_KEOGH provides a useful trade-off in many applications. Two recent lower bounds, LB_IMPROVED and LB_ENHANCED, are substantially tighter than LB_KEOGH. All three have the same worst case computational complexity - linear with respect to series length and constant with respect to window size. We present four new DTW lower bounds in the same complexity class. LB_PETITJEAN is substantially tighter than LB_IMPROVED, with only modest additional computational overhead. LB_WEBB is more efficient than LB_IMPROVED, while often providing a tighter bound. LB_WEBB is always tighter than LB_KEOGH. The parameter free LB_WEBB is usually tighter than LB_ENHANCED. A parameterized variant, LB_Webb_Enhanced, is always tighter than LB_ENHANCED. A further variant, LB_WEBB*, is useful for some constrained distance functions. In extensive experiments, LB_WEBB proves to be very effective for nearest neighbor search.

TS-CHIEF: A Scalable and Accurate Forest Algorithm for Time Series Classification.
Shifaz, A., Pelletier, C., Petitjean, F., & Webb, G. I.
Data Mining and Knowledge Discovery, 34(3), 742-775, 2020.
exclamation Third Most Highly Cited Paper Published in Data Mining and Knowledge Discovery in 2020
[Bibtex] [Abstract]  → Access on publisher site

@Article{shifazetal2019,
author = {Shifaz, Ahmed and Pelletier, Charlotte and Petitjean, Francois and Webb, Geoffrey I},
journal = {Data Mining and Knowledge Discovery},
title = {TS-CHIEF: A Scalable and Accurate Forest Algorithm for Time Series Classification},
year = {2020},
number = {3},
pages = {742-775},
volume = {34},
abstract = {Time Series Classification (TSC) has seen enormous progress over the last two decades. HIVE-COTE (Hierarchical Vote Collective of Transformation-based Ensembles) is the current state of the art in terms of classification accuracy. HIVE-COTE recognizes that time series data are a specific data type for which the traditional attribute-value representation, used predominantly in machine learning, fails to provide a relevant representation. HIVE-COTE combines multiple types of classifiers: each extracting information about a specific aspect of a time series, be it in the time domain, frequency domain or summarization of intervals within the series. However, HIVE-COTE (and its predecessor, FLAT-COTE) is often infeasible to run on even modest amounts of data. For instance, training HIVE-COTE on a dataset with only 1500 time series can require 8 days of CPU time. It has polynomial runtime with respect to the training set size, so this problem compounds as data quantity increases. We propose a novel TSC algorithm, TS-CHIEF (Time Series Combination of Heterogeneous and Integrated Embedding Forest), which rivals HIVE-COTE in accuracy but requires only a fraction of the runtime. TS-CHIEF constructs an ensemble classifier that integrates the most effective embeddings of time series that research has developed in the last decade. It uses tree-structured classifiers to do so efficiently. We assess TS-CHIEF on 85 datasets of the University of California Riverside (UCR) archive, where it achieves state-of-the-art accuracy with scalability and efficiency. We demonstrate that TS-CHIEF can be trained on 130 k time series in 2 days, a data quantity that is beyond the reach of any TSC algorithm with comparable accuracy.},
comment = {Third Most Highly Cited Paper Published in Data Mining and Knowledge Discovery in 2020},
doi = {10.1007/s10618-020-00679-8},
keywords = {time series},
related = {scalable-time-series-classifiers},
url = {https://rdcu.be/c1zg6},
}
ABSTRACT Time Series Classification (TSC) has seen enormous progress over the last two decades. HIVE-COTE (Hierarchical Vote Collective of Transformation-based Ensembles) is the current state of the art in terms of classification accuracy. HIVE-COTE recognizes that time series data are a specific data type for which the traditional attribute-value representation, used predominantly in machine learning, fails to provide a relevant representation. HIVE-COTE combines multiple types of classifiers: each extracting information about a specific aspect of a time series, be it in the time domain, frequency domain or summarization of intervals within the series. However, HIVE-COTE (and its predecessor, FLAT-COTE) is often infeasible to run on even modest amounts of data. For instance, training HIVE-COTE on a dataset with only 1500 time series can require 8 days of CPU time. It has polynomial runtime with respect to the training set size, so this problem compounds as data quantity increases. We propose a novel TSC algorithm, TS-CHIEF (Time Series Combination of Heterogeneous and Integrated Embedding Forest), which rivals HIVE-COTE in accuracy but requires only a fraction of the runtime. TS-CHIEF constructs an ensemble classifier that integrates the most effective embeddings of time series that research has developed in the last decade. It uses tree-structured classifiers to do so efficiently. We assess TS-CHIEF on 85 datasets of the University of California Riverside (UCR) archive, where it achieves state-of-the-art accuracy with scalability and efficiency. We demonstrate that TS-CHIEF can be trained on 130 k time series in 2 days, a data quantity that is beyond the reach of any TSC algorithm with comparable accuracy.

ROCKET: Exceptionally fast and accurate time series classification using random convolutional kernels.
Dempster, A., Petitjean, F., & Webb, G. I.
Data Mining and Knowledge Discovery, 34, 1454-1495, 2020.
exclamation Second Most Highly Cited Paper Published in Data Mining and Knowledge Discovery in 2020
[Bibtex] [Abstract]  → Access on publisher site

@Article{dempster2020rocket,
author = {Angus Dempster and Francois Petitjean and Geoffrey I. Webb},
journal = {Data Mining and Knowledge Discovery},
title = {ROCKET: Exceptionally fast and accurate time series classification using random convolutional kernels},
year = {2020},
pages = {1454-1495},
volume = {34},
abstract = {Most methods for time series classification that attain state-of-the-art accuracy have high computational complexity, requiring significant training time even for smaller datasets, and are intractable for larger datasets. Additionally, many existing methods focus on a single type of feature such as shape or frequency. Building on the recent success of convolutional neural networks for time series classification, we show that simple linear classifiers using random convolutional kernels achieve state-of-the-art accuracy with a fraction of the computational expense of existing methods. Using this method, it is possible to train and test a classifier on all 85 'bake off' datasets in the UCR archive in <2h, and it is possible to train a classifier on a large dataset of more than one million time series in approximately 1 h.},
comment = {Second Most Highly Cited Paper Published in Data Mining and Knowledge Discovery in 2020},
doi = {10.1007/s10618-020-00701-z},
issue = {5},
keywords = {time series},
related = {scalable-time-series-classifiers},
url = {https://rdcu.be/c1zg4},
}
ABSTRACT Most methods for time series classification that attain state-of-the-art accuracy have high computational complexity, requiring significant training time even for smaller datasets, and are intractable for larger datasets. Additionally, many existing methods focus on a single type of feature such as shape or frequency. Building on the recent success of convolutional neural networks for time series classification, we show that simple linear classifiers using random convolutional kernels achieve state-of-the-art accuracy with a fraction of the computational expense of existing methods. Using this method, it is possible to train and test a classifier on all 85 'bake off' datasets in the UCR archive in <2h, and it is possible to train a classifier on a large dataset of more than one million time series in approximately 1 h.

FastEE: Fast Ensembles of Elastic Distances for time series classification.
Tan, C. W., Petitjean, F., & Webb, G. I.
Data Mining and Knowledge Discovery, 34(1), 231-272, 2020.
[Bibtex] [Abstract]  → Access on publisher site

@Article{Tan2019,
author = {Tan, Chang Wei and Petitjean, Fran{\c{c}}ois and Webb, Geoffrey I.},
journal = {Data Mining and Knowledge Discovery},
title = {FastEE: Fast Ensembles of Elastic Distances for time series classification},
year = {2020},
issn = {1573-756X},
number = {1},
pages = {231-272},
volume = {34},
abstract = {In recent years, many new ensemble-based time series classification (TSC) algorithms have been proposed. Each of them is significantly more accurate than their predecessors. The Hierarchical Vote Collective of Transformation-based Ensembles (HIVE-COTE) is currently the most accurate TSC algorithm when assessed on the UCR repository. It is a meta-ensemble of 5 state-of-the-art ensemble-based classifiers. The time complexity of HIVE-COTE---particularly for training---is prohibitive for most datasets. There is thus a critical need to speed up the classifiers that compose HIVE-COTE. This paper focuses on speeding up one of its components: Ensembles of Elastic Distances (EE), which is the classifier that leverages on the decades of research into the development of time-dedicated measures. Training EE can be prohibitive for many datasets. For example, it takes a month on the ElectricDevices dataset with 9000 instances. This is because EE needs to cross-validate the hyper-parameters used for the 11 similarity measures it encompasses. In this work, Fast Ensembles of Elastic Distances is proposed to train EE faster. There are two versions to it. The exact version makes it possible to train EE 10 times faster. The approximate version is 40 times faster than EE without significantly impacting the classification accuracy. This translates to being able to train EE on ElectricDevices in 13h.},
doi = {10.1007/s10618-019-00663-x},
keywords = {time series},
related = {scalable-time-series-classifiers},
url = {https://rdcu.be/c1y5a},
}
ABSTRACT In recent years, many new ensemble-based time series classification (TSC) algorithms have been proposed. Each of them is significantly more accurate than their predecessors. The Hierarchical Vote Collective of Transformation-based Ensembles (HIVE-COTE) is currently the most accurate TSC algorithm when assessed on the UCR repository. It is a meta-ensemble of 5 state-of-the-art ensemble-based classifiers. The time complexity of HIVE-COTE–-particularly for training–-is prohibitive for most datasets. There is thus a critical need to speed up the classifiers that compose HIVE-COTE. This paper focuses on speeding up one of its components: Ensembles of Elastic Distances (EE), which is the classifier that leverages on the decades of research into the development of time-dedicated measures. Training EE can be prohibitive for many datasets. For example, it takes a month on the ElectricDevices dataset with 9000 instances. This is because EE needs to cross-validate the hyper-parameters used for the 11 similarity measures it encompasses. In this work, Fast Ensembles of Elastic Distances is proposed to train EE faster. There are two versions to it. The exact version makes it possible to train EE 10 times faster. The approximate version is 40 times faster than EE without significantly impacting the classification accuracy. This translates to being able to train EE on ElectricDevices in 13h.

InceptionTime: Finding AlexNet for Time Series Classification.
Fawaz, H. I., Lucas, B., Forestier, G., Pelletier, C., Schmidt, D. F., Weber, J., Webb, G. I., Idoumghar, L., Muller, P., & Petitjean, F.
Data Mining and Knowledge Discovery, 34, 1936-1962, 2020.
exclamation Clarivate Web of Science Highly Cited Paper 2022, 2023
exclamation Most Highly Cited Paper Published In Data Mining and Knowledge Discovery in 2020
[Bibtex] [Abstract]  → Access on publisher site

@Article{fawaz2019inceptiontime,
author = {Hassan Ismail Fawaz and Benjamin Lucas and Germain Forestier and Charlotte Pelletier and Daniel F. Schmidt and Jonathan Weber and Geoffrey I. Webb and Lhassane Idoumghar and Pierre-Alain Muller and Francois Petitjean},
journal = {Data Mining and Knowledge Discovery},
title = {InceptionTime: Finding AlexNet for Time Series Classification},
year = {2020},
pages = {1936-1962},
volume = {34},
abstract = {This paper brings deep learning at the forefront of research into time series classification (TSC). TSC is the area of machine learning tasked with the categorization (or labelling) of time series. The last few decades of work in this area have led to significant progress in the accuracy of classifiers, with the state of the art now represented by the HIVE-COTE algorithm. While extremely accurate, HIVE-COTE cannot be applied to many real-world datasets because of its high training time complexity in O(N^2 . T^4) for a dataset with N time series of length T. For example, it takes HIVE-COTE more than 8 days to learn from a small dataset with N = 1500 time series of short length T = 46. Meanwhile deep learning has received enormous attention because of its high accuracy and scalability. Recent approaches to deep learning for TSC have been scalable, but less accurate than HIVE-COTE. We introduce InceptionTime - an ensemble of deep Convolutional Neural Network models, inspired by the Inception-v4 architecture. Our experiments show that InceptionTime is on par with HIVE-COTE in terms of accuracy while being much more scalable: not only can it learn from 1500 time series in one hour but it can also learn from 8M time series in 13 h, a quantity of data that is fully out of reach of HIVE-COTE.},
comment = {Clarivate Web of Science Highly Cited Paper 2022, 2023},
comment2 = {Most Highly Cited Paper Published In Data Mining and Knowledge Discovery in 2020},
doi = {10.1007/s10618-020-00710-y},
issue = {6},
keywords = {time series},
related = {scalable-time-series-classifiers},
url = {https://rdcu.be/b6TXh},
}
ABSTRACT This paper brings deep learning at the forefront of research into time series classification (TSC). TSC is the area of machine learning tasked with the categorization (or labelling) of time series. The last few decades of work in this area have led to significant progress in the accuracy of classifiers, with the state of the art now represented by the HIVE-COTE algorithm. While extremely accurate, HIVE-COTE cannot be applied to many real-world datasets because of its high training time complexity in O(N^2 . T^4) for a dataset with N time series of length T. For example, it takes HIVE-COTE more than 8 days to learn from a small dataset with N = 1500 time series of short length T = 46. Meanwhile deep learning has received enormous attention because of its high accuracy and scalability. Recent approaches to deep learning for TSC have been scalable, but less accurate than HIVE-COTE. We introduce InceptionTime - an ensemble of deep Convolutional Neural Network models, inspired by the Inception-v4 architecture. Our experiments show that InceptionTime is on par with HIVE-COTE in terms of accuracy while being much more scalable: not only can it learn from 1500 time series in one hour but it can also learn from 8M time series in 13 h, a quantity of data that is fully out of reach of HIVE-COTE.

Unsupervised Domain Adaptation Techniques for Classification of Satellite Image Time Series.
Lucas, B., Pelletier, C., Schmidt, D., Webb, G. I., & Petitjean, F.
IEEE International Geoscience and Remote Sensing Symposium, pp. 1074–1077, 2020.
[Bibtex] [Abstract]  → Access on publisher site

@InProceedings{lucas2020unsupervised,
author = {Lucas, Benjamin and Pelletier, Charlotte and Schmidt, Daniel and Webb, Geoffrey I and Petitjean, Fran{\c{c}}ois},
booktitle = {IEEE International Geoscience and Remote Sensing Symposium},
title = {Unsupervised Domain Adaptation Techniques for Classification of Satellite Image Time Series},
year = {2020},
organization = {IEEE},
pages = {1074--1077},
abstract = {Land cover maps are vitally important to many elements of environmental management. However the machine learning algorithms used to produce them require a substantive quantity of labelled training data to reach the best levels of accuracy. When researchers wish to map an area where no labelled training data are available, one potential solution is to use a classifier trained on another geographical area and adapting it to the target location-this is known as Unsupervised Domain Adaptation (DA). In this paper we undertake the first experiments using unsupervised DA methods for the classification of satellite image time series (SITS) data. Our experiments draw the interesting conclusion that existing methods provide no benefit when used on SITS data, and that this is likely due to the temporal nature of the data and the change in class distributions between the regions. This suggests that an unsupervised domain adaptation technique for SITS would be extremely beneficial for land cover mapping.},
doi = {10.1109/IGARSS39084.2020.9324339},
keywords = {time series, earth observation analytics},
related = {scalable-time-series-classifiers},
}
ABSTRACT Land cover maps are vitally important to many elements of environmental management. However the machine learning algorithms used to produce them require a substantive quantity of labelled training data to reach the best levels of accuracy. When researchers wish to map an area where no labelled training data are available, one potential solution is to use a classifier trained on another geographical area and adapting it to the target location-this is known as Unsupervised Domain Adaptation (DA). In this paper we undertake the first experiments using unsupervised DA methods for the classification of satellite image time series (SITS) data. Our experiments draw the interesting conclusion that existing methods provide no benefit when used on SITS data, and that this is likely due to the temporal nature of the data and the change in class distributions between the regions. This suggests that an unsupervised domain adaptation technique for SITS would be extremely beneficial for land cover mapping.

Deep Learning for the Classification of Sentinel-2 Image Series.
Pelletier, C., Webb, G. I., & Petitjean, F.
IEEE International Geoscience And Remote Sensing Symposium, 2019.
[Bibtex] [Abstract]  → Access on publisher site

@InProceedings{PelletierEtAl19b,
author = {Pelletier, Charlotte and Webb, Geoffrey I. and Petitjean, Francois},
booktitle = {IEEE International Geoscience And Remote Sensing Symposium},
title = {Deep Learning for the Classification of Sentinel-2 Image Series},
year = {2019},
month = {Jul},
abstract = {Satellite image time series (SITS) have proven to be essential for accurate and up-to-date land cover mapping over large areas. Most works about SITS have focused on the use of traditional classification algorithms such as Random Forests (RFs). Deep learning algorithms have been very successful for supervised tasks, in particular for data that exhibit a structure between attributes, such as space or time. In this work, we compare for the first time RFs to the two leading deep learning algorithms for handling temporal data: Recurrent Neural Networks (RNNs) and temporal Convolutional Neural Networks (TempCNNs). We carry out a large experiment using Sentinel-2 time series. We compare both accuracy and computational times to classify 10,980 km 2 over Australia. The results highlights the good performance of TemCNNs that obtain the highest accuracy. They also show that RNNs might be less suited for large scale study as they have higher runtime complexity.},
doi = {10.1109/IGARSS.2019.8900123},
keywords = {time series, earth observation analytics},
related = {earth-observation-analytics},
}
ABSTRACT Satellite image time series (SITS) have proven to be essential for accurate and up-to-date land cover mapping over large areas. Most works about SITS have focused on the use of traditional classification algorithms such as Random Forests (RFs). Deep learning algorithms have been very successful for supervised tasks, in particular for data that exhibit a structure between attributes, such as space or time. In this work, we compare for the first time RFs to the two leading deep learning algorithms for handling temporal data: Recurrent Neural Networks (RNNs) and temporal Convolutional Neural Networks (TempCNNs). We carry out a large experiment using Sentinel-2 time series. We compare both accuracy and computational times to classify 10,980 km 2 over Australia. The results highlights the good performance of TemCNNs that obtain the highest accuracy. They also show that RNNs might be less suited for large scale study as they have higher runtime complexity.

Elastic bands across the path: A new framework and methods to lower bound DTW.
Tan, C. W., Petitjean, F., & Webb, G. I.
Proceedings of the 2019 SIAM International Conference on Data Mining, pp. 522-530, 2019.
[Bibtex] [Abstract]  → Access on publisher site

@InProceedings{TanEtAl19,
Title = {Elastic bands across the path: A new framework and methods to lower bound DTW},
Author = {Tan, Chang Wei and Petitjean, Francois and Webb, Geoffrey I.},
Booktitle = {Proceedings of the 2019 SIAM International Conference on Data Mining},
Year = {2019},
Pages = {522-530},
Abstract = {There has been renewed recent interest in developing effective lower bounds for Dynamic Time Warping (DTW) distance between time series. These have many applications in time series indexing, clustering, forecasting, regression and classification. One of the key time series classification algorithms, the nearest neighbor algorithm with DTW distance (NN-DTW) is very expensive to compute, due to the quadratic complexity of DTW. Lower bound search can speed up NN-DTW substantially. An effective and tight lower bound quickly prunes off unpromising nearest neighbor candidates from the search space and minimises the number of the costly DTW computations. The speed up provided by lower bound search becomes increasingly critical as training set size increases. Different lower bounds provide different trade-offs between computation time and tightness. Most existing lower bounds interact with DTW warping window sizes. They are very tight and effective at smaller warping window sizes, but become looser as the warping window increases, thus reducing the pruning effectiveness for NN-DTW. In this work, we present a new class of lower bounds that are tighter than the popular Keogh lower bound, while requiring similar computation time. Our new lower bounds take advantage of the DTW boundary condition, monotonicity and continuity constraints to create a tighter lower bound. Of particular significance, they remain relatively tight even for large windows. A single parameter to these new lower bounds controls the speed-tightness trade-off. We demonstrate that these new lower bounds provide an exceptional balance between computation time and tightness for the NN-DTW time series classification task, resulting in greatly improved efficiency for NN-DTW lower bound search.},
Keywords = {time series},
Related = {scalable-time-series-classifiers},
Url = {https://arxiv.org/abs/1808.09617}
}
ABSTRACT There has been renewed recent interest in developing effective lower bounds for Dynamic Time Warping (DTW) distance between time series. These have many applications in time series indexing, clustering, forecasting, regression and classification. One of the key time series classification algorithms, the nearest neighbor algorithm with DTW distance (NN-DTW) is very expensive to compute, due to the quadratic complexity of DTW. Lower bound search can speed up NN-DTW substantially. An effective and tight lower bound quickly prunes off unpromising nearest neighbor candidates from the search space and minimises the number of the costly DTW computations. The speed up provided by lower bound search becomes increasingly critical as training set size increases. Different lower bounds provide different trade-offs between computation time and tightness. Most existing lower bounds interact with DTW warping window sizes. They are very tight and effective at smaller warping window sizes, but become looser as the warping window increases, thus reducing the pruning effectiveness for NN-DTW. In this work, we present a new class of lower bounds that are tighter than the popular Keogh lower bound, while requiring similar computation time. Our new lower bounds take advantage of the DTW boundary condition, monotonicity and continuity constraints to create a tighter lower bound. Of particular significance, they remain relatively tight even for large windows. A single parameter to these new lower bounds controls the speed-tightness trade-off. We demonstrate that these new lower bounds provide an exceptional balance between computation time and tightness for the NN-DTW time series classification task, resulting in greatly improved efficiency for NN-DTW lower bound search.

Using Sentinel-2 Image Time Series to map the State of Victoria, Australia.
Pelletier, C., Ji, Z., Hagolle, O., Morse-McNabb, E., Sheffield, K., Webb, G. I., & Petitjean, F.
Proceedings 10th International Workshop on the Analysis of Multitemporal Remote Sensing Images, MultiTemp 2019, 2019.
[Bibtex] [Abstract]  → Access on publisher site

@InProceedings{PelletierEtAl19c,
author = {Pelletier, C. and Ji, Z. and Hagolle, O. and Morse-McNabb, E. and Sheffield, K. and Webb, G. I. and Petitjean, F.},
booktitle = {Proceedings 10th International Workshop on the Analysis of Multitemporal Remote Sensing Images, MultiTemp 2019},
title = {Using Sentinel-2 Image Time Series to map the State of Victoria, Australia},
year = {2019},
abstract = {Sentinel-2 satellites are now acquiring images of the entire Earth every five days from 10 to 60 m spatial resolution. The supervised classification of this new optical image time series allows the operational production of accurate land cover maps over large areas. In this paper, we investigate the use of one year of Sentinel-2 data to map the state of Victoria in Australia. In particular, we produce two land cover maps using the most established and advanced algorithms in time series classification: Random Forest (RF) and Temporal Convolutional Neural Network (TempCNN). To our knowledge, these are the first land cover maps at 10 m spatial resolution for an Australian state.},
doi = {10.1109/Multi-Temp.2019.8866921},
keywords = {cartography;convolutional neural nets;geophysical image processing;image classification;image resolution;land cover;optical images;optical information processing;remote sensing;terrain mapping;time series;TempCNN;temporal convolutional neural network;random forest;land cover maps;Victoria state;Australian state;spatial resolution;time series classification;Sentinel-2 data;accurate land cover maps;operational production;optical image time series;supervised classification;Sentinel-2 satellites;Australia;sentinel-2 image time series;Radio frequency;Australia;Spatial resolution;Time series analysis;Agriculture;Convolutional neural networks;Sentinel-2 images;land cover map;time series;Temporal Convolutional Neural Networks;Random Forests;earth observation analytics},
related = {earth-observation-analytics},
}
ABSTRACT Sentinel-2 satellites are now acquiring images of the entire Earth every five days from 10 to 60 m spatial resolution. The supervised classification of this new optical image time series allows the operational production of accurate land cover maps over large areas. In this paper, we investigate the use of one year of Sentinel-2 data to map the state of Victoria in Australia. In particular, we produce two land cover maps using the most established and advanced algorithms in time series classification: Random Forest (RF) and Temporal Convolutional Neural Network (TempCNN). To our knowledge, these are the first land cover maps at 10 m spatial resolution for an Australian state.

Temporal Convolutional Neural Network for the Classification of Satellite Image Time Series.
Pelletier, C., Webb, G. I., & Petitjean, F.
Remote Sensing, 11(5), Art. no. 523, 2019.
exclamation Clarivate Web of Science Highly Cited Paper 2021 - 2023
[Bibtex] [Abstract]  → Access on publisher site

@Article{PelletierEtAl19,
author = {Pelletier, Charlotte and Webb, Geoffrey I. and Petitjean, Francois},
journal = {Remote Sensing},
title = {Temporal Convolutional Neural Network for the Classification of Satellite Image Time Series},
year = {2019},
issn = {2072-4292},
number = {5},
volume = {11},
abstract = {Latest remote sensing sensors are capable of acquiring high spatial and spectral Satellite Image Time Series (SITS) of the world. These image series are a key component of classification systems that aim at obtaining up-to-date and accurate land cover maps of the Earth’s surfaces. More specifically, current SITS combine high temporal, spectral and spatial resolutions, which makes it possible to closely monitor vegetation dynamics. Although traditional classification algorithms, such as Random Forest (RF), have been successfully applied to create land cover maps from SITS, these algorithms do not make the most of the temporal domain. This paper proposes a comprehensive study of Temporal Convolutional Neural Networks (TempCNNs), a deep learning approach which applies convolutions in the temporal dimension in order to automatically learn temporal (and spectral) features. The goal of this paper is to quantitatively and qualitatively evaluate the contribution of TempCNNs for SITS classification, as compared to RF and Recurrent Neural Networks (RNNs) —a standard deep learning approach that is particularly suited to temporal data. We carry out experiments on Formosat-2 scene with 46 images and one million labelled time series. The experimental results show that TempCNNs are more accurate than the current state of the art for SITS classification. We provide some general guidelines on the network architecture, common regularization mechanisms, and hyper-parameter values such as batch size; we also draw out some differences with standard results in computer vision (e.g., about pooling layers). Finally, we assess the visual quality of the land cover maps produced by TempCNNs.},
articlenumber = {523},
comment = {Clarivate Web of Science Highly Cited Paper 2021 - 2023},
doi = {10.3390/rs11050523},
keywords = {time series, earth observation analytics},
related = {earth-observation-analytics},
}
ABSTRACT Latest remote sensing sensors are capable of acquiring high spatial and spectral Satellite Image Time Series (SITS) of the world. These image series are a key component of classification systems that aim at obtaining up-to-date and accurate land cover maps of the Earth’s surfaces. More specifically, current SITS combine high temporal, spectral and spatial resolutions, which makes it possible to closely monitor vegetation dynamics. Although traditional classification algorithms, such as Random Forest (RF), have been successfully applied to create land cover maps from SITS, these algorithms do not make the most of the temporal domain. This paper proposes a comprehensive study of Temporal Convolutional Neural Networks (TempCNNs), a deep learning approach which applies convolutions in the temporal dimension in order to automatically learn temporal (and spectral) features. The goal of this paper is to quantitatively and qualitatively evaluate the contribution of TempCNNs for SITS classification, as compared to RF and Recurrent Neural Networks (RNNs) —a standard deep learning approach that is particularly suited to temporal data. We carry out experiments on Formosat-2 scene with 46 images and one million labelled time series. The experimental results show that TempCNNs are more accurate than the current state of the art for SITS classification. We provide some general guidelines on the network architecture, common regularization mechanisms, and hyper-parameter values such as batch size; we also draw out some differences with standard results in computer vision (e.g., about pooling layers). Finally, we assess the visual quality of the land cover maps produced by TempCNNs.

Proximity Forest: an effective and scalable distance-based classifier for time series.
Lucas, B., Shifaz, A., Pelletier, C., O'Neill, L., Zaidi, N., Goethals, B., Petitjean, F., & Webb, G. I.
Data Mining and Knowledge Discovery, 33, 607-635, 2019.
[Bibtex] [Abstract]  → Access on publisher site

@Article{LucasEtAl2019,
author = {Lucas, Benjamin and Shifaz, Ahmed and Pelletier, Charlotte and O'Neill, Lachlan and Zaidi, Nayyar and Goethals, Bart and Petitjean, Francois and Webb, Geoffrey I.},
journal = {Data Mining and Knowledge Discovery},
title = {Proximity Forest: an effective and scalable distance-based classifier for time series},
year = {2019},
issn = {1573-756X},
pages = {607-635},
volume = {33},
abstract = {Research into the classification of time series has made enormous progress in the last decade. The UCR time series archive has played a significant role in challenging and guiding the development of new learners for time series classification. The largest dataset in the UCR archive holds 10,000 time series only; which may explain why the primary research focus has been on creating algorithms that have high accuracy on relatively small datasets. This paper introduces Proximity Forest, an algorithm that learns accurate models from datasets with millions of time series, and classifies a time series in milliseconds. The models are ensembles of highly randomized Proximity Trees. Whereas conventional decision trees branch on attribute values (and usually perform poorly on time series), Proximity Trees branch on the proximity of time series to one exemplar time series or another; allowing us to leverage the decades of work into developing relevant measures for time series. Proximity Forest gains both efficiency and accuracy by stochastic selection of both exemplars and similarity measures. Our work is motivated by recent time series applications that provide orders of magnitude more time series than the UCR benchmarks. Our experiments demonstrate that Proximity Forest is highly competitive on the UCR archive: it ranks among the most accurate classifiers while being significantly faster. We demonstrate on a 1M time series Earth observation dataset that Proximity Forest retains this accuracy on datasets that are many orders of magnitude greater than those in the UCR repository, while learning its models at least 100,000 times faster than current state-of-the-art models Elastic Ensemble and COTE.},
doi = {10.1007/s10618-019-00617-3},
keywords = {time series},
related = {scalable-time-series-classifiers},
url = {https://rdcu.be/blB8E},
}
ABSTRACT Research into the classification of time series has made enormous progress in the last decade. The UCR time series archive has played a significant role in challenging and guiding the development of new learners for time series classification. The largest dataset in the UCR archive holds 10,000 time series only; which may explain why the primary research focus has been on creating algorithms that have high accuracy on relatively small datasets. This paper introduces Proximity Forest, an algorithm that learns accurate models from datasets with millions of time series, and classifies a time series in milliseconds. The models are ensembles of highly randomized Proximity Trees. Whereas conventional decision trees branch on attribute values (and usually perform poorly on time series), Proximity Trees branch on the proximity of time series to one exemplar time series or another; allowing us to leverage the decades of work into developing relevant measures for time series. Proximity Forest gains both efficiency and accuracy by stochastic selection of both exemplars and similarity measures. Our work is motivated by recent time series applications that provide orders of magnitude more time series than the UCR benchmarks. Our experiments demonstrate that Proximity Forest is highly competitive on the UCR archive: it ranks among the most accurate classifiers while being significantly faster. We demonstrate on a 1M time series Earth observation dataset that Proximity Forest retains this accuracy on datasets that are many orders of magnitude greater than those in the UCR repository, while learning its models at least 100,000 times faster than current state-of-the-art models Elastic Ensemble and COTE.

Exploring Data Quantity Requirements for Domain Adaptation in the Classification of Satellite Image Time Series.
Lucas, B., Pelletier, C., Inglada, J., Schmidt, D., Webb, G. I., & Petitjean, F.
Proceedings 10th International Workshop on the Analysis of Multitemporal Remote Sensing Images, MultiTemp 2019, 2019.
[Bibtex] [Abstract]  → Access on publisher site

@InProceedings{LucasEtAl2019b,
author = {Lucas, B. and Pelletier, C. and Inglada, J. and Schmidt, D. and Webb, G. I. and Petitjean, F},
booktitle = {Proceedings 10th International Workshop on the Analysis of Multitemporal Remote Sensing Images, MultiTemp 2019},
title = {Exploring Data Quantity Requirements for Domain Adaptation in the Classification of Satellite Image Time Series},
year = {2019},
publisher = {IEEE, Institute of Electrical and Electronics Engineers},
abstract = {Land cover maps are a vital input variable in all types of environmental research and management. However the modern state-of-The-Art machine learning techniques used to create them require substantial training data to produce optimal accuracy. Domain Adaptation is one technique researchers might use when labelled training data are unavailable or scarce. This paper looks at the result of training a convolutional neural network model on a region where data are available (source domain), and then adapting this model to another region (target domain) by retraining it on the available labelled data, and in particular how these results change with increasing data availability. Our experiments performing domain adaptation on satellite image time series, draw three interesting conclusions: (1) a model trained only on data from the source domain delivers 73.0% test accuracy on the target domain; (2) when all of the weights are retrained on the target data, over 16,000 instances were required to improve upon the accuracy of the source-only model; and (3) even if sufficient data is available in the target domain, using a model pretrained on a source domain will result in better overall test accuracy compared to a model trained on target domain data only-88.9% versus 84.7%.},
doi = {10.1109/Multi-Temp.2019.8866898},
keywords = {time series, earth observation analytics},
related = {earth-observation-analytics},
}
ABSTRACT Land cover maps are a vital input variable in all types of environmental research and management. However the modern state-of-The-Art machine learning techniques used to create them require substantial training data to produce optimal accuracy. Domain Adaptation is one technique researchers might use when labelled training data are unavailable or scarce. This paper looks at the result of training a convolutional neural network model on a region where data are available (source domain), and then adapting this model to another region (target domain) by retraining it on the available labelled data, and in particular how these results change with increasing data availability. Our experiments performing domain adaptation on satellite image time series, draw three interesting conclusions: (1) a model trained only on data from the source domain delivers 73.0% test accuracy on the target domain; (2) when all of the weights are retrained on the target data, over 16,000 instances were required to improve upon the accuracy of the source-only model; and (3) even if sufficient data is available in the target domain, using a model pretrained on a source domain will result in better overall test accuracy compared to a model trained on target domain data only-88.9% versus 84.7%.

Efficient search of the best warping window for Dynamic Time Warping.
Tan, C. W., Herrmann, M., Forestier, G., Webb, G. I., & Petitjean, F.
Proceedings of the 2018 SIAM International Conference on Data Mining, pp. 459-467, 2018.
exclamation Best Research Paper Award
[Bibtex] [Abstract]  → Download PDF

@InProceedings{TanEtAl18,
author = {Tan, Chang Wei and Herrmann, Matthieu and Forestier, Germain and Webb, Geoffrey I. and Petitjean, Francois},
booktitle = {Proceedings of the 2018 {SIAM} International Conference on Data Mining},
title = {Efficient search of the best warping window for Dynamic Time Warping},
year = {2018},
pages = {459-467},
abstract = {Time series classification maps time series to labels. The nearest neighbour algorithm (NN) using the Dynamic Time Warping (DTW) similarity measure is a leading algorithm for this task and a component of the current best ensemble classifiers for time series. However, NN-DTW is only a winning combination when its meta-parameter - its warping window - is learned from the training data. The warping window (WW) intuitively controls the amount of distortion allowed when comparing a pair of time series. With a training database of N time series of lengths L, a naive approach to learning the WW requires Omega(N^2 . L^3) operations. This often translates in NN-DTW requiring days for training on datasets containing a few thousand time series only. In this paper, we introduce FastWWSearch: an efficient and exact method to learn WW. We show on 86 datasets that our method is always faster than the state of the art, with at least one order of magnitude and up to 1000x speed-up.},
comment = {Best Research Paper Award},
keywords = {time series},
related = {scalable-time-series-classifiers},
}
ABSTRACT Time series classification maps time series to labels. The nearest neighbour algorithm (NN) using the Dynamic Time Warping (DTW) similarity measure is a leading algorithm for this task and a component of the current best ensemble classifiers for time series. However, NN-DTW is only a winning combination when its meta-parameter - its warping window - is learned from the training data. The warping window (WW) intuitively controls the amount of distortion allowed when comparing a pair of time series. With a training database of N time series of lengths L, a naive approach to learning the WW requires Omega(N^2 . L^3) operations. This often translates in NN-DTW requiring days for training on datasets containing a few thousand time series only. In this paper, we introduce FastWWSearch: an efficient and exact method to learn WW. We show on 86 datasets that our method is always faster than the state of the art, with at least one order of magnitude and up to 1000x speed-up.

Generating synthetic time series to augment sparse datasets.
Forestier, G., Petitjean, F., Dau, H. A., Webb, G. I., & Keogh, E.
IEEE International Conference on Data Mining (ICDM-17), pp. 865-870, 2017.
[Bibtex]  → Download PDF

@InProceedings{ForestierEtAl17,
Title = {Generating synthetic time series to augment sparse datasets},
Author = {Forestier, Germain and Petitjean, Francois and Dau, Hoang Anh and Webb, Geoffrey I and Keogh, Eamonn},
Booktitle = {IEEE International Conference on Data Mining (ICDM-17)},
Year = {2017},
Pages = {865-870},
Keywords = {time series},
Related = {scalable-time-series-classifiers}
}
ABSTRACT 

Indexing and classifying gigabytes of time series under time warping.
Tan, C. W., Webb, G. I., & Petitjean, F.
Proceedings of the 2017 SIAM International Conference on Data Mining, pp. 282-290, 2017.
[Bibtex]  → Download PDF  → Access on publisher site

@InProceedings{TanEtAl17a,
Title = {Indexing and classifying gigabytes of time series under time warping},
Author = {Tan, Chang Wei and Webb, Geoffrey I. and Petitjean, Francois},
Booktitle = {Proceedings of the 2017 SIAM International Conference on Data Mining},
Year = {2017},
Organization = {SIAM},
Pages = {282-290},
Doi = {10.1137/1.9781611974973.32},
Keywords = {time series},
Related = {scalable-time-series-classifiers}
}
ABSTRACT 

Faster and more accurate classification of time series by exploiting a novel dynamic time warping averaging algorithm.
Petitjean, F., Forestier, G., Webb, G. I., Nicholson, A. E., Chen, Y., & Keogh, E.
Knowledge and Information Systems, 47(1), 1-26, 2016.
[Bibtex] [Abstract]  → Download PDF  → Access on publisher site

@Article{PetitjeanEtAl16a,
author = {Petitjean, F. and Forestier, G. and Webb, G. I. and Nicholson, A. E. and Chen, Y. and Keogh, E.},
journal = {Knowledge and Information Systems},
title = {Faster and more accurate classification of time series by exploiting a novel dynamic time warping averaging algorithm},
year = {2016},
number = {1},
pages = {1-26},
volume = {47},
abstract = {A concerted research effort over the past two decades has heralded significant improvements in both the efficiency and effectiveness of time series classification. The consensus that has emerged in the community is that the best solution is a surprisingly simple one. In virtually all domains, the most accurate classifier is the nearest neighbor algorithm with dynamic time warping as the distance measure. The time complexity of dynamic time warping means that successful deployments on resource-constrained devices remain elusive. Moreover, the recent explosion of interest in wearable computing devices, which typically have limited computational resources, has greatly increased the need for very efficient classification algorithms. A classic technique to obtain the benefits of the nearest neighbor algorithm, without inheriting its undesirable time and space complexity, is to use the nearest centroid algorithm. Unfortunately, the unique properties of (most) time series data mean that the centroid typically does not resemble any of the instances, an unintuitive and underappreciated fact. In this paper we demonstrate that we can exploit a recent result by Petitjean et al. to allow meaningful averaging of 'warped' time series, which then allows us to create super-efficient nearest 'centroid' classifiers that are at least as accurate as their more computationally challenged nearest neighbor relatives. We demonstrate empirically the utility of our approach by comparing it to all the appropriate strawmen algorithms on the ubiquitous UCR Benchmarks and with a case study in supporting insect classification on resource-constrained sensors.},
doi = {10.1007/s10115-015-0878-8},
keywords = {time series},
related = {scalable-time-series-classifiers},
}
ABSTRACT A concerted research effort over the past two decades has heralded significant improvements in both the efficiency and effectiveness of time series classification. The consensus that has emerged in the community is that the best solution is a surprisingly simple one. In virtually all domains, the most accurate classifier is the nearest neighbor algorithm with dynamic time warping as the distance measure. The time complexity of dynamic time warping means that successful deployments on resource-constrained devices remain elusive. Moreover, the recent explosion of interest in wearable computing devices, which typically have limited computational resources, has greatly increased the need for very efficient classification algorithms. A classic technique to obtain the benefits of the nearest neighbor algorithm, without inheriting its undesirable time and space complexity, is to use the nearest centroid algorithm. Unfortunately, the unique properties of (most) time series data mean that the centroid typically does not resemble any of the instances, an unintuitive and underappreciated fact. In this paper we demonstrate that we can exploit a recent result by Petitjean et al. to allow meaningful averaging of 'warped' time series, which then allows us to create super-efficient nearest 'centroid' classifiers that are at least as accurate as their more computationally challenged nearest neighbor relatives. We demonstrate empirically the utility of our approach by comparing it to all the appropriate strawmen algorithms on the ubiquitous UCR Benchmarks and with a case study in supporting insect classification on resource-constrained sensors.

Dynamic Time Warping Averaging of Time Series Allows Faster and More Accurate Classification.
Petitjean, F., Forestier, G., Webb, G. I., Nicholson, A., Chen, Y., & Keogh, E.
Proceedings of the 14th IEEE International Conference on Data Mining, pp. 470-479, 2014.
exclamation ICDM 2023 10-year Highest Impact Paper Award
exclamation One of nine papers invited to Knowledge and Information Systems journal ICDM-14 special issue
[Bibtex] [Abstract]  → Download PDF  → Access on publisher site

@InProceedings{PetitjeanEtAl14b,
author = {Petitjean, F. and Forestier, G. and Webb, G. I. and Nicholson, A. and Chen, Y. and Keogh, E.},
booktitle = {Proceedings of the 14th {IEEE} International Conference on Data Mining},
title = {Dynamic Time Warping Averaging of Time Series Allows Faster and More Accurate Classification},
year = {2014},
pages = {470-479},
abstract = {Recent years have seen significant progress in improving both the efficiency and effectiveness of time series classification. However, because the best solution is typically the Nearest Neighbor algorithm with the relatively expensive Dynamic Time Warping as the distance measure, successful deployments on resource constrained devices remain elusive. Moreover, the recent explosion of interest in wearable devices, which typically have limited computational resources, has created a growing need for very efficient classification algorithms. A commonly used technique to glean the benefits of the Nearest Neighbor algorithm, without inheriting its undesirable time complexity, is to use the Nearest Centroid algorithm. However, because of the unique properties of (most) time series data, the centroid typically does not resemble any of the instances, an unintuitive and underappreciated fact. In this work we show that we can exploit a recent result to allow meaningful averaging of 'warped' times series, and that this result allows us to create ultra-efficient Nearest 'Centroid' classifiers that are at least as accurate as their more lethargic Nearest Neighbor cousins.},
comment = {ICDM 2023 10-year Highest Impact Paper Award},
comment2 = {One of nine papers invited to Knowledge and Information Systems journal ICDM-14 special issue},
doi = {10.1109/ICDM.2014.27},
keywords = {time series},
related = {scalable-time-series-classifiers},
}
ABSTRACT Recent years have seen significant progress in improving both the efficiency and effectiveness of time series classification. However, because the best solution is typically the Nearest Neighbor algorithm with the relatively expensive Dynamic Time Warping as the distance measure, successful deployments on resource constrained devices remain elusive. Moreover, the recent explosion of interest in wearable devices, which typically have limited computational resources, has created a growing need for very efficient classification algorithms. A commonly used technique to glean the benefits of the Nearest Neighbor algorithm, without inheriting its undesirable time complexity, is to use the Nearest Centroid algorithm. However, because of the unique properties of (most) time series data, the centroid typically does not resemble any of the instances, an unintuitive and underappreciated fact. In this work we show that we can exploit a recent result to allow meaningful averaging of 'warped' times series, and that this result allows us to create ultra-efficient Nearest 'Centroid' classifiers that are at least as accurate as their more lethargic Nearest Neighbor cousins.