The world is dynamic, in a constant state of flux, while machine learning systems typically learn static models from historical data. Failure to account for the dynamic nature of the world may result in sub-optimal performance when these models of the past are used to predict the present or future. This research investigates this phenomenon of concept drift and how it is best addressed.

Our software for generating synthetic data streams with abrupt drift can be downloaded here.

Our system for describing the concept drift present in real-world data can be downloaded here.

Our Extremely Fast Decision Tree (EFDT) module for MOA can be downloaded here.

### Publications

An Incremental Construction of Deep Neuro Fuzzy System for Continual Learning of Non-stationary Data Streams.

Pratama, M., Pedrycz, W., & Webb, G. I.

IEEE Transactions on Fuzzy Systems, 28(7), 1315-1328, 2020.

[Bibtex]

```
@Article{Pratama19,
author = {Pratama, M. and Pedrycz, W. and Webb, G. I.},
journal = {IEEE Transactions on Fuzzy Systems},
title = {An Incremental Construction of Deep Neuro Fuzzy System for Continual Learning of Non-stationary Data Streams},
year = {2020},
issn = {1063-6706},
number = {7},
pages = {1315-1328},
volume = {28},
doi = {10.1109/TFUZZ.2019.2939993},
keywords = {Concept Drift},
related = {learning-from-non-stationary-distributions},
}
```

**ABSTRACT**

PCA-based drift and shift quantification framework for multidimensional data.

Goldenberg, I., & Webb, G. I.

Knowledge and Information Systems, 62, 2835-2854, 2020.

[Bibtex] [Abstract]

```
@Article{Goldenberg2020,
author = {Goldenberg, Igor and Webb, Geoffrey I.},
journal = {Knowledge and Information Systems},
title = {PCA-based drift and shift quantification framework for multidimensional data},
year = {2020},
pages = {2835-2854},
volume = {62},
abstract = {Concept drift is a serious problem confronting machine learning systems in a dynamic and ever-changing world. In order to manage concept drift it may be useful to first quantify it by measuring the distance between distributions that generate data before and after a drift. There is a paucity of methods to do so in the case of multidimensional numeric data. This paper provides an in-depth analysis of the PCA-based change detection approach, identifies shortcomings of existing methods and shows how this approach can be used to measure a drift, not merely detect it.},
doi = {10.1007/s10115-020-01438-3},
groups = {[giwebb:]},
keywords = {Concept Drift},
related = {learning-from-non-stationary-distributions},
}
```

**ABSTRACT** Concept drift is a serious problem confronting machine learning systems in a dynamic and ever-changing world. In order to manage concept drift it may be useful to first quantify it by measuring the distance between distributions that generate data before and after a drift. There is a paucity of methods to do so in the case of multidimensional numeric data. This paper provides an in-depth analysis of the PCA-based change detection approach, identifies shortcomings of existing methods and shows how this approach can be used to measure a drift, not merely detect it.

Adaptive Online Extreme Learning Machine by Regulating Forgetting Factor by Concept Drift Map.

Yu, H., & Webb, G. I.

Neurocomputing, 343, 141-153, 2019.

[Bibtex] [Abstract]

```
@Article{yuetal2019,
Title = {Adaptive Online Extreme Learning Machine by Regulating Forgetting Factor by Concept Drift Map},
Author = {Yu, Hualong and Webb, Geoffrey I},
Journal = {Neurocomputing},
Year = {2019},
Pages = {141-153},
Volume = {343},
Abstract = {In online-learning, the data is incrementally received and the distributions from which it is drawn may keep changing over time. This phenomenon is widely known as concept drift. Such changes may affect the generalization of a learned model to future data. This problem may be exacerbated by the form of the drift itself changing over time. Quantitative measures to describe and analyze the concept drift have been proposed in previous work. A description composed from these measures is called a concept drift map. We believe that these maps could be useful for guiding how much knowledge in the old model should be forgotten. Therefore, this paper presents an adaptive online learning model that uses a concept drift map to regulate the forgetting factor of an extreme learning machine. Specifically, when a batch of new instances are labeled, the distribution of each class on each attribute is firstly estimated, and then it is compared with the distribution estimated in the previous batch to calculate the magnitude of concept drift, which is further used to regulate the forgetting factor and to update the learning model. Therefore, the novelty of this paper lies in that a quantitative distance metric between two distributions constructed on continuous attribute space is presented to construct concept drift map which can be further associated with the forgetting factor to make the learning model adapt the concept drift. Experimental results on several benchmark stream data sets show the proposed model is generally superior to several previous algorithms when classifying a variety of data streams subject to drift, indicating its effectiveness and feasibility.},
Doi = {10.1016/j.neucom.2018.11.098},
Keywords = {Concept Drift},
Publisher = {Elsevier},
Related = {learning-from-non-stationary-distributions}
}
```

**ABSTRACT** In online-learning, the data is incrementally received and the distributions from which it is drawn may keep changing over time. This phenomenon is widely known as concept drift. Such changes may affect the generalization of a learned model to future data. This problem may be exacerbated by the form of the drift itself changing over time. Quantitative measures to describe and analyze the concept drift have been proposed in previous work. A description composed from these measures is called a concept drift map. We believe that these maps could be useful for guiding how much knowledge in the old model should be forgotten. Therefore, this paper presents an adaptive online learning model that uses a concept drift map to regulate the forgetting factor of an extreme learning machine. Specifically, when a batch of new instances are labeled, the distribution of each class on each attribute is firstly estimated, and then it is compared with the distribution estimated in the previous batch to calculate the magnitude of concept drift, which is further used to regulate the forgetting factor and to update the learning model. Therefore, the novelty of this paper lies in that a quantitative distance metric between two distributions constructed on continuous attribute space is presented to construct concept drift map which can be further associated with the forgetting factor to make the learning model adapt the concept drift. Experimental results on several benchmark stream data sets show the proposed model is generally superior to several previous algorithms when classifying a variety of data streams subject to drift, indicating its effectiveness and feasibility.

Survey of distance measures for quantifying concept drift and shift in numeric data.

Goldenberg, I., & Webb, G. I.

Knowledge and Information Systems, 60(2), 591-615, 2019.

[Bibtex] [Abstract]

```
@Article{Goldenberg2018,
Title = {Survey of distance measures for quantifying concept drift and shift in numeric data},
Author = {Goldenberg, Igor and Webb, Geoffrey I.},
Journal = {Knowledge and Information Systems},
Year = {2019},
Number = {2},
Pages = {591-615},
Volume = {60},
Abstract = {Deployed machine learning systems are necessarily learned from historical data and are often applied to current data. When the world changes, the learned models can lose fidelity. Such changes to the statistical properties of data over time are known as concept drift. Similarly, models are often learned in one context, but need to be applied in another. This is called concept shift. Quantifying the magnitude of drift or shift, especially in the context of covariate drift or shift, or unsupervised learning, requires use of measures of distance between distributions. In this paper, we survey such distance measures with respect to their suitability for estimating drift and shift magnitude between samples of numeric data.},
Doi = {10.1007/s10115-018-1257-z},
ISSN = {0219-3116},
Keywords = {Concept Drift},
Related = {learning-from-non-stationary-distributions}
}
```

**ABSTRACT** Deployed machine learning systems are necessarily learned from historical data and are often applied to current data. When the world changes, the learned models can lose fidelity. Such changes to the statistical properties of data over time are known as concept drift. Similarly, models are often learned in one context, but need to be applied in another. This is called concept shift. Quantifying the magnitude of drift or shift, especially in the context of covariate drift or shift, or unsupervised learning, requires use of measures of distance between distributions. In this paper, we survey such distance measures with respect to their suitability for estimating drift and shift magnitude between samples of numeric data.

Extremely Fast Decision Tree.

Manapragada, C., Webb, G. I., & Salehi, M.

Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, NY, USA, pp. 1953–1962, 2018.

[Bibtex] [Abstract]

```
@InProceedings{ManapragadaEtAl18,
Title = {Extremely Fast Decision Tree},
Author = {Manapragada, Chaitanya and Webb, Geoffrey I. and Salehi, Mahsa},
Booktitle = {Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining},
Year = {2018},
Address = {New York, NY, USA},
Pages = {1953--1962},
Publisher = {ACM},
Series = {KDD '18},
Abstract = {We introduce a novel incremental decision tree learning algorithm, Hoeffding Anytime Tree, that is statistically more efficient than the current state-of-the-art, Hoeffding Tree. We demonstrate that an implementation of Hoeffding Anytime Tree---"Extremely Fast Decision Tree", a minor modification to the MOA implementation of Hoeffding Tree---obtains significantly superior prequential accuracy on most of the largest classification datasets from the UCI repository. Hoeffding Anytime Tree produces the asymptotic batch tree in the limit, is naturally resilient to concept drift, and can be used as a higher accuracy replacement for Hoeffding Tree in most scenarios, at a small additional computational cost.},
Acmid = {3220005},
Doi = {10.1145/3219819.3220005},
ISBN = {978-1-4503-5552-0},
Keywords = {Concept Drift},
Location = {London, United Kingdom},
Related = {learning-from-non-stationary-distributions}
}
```

**ABSTRACT** We introduce a novel incremental decision tree learning algorithm, Hoeffding Anytime Tree, that is statistically more efficient than the current state-of-the-art, Hoeffding Tree. We demonstrate that an implementation of Hoeffding Anytime Tree–-"Extremely Fast Decision Tree", a minor modification to the MOA implementation of Hoeffding Tree–-obtains significantly superior prequential accuracy on most of the largest classification datasets from the UCI repository. Hoeffding Anytime Tree produces the asymptotic batch tree in the limit, is naturally resilient to concept drift, and can be used as a higher accuracy replacement for Hoeffding Tree in most scenarios, at a small additional computational cost.

On the Inter-Relationships among Drift Rate, Forgetting Rate, Bias/Variance Profile and Error.

Zaidi, N. A., Webb, G. I., Petitjean, F., & Forestier, G.

arxiv, 1801.09354, 2018.

[Bibtex] [Abstract]

```
@Article{ZaidiEtAl18b,
Title = {On the Inter-Relationships among Drift Rate,
Forgetting Rate, Bias/Variance Profile and Error},
Author = {Zaidi, Nayyar A. and Webb, Geoffrey I. and Petitjean, Francois and Forestier, Germain},
Journal = {arxiv},
Year = {2018},
Pages = {1801.09354},
Abstract = {We propose two general and falsifiable hypotheses about expectations on generalization error when learning in the context of concept drift. One posits that as drift rate increases, the forgetting rate that minimizes generalization error will also increase and vice versa. The other posits that as a learner's forgetting rate increases, the bias/variance profile that minimizes generalization error will have lower variance and vice versa. These hypotheses lead to the concept of the sweet path, a path through the 3-d space of alternative drift rates, forgetting rates and bias/variance profiles on which generalization error will be minimized, such that slow drift is coupled with low forgetting and low bias, while rapid drift is coupled with fast forgetting and low variance. We present experiments that support the existence of such a sweet path. We also demonstrate that simple learners that select appropriate forgetting rates and bias/variance profiles are highly competitive with the state-of-the-art in incremental learners for concept drift on real-world drift problems.},
Keywords = {Concept Drift},
Related = {learning-from-non-stationary-distributions},
Url = {https://arxiv.org/abs/1801.09354}
}
```

**ABSTRACT** We propose two general and falsifiable hypotheses about expectations on generalization error when learning in the context of concept drift. One posits that as drift rate increases, the forgetting rate that minimizes generalization error will also increase and vice versa. The other posits that as a learner's forgetting rate increases, the bias/variance profile that minimizes generalization error will have lower variance and vice versa. These hypotheses lead to the concept of the sweet path, a path through the 3-d space of alternative drift rates, forgetting rates and bias/variance profiles on which generalization error will be minimized, such that slow drift is coupled with low forgetting and low bias, while rapid drift is coupled with fast forgetting and low variance. We present experiments that support the existence of such a sweet path. We also demonstrate that simple learners that select appropriate forgetting rates and bias/variance profiles are highly competitive with the state-of-the-art in incremental learners for concept drift on real-world drift problems.

Analyzing concept drift and shift from sample data.

Webb, G. I., Lee, L. K., Goethals, B., & Petitjean, F.

Data Mining and Knowledge Discovery, 32(5), 1179-1199, 2018.

[Bibtex] [Abstract]

```
@Article{WebbEtAl18,
Title = {Analyzing concept drift and shift from sample data},
Author = {Webb, Geoffrey I and Lee, Loong Kuan and Goethals, Bart and Petitjean, Francois},
Journal = {Data Mining and Knowledge Discovery},
Year = {2018},
Number = {5},
Pages = {1179-1199},
Volume = {32},
Abstract = {Concept drift and shift are major issues that greatly affect the accuracy and reliability of many real-world applications of machine learning. We propose a new data mining task, concept drift mapping - the description and analysis of instances of concept drift or shift. We argue that concept drift mapping is an essential prerequisite for tackling concept drift and shift. We propose tools for this purpose, arguing for the importance of quantitative descriptions of drift and shift in marginal distributions. We present quantitative concept drift mapping techniques, along with methods for visualizing their results. We illustrate their effectiveness for real-world applications across energy-pricing, vegetation monitoring and airline scheduling.},
Doi = {10.1007/s10618-018-0554-1},
Keywords = {Concept Drift},
Related = {learning-from-non-stationary-distributions},
Url = {http://rdcu.be/IUTI}
}
```

**ABSTRACT** Concept drift and shift are major issues that greatly affect the accuracy and reliability of many real-world applications of machine learning. We propose a new data mining task, concept drift mapping - the description and analysis of instances of concept drift or shift. We argue that concept drift mapping is an essential prerequisite for tackling concept drift and shift. We propose tools for this purpose, arguing for the importance of quantitative descriptions of drift and shift in marginal distributions. We present quantitative concept drift mapping techniques, along with methods for visualizing their results. We illustrate their effectiveness for real-world applications across energy-pricing, vegetation monitoring and airline scheduling.

Characterizing Concept Drift.

Webb, G. I., Hyde, R., Cao, H., Nguyen, H. L., & Petitjean, F.

Data Mining and Knowledge Discovery, 30(4), 964-994, 2016.

[Bibtex] [Abstract]

```
@Article{WebbEtAl16,
author = {Webb, G. I. and Hyde, R. and Cao, H. and Nguyen, H. L. and Petitjean, F.},
journal = {Data Mining and Knowledge Discovery},
title = {Characterizing Concept Drift},
year = {2016},
number = {4},
pages = {964-994},
volume = {30},
abstract = {Most machine learning models are static, but the world is dynamic, and increasing online deployment of learned models gives increasing urgency to the development of efficient and effective mechanisms to address learning in the context of non-stationary distributions, or as it is commonly called concept drift. However, the key issue of characterizing the different types of drift that can occur has not previously been subjected to rigorous definition and analysis. In particular, while some qualitative drift categorizations have been proposed, few have been formally defined, and the quantitative descriptions required for precise and objective understanding of learner performance have not existed. We present the first comprehensive framework for quantitative analysis of drift. This supports the development of the first comprehensive set of formal definitions of types of concept drift. The formal definitions clarify ambiguities and identify gaps in previous definitions, giving rise to a new comprehensive taxonomy of concept drift types and a solid foundation for research into mechanisms to detect and address concept drift.},
doi = {10.1007/s10618-015-0448-4},
keywords = {Concept Drift},
related = {learning-from-non-stationary-distributions},
url = {https://rdcu.be/7vMN},
}
```

**ABSTRACT** Most machine learning models are static, but the world is dynamic, and increasing online deployment of learned models gives increasing urgency to the development of efficient and effective mechanisms to address learning in the context of non-stationary distributions, or as it is commonly called concept drift. However, the key issue of characterizing the different types of drift that can occur has not previously been subjected to rigorous definition and analysis. In particular, while some qualitative drift categorizations have been proposed, few have been formally defined, and the quantitative descriptions required for precise and objective understanding of learner performance have not existed. We present the first comprehensive framework for quantitative analysis of drift. This supports the development of the first comprehensive set of formal definitions of types of concept drift. The formal definitions clarify ambiguities and identify gaps in previous definitions, giving rise to a new comprehensive taxonomy of concept drift types and a solid foundation for research into mechanisms to detect and address concept drift.

Contrary to Popular Belief Incremental Discretization can be Sound, Computationally Efficient and Extremely Useful for Streaming Data.

Webb, G. I.

Proceedings of the 14th IEEE International Conference on Data Mining, pp. 1031-1036, 2014.

[Bibtex] [Abstract]

```
@InProceedings{Webb14,
author = {Webb, G. I.},
booktitle = {Proceedings of the 14th {IEEE} International Conference on Data Mining},
title = {Contrary to Popular Belief Incremental Discretization can be Sound, Computationally Efficient and Extremely Useful for Streaming Data},
year = {2014},
pages = {1031-1036},
abstract = {Discretization of streaming data has received surprisingly
little attention. This might be because streaming data
require incremental discretization with cutpoints that may vary
over time and this is perceived as undesirable. We argue, to
the contrary, that it can be desirable for a discretization to
evolve in synchronization with an evolving data stream, even
when the learner assumes that attribute values. meanings remain
invariant over time. We examine the issues associated with
discretization in the context of distribution drift and develop
computationally efficient incremental discretization algorithms.
We show that discretization can reduce the error of a classical
incremental learner and that allowing a discretization to drift in
synchronization with distribution drift can further reduce error.},
doi = {10.1109/ICDM.2014.123},
keywords = {Concept Drift and Discretization and Incremental Learning and Stream mining},
related = {learning-from-non-stationary-distributions},
}
```

**ABSTRACT** Discretization of streaming data has received surprisingly little attention. This might be because streaming data require incremental discretization with cutpoints that may vary over time and this is perceived as undesirable. We argue, to the contrary, that it can be desirable for a discretization to evolve in synchronization with an evolving data stream, even when the learner assumes that attribute values. meanings remain invariant over time. We examine the issues associated with discretization in the context of distribution drift and develop computationally efficient incremental discretization algorithms. We show that discretization can reduce the error of a classical incremental learner and that allowing a discretization to drift in synchronization with distribution drift can further reduce error.