Data Scientist


Concept Drift: Learning From Non-Stationary Distributions

The world is dynamic, in a constant state of flux, while machine learning systems typically learn static models from historical data. Failure to account for the dynamic nature of the world may result in sub-optimal performance when these models of the past are used to predict the present or future. This research investigates this phenomenon of concept drift and how it is best addressed.

Our software for generating synthetic data streams with abrupt drift can be downloaded here.

Our system for describing the concept drift present in real-world data can be downloaded here.

Publications

Analyzing concept drift and shift from sample data.
Webb, G. I., Lee, L. K., Goethals, B., & Petitjean, F.
Data Mining and Knowledge Discovery, in press.
[PDF] [DOI] [Bibtex] [Abstract]

@Article{WebbEtAl18,
Title = {Analyzing concept drift and shift from sample data},
Author = {Geoffrey I Webb and Loong Kuan Lee and Bart Goethals and Francois Petitjean},
Journal = {Data Mining and Knowledge Discovery},
Year = {in press},
Abstract = {Concept drift and shift are major issues that greatly affect the accuracy and reliability of many real-world applications of machine learning. We propose a new data mining task, concept drift mapping - the description and analysis of instances of concept drift or shift. We argue that concept drift mapping is an essential prerequisite for tackling concept drift and shift. We propose tools for this purpose, arguing for the importance of quantitative descriptions of drift and shift in marginal distributions. We present quantitative concept drift mapping techniques, along with methods for visualizing their results. We illustrate their effectiveness for real-world applications across energy-pricing, vegetation monitoring and airline scheduling.},
Doi = {10.1007/s10618-018-0554-1},
Keywords = {Concept Drift},
Related = {learning-from-non-stationary-distributions},
Url = {http://www.springer.com/home?SGWID=0-0-1003-0-0&aqId=3479512&download=1&checkval=4f39c8d4d1ed469cfe8e3354b97e7a0b},
Urltext = {Link to prepublication draft}
}
ABSTRACT Concept drift and shift are major issues that greatly affect the accuracy and reliability of many real-world applications of machine learning. We propose a new data mining task, concept drift mapping - the description and analysis of instances of concept drift or shift. We argue that concept drift mapping is an essential prerequisite for tackling concept drift and shift. We propose tools for this purpose, arguing for the importance of quantitative descriptions of drift and shift in marginal distributions. We present quantitative concept drift mapping techniques, along with methods for visualizing their results. We illustrate their effectiveness for real-world applications across energy-pricing, vegetation monitoring and airline scheduling.

Extremely Fast Decsion Tree.
Manapragada, C., Webb, G. I., & Salehi, M.
arxiv, 1802.08780, 2018.
[URL] [Bibtex] [Abstract]

@Article{ManapragadaEtAl18,
Title = {Extremely Fast Decsion Tree},
Author = {Chaitanya Manapragada and Geoffrey I. Webb and Mahsa Salehi},
Journal = {arxiv},
Year = {2018},
Pages = {1802.08780},
Abstract = {We introduce a novel incremental decision tree learning algorithm, Hoeffding Anytime Tree, that is statistically more efficient than the current state-of-the-art, Hoeffding Tree. We demonstrate that an implementation of Hoeffding Anytime Tree---"Extremely Fast Decision Tree", a minor modification to the MOA implementation of Hoeffding Tree---obtains significantly superior prequential accuracy on most of the largest classification datasets from the UCI repository. Hoeffding Anytime Tree produces the asymptotic batch tree in the limit, is naturally resilient to concept drift, and can be used as a higher accuracy replacement for Hoeffding Tree in most scenarios, at a small additional computational cost.},
Keywords = {Concept Drift},
Url = {https://arxiv.org/abs/1802.08780}
}
ABSTRACT We introduce a novel incremental decision tree learning algorithm, Hoeffding Anytime Tree, that is statistically more efficient than the current state-of-the-art, Hoeffding Tree. We demonstrate that an implementation of Hoeffding Anytime Tree---"Extremely Fast Decision Tree", a minor modification to the MOA implementation of Hoeffding Tree---obtains significantly superior prequential accuracy on most of the largest classification datasets from the UCI repository. Hoeffding Anytime Tree produces the asymptotic batch tree in the limit, is naturally resilient to concept drift, and can be used as a higher accuracy replacement for Hoeffding Tree in most scenarios, at a small additional computational cost.

On the Inter-Relationships among Drift Rate, Forgetting Rate, Bias/Variance Profile and Error.
Zaidi, N. A., Webb, G. I., Petitjean, F., & Forestier, G.
arxiv, 1801.09354, 2018.
[URL] [Bibtex] [Abstract]

@Article{ZaidiEtAl18b,
Title = {On the Inter-Relationships among Drift Rate,
Forgetting Rate, Bias/Variance Profile and Error},
Author = {Nayyar A. Zaidi and Geoffrey I. Webb and
Francois Petitjean and Germain Forestier},
Journal = {arxiv},
Year = {2018},
Pages = {1801.09354},
Abstract = {We propose two general and falsifiable hypotheses about expectations on generalization error when learning in the context of concept drift. One posits that as drift rate increases, the forgetting rate that minimizes generalization error will also increase and vice versa. The other posits that as a learner's forgetting rate increases, the bias/variance profile that minimizes generalization error will have lower variance and vice versa. These hypotheses lead to the concept of the sweet path, a path through the 3-d space of alternative drift rates, forgetting rates and bias/variance profiles on which generalization error will be minimized, such that slow drift is coupled with low forgetting and low bias, while rapid drift is coupled with fast forgetting and low variance. We present experiments that support the existence of such a sweet path. We also demonstrate that simple learners that select appropriate forgetting rates and bias/variance profiles are highly competitive with the state-of-the-art in incremental learners for concept drift on real-world drift problems.},
Keywords = {Concept Drift},
Url = {https://arxiv.org/abs/1801.09354}
}
ABSTRACT We propose two general and falsifiable hypotheses about expectations on generalization error when learning in the context of concept drift. One posits that as drift rate increases, the forgetting rate that minimizes generalization error will also increase and vice versa. The other posits that as a learner's forgetting rate increases, the bias/variance profile that minimizes generalization error will have lower variance and vice versa. These hypotheses lead to the concept of the sweet path, a path through the 3-d space of alternative drift rates, forgetting rates and bias/variance profiles on which generalization error will be minimized, such that slow drift is coupled with low forgetting and low bias, while rapid drift is coupled with fast forgetting and low variance. We present experiments that support the existence of such a sweet path. We also demonstrate that simple learners that select appropriate forgetting rates and bias/variance profiles are highly competitive with the state-of-the-art in incremental learners for concept drift on real-world drift problems.

Characterizing Concept Drift.
Webb, G. I., Hyde, R., Cao, H., Nguyen, H. L., & Petitjean, F.
Data Mining and Knowledge Discovery, 30(4), 964-994, 2016.
[PDF] [DOI] [Bibtex] [Abstract]

@Article{WebbEtAl16,
Title = {Characterizing Concept Drift},
Author = {G.I. Webb and R. Hyde and H. Cao and H.L. Nguyen and F. Petitjean},
Journal = {Data Mining and Knowledge Discovery},
Year = {2016},
Number = {4},
Pages = {964-994},
Volume = {30},
Abstract = {Most machine learning models are static, but the world is dynamic, and increasing online deployment of learned models gives increasing urgency to the development of efficient and effective mechanisms to address learning in the context of non-stationary distributions, or as it is commonly called concept drift. However, the key issue of characterizing the different types of drift that can occur has not previously been subjected to rigorous definition and analysis. In particular, while some qualitative drift categorizations have been proposed, few have been formally defined, and the quantitative descriptions required for precise and objective understanding of learner performance have not existed. We present the first comprehensive framework for quantitative analysis of drift. This supports the development of the first comprehensive set of formal definitions of types of concept drift. The formal definitions clarify ambiguities and identify gaps in previous definitions, giving rise to a new comprehensive taxonomy of concept drift types and a solid foundation for research into mechanisms to detect and address concept drift.},
Doi = {10.1007/s10618-015-0448-4},
Keywords = {Concept Drift},
Related = {learning-from-non-stationary-distributions},
Url = {http://arxiv.org/abs/1511.03816},
Urltext = {Link to prepublication draft}
}
ABSTRACT Most machine learning models are static, but the world is dynamic, and increasing online deployment of learned models gives increasing urgency to the development of efficient and effective mechanisms to address learning in the context of non-stationary distributions, or as it is commonly called concept drift. However, the key issue of characterizing the different types of drift that can occur has not previously been subjected to rigorous definition and analysis. In particular, while some qualitative drift categorizations have been proposed, few have been formally defined, and the quantitative descriptions required for precise and objective understanding of learner performance have not existed. We present the first comprehensive framework for quantitative analysis of drift. This supports the development of the first comprehensive set of formal definitions of types of concept drift. The formal definitions clarify ambiguities and identify gaps in previous definitions, giving rise to a new comprehensive taxonomy of concept drift types and a solid foundation for research into mechanisms to detect and address concept drift.

Contrary to Popular Belief Incremental Discretization can be Sound, Computationally Efficient and Extremely Useful for Streaming Data.
Webb, G. I.
Proceedings of the 14th IEEE International Conference on Data Mining, pp. 1031-1036, 2014.
[PDF] [DOI] [Bibtex] [Abstract]

@InProceedings{Webb14,
Title = {Contrary to Popular Belief Incremental Discretization can be Sound, Computationally Efficient and Extremely Useful for Streaming Data},
Author = {G.I. Webb},
Booktitle = {Proceedings of the 14th {IEEE} International Conference on Data Mining},
Year = {2014},
Pages = {1031-1036},
Abstract = {Discretization of streaming data has received surprisingly
little attention. This might be because streaming data
require incremental discretization with cutpoints that may vary
over time and this is perceived as undesirable. We argue, to
the contrary, that it can be desirable for a discretization to
evolve in synchronization with an evolving data stream, even
when the learner assumes that attribute values. meanings remain
invariant over time. We examine the issues associated with
discretization in the context of distribution drift and develop
computationally efficient incremental discretization algorithms.
We show that discretization can reduce the error of a classical
incremental learner and that allowing a discretization to drift in
synchronization with distribution drift can further reduce error.},
Doi = {10.1109/ICDM.2014.123},
Keywords = {Concept Drift and Discretization and Incremental Learning and Stream mining},
Related = {learning-from-non-stationary-distributions}
}
ABSTRACT Discretization of streaming data has received surprisingly little attention. This might be because streaming data require incremental discretization with cutpoints that may vary over time and this is perceived as undesirable. We argue, to the contrary, that it can be desirable for a discretization to evolve in synchronization with an evolving data stream, even when the learner assumes that attribute values. meanings remain invariant over time. We examine the issues associated with discretization in the context of distribution drift and develop computationally efficient incremental discretization algorithms. We show that discretization can reduce the error of a classical incremental learner and that allowing a discretization to drift in synchronization with distribution drift can further reduce error.