Discretization for Naive Bayes

Naive Bayes has distinct requirements of discretization due to its attribute independence assumption. This work provides theoretical analysis of those requirements and new techniques that improve classification accuracy.

PKIDiscretize is a standard Weka component that implements our proportional k-interval discretization technique.

Publications

A Comparative Study of Bandwidth Choice in Kernel Density Estimation for Naive Bayesian Classification.
Liu, B., Yang, Y., Webb, G. I., & Boughton, J.
Proceedings of the 13th Pacific-Asia Conference, PAKDD 2009, Berlin/Heidelberg, pp. 302-313, 2009.
[Bibtex] → Download PDF → Access on publisher site

@InProceedings{LiuYangWebbBoughton09,
author = {Liu, B. and Yang, Y. and Webb, G. I. and Boughton, J.},
booktitle = {Proceedings of the 13th {Pacific}-{Asia} Conference, PAKDD 2009},
title = {A Comparative Study of Bandwidth Choice in Kernel Density Estimation for Naive Bayesian Classification},
year = {2009},
address = {Berlin/Heidelberg},
pages = {302-313},
publisher = {Springer},
keywords = {Conditional Probability Estimation and AODE and Discretization for Naive Bayes},
location = {Bangkok, Thailand},
related = {discretization-for-naive-bayes},
doi = {10.1007/2F978-3-642-01307-2_29},
}

ABSTRACT

Discretization for Naive-Bayes Learning: Managing Discretization Bias and Variance.
Yang, Y., & Webb, G. I.
Machine Learning, 74(1), 39-74, 2009.
[Bibtex] [Abstract] → Access on publisher site

@Article{YangWebb09,
author = {Yang, Y. and Webb, G. I.},
journal = {Machine Learning},
title = {Discretization for Naive-Bayes Learning: Managing Discretization Bias and Variance},
year = {2009},
number = {1},
pages = {39-74},
volume = {74},
abstract = {Quantitative attributes are usually discretized in Naive-Bayes learning. We establish simple conditions under which discretization is equivalent to use of the true probability density function during naive-Bayes learning. The use of different discretization techniques can be expected to affect the classification bias and variance of generated naive-Bayes classifiers, effects we name discretization bias and variance. We argue that by properly managing discretization bias and variance, we can effectively reduce naive-Bayes classification error. In particular, we supply insights into managing discretization bias and variance by adjusting the number of intervals and the number of training instances contained in each interval. We accordingly propose proportional discretization and fixed frequency discretization, two efficient unsupervised discretization methods that are able to effectively manage discretization bias and variance. We evaluate our new techniques against four key discretization methods for naive-Bayes classifiers. The experimental results support our theoretical analyses by showing that with statistically significant frequency, naive-Bayes classifiers trained on data discretized by our new methods are able to achieve lower classification error than those trained on data discretized by current established discretization methods.},
address = {Netherlands},
doi = {10.1007/s10994-008-5083-5},
keywords = {Discretization for Naive Bayes and Conditional Probability Estimation and AODE},
publisher = {Springer},
related = {discretization-for-naive-bayes},
}

ABSTRACT Quantitative attributes are usually discretized in Naive-Bayes learning. We establish simple conditions under which discretization is equivalent to use of the true probability density function during naive-Bayes learning. The use of different discretization techniques can be expected to affect the classification bias and variance of generated naive-Bayes classifiers, effects we name discretization bias and variance. We argue that by properly managing discretization bias and variance, we can effectively reduce naive-Bayes classification error. In particular, we supply insights into managing discretization bias and variance by adjusting the number of intervals and the number of training instances contained in each interval. We accordingly propose proportional discretization and fixed frequency discretization, two efficient unsupervised discretization methods that are able to effectively manage discretization bias and variance. We evaluate our new techniques against four key discretization methods for naive-Bayes classifiers. The experimental results support our theoretical analyses by showing that with statistically significant frequency, naive-Bayes classifiers trained on data discretized by our new methods are able to achieve lower classification error than those trained on data discretized by current established discretization methods.

Incremental Discretization for Naive-Bayes Classifier.
Lu, J., Yang, Y., & Webb, G. I.
Lecture Notes in Computer Science 4093: Proceedings of the Second International Conference on Advanced Data Mining and Applications (ADMA 2006), Berlin, pp. 223-238, 2006.
[Bibtex] [Abstract] → Download PDF

@InProceedings{LuYangWebb06,
author = {Lu, J. and Yang, Y. and Webb, G. I.},
booktitle = {Lecture Notes in Computer Science 4093: Proceedings of the Second International Conference on Advanced Data Mining and Applications (ADMA 2006)},
title = {Incremental Discretization for Naive-Bayes Classifier},
year = {2006},
address = {Berlin},
editor = {Li, Xue and Zaiane, Osmar R. and Li, Zhanhuai},
pages = {223-238},
publisher = {Springer},
abstract = {Naive-Bayes classifiers (NB) support incremental learning. However, the lack of effective incremental discretization methods has been hindering NB's incremental learning in face of quantitative data. This problem is further compounded by the fact that quantitative data are everywhere, from temperature readings to share prices. In this paper, we present a novel incremental discretization method for NB, incremental flexible frequency discretization (IFFD). IFFD discretizes values of a quantitative attribute into a sequence of intervals of flexible sizes. It allows online insertion and splitting operation on intervals. Theoretical analysis and experimental test are conducted to compare IFFD with alternative methods. Empirical evidence suggests that IFFD is efficient and effective. NB coupled with IFFD achieves a rapport between high learning efficiency and high classification accuracy in the context of incremental learning.},
keywords = {Conditional Probability Estimation and Discretization for Naive Bayes and Incremental Learning and Stream Mining},
location = {Xian, China},
related = {discretization-for-naive-bayes},
}

ABSTRACT Naive-Bayes classifiers (NB) support incremental learning. However, the lack of effective incremental discretization methods has been hindering NB's incremental learning in face of quantitative data. This problem is further compounded by the fact that quantitative data are everywhere, from temperature readings to share prices. In this paper, we present a novel incremental discretization method for NB, incremental flexible frequency discretization (IFFD). IFFD discretizes values of a quantitative attribute into a sequence of intervals of flexible sizes. It allows online insertion and splitting operation on intervals. Theoretical analysis and experimental test are conducted to compare IFFD with alternative methods. Empirical evidence suggests that IFFD is efficient and effective. NB coupled with IFFD achieves a rapport between high learning efficiency and high classification accuracy in the context of incremental learning.

Weighted Proportional k-Interval Discretization for Naive-Bayes Classifiers.
Yang, Y., & Webb, G. I.
Lecture Notes in Artificial Intelligence Vol. 2637: Proceedings of the Seventh Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD'03), Berlin/Heidelberg, pp. 501-512, 2003.
[Bibtex] [Abstract] → Download PDF

@InProceedings{YangWebb03,
author = {Yang, Y. and Webb, G. I.},
booktitle = {Lecture Notes in Artificial Intelligence Vol. 2637: Proceedings of the Seventh {Pacific}-{Asia} Conference on Knowledge Discovery and Data Mining (PAKDD'03)},
title = {Weighted Proportional k-Interval Discretization for Naive-Bayes Classifiers},
year = {2003},
address = {Berlin/Heidelberg},
editor = {Whang, K-Y. and Jeon, J. and Shim, K. and Srivastava, J.},
pages = {501-512},
publisher = {Springer-Verlag},
abstract = {The use of different discretization techniques can be expected to affect the bias and variance of a learning algorithm. We call such an effect discretization bias and variance. Proportional k-interval discretization (PKID) tunes discretization bias and variance by adjusting discretized interval size and number proportional to the number of training instances. Theoretical analysis suggests that this is desirable for naive-Bayes classifiers. However PKID has sub-optimal performance when learning from small training data. We argue that this is because PKID equally weighs bias reduction and variance reduction. But for small data, variance reduction can contribute more to lower learning error and thus should be given greater weight than bias reduction. Accordingly we propose weighted proportional k-interval discretization (WPKID), which establishes a more suitable bias and variance trade-off for small data while allowing additional training data to be used to reduce both bias and variance. Our experiments demonstrate that for naive-Bayes classifiers, WPKID improves upon PKID for smaller datasets with significant frequency; and WPKID delivers lower classification error significantly more often than not in comparison to the other three leading alternative discretization techniques studied.},
keywords = {Discretization for Naive Bayes},
location = {Seoul, Korea},
related = {discretization-for-naive-bayes},
}

ABSTRACT The use of different discretization techniques can be expected to affect the bias and variance of a learning algorithm. We call such an effect discretization bias and variance. Proportional k-interval discretization (PKID) tunes discretization bias and variance by adjusting discretized interval size and number proportional to the number of training instances. Theoretical analysis suggests that this is desirable for naive-Bayes classifiers. However PKID has sub-optimal performance when learning from small training data. We argue that this is because PKID equally weighs bias reduction and variance reduction. But for small data, variance reduction can contribute more to lower learning error and thus should be given greater weight than bias reduction. Accordingly we propose weighted proportional k-interval discretization (WPKID), which establishes a more suitable bias and variance trade-off for small data while allowing additional training data to be used to reduce both bias and variance. Our experiments demonstrate that for naive-Bayes classifiers, WPKID improves upon PKID for smaller datasets with significant frequency; and WPKID delivers lower classification error significantly more often than not in comparison to the other three leading alternative discretization techniques studied.

On Why Discretization Works for Naive-Bayes Classifiers.
Yang, Y., & Webb, G. I.
Lecture Notes in Artificial Intelligence Vol. 2903: Proceedings of the 16th Australian Conference on Artificial Intelligence (AI 03), Berlin/Heidelberg, pp. 440-452, 2003.
[Bibtex] [Abstract] → Download PDF

@InProceedings{YangWebb03c,
Title = {On Why Discretization Works for Naive-Bayes Classifiers},
Author = {Yang, Y. and Webb, G. I.},
Booktitle = {Lecture Notes in Artificial Intelligence Vol. 2903: Proceedings of the 16th Australian Conference on Artificial Intelligence (AI 03)},
Year = {2003},
Address = {Berlin/Heidelberg},
Editor = {Gedeon, T.D. and Fung, L.C.C.},
Pages = {440-452},
Publisher = {Springer},
Abstract = {We investigate why discretization is effective in naive-Bayes learning. We prove a theorem that identifies particular conditions under which discretization will result in naive Bayes classifiers delivering the same probability estimates as would be obtained if the correct probability density functions were employed. We discuss the factors that might affect naive-Bayes classification error under discretization. We suggest that the use of different discretization techniques can affect the classification bias and variance of the generated classifiers, an effect named discretization bias and variance. We argue that by properly managing discretization bias and variance, we can effectively reduce naive-Bayes classification error},
Keywords = {Discretization for Naive Bayes},
Location = {Perth, Australia},
Related = {discretization-for-naive-bayes}
}

ABSTRACT We investigate why discretization is effective in naive-Bayes learning. We prove a theorem that identifies particular conditions under which discretization will result in naive Bayes classifiers delivering the same probability estimates as would be obtained if the correct probability density functions were employed. We discuss the factors that might affect naive-Bayes classification error under discretization. We suggest that the use of different discretization techniques can affect the classification bias and variance of the generated classifiers, an effect named discretization bias and variance. We argue that by properly managing discretization bias and variance, we can effectively reduce naive-Bayes classification error

Non-Disjoint Discretization for Naive-Bayes Classifiers.
Yang, Y., & Webb, G. I.
Proceedings of the Nineteenth International Conference on Machine Learning (ICML '02), San Francisco, pp. 666-673, 2002.
[Bibtex] [Abstract] → Download PDF

@InProceedings{YangWebb02b,
Title = {Non-Disjoint Discretization for Naive-Bayes Classifiers},
Author = {Yang, Y. and Webb, G. I.},
Booktitle = {Proceedings of the Nineteenth International Conference on Machine Learning (ICML '02)},
Year = {2002},
Address = {San Francisco},
Editor = {Sammut, C. and Hoffmann, A.G.},
Pages = {666-673},
Publisher = {Morgan Kaufmann},
Abstract = {Previous discretization techniques have discretized numeric attributes into disjoint intervals. We argue that this is neither necessary nor appropriate for naive-Bayes classifiers. The analysis leads to a new discretization method, Non-Disjoint Discretization (NDD). NDD forms overlapping intervals for a numeric attribute, always locating a value toward the middle of its discretized interval to obtain more reliable probability estimation. It also adjusts the number and size of discretized intervals to the number of training instances, seeking an appropriate trade-off between bias and variance of probability estimation. We justify NDD in theory and test it on a wide cross-section of datasets. Our experimental results suggest that for naive-Bayes classifiers, NDD works better than alternative discretization approaches.},
Audit-trail = {Posted by Ying at http://www.cs.uvm.edu/~yyang/ndd.pdf No link on GW page - 9/2/05 requested permission},
Keywords = {Discretization for Naive Bayes},
Location = {Sydney, Australia},
Related = {discretization-for-naive-bayes}
}

ABSTRACT Previous discretization techniques have discretized numeric attributes into disjoint intervals. We argue that this is neither necessary nor appropriate for naive-Bayes classifiers. The analysis leads to a new discretization method, Non-Disjoint Discretization (NDD). NDD forms overlapping intervals for a numeric attribute, always locating a value toward the middle of its discretized interval to obtain more reliable probability estimation. It also adjusts the number and size of discretized intervals to the number of training instances, seeking an appropriate trade-off between bias and variance of probability estimation. We justify NDD in theory and test it on a wide cross-section of datasets. Our experimental results suggest that for naive-Bayes classifiers, NDD works better than alternative discretization approaches.

A Comparative Study of Discretization Methods for Naive-Bayes Classifiers.
Yang, Y., & Webb, G. I.
Proceedings of the 2002 Pacific Rim Knowledge Acquisition Workshop (PKAW'02), Tokyo, pp. 159-173, 2002.
[Bibtex] [Abstract] → Download PDF

@InProceedings{YangWebb02a,
Title = {A Comparative Study of Discretization Methods for Naive-Bayes Classifiers},
Author = {Yang, Y. and Webb, G. I.},
Booktitle = {Proceedings of the 2002 {Pacific} Rim Knowledge Acquisition Workshop (PKAW'02)},
Year = {2002},
Address = {Tokyo},
Editor = {Yamaguchi, T. and Hoffmann, A. and Motoda, H. and Compton, P.},
Pages = {159-173},
Publisher = {Japanese Society for Artificial Intelligence},
Abstract = {Discretization is a popular approach to handling numeric attributes in machine learning. We argue that the requirements for effective discretization differ between naive-Bayes learning and many other learning algorithms. We evaluate the effectiveness with naive-Bayes classifiers of nine discretization methods, equal width discretization (EWD), equal frequency discretization (EFD), fuzzy discretization (FD), entropy minimization discretization (EMD), iterative discretization (ID), proportional k-interval discretization (PKID), lazy discretization (LD), non-disjoint discretization (NDD) and weighted proportional k-interval discretization (WPKID). It is found that in general naive-Bayes classifiers trained on data preprocessed by LD, NDD or WPKID achieve lower classification error than those trained on data preprocessed by the other discretization methods. But LD can not scale to large data. This study leads to a new discretization method, weighted non-disjoint discretization (WNDD) that combines WPKID and NDD's advantages. Our experiments show that among all the rival discretization methods, WNDD best helps naive-Bayes classifiers reduce average classification error.},
Keywords = {Discretization for Naive Bayes},
Location = {Tokyo, Japan},
Related = {discretization-for-naive-bayes}
}

ABSTRACT Discretization is a popular approach to handling numeric attributes in machine learning. We argue that the requirements for effective discretization differ between naive-Bayes learning and many other learning algorithms. We evaluate the effectiveness with naive-Bayes classifiers of nine discretization methods, equal width discretization (EWD), equal frequency discretization (EFD), fuzzy discretization (FD), entropy minimization discretization (EMD), iterative discretization (ID), proportional k-interval discretization (PKID), lazy discretization (LD), non-disjoint discretization (NDD) and weighted proportional k-interval discretization (WPKID). It is found that in general naive-Bayes classifiers trained on data preprocessed by LD, NDD or WPKID achieve lower classification error than those trained on data preprocessed by the other discretization methods. But LD can not scale to large data. This study leads to a new discretization method, weighted non-disjoint discretization (WNDD) that combines WPKID and NDD's advantages. Our experiments show that among all the rival discretization methods, WNDD best helps naive-Bayes classifiers reduce average classification error.

Proportional K-Interval Discretization for Naive-Bayes Classifiers.
Yang, Y., & Webb, G. I.
Lecture Notes in Computer Science 2167: Proceedings of the 12th European Conference on Machine Learning (ECML'01), Berlin/Heidelberg, pp. 564-575, 2001.
[Bibtex] [Abstract] → Download PDF

@InProceedings{YangWebb01,
author = {Yang, Y. and Webb, G. I.},
booktitle = {Lecture Notes in Computer Science 2167: Proceedings of the 12th European Conference on Machine Learning (ECML'01)},
title = {Proportional K-Interval Discretization for Naive-Bayes Classifiers},
year = {2001},
address = {Berlin/Heidelberg},
editor = {DeRaedt, L. and Flach, P. A.},
pages = {564-575},
publisher = {Springer-Verlag},
abstract = {This paper argues that two commonly-used discretization approaches, fixed k-interval discretization and entropy-based discretization have sub-optimal characteristics for naive-Bayes classification. This analysis leads to a new discretization method, Proportional k-Interval Discretization (PKID), which adjusts the number and size of discretized intervals to the number of training instances, thus seeks an appropriate trade-off between the bias and variance of the probability estimation for naive-Bayes classifiers. We justify PKID in theory, as well as test it on a wide cross-section of datasets. Our experimental results suggest that in comparison to its alternatives, PKID provides naive-Bayes classifiers competitive classification performance for smaller datasets and better classification performance for larger datasets.},
keywords = {Discretization for Naive Bayes},
location = {Freiburg, Germany},
related = {discretization-for-naive-bayes},
}

ABSTRACT This paper argues that two commonly-used discretization approaches, fixed k-interval discretization and entropy-based discretization have sub-optimal characteristics for naive-Bayes classification. This analysis leads to a new discretization method, Proportional k-Interval Discretization (PKID), which adjusts the number and size of discretized intervals to the number of training instances, thus seeks an appropriate trade-off between the bias and variance of the probability estimation for naive-Bayes classifiers. We justify PKID in theory, as well as test it on a wide cross-section of datasets. Our experimental results suggest that in comparison to its alternatives, PKID provides naive-Bayes classifiers competitive classification performance for smaller datasets and better classification performance for larger datasets.