**Bias and variance** provide a powerful conceptual tool for analyzing classification performance. Â Previous approaches to conducting bias-variance experiments have provided little control over the types of data distribution from which bias and variance are estimated. Â I have developed new techniques for bias-variance analysis that provide greater control over the data distribution.Â Experiments show that the type of distribution used for bias-variance experiments can greatly affect the results obtained.

### Publications

Estimating bias and variance from data.

Webb, G. I., & Conilione, P.

(2004). Unpublished manuscript.

[Bibtex] [Abstract]

```
@Unpublished{WebbConilione04,
Title = {Estimating bias and variance from data},
Author = {Webb, Geoffrey I and Conilione, Paul},
Note = {Unpublished manuscript},
Year = {2004},
Abstract = {The bias-variance decomposition of error provides useful insights into the error performance of a classifier as it is applied to di#erent types of learning task. Most notably, it has been used to explain the extraordinary e#ectiveness of ensemble learning techniques. It is important that the research community have e#ective tools for assessing such explanations. To this end, techniques have been developed for estimating bias and variance from data. The most widely deployed of these uses repeated sub-sampling with a holdout set. We argue, with empirical support, that this approach has serious limitations. First, it provides very little flexibility in the types of distributions of training sets that may be studied. It requires that the training sets be relatively small and that the degree of variation between training sets be very circumscribed. Second, the approach leads to bias and variance estimates that have high statistical variance and hence low reliability. We develop an alternative method that is based on cross-validation. We show that this method allows far greater flexibility in the types of distribution that are examined and that the estimates derived are much more stable. Finally, we show that changing the distributions of training sets from which bias and variance estimates are drawn can alter substantially the bias and variance estimates that are derived.},
Keywords = {Learning from large datasets and Bias-Variance}
}
```

**ABSTRACT** The bias-variance decomposition of error provides useful insights into the error performance of a classifier as it is applied to di#erent types of learning task. Most notably, it has been used to explain the extraordinary e#ectiveness of ensemble learning techniques. It is important that the research community have e#ective tools for assessing such explanations. To this end, techniques have been developed for estimating bias and variance from data. The most widely deployed of these uses repeated sub-sampling with a holdout set. We argue, with empirical support, that this approach has serious limitations. First, it provides very little flexibility in the types of distributions of training sets that may be studied. It requires that the training sets be relatively small and that the degree of variation between training sets be very circumscribed. Second, the approach leads to bias and variance estimates that have high statistical variance and hence low reliability. We develop an alternative method that is based on cross-validation. We show that this method allows far greater flexibility in the types of distribution that are examined and that the estimates derived are much more stable. Finally, we show that changing the distributions of training sets from which bias and variance estimates are drawn can alter substantially the bias and variance estimates that are derived.

The Need for Low Bias Algorithms in Classification Learning From Large Data Sets.

Brain, D., & Webb, G. I.

Lecture Notes in Computer Science 2431: Principles of Data Mining and Knowledge Discovery: Proceedings of the Sixth European Conference (PKDD 2002), Berlin/Heidelberg, pp. 62-73, 2002.

[Bibtex] [Abstract]

```
@InProceedings{BrainWebb02,
Title = {The Need for Low Bias Algorithms in Classification Learning From Large Data Sets},
Author = { D. Brain and G.I. Webb},
Booktitle = {Lecture Notes in Computer Science 2431: Principles of Data Mining and Knowledge Discovery: Proceedings of the Sixth European Conference (PKDD 2002)},
Year = {2002},
Address = {Berlin/Heidelberg},
Pages = {62-73},
Publisher = {Springer-Verlag},
Abstract = {This paper reviews the appropriateness for application to large data sets of standard machine learning algorithms, which were mainly developed in the context of small data sets. Sampling and parallelization have proved useful means for reducing computation time when learning from large data sets. However, such methods assume that algorithms that were designed for use with what are now considered small data sets are also fundamentally suitable for large data sets. It is plausible that optimal learning from large data sets requires a different type of algorithm to optimal learning from small data sets. This paper investigates one respect in which data set size may affect the requirements of a learning algorithm û the bias plus variance decomposition of classification error. Experiments show that learning from large data sets may be more effective when using an algorithm that places greater emphasis on bias management, rather than variance management},
Audit-trail = {http://link.springer.de/link/service/series/0558/bibs/2431/24310062.htm},
Keywords = {Learning from large datasets and Bias-Variance},
Location = {Helsinki, Finland},
Related = {learning-from-large-datasets}
}
```

**ABSTRACT** This paper reviews the appropriateness for application to large data sets of standard machine learning algorithms, which were mainly developed in the context of small data sets. Sampling and parallelization have proved useful means for reducing computation time when learning from large data sets. However, such methods assume that algorithms that were designed for use with what are now considered small data sets are also fundamentally suitable for large data sets. It is plausible that optimal learning from large data sets requires a different type of algorithm to optimal learning from small data sets. This paper investigates one respect in which data set size may affect the requirements of a learning algorithm û the bias plus variance decomposition of classification error. Experiments show that learning from large data sets may be more effective when using an algorithm that places greater emphasis on bias management, rather than variance management

MultiBoosting: A Technique for Combining Boosting and Wagging.

Webb, G. I.

Machine Learning, 40(2), 159-196, 2000.

[Bibtex] [Abstract]

```
@Article{Webb00a,
Title = {MultiBoosting: A Technique for Combining Boosting and Wagging},
Author = {G. I. Webb},
Journal = {Machine Learning},
Year = {2000},
Number = {2},
Pages = {159-196},
Volume = {40},
Abstract = {MultiBoosting is an extension to the highly successful AdaBoost technique for forming decision committees. MultiBoosting can be viewed as combining AdaBoost with wagging. It is able to harness both AdaBoost's high bias and variance reduction with wagging's superior variance reduction. Using C4.5 as the base learning algorithm, Multi-boosting is demonstrated to produce decision committees with lower error than either AdaBoost or wagging significantly more often than the reverse over a large representative cross-section of UCI data sets. It offers the further advantage over AdaBoost of suiting parallel execution.},
Address = {Netherlands},
Audit-trail = {27/10/03 requested permission to post pp pdf. 28/10/03 Permission granted by Kluwer. PDF posted 30/10/03},
Doi = {10.1023/A:1007659514849},
Keywords = {MultiBoosting and Boosting and Bias-Variance},
Publisher = {Springer},
Related = {multiboosting-and-multi-strategy-ensemble-learning}
}
```

**ABSTRACT** MultiBoosting is an extension to the highly successful AdaBoost technique for forming decision committees. MultiBoosting can be viewed as combining AdaBoost with wagging. It is able to harness both AdaBoost's high bias and variance reduction with wagging's superior variance reduction. Using C4.5 as the base learning algorithm, Multi-boosting is demonstrated to produce decision committees with lower error than either AdaBoost or wagging significantly more often than the reverse over a large representative cross-section of UCI data sets. It offers the further advantage over AdaBoost of suiting parallel execution.

On The Effect of Data Set Size on Bias And Variance in Classification Learning.

Brain, D., & Webb, G. I.

Proceedings of the Fourth Australian Knowledge Acquisition Workshop (AKAW-99), Sydney, pp. 117-128, 1999.

[Bibtex] [Abstract]

```
@InProceedings{BrainWebb99,
Title = {On The Effect of Data Set Size on Bias And Variance in Classification Learning},
Author = {D. Brain and G. I. Webb},
Booktitle = {Proceedings of the Fourth {Australian} Knowledge Acquisition Workshop ({AKAW}-99)},
Year = {1999},
Address = {Sydney},
Editor = {D. Richards and G. Beydoun and A. Hoffmann and P. Compton },
Pages = {117-128},
Publisher = {The University of New South Wales},
Abstract = {With the advent of data mining, machine learning has come of age and is now a critical technology in many businesses. However, machine learning evolved in a different research context to that in which it now finds itself employed. A particularly important problem in the data mining world is working effectively with large data sets. However, most machine learning research has been conducted in the context of learning from very small data sets. To date most approaches to scaling up machine learning to large data sets have attempted to modify existing algorithms to deal with large data sets in a more computationally efficient and effective manner. But is this necessarily the best method? This paper explores the possibility of designing algorithms specifically for large data sets. Specifically, the paper looks at how increasing data set size affects bias and variance error decompositions for classification algorithms. Preliminary results of experiments to determine these effects are presented, showing that, as hypothesized variance can be expected to decrease as training set size increases. No clear effect of training set size on bias was observed. These results have profound implications for data mining from large data sets, indicating that developing effective learning algorithms for large data sets is not simply a matter of finding computationally efficient variants of existing learning algorithms.},
Audit-trail = {*},
Keywords = {Learning from large datasets and Bias-Variance},
Location = {Sydney, Australia},
Related = {learning-from-large-datasets}
}
```

**ABSTRACT** With the advent of data mining, machine learning has come of age and is now a critical technology in many businesses. However, machine learning evolved in a different research context to that in which it now finds itself employed. A particularly important problem in the data mining world is working effectively with large data sets. However, most machine learning research has been conducted in the context of learning from very small data sets. To date most approaches to scaling up machine learning to large data sets have attempted to modify existing algorithms to deal with large data sets in a more computationally efficient and effective manner. But is this necessarily the best method? This paper explores the possibility of designing algorithms specifically for large data sets. Specifically, the paper looks at how increasing data set size affects bias and variance error decompositions for classification algorithms. Preliminary results of experiments to determine these effects are presented, showing that, as hypothesized variance can be expected to decrease as training set size increases. No clear effect of training set size on bias was observed. These results have profound implications for data mining from large data sets, indicating that developing effective learning algorithms for large data sets is not simply a matter of finding computationally efficient variants of existing learning algorithms.