Generality is predictive of prediction accuracy.

Manipulation of generality, through appropriate generalization and specialization, can modify classifier performance in predictable and useful ways.

Publications

Generality is Predictive of Prediction Accuracy.
Webb, G. I., & Brain, D.
LNAI State-of-the-Art Survey series, 'Data Mining: Theory, Methodology, Techniques, and Applications', Berlin/Heidelberg, pp. 1-13, 2006.
[Bibtex] [Abstract] → Download PDF

@InProceedings{WebbBrain05,
author = {Webb, G. I. and Brain, D.},
booktitle = {LNAI State-of-the-Art Survey series, 'Data Mining: Theory, Methodology, Techniques, and Applications'},
title = {Generality is Predictive of Prediction Accuracy},
year = {2006},
address = {Berlin/Heidelberg},
note = {An earlier version of this paper was published in the Proceedings of PKAW 2002, pp 117-130},
pages = {1-13},
publisher = {Springer},
abstract = {During knowledge acquisition it frequently occurs that multiple alternative potential rules all appear equally credible. This paper addresses the dearth of formal analysis about how to select between such alternatives. It presents two hypotheses about the expected impact of selecting between classification rules of differing levels of generality in the absence of other evidence about their likely relative performance on unseen data. We argue that the accuracy on unseen data of the more general rule will tend to be closer to that of a default rule for the class than will that of the more specific rule. We also argue that in comparison to the more general rule, the accuracy of the more specific rule on unseen cases will tend to be closer to the accuracy obtained on training data. Experimental evidence is provided in support of these hypotheses. These hypotheses can be useful for selecting between rules in order to achieve specific knowledge acquisition objectives.},
keywords = {Generality},
related = {generality-is-predictive-of-prediction-accuracy},
}

ABSTRACT During knowledge acquisition it frequently occurs that multiple alternative potential rules all appear equally credible. This paper addresses the dearth of formal analysis about how to select between such alternatives. It presents two hypotheses about the expected impact of selecting between classification rules of differing levels of generality in the absence of other evidence about their likely relative performance on unseen data. We argue that the accuracy on unseen data of the more general rule will tend to be closer to that of a default rule for the class than will that of the more specific rule. We also argue that in comparison to the more general rule, the accuracy of the more specific rule on unseen cases will tend to be closer to the accuracy obtained on training data. Experimental evidence is provided in support of these hypotheses. These hypotheses can be useful for selecting between rules in order to achieve specific knowledge acquisition objectives.

Generality is Predictive of Prediction Accuracy.
Webb, G. I., & Brain, D.
Proceedings of the 2002 Pacific Rim Knowledge Acquisition Workshop (PKAW'02), Tokyo, pp. 117-130, 2002.
[Bibtex] [Abstract] → Download PDF

@InProceedings{WebbBrain02,
author = {Webb, G. I. and Brain, D.},
booktitle = {Proceedings of the 2002 {Pacific} Rim Knowledge Acquisition Workshop (PKAW'02)},
title = {Generality is Predictive of Prediction Accuracy},
year = {2002},
address = {Tokyo},
editor = {Yamaguchi, T. and Hoffmann, A. and Motoda, H. and Compton, P.},
pages = {117-130},
publisher = {Japanese Society for Artificial Intelligence},
abstract = {There has been a dearth of research into the relative impacts of alternative high level learning biases. This paper presents two hypotheses about the expected impact of selecting between classification rules of differing levels of generality in the absence of other evidence about their likely relative performance on unseen data. It is argued that the accuracy on unseen data of the more general rule will tend to be closer to that of a default rule for the class than will that of the more specific rule. It is also argued that the accuracy on unseen cases of the more specific rule will tend to be closer to the accuracy obtained on training data than will the accuracy of the more general rule. Experimental evidence is provided in support of these hypotheses. We argue that these hypotheses can be of use in selecting appropriate learning biases to achieve specific learning objectives.},
keywords = {Generality},
location = {Tokyo, Japan},
related = {generality-is-predictive-of-prediction-accuracy},
}

ABSTRACT There has been a dearth of research into the relative impacts of alternative high level learning biases. This paper presents two hypotheses about the expected impact of selecting between classification rules of differing levels of generality in the absence of other evidence about their likely relative performance on unseen data. It is argued that the accuracy on unseen data of the more general rule will tend to be closer to that of a default rule for the class than will that of the more specific rule. It is also argued that the accuracy on unseen cases of the more specific rule will tend to be closer to the accuracy obtained on training data than will the accuracy of the more general rule. Experimental evidence is provided in support of these hypotheses. We argue that these hypotheses can be of use in selecting appropriate learning biases to achieve specific learning objectives.

Cost Sensitive Specialisation.
Webb, G. I.
Lecture Notes in Computer Science Vol. 1114. Topics in Artificial Intelligence: Proceedings of the Fourth Pacific Rim International Conference on Artificial Intelligence (PRICAI'96), Berlin/Heidelberg, pp. 23-34, 1996.
[Bibtex] [Abstract] → Download PDF

@InProceedings{Webb96d,
Title = {Cost Sensitive Specialisation},
Author = {Webb, G. I.},
Booktitle = {Lecture Notes in Computer Science Vol. 1114. Topics in Artificial Intelligence: Proceedings of the Fourth {Pacific} Rim International Conference on Artificial Intelligence (PRICAI'96)},
Year = {1996},
Address = {Berlin/Heidelberg},
Editor = {Foo, N.Y. and Goebel, R.},
Pages = {23-34},
Publisher = {Springer-Verlag},
Abstract = {Cost-sensitive specialization is a generic technique for misclassification cost sensitive induction. This technique involves specializing aspects of a classifier associated with high misclassification costs and generalizing those associated with low misclassification costs. It is widely applicable and simple to implement. It could be used to augment the effect of standard cost-sensitive induction techniques. It should directly extend to test application cost sensitive induction tasks. Experimental evaluation demonstrates consistent positive effects over a range of misclassification cost sensitive learning tasks.},
Keywords = {Cost Sensitive Learning and Generality},
Location = {Cairns, Australia},
Related = {generality-is-predictive-of-prediction-accuracy}
}

ABSTRACT Cost-sensitive specialization is a generic technique for misclassification cost sensitive induction. This technique involves specializing aspects of a classifier associated with high misclassification costs and generalizing those associated with low misclassification costs. It is widely applicable and simple to implement. It could be used to augment the effect of standard cost-sensitive induction techniques. It should directly extend to test application cost sensitive induction tasks. Experimental evaluation demonstrates consistent positive effects over a range of misclassification cost sensitive learning tasks.

Generality Is More Significant Then Complexity: Toward An Alternative To Occams Razor.
Webb, G. I.
Artificial Intelligence: Sowing the Seeds for the Future, Proceedings of Seventh Australian Joint Conference on Artificial Intelligence (AI'94), Singapore, pp. 60-67, 1994.
[Bibtex] [Abstract] → Download PDF

@InProceedings{Webb94b,
Title = {Generality Is More Significant Then Complexity: Toward An Alternative To Occams Razor},
Author = {Webb, G. I.},
Booktitle = {Artificial Intelligence: Sowing the Seeds for the Future, Proceedings of Seventh Australian Joint Conference on Artificial Intelligence (AI'94)},
Year = {1994},
Address = {Singapore},
Editor = {Zhang, C. and Debenham, J. and Lukose, D.},
Pages = {60-67},
Publisher = {World Scientific},
Abstract = {Occam's Razor is widely employed in machine learning to select between classifiers with equal empirical support. This paper presents the theorem of decreasing inductive power: that, all other things being equal, if two classifiers a and b cover identical cases from the training set and a is a generalisation of b, a has higher probability than b of misclassifying a previously unsighted case. This theorem suggests that, to the contrary of Occam's Razor, generality, not complexity, should be used to select between classifiers with equal empirical support. Two studies are presented. The first study demonstrates that the theorem of decreasing inductive power holds for a number of commonly studied learning problems and for a number of different means of manipulating classifier generality. The second study demonstrates that generality provides a more consistent indicator of predictive accuracy in the context of a default rule than does complexity. These results suggest that the theorem of decreasing predictive power provides a suitable theoretical framework for the development of learning biases for use in selecting between classifiers with identical empirical support},
Keywords = {Occams Razor and Rule Learning and Generality},
Location = {Armidale,NSW, Australia},
Related = {occams-razor-in-machine-learning}
}

ABSTRACT Occam's Razor is widely employed in machine learning to select between classifiers with equal empirical support. This paper presents the theorem of decreasing inductive power: that, all other things being equal, if two classifiers a and b cover identical cases from the training set and a is a generalisation of b, a has higher probability than b of misclassifying a previously unsighted case. This theorem suggests that, to the contrary of Occam's Razor, generality, not complexity, should be used to select between classifiers with equal empirical support. Two studies are presented. The first study demonstrates that the theorem of decreasing inductive power holds for a number of commonly studied learning problems and for a number of different means of manipulating classifier generality. The second study demonstrates that generality provides a more consistent indicator of predictive accuracy in the context of a default rule than does complexity. These results suggest that the theorem of decreasing predictive power provides a suitable theoretical framework for the development of learning biases for use in selecting between classifiers with identical empirical support