Decision tree grafting

Grafting is a postprocess that adds tests and nodes to existing decision trees, resulting in bagging-like variance reduction while providing a single interpretable decision tree. It was developed as part of my exploration of the use of Occam's Razor in machine learning.

The first grafting algorithm added new branches that did not change the tree's classification of any training data [Webb, 1996]. This constraint was imposed so as to create as clear a demonstration that it was not true that all other things being equal a less complex classifier should be more accurate than a more complex one. Subsequent algorithms [Webb, 1998, 1999] were intended as practical approaches to machine learning and allowed branches that reclassified training examples that the original tree misclassified.

The original decision tree grafting software was implemented as an extension to C4.5. It can be downloaded here (requires C4.5 release 6). Decision tree grafting is also implemented in the J48Graft component of the Weka machine learning workbench.

Decision tree grafting was first conceived as a way of demonstrating that of two models that perform identically on training data, the simplest is not most likely to have the lowest accuracy. However, it has also proven to be a practical learning algorithm. Indeed, its Weka implementation has been used in a number of scientific applications, including those listed here.

Publications

Decision Tree Grafting From The All Tests But One Partition.
Webb, G. I.
Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence (IJCAI 99), San Francisco, pp. 702-707, 1999.
[Bibtex] [Abstract] → Download PDF

@InProceedings{Webb99,
Title = {Decision Tree Grafting From The All Tests But One Partition},
Author = {Webb, G. I.},
Booktitle = {Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence ({IJCAI} 99)},
Year = {1999},
Address = {San Francisco},
Editor = {Dean, T.},
Pages = {702-707},
Publisher = {Morgan Kaufmann},
Abstract = {Decision tree grafting adds nodes to an existing decision tree with the objective of reducing prediction error. A new grafting algorithm is presented that considers one set of training data only for each leaf of the initial decision tree, the set of cases that fail at most one test on the path to the leaf. This new technique is demonstrated to retain the error reduction power of the original grafting algorithm while dramatically reducing compute time and the complexity of the inferred tree. Bias/variance analysis reveal that the original grafting technique operated primarily by variance reduction while the new technique reduces both bias and variance.},
Audit-trail = {PDF posted with the permission of {IJCAI} Inc},
Keywords = {Decision Tree Learning and Decision Tree Grafting and Occams Razor},
Location = {Stockholm, Sweden},
Related = {decision-tree-grafting}
}

ABSTRACT Decision tree grafting adds nodes to an existing decision tree with the objective of reducing prediction error. A new grafting algorithm is presented that considers one set of training data only for each leaf of the initial decision tree, the set of cases that fail at most one test on the path to the leaf. This new technique is demonstrated to retain the error reduction power of the original grafting algorithm while dramatically reducing compute time and the complexity of the inferred tree. Bias/variance analysis reveal that the original grafting technique operated primarily by variance reduction while the new technique reduces both bias and variance.

The Problem of Missing Values in Decision Tree Grafting.
Webb, G. I.
Lecture Notes in Computer Science Vol. 1502: Advanced Topics in Artificial Intelligence, Selected Papers from the Eleventh Australian Joint Conference on Artificial Intelligence (AI '98), Berlin, pp. 273-283, 1998.
[Bibtex] [Abstract] → Download PDF

@InProceedings{Webb98,
Title = {The Problem of Missing Values in Decision Tree Grafting},
Author = {Webb, G. I.},
Booktitle = {Lecture Notes in Computer Science Vol. 1502: Advanced Topics in Artificial Intelligence, Selected Papers from the Eleventh Australian Joint Conference on Artificial Intelligence (AI '98)},
Year = {1998},
Address = {Berlin},
Editor = {Antoniou, G. and Slaney, J.K.},
Pages = {273-283},
Publisher = {Springer-Verlag},
Abstract = {Decision tree grafting adds nodes to inferred decision trees. Previous research has demonstrated that appropriate grafting techniques can improve predictive accuracy across a wide cross-selection of domains. However, previous decision tree grafting systems are demonstrated to have a serious deficiency for some data sets containing missing values. This problem arises due to the method for handling missing values employed by C4.5, in which the grafting systems have been embedded. This paper provides an explanation of and solution to the problem. Experimental evidence is presented of the efficacy of this solution.},
Keywords = {Decision Tree Learning and Decision Tree Grafting and Occams Razor},
Location = {Brisbane, Australia},
Related = {decision-tree-grafting}
}

ABSTRACT Decision tree grafting adds nodes to inferred decision trees. Previous research has demonstrated that appropriate grafting techniques can improve predictive accuracy across a wide cross-selection of domains. However, previous decision tree grafting systems are demonstrated to have a serious deficiency for some data sets containing missing values. This problem arises due to the method for handling missing values employed by C4.5, in which the grafting systems have been embedded. This paper provides an explanation of and solution to the problem. Experimental evidence is presented of the efficacy of this solution.

Decision Tree Grafting.
Webb, G. I.
Proceedings of the Fifteenth International Joint Conference on Artificial Intelligence (IJCAI 97), San Francisco, pp. 846-851, 1997.
[Bibtex] [Abstract] → Download PDF

@InProceedings{Webb97,
Title = {Decision Tree Grafting},
Author = {Webb, G. I.},
Booktitle = {Proceedings of the Fifteenth International Joint Conference on Artificial Intelligence ({IJCAI} 97)},
Year = {1997},
Address = {San Francisco},
Pages = {846-851},
Publisher = {Morgan Kaufmann},
Abstract = {This paper extends recent work on decision tree grafting. Grafting is an inductive process that adds nodes to inferred decision trees. This process is demonstrated to frequently improve predictive accuracy. Superficial analysis might suggest that decision tree grafting is the direct reverse of pruning. To the contrary, it is argued that the two processes are complementary. This is because, like standard tree growing techniques, pruning uses only local information, whereas grafting uses non-local information. The use of both pruning and grafting in conjunction is demonstrated to provide the best general predictive accuracy over a representative selection of learning tasks.},
Audit-trail = {PDF posted with the permission of IJCAI Inc},
Keywords = {Decision Trees and Decision Tree Grafting and Occams Razor},
Location = {Nagoya, Japan},
Related = {decision-tree-grafting}
}

ABSTRACT This paper extends recent work on decision tree grafting. Grafting is an inductive process that adds nodes to inferred decision trees. This process is demonstrated to frequently improve predictive accuracy. Superficial analysis might suggest that decision tree grafting is the direct reverse of pruning. To the contrary, it is argued that the two processes are complementary. This is because, like standard tree growing techniques, pruning uses only local information, whereas grafting uses non-local information. The use of both pruning and grafting in conjunction is demonstrated to provide the best general predictive accuracy over a representative selection of learning tasks.

Further Experimental Evidence Against The Utility Of Occam's Razor.
Webb, G. I.
Journal of Artificial Intelligence Research, 4, 397-417, 1996.
[Bibtex] [Abstract] → Access on publisher site

@Article{Webb96b,
Title = {Further Experimental Evidence Against The Utility Of Occam's Razor},
Author = {Webb, G. I.},
Journal = {Journal of Artificial Intelligence Research},
Year = {1996},
Pages = {397-417},
Volume = {4},
Abstract = {This paper presents new experimental evidence against the utility of Occam's razor. A systematic procedure is presented for post-processing decision trees produced by C4.5. This procedure was derived by rejecting Occam's razor and instead attending to the assumption that similar objects are likely to belong to the same class. It increases a decision tree's complexity without altering the performance of that tree on the training data from which it is inferred. The resulting more complex decision trees are demonstrated to have, on average, for a variety of common learning tasks, higher predictive accuracy than the less complex original decision trees. This result raises considerable doubt about the utility of Occam's razor as it is commonly applied in modern machine learning.},
Address = {Menlo Park, CA},
Doi = {10.1613/jair.228},
Keywords = {Decision Trees and Decision Tree Grafting and Occams Razor},
Publisher = {AAAI Press},
Related = {occams-razor-in-machine-learning}
}

ABSTRACT This paper presents new experimental evidence against the utility of Occam's razor. A systematic procedure is presented for post-processing decision trees produced by C4.5. This procedure was derived by rejecting Occam's razor and instead attending to the assumption that similar objects are likely to belong to the same class. It increases a decision tree's complexity without altering the performance of that tree on the training data from which it is inferred. The resulting more complex decision trees are demonstrated to have, on average, for a variety of common learning tasks, higher predictive accuracy than the less complex original decision trees. This result raises considerable doubt about the utility of Occam's razor as it is commonly applied in modern machine learning.