The first grafting algorithm added new branches that did not change the tree’s classification of any training data [Webb, 1996]. This constraint was imposed so as to create as clear a demonstration that it was not true that all other things being equal a less complex classifier should be more accurate than a more complex one. Subsequent algorithms [Webb, 1998, 1999] were intended as practical approaches to machine learning and allowed branches that reclassified training examples that the original tree misclassified.
The original decision tree grafting software was implemented as an extension to C4.5. It can be downloaded here (requires C4.5 release 6). Decision tree grafting is also implemented in the J48Graft component of the Weka machine learning workbench.
Decision tree grafting was first conceived as a way of demonstrating that of two models that perform identically on training data, the simplest is not most likely to have the lowest accuracy. However, it has also proven to be a practical learning algorithm. Indeed, its Weka implementation has been used in a number of scientific applications, including those listed here.