Feature Construction

Feature construction (also known as constructive induction or attribute discovery) enriches data by adding derived features. These can enrich a data analysis pipeline by capturing relevant relationships within the data that downstream processes are otherwise unable to model or exploit. They may also support explainable AI by making relationships explicit that would otherwise be implicit and difficult to comprehend.

Our pioneering research demonstrated that feature construction can empower machine learning systems to construct more accurate models across a wide range of learning tasks.

Publications

Empirical Function Attribute Construction in Classification Learning.
Yip, S., & Webb, G. I.
Artificial Intelligence: Sowing the Seeds for the Future, Proceedings of Seventh Australian Joint Conference on Artificial Intelligence (AI'94), Singapore, pp. 29-36, 1994.
[Bibtex] [Abstract]

@InProceedings{YipWebb94a,
Title = {Empirical Function Attribute Construction in Classification Learning},
Author = {S. Yip and G. I. Webb},
Booktitle = {Artificial Intelligence: Sowing the Seeds for the Future, Proceedings of Seventh Australian Joint Conference on Artificial Intelligence (AI'94)},
Year = {1994},
Address = {Singapore},
Editor = {C. Zhang and J. Debenham and D. Lukose},
Pages = {29-36},
Publisher = {World Scientific},
Abstract = {The merits of incorporating feature construction to assist selective induction in learning hard concepts are well documented. This paper introduces the notion of function attributes and reports a method of incorporating functional regularities in classifiers. Training sets are preprocessed with this method before submission to a selective induction classification learning system. The method, referred to as FAFA (function attribute finding), is characterised by finding bivariate functions that contribute to the discrimination between classes and then transforming them to function attributes as additional attributes of the data set. The value of each function attribute equals the deviation of each example from the value obtained by applying that function to the example. The expanded data set is then submitted to classification learning. Evaluation with published and artificial data shows that this method can improve classifiers in terms of predictive accuracy and complexity.},
Keywords = {Constructive Induction},
Location = {Armidale,NSW, Australia},
Related = {feature-construction}
}

ABSTRACT The merits of incorporating feature construction to assist selective induction in learning hard concepts are well documented. This paper introduces the notion of function attributes and reports a method of incorporating functional regularities in classifiers. Training sets are preprocessed with this method before submission to a selective induction classification learning system. The method, referred to as FAFA (function attribute finding), is characterised by finding bivariate functions that contribute to the discrimination between classes and then transforming them to function attributes as additional attributes of the data set. The value of each function attribute equals the deviation of each example from the value obtained by applying that function to the example. The expanded data set is then submitted to classification learning. Evaluation with published and artificial data shows that this method can improve classifiers in terms of predictive accuracy and complexity.

Incorporating Canonical Discriminate Attributes in Classification Learning.
Yip, S., & Webb, G. I.
Proceedings of the Tenth Biennial Canadian Artificial Intelligence Conference(AI-94), San Francisco, pp. 63-70, 1994.
[Bibtex] [Abstract]

@InProceedings{YipWebb94b,
Title = {Incorporating Canonical Discriminate Attributes in Classification Learning},
Author = {S. Yip and G. I. Webb},
Booktitle = {Proceedings of the Tenth Biennial Canadian Artificial Intelligence Conference(AI-94)},
Year = {1994},
Address = {San Francisco},
Editor = {R. Elio},
Pages = {63-70},
Publisher = {Morgan Kaufmann},
Abstract = {This paper describes a method for incorporating canonical discriminant attributes in classification machine learning. Though decision trees and rules have semantic appeal when building expert systems, the merits of discriminant analysis are well documented. For data sets on which discriminant analysis obtains significantly better predictive accuracy than symbolic machine learning, the incorporation of canonical discriminant attributes can benefit machine learning. The process starts by applying canonical discriminant analysis to the training set. The canonical discriminant attributes are included as additional attributes. The expanded data set is then subjected to machine learning. This enables linear combinations of numeric attributes to be incorporated in the classifiers that are learnt. Evaluation on the data sets on which discriminant analysis performs better than most machine learning systems, such as the Iris flowers and Waveform data sets, shows that incorporating the power of discriminant analysis in machine classification learning can significantly improve the predictive accuracy and reduce the complexity of classifiers induced by machine learning systems.},
Keywords = {Constructive Induction},
Location = {Banff, Canada},
Related = {feature-construction}
}

ABSTRACT This paper describes a method for incorporating canonical discriminant attributes in classification machine learning. Though decision trees and rules have semantic appeal when building expert systems, the merits of discriminant analysis are well documented. For data sets on which discriminant analysis obtains significantly better predictive accuracy than symbolic machine learning, the incorporation of canonical discriminant attributes can benefit machine learning. The process starts by applying canonical discriminant analysis to the training set. The canonical discriminant attributes are included as additional attributes. The expanded data set is then subjected to machine learning. This enables linear combinations of numeric attributes to be incorporated in the classifiers that are learnt. Evaluation on the data sets on which discriminant analysis performs better than most machine learning systems, such as the Iris flowers and Waveform data sets, shows that incorporating the power of discriminant analysis in machine classification learning can significantly improve the predictive accuracy and reduce the complexity of classifiers induced by machine learning systems.

Discriminate Attribute Finding in Classification Learning.
Yip, S., & Webb, G. I.
Proceedings of the Fifth Australian Joint Conference on Artificial Intelligence (AI'92), Singapore, pp. 374-379, 1992.
[Bibtex] [Abstract]

@InProceedings{YipWebb92b,
Title = {Discriminate Attribute Finding in Classification Learning},
Author = {S. Yip and G. I. Webb},
Booktitle = {Proceedings of the Fifth Australian Joint Conference on Artificial Intelligence (AI'92)},
Year = {1992},
Address = {Singapore},
Editor = {A. Adams and L. Sterling},
Pages = {374-379},
Publisher = {World Scientific},
Abstract = {This paper describes a method for extending domain models in classification learning by deriving new attributes from existing ones. The process starts by examining examples of different classes which have overlapping ranges in all of their numeric attribute values. Based on existing attributes, new attributes which enhance the distinguishability of a class are created. These additional attributes are then used in the subsequent classification learning process. The research revealed that this method can enable relationships between attributes to be incorporated in the classification procedures and, depending on the nature of data, significantly increase the coverage of class descriptions, improve the accuracy of classifying novel instances and reduce the number of clauses in class description when compared to classification learning alone. Evaluation with the data on iris flower classification showed that the classification accuracy is slightly improved and the number of clauses in the class description is significantly reduced.},
Keywords = {Constructive Induction},
Location = {Hobart, Tas., Australia},
Related = {feature-construction}
}

ABSTRACT This paper describes a method for extending domain models in classification learning by deriving new attributes from existing ones. The process starts by examining examples of different classes which have overlapping ranges in all of their numeric attribute values. Based on existing attributes, new attributes which enhance the distinguishability of a class are created. These additional attributes are then used in the subsequent classification learning process. The research revealed that this method can enable relationships between attributes to be incorporated in the classification procedures and, depending on the nature of data, significantly increase the coverage of class descriptions, improve the accuracy of classifying novel instances and reduce the number of clauses in class description when compared to classification learning alone. Evaluation with the data on iris flower classification showed that the classification accuracy is slightly improved and the number of clauses in the class description is significantly reduced.

Function Finding in Classification Learning.
Yip, S., & Webb, G. I.
Proceedings of the Second Pacific Rim International Conference on Artificial Intelligence (PRICAI '92), Berlin, pp. 555-561, 1992.
[Bibtex] [Abstract]

@InProceedings{YipWebb92a,
Title = {Function Finding in Classification Learning},
Author = {S. Yip and G. I. Webb},
Booktitle = {Proceedings of the Second {Pacific} Rim International Conference on Artificial Intelligence (PRICAI '92)},
Year = {1992},
Address = {Berlin},
Pages = {555-561},
Publisher = {Springer-Verlag},
Abstract = {The paper describes a method for extending domain models in classification learning by deriving new attributes from existing attributes. The process starts by finding functional regularities within each class. Such regularities are then treated as additional attributes in the subsequent classification learning process. The research revealed that these techniques can reduce the number of clauses required to describe each class, enable functional regularities between attributes to be incorporated in the classification procedures and, depending on the nature of data, significantly increase the coverage of class descriptions and improve the accuracy of classifying novel instances when compared to classification learning alone.},
Keywords = {Constructive Induction},
Location = {Seoul, Korea},
Related = {feature-construction}
}

ABSTRACT The paper describes a method for extending domain models in classification learning by deriving new attributes from existing attributes. The process starts by finding functional regularities within each class. Such regularities are then treated as additional attributes in the subsequent classification learning process. The research revealed that these techniques can reduce the number of clauses required to describe each class, enable functional regularities between attributes to be incorporated in the classification procedures and, depending on the nature of data, significantly increase the coverage of class descriptions and improve the accuracy of classifying novel instances when compared to classification learning alone.