Data Scientist


Impact Rules

Impact Rules [also known as quantitative association rules] provide analysis similar to association rules except that the target is a distribution on a numeric value.  Impact Rules support data segmentation for optimisation of a numeric outcome

Publications

Efficiently Identifying Exploratory Rules’ Significance.
Huang, S., & Webb, G. I.
LNAI State-of-the-Art Survey series, ‘Data Mining: Theory, Methodology, Techniques, and Applications’, Berlin/Heidelberg, pp. 64-77, 2006.
[DOI] [Bibtex] [Abstract]

@InProceedings{HuangWebb05b,
Title = {Efficiently Identifying Exploratory Rules' Significance},
Author = {S. Huang and G.I. Webb},
Booktitle = {LNAI State-of-the-Art Survey series, 'Data Mining: Theory, Methodology, Techniques, and Applications'},
Year = {2006},
Address = {Berlin/Heidelberg},
Note = {An earlier version of this paper was published in S.J. Simoff and G.J. Williams (Eds.), Proceedings of the Third Australasian Data Mining Conference (AusDM04) Cairns, Australia. Sydney: University of Technology, pages 169-182.},
Pages = {64-77},
Publisher = {Springer},
Abstract = {How to efficiently discard potentially uninteresting rules in exploratory rule discovery is one of the important research foci in data mining. Many researchers have presented algorithms to automatically remove potentially uninteresting rules utilizing background knowledge and user-specified constraints. Identifying the significance of exploratory rules using a significance test is desirable for removing rules that may appear interesting by chance, hence providing the users with a more compact set of resulting rules. However, applying statistical tests to identify significant rules requires considerable computation and data access in order to obtain the necessary statistics. The situation gets worse as the size of the database increases. In this paper, we propose two approaches for improving the efficiency of significant exploratory rule discovery. We also evaluate the experimental effect in impact rule discovery which is suitable for discovering exploratory rules in very large, dense databases.},
Doi = {10.1007/11677437_6},
Keywords = {Association Rule Discovery and statistically sound discovery and OPUS and Impact Rules},
Related = {statistically-sound-association-discovery}
}
ABSTRACT How to efficiently discard potentially uninteresting rules in exploratory rule discovery is one of the important research foci in data mining. Many researchers have presented algorithms to automatically remove potentially uninteresting rules utilizing background knowledge and user-specified constraints. Identifying the significance of exploratory rules using a significance test is desirable for removing rules that may appear interesting by chance, hence providing the users with a more compact set of resulting rules. However, applying statistical tests to identify significant rules requires considerable computation and data access in order to obtain the necessary statistics. The situation gets worse as the size of the database increases. In this paper, we propose two approaches for improving the efficiency of significant exploratory rule discovery. We also evaluate the experimental effect in impact rule discovery which is suitable for discovering exploratory rules in very large, dense databases.

Pruning Derivative Partial Rules During Impact Rule Discovery.
Huang, S., & Webb, G. I.
Lecture Notes in Computer Science Vol. 3518: Proceedings of the 9th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining (PAKDD 2005), Berlin/Heidelberg, pp. 71-80, 2005.
[PDF] [Bibtex] [Abstract]

@InProceedings{HuangWebb05a,
Title = {Pruning Derivative Partial Rules During Impact Rule Discovery},
Author = {S. Huang and G.I. Webb},
Booktitle = {Lecture Notes in Computer Science Vol. 3518: Proceedings of the 9th {Pacific}-{Asia} Conference on Advances in Knowledge Discovery and Data Mining (PAKDD 2005)},
Year = {2005},
Address = {Berlin/Heidelberg},
Editor = {T.B. Ho and D. Cheung and H. Liu },
Pages = {71-80},
Publisher = {Springer},
Abstract = {Because exploratory rule discovery works with data that is only a sample of the phenomena to be investigated, some resulting rules may appear interesting only by chance. Techniques are developed for automatically discarding statistically insignificant exploratory rules that cannot survive a hypothesis with regard to its ancestors. We call such insignificant rules derivative extended rules. In this paper, we argue that there is another type of derivative exploratory rules, which is derivative with regard to their children. We also argue that considerable amount of such derivative partial rules can not be successfully removed using existing rule pruning techniques. We propose a new technique to address this problem. Experiments are done in impact rule discovery to evaluate the effect of this derivative partial rule filter. Results show that the inherent problem of too many resulting rules in exploratory rule discovery is alleviated.},
Keywords = {Association Rule Discovery and statistically sound discovery and OPUS and Impact Rules},
Location = {Hanoi, Vietnam},
Related = {impact-rules}
}
ABSTRACT Because exploratory rule discovery works with data that is only a sample of the phenomena to be investigated, some resulting rules may appear interesting only by chance. Techniques are developed for automatically discarding statistically insignificant exploratory rules that cannot survive a hypothesis with regard to its ancestors. We call such insignificant rules derivative extended rules. In this paper, we argue that there is another type of derivative exploratory rules, which is derivative with regard to their children. We also argue that considerable amount of such derivative partial rules can not be successfully removed using existing rule pruning techniques. We propose a new technique to address this problem. Experiments are done in impact rule discovery to evaluate the effect of this derivative partial rule filter. Results show that the inherent problem of too many resulting rules in exploratory rule discovery is alleviated.

Discarding Insignificant Rules During Impact Rule Discovery in Large, Dense Databases.
Huang, S., & Webb, G. I.
Proceedings of the Fifth SIAM International Conference on Data Mining (SDM’05) [short paper], Philadelphia, PA, pp. 541-545, 2005.
[PDF] [Bibtex] [Abstract]

@InProceedings{HuangWebb05,
Title = {Discarding Insignificant Rules During Impact Rule Discovery in Large, Dense Databases},
Author = {S. Huang and G.I. Webb},
Booktitle = {Proceedings of the Fifth {SIAM} International Conference on Data Mining ({SDM}'05) [short paper]},
Year = {2005},
Address = {Philadelphia, PA},
Editor = {H. Kargupta and C. Kamath and J. Srivastava and A. Goodman},
Pages = {541-545},
Publisher = {Society for Industrial and Applied Mathematics},
Abstract = {Considerable progress has been made on how to reduce the number of spurious exploratory rules with quantitative attributes. However, little has been done for rules with undiscretized quantitative attributes. It is argued that propositional rules can not effectively describe the interactions between quantitative and qualitative attributes. Aumann and Lindell proposed quantitative association rules to provide a better description of such relationship, together with a rule pruning techniques . Since their technique is based on the frequent itemset framework, it is not suitable for rule discovery in large, dense databases. In this paper, an efficient technique for automatically discarding insignificant rules during rule discovery is proposed, based on the OPUS search algorithm. Experiments demonstrate that the algorithm we propose can efficiently remove potentially uninteresting rules even in very large, dense databases.},
Audit-trail = {Shiying travelling to present paper. Requested permission to post pdf 10/2},
Keywords = {Association Rule Discovery and statistically sound discovery and OPUS and Impact Rules},
Location = {Newport Beach, CA},
Related = {impact-rules}
}
ABSTRACT Considerable progress has been made on how to reduce the number of spurious exploratory rules with quantitative attributes. However, little has been done for rules with undiscretized quantitative attributes. It is argued that propositional rules can not effectively describe the interactions between quantitative and qualitative attributes. Aumann and Lindell proposed quantitative association rules to provide a better description of such relationship, together with a rule pruning techniques . Since their technique is based on the frequent itemset framework, it is not suitable for rule discovery in large, dense databases. In this paper, an efficient technique for automatically discarding insignificant rules during rule discovery is proposed, based on the OPUS search algorithm. Experiments demonstrate that the algorithm we propose can efficiently remove potentially uninteresting rules even in very large, dense databases.

Discovering Associations with Numeric Variables.
Webb, G. I.
Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2001)[short paper], New York, pp. 383-388, 2001.
[PDF] [URL] [Bibtex] [Abstract]

@InProceedings{Webb01a,
Title = {Discovering Associations with Numeric Variables},
Author = {G. I. Webb},
Booktitle = {Proceedings of the Seventh {ACM} {SIGKDD} International Conference on Knowledge Discovery and Data Mining (KDD-2001)[short paper]},
Year = {2001},
Address = {New York},
Editor = {F. Provost and R. Srikant},
Pages = {383-388},
Publisher = {The Association for Computing Machinery},
Abstract = {This paper further develops Aumann and Lindell's [3] proposal for a variant of association rules for which the consequent is a numeric variable. It is argued that these rules can discover useful interactions with numeric data that cannot be discovered directly using traditional association rules with discretization. Alternative measures for identifying interesting rules are proposed. Efficient algorithms are presented that enable these rules to be discovered for dense data sets for which application of Auman and Lindell's algorithm is infeasible.},
Audit-trail = {*},
Keywords = {Impact Rules and OPUS and Association Rule Discovery},
Location = {San Francisco, CA},
Related = {impact-rules},
Url = {http://dl.acm.org/authorize?19861}
}
ABSTRACT This paper further develops Aumann and Lindell's [3] proposal for a variant of association rules for which the consequent is a numeric variable. It is argued that these rules can discover useful interactions with numeric data that cannot be discovered directly using traditional association rules with discretization. Alternative measures for identifying interesting rules are proposed. Efficient algorithms are presented that enable these rules to be discovered for dense data sets for which application of Auman and Lindell's algorithm is infeasible.