Data Scientist


Filtered Top-k Association Discovery

I have pioneered association discovery techniques that seek the k most useful associations within constraints imposed by user-specified filters, rather then applying the minimum-support constraint more commonly used in association discovery.  Many of these techniques are included in my Magnum Opus association discovery software which is now a core component in BigML.

I argue that minimum support constraints can lead to poor results because minimum support is often of little relevance to how interesting is an association.  It is not feasible to set it low enough to capture all potentially interesting associations.  Nor is it possible to set it high enough to remove uninteresting associations.

I have pioneered sound statistical filters that can apply any of the large number of standard statistical hypothesis tests to filter out associations that are unlikely to be interesting.  These techniques prove very effective in practice.

The OPUSMiner filtered-top-k itemset discovery software can be downloaded here.

Publications