Data Scientist

Association discovery

Association discovery includes association mining, pattern mining, association rule discovery, subgroup discovery, emerging pattern discovery and contrast discovery.

I have pioneered association discovery techniques that seek the most useful associations, rather than applying the minimum-support constraint more commonly used in the field.  Many of these techniques are included in my Magnum Opus software which is now incorporated in BigML.  Magnum Opus has been widely used in scientific research.

Due to the large numbers of patterns that are considered in association discovery, most techniques typically suffer large risk of type-1 error, that is, of finding patterns that appear interesting only due to chance artifacts of the process by which the sample data were generated.  Most attempts to control this risk have done so at the cost of high risk of type-2 error, that is, of falsely rejecting non-spurious patterns.  I have pioneered strategies for statistically sound pattern discovery, strictly controlling type-1 error during association discovery without the level of risk of type-2 error suffered by previous approaches.

The OPUSMiner statistically sound itemset discovery software can be downloaded here. An R package is available here.

The Skopus statistically sound sequential pattern discovery software can be downloaded here.

ACM SIGKDD 2014 Tutorial on Statistically Sound Pattern Discovery, with Wilhelmiina Hämäläinen.