Data Scientist

Statistically sound association discovery

Association discovery includes association rule discovery, k-optimal rule discovery, emerging pattern discovery and contrast discovery.  These methods explore large pattern spaces to identify all patterns that satisfy some user-specified criteria with respect to given data.   Due to the large numbers of patterns that are considered, they typically suffer large risk of type-1 error, that is, of finding patterns that appear interesting only due to chance artifacts of the process by which the sample data were generated.  Most attempts to control this risk have done so at the cost of high risk of type-2 error, that is, of falsely rejecting non-spurious patterns.  I have developed strategies for strictly controlling type-1 error during association discovery without the level of risk of type-2 error suffered by previous approaches.  Many of these techniques are included in my Magnum Opus association discovery software which is now a core component in BigML.

The OPUSMiner statistically sound itemset discovery software can be downloaded here.

The Skopus statistically sound sequential pattern discovery software can be downloaded here.

ACM SIGKDD 2014 Tutorial, with Wilhelmiina Hämäläinen.