Data Scientist


Author: Geoff Webb

Finding real associations with R

Our OPUS Miner package is now available in R. It finds statistically significant complex interactions in data. We would value you feedback. It can be downloaded from https://cran.r-project.org/package=opusminer.

Elected to ACM SIGKDD Board of Directors

I am honoured to have been elected to the Board of the ACM Special Interest Group on Knowledge Discovery and Data Mining (SIGKDD).  I am looking forward to working with …

Encyclopedia of Machine Learning and Data Mining

We are delighted to have the second edition of the highly successful Encyclopedia of Machine Learning go live. The revised and expanded second edition has been re-titled the Encyclopedia of …

Usama Fayyad and my keynote addresses at Practical Big Data 2017

Usama Fayyad and my keynote addresses at Practical Big Data 2017. My talk starts at 1:11:00. Posted by Geoff Webb on Thursday, February 2, 2017

Encyclopedia of Machine Learning still most downloaded Springer Reference

It is good to see that the first edition of our Encyclopedia of Machine Learning is still serving the community. #SpringerRefCountdown! #1 most downloaded entry last month: https://t.co/hZ2j7FJQmN from our …

Two awards in one week!

I am honoured to have received the Australian Computer Society’s ICT Researcher of the Year Award and the Australasian Artificial Intelligence Distinguished Research Contributions Award.

Fun KDD video

https://www.youtube.com/watch?v=FBlhhebFhTI

CfP ECMLPKDD Workshop on Statistically Sound Data Mining, due July 15

Join us in Riva del Garda on Monday, September 19, 2016 for the Second ECMLPKDD Workshop on Statistically Sound Data Mining. The proceedings will be published in the JMLR: Workshop and …

Statistical testing of hypothesis streams and cascades

Statistical hypothesis testing was developed in an age when calculations were performed by hand and individual hypotheses were considered in isolation. In the modern information era, it is often desirable …

Did you realise that it is dangerous to assume data are interval scale?

Many machine learning algorithms make an implicit assumption that numeric data are interval scale, specifically, that a unit difference between values has the same meaning irrespective of the magnitude of …

Variety in recent publications

I have had a nice variety of recent papers! Concept drift: Characterizing Concept Drift. Webb, G. I., Hyde, R., Cao, H., Nguyen, H. L., & Petitjean, F.Data Mining and Knowledge …