About Geoff Webb

Data Scientist

The Australian Research Council has funded the following two projects:

Electronic skin nanopatches for continuous blood pressure monitoring
Investigators:
Prof Wenlong Cheng (Chief Investigator)
Prof Andrew Tonkin (Chief Investigator)
A/Prof Bing Wang (Chief Investigator)
Prof Geoffrey Webb (Chief Investigator)
Dr Stephen Wang (Chief Investigator)
Prof David Kaye (Partner Investigator)
Dr Yijia Li (Partner Investigator)
Mr Paul Carboon (Partner Investigator)
Summary: This project aims to develop soft, thin, wearable and non-invasive heart health monitors that continuously monitor blood pressures anytime anywhere, using an electronic skin technology platform with the world’s thinnest gold nanowires. Nanotechnologists, electrical engineers, clinicians, information technologists and industrial designers will collaborate to develop blood pressure correlation algorithms and evaluate sensing performances. New knowledge and commercial technologies will make Australian medical technology industries competitive global leaders in wearable technology industries.
Funding: $380,000

Legal and social dynamics of ebook lending in Australia’s public libraries
Investigators:
Dr Rebecca Giblin (Chief Investigator)
A/Prof Kimberlee Weatherall (Chief Investigator)
Prof Julian Thomas (Chief Investigator)
Prof Geoffrey Webb (Chief Investigator)
Summary: This project aims to develop an evidence base of quantitative and qualitative data about how eBooks are used in libraries. EBooks have tremendous beneficial potential, particularly for Australians in remote areas and those with impaired mobility or vision. However, libraries’ rights to acquire and lend them are more restricted than for physical books. Libraries and legal, social and data science researchers will investigate eBook lending practices and understand their social impacts. The project will identify ways of reforming policy, law, and practice to help libraries fulfil their public interest missions. This project is expected to enable libraries to extract more value from existing public investments.
Funding: $252,000

Since 1999, Magnum Opus has been a leading data mining tool making association discovery better and faster for everyone.

BigML is the best platform for Machine Learning on the internet.

G.I. Webb & Associates are excited to be partnering with BigML to provide the best tools and environment for association discovery.

As a result, G.I. Webb & Associates are no longer offering new Magnum Opus  licenses or downloads. We will continue supporting our licensees as usual.

BigML.com

 

1

SDM-awardWe are delighted to receive the SDM15 Best Research Paper Honorable Mention award.

The Society for Industrial and Applied Math (SIAM) International Conference on Data Mining (SDM15) Awards Committee selected 4 papers for awards from nearly 400 submissions.

View the presentation here.

And here is a link to the paper and its bibliographic details:

    [URL] Petitjean, F., & Webb, G. I. (2015). Scaling log-linear analysis to datasets with thousands of variables. Proceedings of the 2015 SIAM International Conference on Data Mining, pp. 469-477.
    [Bibtex] [Abstract]  → Related papers and software

    @InProceedings{PetitjeanWebb15,
    Title = {Scaling log-linear analysis to datasets with thousands of variables},
    Author = {F. Petitjean and G.I. Webb},
    Booktitle = {Proceedings of the 2015 {SIAM} International Conference on Data Mining},
    Year = {2015},
    Pages = {469-477},
    Abstract = {Association discovery is a fundamental data mining task. The primary statistical approach to association discovery between variables is log-linear analysis. Classical approaches to log-linear analysis do not scale beyond about ten variables. We have recently shown that, if we ensure that the graph supporting the log-linear model is chordal, log-linear analysis can be applied to datasets with hundreds of variables without sacrificing the statistical soundness [21]. However, further scalability remained limited, because state-of-the-art techniques have to examine every edge at every step of the search. This paper makes the following contributions: 1) we prove that only a very small subset of edges has to be considered at each step of the search; 2) we demonstrate how to efficiently find this subset of edges and 3) we show how to efficiently keep track of the best edges to be subsequently added to the initial model. Our experiments, carried out on real datasets with up to 2000 variables, show that our contributions make it possible to gain about 4 orders of magnitude, making log-linear analysis of datasets with thousands of variables possible in seconds instead of days.},
    Comment = {Best Research Paper Honorable Mention Award},
    Keywords = {Association Rule Discovery and statistically sound discovery and scalable graphical models and Learning from large datasets and DP140100087},
    Related = {scalable-graphical-modeling},
    Url = {http://epubs.siam.org/doi/pdf/10.1137/1.9781611974010.53}
    }
    ABSTRACT Association discovery is a fundamental data mining task. The primary statistical approach to association discovery between variables is log-linear analysis. Classical approaches to log-linear analysis do not scale beyond about ten variables. We have recently shown that, if we ensure that the graph supporting the log-linear model is chordal, log-linear analysis can be applied to datasets with hundreds of variables without sacrificing the statistical soundness [21]. However, further scalability remained limited, because state-of-the-art techniques have to examine every edge at every step of the search. This paper makes the following contributions: 1) we prove that only a very small subset of edges has to be considered at each step of the search; 2) we demonstrate how to efficiently find this subset of edges and 3) we show how to efficiently keep track of the best edges to be subsequently added to the initial model. Our experiments, carried out on real datasets with up to 2000 variables, show that our contributions make it possible to gain about 4 orders of magnitude, making log-linear analysis of datasets with thousands of variables possible in seconds instead of days.

View our panel on Video Lectures:


A Data Scientist’s Guide to Making Money from Start-ups
Geoff Webb, Foster Provost, Ron Bekkerman, Oren Etzioni, Usama Fayyad, Claudia Perlich

We also wrote a paper based on the panel discussion:

    [URL] Provost, F., Webb, G. I., Bekkerman, R., Etzioni, O., Fayyad, U., & Perlich, C. (2014). A Data Scientist's Guide to Start-Ups. Big Data, 2(3), 117-128.
    [Bibtex] [Abstract]

    @Article{ProvostEtAl14,
    Title = {A Data Scientist's Guide to Start-Ups},
    Author = {F. Provost and G. I. Webb and R. Bekkerman and O. Etzioni and U. Fayyad and C. Perlich},
    Journal = {Big Data},
    Year = {2014},
    Number = {3},
    Pages = {117-128},
    Volume = {2},
    Abstract = {In August 2013, we held a panel discussion at the KDD 2013 conference in Chicago on the subject of data science, data scientists, and start-ups. KDD is the premier conference on data science research and practice. The panel discussed the pros and cons for top-notch data scientists of the hot data science start-up scene. In this article, we first present background on our panelists. Our four panelists have unquestionable pedigrees in data science and substantial experience with start-ups from multiple perspectives (founders, employees, chief scientists, venture capitalists). For the casual reader, we next present a brief summary of the experts' opinions on eight of the issues the panel discussed. The rest of the article presents a lightly edited transcription of the entire panel discussion.},
    Keywords = {Big Data},
    Url = {http://dx.doi.org/10.1089/big.2014.0031}
    }
    ABSTRACT In August 2013, we held a panel discussion at the KDD 2013 conference in Chicago on the subject of data science, data scientists, and start-ups. KDD is the premier conference on data science research and practice. The panel discussed the pros and cons for top-notch data scientists of the hot data science start-up scene. In this article, we first present background on our panelists. Our four panelists have unquestionable pedigrees in data science and substantial experience with start-ups from multiple perspectives (founders, employees, chief scientists, venture capitalists). For the casual reader, we next present a brief summary of the experts' opinions on eight of the issues the panel discussed. The rest of the article presents a lightly edited transcription of the entire panel discussion.