Geoff Webb: Data Science Software

Magnum Opus is commercial association discovery software that implements many of my association discovery techniques. It is now a core component in BigML.

OPUS Miner is an open source implementation of the OPUS Miner algorithm which applies OPUS search for Filtered Top-k Association Discovery of Self-Sufficient Itemsets. An R package is available here.

An implementation of impact rules can be downloaded here.

Chordalysis implements our approaches to scalable learning of graphical models.

We have contributed numerous components to the Weka machine learning workbench. These include:

AnDE: averaged n-dependence estimators, an efficient technique for relaxing the attribute-independence assumption of naive Bayes. [papers]
AODE: averaged one-dependence estimators, AnDE with n=1. [papers]
AODEsr: AODE with subsumption resolution. [papers]
BVDecomposeSegCVSub: Bias-variance decomposition using the sub-sampled cross-validation procedure. [paper]
J48Graft: adds grafting to J48. [ papers ]
LBR: lazy Bayesian rules, a lazy learning approach to lessening the attribute-independence assumption of naive Bayes. [papers]
MultiBoostAB: an ensemble learning technique that combines boosting and bagging, attaining much of the former's superior bias reduction together with much of the latter's superior variance reduction. [papers]
PKIDiscretize: proportional k-interval discretization, a discretization technique for naive Bayes. [papers]
WANBIA: a system that uses naive Bayes to precondition logistic regression [ papers ]

SKDB is an open source C++ implementation of Selective KDB. A refinement that uses Hierarchical Dirichlet Processes to obtain exceptional predictive accuracy can be downloaded here.

SASANDE is an open source C++ implementation of Sample-based Selective Attribute ANDE.

ALR is an open source implementation of our big models for big data learning algorithm.

EBNC implements our algorithms for Efficient Parameter Learning of Bayesian Networks.

Softmax Logistic Regression (for both Continuous and Discrete data) - with TRON + QuasiNewton + Conjugate Gradient optimisations.

The Knowledge Factory is an expert system development environment that incorporates interactive rule induction. The Knowledge Factory works with you to produce and refine expert systems.

C4.5X is a set of source files that extends C4.5 release 6 to add decision tree grafting.

Our software for generating synthetic data streams with abrupt drift can be downloaded here.

Our system for describing the concept drift present in real-world data can be downloaded here.

Our packages for time series classification include ROCKET, MiniROCKET, MultiROCKET, Barycentric averaging, fast indexing (Matlab version), and fast window size selection. Our Tempo repository contains fast C++ implementations with Python bindings of many time series distance measures and lower bounds, together with the Proximity Forest time series classifier.

One of my MDS students, Jieshen Huang, implemented impact rules in Python. The source can be found here.