Translational Data Science in Health

I investigate a wide range of applications of data science to health, ranging from analysis of medical imaging to diagnostics. My research with the Alfred Hospital led to a revision to Medical Emergency Team protocols that saves $500,000 per annum while improving clinical outcomes.

Publications

A Radiograph Dataset for the Classification, Localization, and Segmentation of Primary Bone Tumors.
Yao, S., Huang, Y., Wang, X., Zhang, Y., Paixao, I. C., Wang, Z., Chai, C. L., Wang, H., Lu, D., Webb, G. I., Li, S., Guo, Y., Chen, Q., & Song, J.
Scientific Data, 12, Art. no. 88, 2025.
[Bibtex] → Access on publisher site

@Article{Yao2025,
author = {Yao, Shunhan and Huang, Yuanxiang and Wang, Xiaoyu and Zhang, Yiwen and Paixao, Ian Costa and Wang, Zhikang and Chai, Charla Lu and Wang, Hongtao and Lu, Dinggui and Webb, Geoffrey I and Li, Shanshan and Guo, Yuming and Chen, Qingfeng and Song, Jiangning},
journal = {Scientific Data},
title = {A Radiograph Dataset for the Classification, Localization, and Segmentation of Primary Bone Tumors},
year = {2025},
issn = {2052-4463},
volume = {12},
articlenumber = {88},
doi = {10.1038/s41597-024-04311-y},
keywords = {health},
publisher = {Springer Science and Business Media LLC},
related = {health},
}

ABSTRACT

The application of artificial intelligence to accelerate G protein-coupled receptor drug discovery.
Nguyen, A. T. N., Nguyen, D. T. N., Koh, H. Y., Toskov, J., MacLean, W., Xu, A., Zhang, D., Webb, G. I., May, L. T., & Halls, M. L.
British Journal of Pharmacology, 181(14), 2371-2384, 2024.
[Bibtex] [Abstract] → Access on publisher site

@Article{Nguyen,
author = {Nguyen, Anh T. N. and Nguyen, Diep T. N. and Koh, Huan Yee and Toskov, Jason and MacLean, William and Xu, Andrew and Zhang, Daokun and Webb, Geoffrey I. and May, Lauren T. and Halls, Michelle L.},
journal = {British Journal of Pharmacology},
title = {The application of artificial intelligence to accelerate G protein-coupled receptor drug discovery},
year = {2024},
number = {14},
pages = {2371-2384},
volume = {181},
abstract = {The application of artificial intelligence (AI) approaches to drug discovery for G protein-coupled receptors (GPCRs) is a rapidly expanding area. Artificial intelligence can be used at multiple stages during the drug discovery process, from aiding our understanding of the fundamental actions of GPCRs to the discovery of new ligand-GPCR interactions or the prediction of clinical responses. Here, we provide an overview of the concepts behind artificial intelligence, including the subfields of machine learning and deep learning. We summarise the published applications of artificial intelligence to different stages of the GPCR drug discovery process. Finally, we reflect on the benefits and limitations of artificial intelligence and share our vision for the exciting potential for further development of applications to aid GPCR drug discovery. In addition to making the drug discovery process faster, smarter and cheaper, we anticipate that the application of artificial intelligence will create exciting new opportunities for GPCR drug discovery.},
doi = {10.1111/bph.16140},
keywords = {health, artificial intelligence, deep learning, drug discovery, G protein-coupled receptor, machine learning},
related = {health},
}

ABSTRACT The application of artificial intelligence (AI) approaches to drug discovery for G protein-coupled receptors (GPCRs) is a rapidly expanding area. Artificial intelligence can be used at multiple stages during the drug discovery process, from aiding our understanding of the fundamental actions of GPCRs to the discovery of new ligand-GPCR interactions or the prediction of clinical responses. Here, we provide an overview of the concepts behind artificial intelligence, including the subfields of machine learning and deep learning. We summarise the published applications of artificial intelligence to different stages of the GPCR drug discovery process. Finally, we reflect on the benefits and limitations of artificial intelligence and share our vision for the exciting potential for further development of applications to aid GPCR drug discovery. In addition to making the drug discovery process faster, smarter and cheaper, we anticipate that the application of artificial intelligence will create exciting new opportunities for GPCR drug discovery.

Predicting Pseudomonas aeruginosa drug resistance using artificial intelligence and clinical MALDI-TOF mass spectra.
Nguyen, H., Peleg, A. Y., Song, J., Antony, B., Webb, G. I., Wisniewski, J. A., Blakeway, L. V., Badoordeen, G. Z., Theegala, R., Zisis, H., Dowe, D. L., & Macesic, N.
mSystems, 2024.
[Bibtex] [Abstract] → Access on publisher site

@Article{Nguyen2024,
author = {Nguyen, Hoai-An and Peleg, Anton Y. and Song, Jiangning and Antony, Bhavna and Webb, Geoffrey I. and Wisniewski, Jessica A. and Blakeway, Luke V. and Badoordeen, Gnei Z. and Theegala, Ravali and Zisis, Helen and Dowe, David L. and Macesic, Nenad},
journal = {mSystems},
title = {Predicting Pseudomonas aeruginosa drug resistance using artificial intelligence and clinical MALDI-TOF mass spectra},
year = {2024},
issn = {2379-5077},
month = aug,
abstract = {Matrix-assisted laser desorption/ionization–time of flight mass spectrometry (MALDI-TOF MS) is widely used in clinical microbiology laboratories for bacterial identification but its use for detection of antimicrobial resistance (AMR) remains limited. Here, we used MALDI-TOF MS with artificial intelligence (AI) approaches to successfully predict AMR in Pseudomonas aeruginosa, a priority pathogen with complex AMR mechanisms. The highest performance was achieved for modern β-lactam/β-lactamase inhibitor drugs, namely, ceftazidime/avibactam and ceftolozane/tazobactam. For these drugs, the model demonstrated area under the receiver operating characteristic curve (AUROC) of 0.869 and 0.856, specificity of 0.925 and 0.897, and sensitivity of 0.731 and 0.714, respectively. As part of this work, we developed dynamic binning, a feature engineering technique that effectively reduces the high-dimensional feature set and has wide-ranging applicability to MALDI-TOF MS data. Compared to conventional feature engineering approaches, the dynamic binning method yielded highest performance in 7 of 10 antimicrobials. Moreover, we showcased the efficacy of transfer learning in enhancing the AUROC performance for 8 of 11 antimicrobials. By assessing the contribution of features to the model's prediction, we identified proteins that may contribute to AMR mechanisms. Our findings demonstrate the potential of combining AI with MALDI-TOF MS as a rapid AMR diagnostic tool for Pseudomonas aeruginosa.},
doi = {10.1128/msystems.00789-24},
editor = {Yang, Yu-Liang},
keywords = {health},
publisher = {American Society for Microbiology},
related = {health},
}

ABSTRACT Matrix-assisted laser desorption/ionization–time of flight mass spectrometry (MALDI-TOF MS) is widely used in clinical microbiology laboratories for bacterial identification but its use for detection of antimicrobial resistance (AMR) remains limited. Here, we used MALDI-TOF MS with artificial intelligence (AI) approaches to successfully predict AMR in Pseudomonas aeruginosa, a priority pathogen with complex AMR mechanisms. The highest performance was achieved for modern β-lactam/β-lactamase inhibitor drugs, namely, ceftazidime/avibactam and ceftolozane/tazobactam. For these drugs, the model demonstrated area under the receiver operating characteristic curve (AUROC) of 0.869 and 0.856, specificity of 0.925 and 0.897, and sensitivity of 0.731 and 0.714, respectively. As part of this work, we developed dynamic binning, a feature engineering technique that effectively reduces the high-dimensional feature set and has wide-ranging applicability to MALDI-TOF MS data. Compared to conventional feature engineering approaches, the dynamic binning method yielded highest performance in 7 of 10 antimicrobials. Moreover, we showcased the efficacy of transfer learning in enhancing the AUROC performance for 8 of 11 antimicrobials. By assessing the contribution of features to the model's prediction, we identified proteins that may contribute to AMR mechanisms. Our findings demonstrate the potential of combining AI with MALDI-TOF MS as a rapid AMR diagnostic tool for Pseudomonas aeruginosa.

COVID-19 restrictions and the incidence and prevalence of prescription opioid use in Australia - a nation-wide study.
Jung, M., Lukose, D., Nielsen, S., Bell, S. J., Webb, G. I., & Ilomaki, J.
British Journal of Clinical Pharmacology, 89(2), 914-920, 2023.
[Bibtex] [Abstract] → Access on publisher site

@Article{Jung,
author = {Jung, Monica and Lukose, Dickson and Nielsen, Suzanne and Bell, J. Simon and Webb, Geoffrey I. and Ilomaki, Jenni},
journal = {British Journal of Clinical Pharmacology},
title = {COVID-19 restrictions and the incidence and prevalence of prescription opioid use in Australia - a nation-wide study},
year = {2023},
number = {2},
pages = {914-920},
volume = {89},
abstract = {The COVID-19 pandemic has disrupted seeking and delivery of healthcare. Different Australian jurisdictions implemented different COVID-19 restrictions. We used Australian national pharmacy dispensing data to conduct interrupted time series analyses to examine the incidence and prevalence of opioid dispensing in different jurisdictions. Following nationwide COVID-19 restrictions, the incidence dropped by -0.40 [-0.50, -0.31], -0.33 [-0.46, -0.21] and -0.21 [-0.37, -0.04] /1000 people/week and prevalence dropped by -0.85 [-1.39, -0.31], -0.54 [-1.01, -0.07] and -0.62 [-0.99, -0.25] /1000 people/week in Victoria, New South Wales and other jurisdictions, respectively. Incidence and prevalence increased by 0.29 [0.13, 0.44] and 0.72 [0.11, 1.33] /1000 people/week, respectively in Victoria post-lockdown; no significant changes were observed in other jurisdictions. No significant changes were observed in the initiation of long-term opioid use in any jurisdictions. More stringent restrictions coincided with more pronounced reductions in overall opioid initiation, but initiation of long-term opioid use did not change.},
doi = {10.1111/bcp.15577},
keywords = {health, opioids, chronic pain, drug utilisation, medication safety, quality use of medicines},
related = {health},
}

ABSTRACT The COVID-19 pandemic has disrupted seeking and delivery of healthcare. Different Australian jurisdictions implemented different COVID-19 restrictions. We used Australian national pharmacy dispensing data to conduct interrupted time series analyses to examine the incidence and prevalence of opioid dispensing in different jurisdictions. Following nationwide COVID-19 restrictions, the incidence dropped by -0.40 [-0.50, -0.31], -0.33 [-0.46, -0.21] and -0.21 [-0.37, -0.04] /1000 people/week and prevalence dropped by -0.85 [-1.39, -0.31], -0.54 [-1.01, -0.07] and -0.62 [-0.99, -0.25] /1000 people/week in Victoria, New South Wales and other jurisdictions, respectively. Incidence and prevalence increased by 0.29 [0.13, 0.44] and 0.72 [0.11, 1.33] /1000 people/week, respectively in Victoria post-lockdown; no significant changes were observed in other jurisdictions. No significant changes were observed in the initiation of long-term opioid use in any jurisdictions. More stringent restrictions coincided with more pronounced reductions in overall opioid initiation, but initiation of long-term opioid use did not change.

Rapid Identification of Protein Formulations with Bayesian Optimisation.
Huynh, V., Say, B., Vogel, P., Cao, L., Webb, G. I., & Aleti, A.
2023 International Conference on Machine Learning and Applications (ICMLA), pp. 776-781, 2023.
[Bibtex] → Access on publisher site

@InProceedings{Huynh2023,
author = {Huynh, Viet and Say, Buser and Vogel, Peter and Cao, Lucy and Webb, Geoffrey I and Aleti, Aldeida},
booktitle = {2023 International Conference on Machine Learning and Applications (ICMLA)},
title = {Rapid Identification of Protein Formulations with Bayesian Optimisation},
year = {2023},
pages = {776-781},
creationdate = {2024-03-21T10:48:29},
doi = {10.1109/ICMLA58977.2023.00113},
keywords = {health,Bioinformatics,Drugs;Proteins;Industries;Metalearning;Transportation;Stability analysis;Bayes methods;Protein buffer optimisation;Bayesian optimisation},
}

ABSTRACT

EHR-QC: A streamlined pipeline for automated electronic health records standardisation and preprocessing to predict clinical outcomes.
Ramakrishnaiah, Y., Macesic, N., Webb, G., Peleg, A. Y., & Tyagi, S.
Journal of Biomedical Informatics, 104509, 2023.
[Bibtex] [Abstract] → Access on publisher site

@Article{Ramakrishnaiah2023,
author = {Yashpal Ramakrishnaiah and Nenad Macesic and Geoff Webb and Anton Y. Peleg and Sonika Tyagi},
journal = {Journal of Biomedical Informatics},
title = {EHR-QC: A streamlined pipeline for automated electronic health records standardisation and preprocessing to predict clinical outcomes},
year = {2023},
issn = {1532-0464},
pages = {104509},
abstract = {The adoption of electronic health records (EHRs) has created opportunities to analyze historical data for predicting clinical outcomes and improving patient care. However, non-standardized data representations and anomalies pose major challenges to the use of EHRs in digital health research. To address these challenges, we have developed EHR-QC, a tool comprising two modules: the data standardization module and the preprocessing module. The data standardization module migrates source EHR data to a standard format using advanced concept mapping techniques, surpassing expert curation in benchmarking analysis. The preprocessing module includes several functions designed specifically to handle healthcare data subtleties. We provide automated detection of data anomalies and solutions to handle those anomalies. We believe that the development and adoption of tools like EHR-QC is critical for advancing digital health. Our ultimate goal is to accelerate clinical research by enabling rapid experimentation with data-driven observational research to generate robust, generalisable biomedical knowledge.},
creationdate = {2023-10-12T10:22:50},
doi = {10.1016/j.jbi.2023.104509},
keywords = {Digital health, Electronic health records, EHR, Clinical outcome prediction, Machine learning},
related = {health},
}

ABSTRACT The adoption of electronic health records (EHRs) has created opportunities to analyze historical data for predicting clinical outcomes and improving patient care. However, non-standardized data representations and anomalies pose major challenges to the use of EHRs in digital health research. To address these challenges, we have developed EHR-QC, a tool comprising two modules: the data standardization module and the preprocessing module. The data standardization module migrates source EHR data to a standard format using advanced concept mapping techniques, surpassing expert curation in benchmarking analysis. The preprocessing module includes several functions designed specifically to handle healthcare data subtleties. We provide automated detection of data anomalies and solutions to handle those anomalies. We believe that the development and adoption of tools like EHR-QC is critical for advancing digital health. Our ultimate goal is to accelerate clinical research by enabling rapid experimentation with data-driven observational research to generate robust, generalisable biomedical knowledge.

Did Australia's COVID-19 restrictions impact statin incidence, prevalence or adherence?.
Livori, A. C., Lukose, D., Bell, S. J., Webb, G. I., & Ilomaki, J.
Current Problems in Cardiology, 101576, 2022.
[Bibtex] [Abstract] → Access on publisher site

@Article{Livori2022,
author = {Adam C Livori and Dickson Lukose and J Simon Bell and Geoffrey I Webb and Jenni Ilomaki},
journal = {Current Problems in Cardiology},
title = {Did Australia's COVID-19 restrictions impact statin incidence, prevalence or adherence?},
year = {2022},
issn = {0146-2806},
pages = {101576},
abstract = {Objective
COVID-19 restrictions may have an unintended consequence of limiting access to cardiovascular care. Australia implemented adaptive interventions (e.g. telehealth consultations, digital image prescriptions, continued dispensing, medication delivery) to maintain medication access. This study investigated whether COVID-19 restrictions in different jurisdictions coincided with changes in statin incidence, prevalence and adherence.
Methods
Analysis of a 10% random sample of national medication claims data from January 2018 to December 2020 was conducted across three Australian jurisdictions. Weekly incidence and prevalence were estimated by dividing the number statin initiations and any statin dispensing by the Australian population aged 18-99 years. Statin adherence was analysed across the jurisdictions and years, with adherence categorised as <40%, 40-79% and >=80% based on dispensings per calendar year.
Results
Overall, 309,123, 315,703 and 324,906 people were dispensed and 39029, 39816, and 44979 initiated statins in 2018, 2019 and 2020 respectively. Two waves of COVID-19 restrictions in 2020 coincided with no meaningful change in statin incidence or prevalence per week when compared to 2018 and 2019. Incidence increased 0.3% from 23.7 to 26.2 per 1000 people across jurisdictions in 2020 compared to 2019. Prevalence increased 0.14% from 158.5 to 159.9 per 1000 people across jurisdictions in 2020 compared to 2019. The proportion of adults with >=80% adherence increased by 3.3% in Victoria, 1.4% in NSW and 1.8% in other states and territories between 2019 and 2020.
Conclusions
COVID-19 restrictions did not coincide with meaningful changes in the incidence, prevalence or adherence to statins suggesting adaptive interventions succeeded in maintaining access to cardiovascular medications.},
doi = {10.1016/j.cpcardiol.2022.101576},
keywords = {Statin, drug utilisation, medication adherence, cardiovascular, cardiology, health},
related = {health},
}

ABSTRACT Objective COVID-19 restrictions may have an unintended consequence of limiting access to cardiovascular care. Australia implemented adaptive interventions (e.g. telehealth consultations, digital image prescriptions, continued dispensing, medication delivery) to maintain medication access. This study investigated whether COVID-19 restrictions in different jurisdictions coincided with changes in statin incidence, prevalence and adherence. Methods Analysis of a 10% random sample of national medication claims data from January 2018 to December 2020 was conducted across three Australian jurisdictions. Weekly incidence and prevalence were estimated by dividing the number statin initiations and any statin dispensing by the Australian population aged 18-99 years. Statin adherence was analysed across the jurisdictions and years, with adherence categorised as <40%, 40-79% and >=80% based on dispensings per calendar year. Results Overall, 309,123, 315,703 and 324,906 people were dispensed and 39029, 39816, and 44979 initiated statins in 2018, 2019 and 2020 respectively. Two waves of COVID-19 restrictions in 2020 coincided with no meaningful change in statin incidence or prevalence per week when compared to 2018 and 2019. Incidence increased 0.3% from 23.7 to 26.2 per 1000 people across jurisdictions in 2020 compared to 2019. Prevalence increased 0.14% from 158.5 to 159.9 per 1000 people across jurisdictions in 2020 compared to 2019. The proportion of adults with >=80% adherence increased by 3.3% in Victoria, 1.4% in NSW and 1.8% in other states and territories between 2019 and 2020. Conclusions COVID-19 restrictions did not coincide with meaningful changes in the incidence, prevalence or adherence to statins suggesting adaptive interventions succeeded in maintaining access to cardiovascular medications.

Cell graph neural networks enable the precise prediction of patient survival in gastric cancer.
Wang, Y., Wang, Y. G., Hu, C., Li, M., Fan, Y., Otter, N., Sam, I., Gou, H., Hu, Y., Kwok, T., Zalcberg, J., Boussioutas, A., Daly, R. J., Montfar, G., Li, P., Xu, D., Webb, G. I., & Song, J.
npj Precision Oncology, 6(1), Art. no. 45, 2022.
[Bibtex] [Abstract] → Access on publisher site

@Article{Wang2022,
author = {Wang, Yanan and Wang, Yu Guang and Hu, Changyuan and Li, Ming and Fan, Yanan and Otter, Nina and Sam, Ikuan and Gou, Hongquan and Hu, Yiqun and Kwok, Terry and Zalcberg, John and Boussioutas, Alex and Daly, Roger J. and Montfar, Guido and Li, Pietro and Xu, Dakang and Webb, Geoffrey I. and Song, Jiangning},
journal = {npj Precision Oncology},
title = {Cell graph neural networks enable the precise prediction of patient survival in gastric cancer},
year = {2022},
issn = {2397-768X},
number = {1},
volume = {6},
abstract = {Gastric cancer is one of the deadliest cancers worldwide. An accurate prognosis is essential for effective clinical assessment and treatment. Spatial patterns in the tumor microenvironment (TME) are conceptually indicative of the staging and progression of gastric cancer patients. Using spatial patterns of the TME by integrating and transforming the multiplexed immunohistochemistry (mIHC) images as Cell-Graphs, we propose a graph neural network-based approach, termed Cell-Graph Signature or CGSignature, powered by artificial intelligence, for the digital staging of TME and precise prediction of patient survival in gastric cancer. In this study, patient survival prediction is formulated as either a binary (short-term and long-term) or ternary (short-term, medium-term, and long-term) classification task. Extensive benchmarking experiments demonstrate that the CGSignature achieves outstanding model performance, with Area Under the Receiver Operating Characteristic curve of 0.960+/-0.01, and 0.771+/-0.024 to 0.904+/-0.012 for the binary- and ternary-classification, respectively. Moreover, Kaplan-Meier survival analysis indicates that the 'digital grade' cancer staging produced by CGSignature provides a remarkable capability in discriminating both binary and ternary classes with statistical significance (P value <=0.0001), significantly outperforming the AJCC 8th edition Tumor Node Metastasis staging system. Using Cell-Graphs extracted from mIHC images, CGSignature improves the assessment of the link between the TME spatial patterns and patient prognosis. Our study suggests the feasibility and benefits of such an artificial intelligence-powered digital staging system in diagnostic pathology and precision oncology.},
articlenumber = {45},
doi = {10.1038/s41698-022-00285-5},
keywords = {health},
related = {health},
url = {https://rdcu.be/cQeFD},
}

ABSTRACT Gastric cancer is one of the deadliest cancers worldwide. An accurate prognosis is essential for effective clinical assessment and treatment. Spatial patterns in the tumor microenvironment (TME) are conceptually indicative of the staging and progression of gastric cancer patients. Using spatial patterns of the TME by integrating and transforming the multiplexed immunohistochemistry (mIHC) images as Cell-Graphs, we propose a graph neural network-based approach, termed Cell-Graph Signature or CGSignature, powered by artificial intelligence, for the digital staging of TME and precise prediction of patient survival in gastric cancer. In this study, patient survival prediction is formulated as either a binary (short-term and long-term) or ternary (short-term, medium-term, and long-term) classification task. Extensive benchmarking experiments demonstrate that the CGSignature achieves outstanding model performance, with Area Under the Receiver Operating Characteristic curve of 0.960+/-0.01, and 0.771+/-0.024 to 0.904+/-0.012 for the binary- and ternary-classification, respectively. Moreover, Kaplan-Meier survival analysis indicates that the 'digital grade' cancer staging produced by CGSignature provides a remarkable capability in discriminating both binary and ternary classes with statistical significance (P value <=0.0001), significantly outperforming the AJCC 8th edition Tumor Node Metastasis staging system. Using Cell-Graphs extracted from mIHC images, CGSignature improves the assessment of the link between the TME spatial patterns and patient prognosis. Our study suggests the feasibility and benefits of such an artificial intelligence-powered digital staging system in diagnostic pathology and precision oncology.

OCTID: a one-class learning-based Python package for tumor image detection.
Wang, Y., Yang, L., Webb, G. I., Ge, Z., & Song, J.
Bioinformatics, 37(21), 3986-3988, 2021.
[Bibtex] [Abstract] → Access on publisher site

@Article{10.1093/bioinformatics/btab416,
author = {Wang, Yanan and Yang, Litao and Webb, Geoffrey I and Ge, Zongyuan and Song, Jiangning},
journal = {Bioinformatics},
title = {{OCTID}: a one-class learning-based {Python} package for tumor image detection},
year = {2021},
issn = {1367-4803},
number = {21},
pages = {3986-3988},
volume = {37},
abstract = {{Tumor tile selection is a necessary prerequisite in patch-based cancer whole slide image analysis, which is labor-intensive and requires expertise. Whole slides are annotated as tumor or tumor free, but tiles within a tumor slide are not. As all tiles within a tumor free slide are tumor free, these can be used to capture tumor-free patterns using the one-class learning strategy. We present a Python package, termed OCTID, which combines a pretrained convolutional neural network (CNN) model, Uniform Manifold Approximation and Projection (UMAP) and one-class support vector machine to achieve accurate tumor tile classification using a training set of tumor free tiles. Benchmarking experiments on four H&E image datasets achieved remarkable performance in terms of F1-score (0.90?+/-0.06), Matthews correlation coefficient (0.93?+/-0.05) and accuracy (0.94?+/-0.03).Detailed information can be found in the Supplementary File.Supplementary data are available at Bioinformatics online.}},
doi = {10.1093/bioinformatics/btab416},
keywords = {health},
related = {health},
}

ABSTRACT {Tumor tile selection is a necessary prerequisite in patch-based cancer whole slide image analysis, which is labor-intensive and requires expertise. Whole slides are annotated as tumor or tumor free, but tiles within a tumor slide are not. As all tiles within a tumor free slide are tumor free, these can be used to capture tumor-free patterns using the one-class learning strategy. We present a Python package, termed OCTID, which combines a pretrained convolutional neural network (CNN) model, Uniform Manifold Approximation and Projection (UMAP) and one-class support vector machine to achieve accurate tumor tile classification using a training set of tumor free tiles. Benchmarking experiments on four H&E image datasets achieved remarkable performance in terms of F1-score (0.90?+/-0.06), Matthews correlation coefficient (0.93?+/-0.05) and accuracy (0.94?+/-0.03).Detailed information can be found in the Supplementary File.Supplementary data are available at Bioinformatics online.}

HEAL: an automated deep learning framework for cancer histopathology image analysis.
Wang, Y., Coudray, N., Zhao, Y., Li, F., Hu, C., Zhang, Y., Imoto, S., Tsirigos, A., Webb, G. I., Daly, R. J., & Song, J.
Bioinformatics, 37(22), 4291-4295, 2021.
[Bibtex] [Abstract] → Access on publisher site

@Article{Wang2021,
author = {Wang, Yanan and Coudray, Nicolas and Zhao, Yun and Li, Fuyi and Hu, Changyuan and Zhang, Yao-Zhong and Imoto, Seiya and Tsirigos, Aristotelis and Webb, Geoffrey I and Daly, Roger J and Song, Jiangning},
journal = {Bioinformatics},
title = {{HEAL}: an automated deep learning framework for cancer histopathology image analysis},
year = {2021},
number = {22},
pages = {4291-4295},
volume = {37},
abstract = {{Digital pathology supports analysis of histopathological images using deep learning methods at a large-scale. However, applications of deep learning in this area have been limited by the complexities of configuration of the computational environment and of hyperparameter optimization, which hinder deployment and reduce reproducibility.Here, we propose HEAL, a deep learning-based automated framework for easy, flexible, and multi-faceted histopathological image analysis. We demonstrate its utility and functionality by performing two case studies on lung cancer and one on colon cancer. Leveraging the capability of Docker, HEAL represents an ideal end-to-end tool to conduct complex histopathological analysis and enables deep learning in a broad range of applications for cancer image analysis.Supplementary data are available at Bioinformatics online.}},
doi = {10.1093/bioinformatics/btab380},
keywords = {health},
publisher = {Oxford University Press ({OUP})},
related = {health},
}

ABSTRACT {Digital pathology supports analysis of histopathological images using deep learning methods at a large-scale. However, applications of deep learning in this area have been limited by the complexities of configuration of the computational environment and of hyperparameter optimization, which hinder deployment and reduce reproducibility.Here, we propose HEAL, a deep learning-based automated framework for easy, flexible, and multi-faceted histopathological image analysis. We demonstrate its utility and functionality by performing two case studies on lung cancer and one on colon cancer. Leveraging the capability of Docker, HEAL represents an ideal end-to-end tool to conduct complex histopathological analysis and enables deep learning in a broad range of applications for cancer image analysis.Supplementary data are available at Bioinformatics online.}

Toward Electronic Surveillance of Invasive Mold Diseases in Hematology-Oncology Patients: An Expert System Combining Natural Language Processing of Chest Computed Tomography Reports, Microbiology, and Antifungal Drug Data.
Ananda-Rajah, M. R., Bergmeir, C., Petitjean, F., Slavin, M. A., Thursky, K. A., & Webb, G. I.
JCO Clinical Cancer Informatics(1), 1-10, 2017.
[Bibtex] [Abstract] → Access on publisher site

@Article{Ananda-RajahEtAl17,
author = {Ananda-Rajah, Michelle R. and Bergmeir, Christoph and Petitjean, Francois and Slavin, Monica A. and Thursky, Karin A. and Webb, Geoffrey I.},
journal = {JCO Clinical Cancer Informatics},
title = {Toward Electronic Surveillance of Invasive Mold Diseases in Hematology-Oncology Patients: An Expert System Combining Natural Language Processing of Chest Computed Tomography Reports, Microbiology, and Antifungal Drug Data},
year = {2017},
number = {1},
pages = {1-10},
abstract = {Prospective epidemiologic surveillance of invasive mold disease (IMD) in hematology patients is hampered by the absence of a reliable laboratory prompt. This study develops an expert system for electronic surveillance of IMD that combines probabilities using natural language processing (NLP) of computed tomography (CT) reports with microbiology and antifungal drug data to improve prediction of IMD.MethodsMicrobiology indicators and antifungal drug dispensing data were extracted from hospital information systems at three tertiary hospitals for 123 hematology-oncology patients. Of this group, 64 case patients had 26 probable/proven IMD according to international definitions, and 59 patients were uninfected controls. Derived probabilities from NLP combined with medical expertise identified patients at high likelihood of IMD, with remaining patients processed by a machine-learning classifier trained on all available features. Results Compared with the baseline text classifier, the expert system that incorporated the best performing algorithm (naive Bayes) improved specificity from 50.8\% (95\% CI, 37.5\% to 64.1\%) to 74.6\% (95\% CI, 61.6\% to 85.0\%), reducing false positives by 48\% from 29 to 15; improved sensitivity slightly from 96.9\% (95\% CI, 89.2\% to 99.6\%) to 98.4\% (95\% CI, 91.6\% to 100\%); and improved receiver operating characteristic area from 73.9\% (95\% CI, 67.1\% to 80.6\%) to 92.8\% (95\% CI, 88\% to 97.5\%). Conclusion An expert system that uses multiple sources of data (CT reports, microbiology, antifungal drug dispensing) is a promising approach to continuous prospective surveillance of IMD in the hospital, and demonstrates reduced false notifications (positives) compared with NLP of CT reports alone. Our expert system could provide decision support for IMD surveillance, which is critical to antifungal stewardship and improving supportive care in cancer.},
doi = {10.1200/CCI.17.00011},
eprint = {https://doi.org/10.1200/CCI.17.00011},
keywords = {health},
owner = {giwebb},
related = {health},
timestamp = {2017.09.07},
}

ABSTRACT Prospective epidemiologic surveillance of invasive mold disease (IMD) in hematology patients is hampered by the absence of a reliable laboratory prompt. This study develops an expert system for electronic surveillance of IMD that combines probabilities using natural language processing (NLP) of computed tomography (CT) reports with microbiology and antifungal drug data to improve prediction of IMD.MethodsMicrobiology indicators and antifungal drug dispensing data were extracted from hospital information systems at three tertiary hospitals for 123 hematology-oncology patients. Of this group, 64 case patients had 26 probable/proven IMD according to international definitions, and 59 patients were uninfected controls. Derived probabilities from NLP combined with medical expertise identified patients at high likelihood of IMD, with remaining patients processed by a machine-learning classifier trained on all available features. Results Compared with the baseline text classifier, the expert system that incorporated the best performing algorithm (naive Bayes) improved specificity from 50.8\% (95\% CI, 37.5\% to 64.1\%) to 74.6\% (95\% CI, 61.6\% to 85.0\%), reducing false positives by 48\% from 29 to 15; improved sensitivity slightly from 96.9\% (95\% CI, 89.2\% to 99.6\%) to 98.4\% (95\% CI, 91.6\% to 100\%); and improved receiver operating characteristic area from 73.9\% (95\% CI, 67.1\% to 80.6\%) to 92.8\% (95\% CI, 88\% to 97.5\%). Conclusion An expert system that uses multiple sources of data (CT reports, microbiology, antifungal drug dispensing) is a promising approach to continuous prospective surveillance of IMD in the hospital, and demonstrates reduced false notifications (positives) compared with NLP of CT reports alone. Our expert system could provide decision support for IMD surveillance, which is critical to antifungal stewardship and improving supportive care in cancer.

Designing a more efficient, effective and safe Medical Emergency Team (MET) service using data analysis.
Bergmeir, C., Bilgrami, I., Bain, C., Webb, G. I., Orosz, J., & Pilcher, D.
PLoS ONE, 12(12), Art. no. e0188688, 2017.
[Bibtex] → Access on publisher site

@Article{BergmeirEtAl2017,
author = {Bergmeir, Christoph and Bilgrami, Irma and Bain, Christopher and Webb, Geoffrey I and Orosz, Judit and Pilcher, David},
journal = {PLoS ONE},
title = {Designing a more efficient, effective and safe Medical Emergency Team (MET) service using data analysis},
year = {2017},
number = {12},
volume = {12},
articlenumber = {e0188688},
doi = {10.1371/journal.pone.0188688},
keywords = {health},
related = {health},
}

ABSTRACT

Identifying markers of pathology in SAXS data of malignant tissues of the brain.
Siu, K. K. W., Butler, S. M., Beveridge, T., Gillam, J. E., Hall, C. J., Kaye, A. H., Lewis, R. A., Mannan, K., McLoughlin, G., Pearson, S., Round, A. R., E., S., Webb, G. I., & Wilkinson, S. J.
Nuclear Instruments and Methods in Physics Research A, 548, 140-146, 2005.
[Bibtex] [Abstract] → Download PDF → Access on publisher site

@Article{SiuEtAl05,
author = {Siu, K. K. W. and Butler, S. M. and Beveridge, T. and Gillam, J. E. and Hall, C. J. and Kaye, A. H. and Lewis, R. A. and Mannan, K. and McLoughlin, G. and Pearson, S. and Round, A. R. and Schultke E. and Webb, G. I. and Wilkinson, S. J.},
journal = {Nuclear Instruments and Methods in Physics Research A},
title = {Identifying markers of pathology in SAXS data of malignant tissues of the brain},
year = {2005},
pages = {140-146},
volume = {548},
abstract = {Conventional neuropathological analysis for brain malignancies is heavily reliant on the observation of morphological abnormalities, observed in thin, stained sections of tissue. Small Angle X-ray Scattering (SAXS) data provide an alternative means of distinguishing pathology by examining the ultra-structural (nanometer length scales) characteristics of tissue. To evaluate the diagnostic potential of SAXS for brain tumors, data was collected from normal, malignant and benign tissues of the human brain at station 2.1 of the Daresbury Laboratory Synchrotron Radiation Source and subjected to data mining and multivariate statistical analysis. The results suggest SAXS data may be an effective classi.er of malignancy.},
doi = {10.1016/j.nima.2005.03.081},
keywords = {health},
publisher = {Elsevier},
related = {health},
}

ABSTRACT Conventional neuropathological analysis for brain malignancies is heavily reliant on the observation of morphological abnormalities, observed in thin, stained sections of tissue. Small Angle X-ray Scattering (SAXS) data provide an alternative means of distinguishing pathology by examining the ultra-structural (nanometer length scales) characteristics of tissue. To evaluate the diagnostic potential of SAXS for brain tumors, data was collected from normal, malignant and benign tissues of the human brain at station 2.1 of the Daresbury Laboratory Synchrotron Radiation Source and subjected to data mining and multivariate statistical analysis. The results suggest SAXS data may be an effective classi.er of malignancy.

A Case Study in Feature Invention for Breast Cancer Diagnosis Using X-Ray Scatter Images.
Butler, S. M., Webb, G. I., & Lewis, R. A.
Lecture Notes in Artificial Intelligence Vol. 2903: Proceedings of the 16th Australian Conference on Artificial Intelligence (AI 03), Berlin/Heidelberg, pp. 677-685, 2003.
[Bibtex] [Abstract] → Download PDF → Access on publisher site

@InProceedings{ButlerWebbLewis03,
author = {Butler, S. M. and Webb, G. I. and Lewis, R. A.},
booktitle = {Lecture Notes in Artificial Intelligence Vol. 2903: Proceedings of the 16th Australian Conference on Artificial Intelligence (AI 03)},
title = {A Case Study in Feature Invention for Breast Cancer Diagnosis Using X-Ray Scatter Images},
year = {2003},
address = {Berlin/Heidelberg},
editor = {Gedeon, T.D. and Fung, L.C.C.},
pages = {677-685},
publisher = {Springer},
abstract = {X-ray mammography is the current method for screening for breast cancer, and like any technique, has its limitations. Several groups have reported differences in the X-ray scattering patterns of normal and tumour tissue from the breast. This gives rise to the hope that X-ray scatter analysis techniques may lead to a more accurate and cost effective method of diagnosing beast cancer which lends itself to automation. This is a particularly challenging exercise due to the inherent complexity of the information content in X-ray scatter patterns from complex heterogenous tissue samples. We use a simple naive Bayes classier, coupled with Equal Frequency Discretization (EFD) as our classification system. High-level features are extracted from the low-level pixel data. This paper reports some preliminary results in the ongoing development of this classification method that can distinguish between the diffraction patterns of normal and cancerous tissue, with particular emphasis on the invention of features for classification.},
doi = {10.1007/978-3-540-24581-0_58},
keywords = {health},
related = {health},
}

ABSTRACT X-ray mammography is the current method for screening for breast cancer, and like any technique, has its limitations. Several groups have reported differences in the X-ray scattering patterns of normal and tumour tissue from the breast. This gives rise to the hope that X-ray scatter analysis techniques may lead to a more accurate and cost effective method of diagnosing beast cancer which lends itself to automation. This is a particularly challenging exercise due to the inherent complexity of the information content in X-ray scatter patterns from complex heterogenous tissue samples. We use a simple naive Bayes classier, coupled with Equal Frequency Discretization (EFD) as our classification system. High-level features are extracted from the low-level pixel data. This paper reports some preliminary results in the ongoing development of this classification method that can distinguish between the diffraction patterns of normal and cancerous tissue, with particular emphasis on the invention of features for classification.

Application Of Machine Learning To A Renal Biopsy Data-Base.
Agar, J., & Webb, G. I.
Nephrology, Dialysis and Transplantation, 7, 472-478, 1992.
[Bibtex] [Abstract] → Access on publisher site

@Article{AgarWebb92,
author = {Agar, J. and Webb, G. I.},
journal = {Nephrology, Dialysis and Transplantation},
title = {Application Of Machine Learning To A Renal Biopsy Data-Base},
year = {1992},
pages = {472-478},
volume = {7},
abstract = {This pilot study has applied machine learning (artificial intelligence derived qualitative analysis procedures) to yield non-invasive techniques for the assessment and interpretation of clinical and laboratory data in glomerular disease. To evaluate the appropriateness of these techniques, they were applied to subsets of a small database of 284 case histories and the resulting procedures evaluated against the remaining cases. Over such evaluations, the following average diagnostic accuracies were obtained: microscopic polyarteritis, 95.37%; minimal lesion nephrotic syndrome, 96.50%; immunoglobulin A nephropathy, 81.26%; minor changes, 93.66%; lupus nephritis, 96.27%; focal glomerulosclerosis, 92.06%; mesangial proliferative glomerulonephritis, 92.56%; and membranous nephropathy, 92.56%. Although in general the new diagnostic system is not yet as accurate as the histological evaluation of renal biopsy specimens, it shows promise of adding a further dimension to the diagnostic process. When the machine learning techniques are applied to a larger database, greater diagnostic accuracy should be obtained. It may allow accurate non- invasive diagnosis of some cases of glomerular disease without the need for renal biopsy. This may reduce both the cost and the morbidity of the investigation of glomerular disease and may be of particular value in situations where renal biopsy is considered hazardous or contraindicated.},
address = {Oxford UK},
audit-trail = {28/10/03 Link to abstract only at this stage available via Oxford Press.},
keywords = {Rule Learning, health},
publisher = {Oxford University Press},
related = {health},
url = {http://ndt.oxfordjournals.org/content/7/6/472.abstract},
}

ABSTRACT This pilot study has applied machine learning (artificial intelligence derived qualitative analysis procedures) to yield non-invasive techniques for the assessment and interpretation of clinical and laboratory data in glomerular disease. To evaluate the appropriateness of these techniques, they were applied to subsets of a small database of 284 case histories and the resulting procedures evaluated against the remaining cases. Over such evaluations, the following average diagnostic accuracies were obtained: microscopic polyarteritis, 95.37%; minimal lesion nephrotic syndrome, 96.50%; immunoglobulin A nephropathy, 81.26%; minor changes, 93.66%; lupus nephritis, 96.27%; focal glomerulosclerosis, 92.06%; mesangial proliferative glomerulonephritis, 92.56%; and membranous nephropathy, 92.56%. Although in general the new diagnostic system is not yet as accurate as the histological evaluation of renal biopsy specimens, it shows promise of adding a further dimension to the diagnostic process. When the machine learning techniques are applied to a larger database, greater diagnostic accuracy should be obtained. It may allow accurate non- invasive diagnosis of some cases of glomerular disease without the need for renal biopsy. This may reduce both the cost and the morbidity of the investigation of glomerular disease and may be of particular value in situations where renal biopsy is considered hazardous or contraindicated.

The Application of Machine Learning to the Diagnosis of Glomerular Disease.
Webb, G. I., & Agar, J.
Proceedings of the IJCAI Workshop W.15: Representing Knowledge in Medical Decision Support Systems, pp. 8.1-8.8, 1991.
[Bibtex] [Abstract] → Download PDF

@InProceedings{WebbAgar91,
author = {Webb, G. I. and Agar, J.},
booktitle = {Proceedings of the {IJCAI} Workshop W.15: Representing Knowledge in Medical Decision Support Systems},
title = {The Application of Machine Learning to the Diagnosis of Glomerular Disease},
year = {1991},
editor = {Sarmeinto, C.},
pages = {8.1-8.8},
abstract = {A pilot study has applied the DLG machine learning algorithm to create expert systems for the assessment and interpretation of clinical and laboratory data in glomerular disease. Despite the limited size of the data-set and major deficiencies in the information recorded therein, for one of the conditions examined in this study, microscopic polyarteritis, a consistent diagnostic accuracy of 100% was obtained. With expansion of the data base, it is possible that techniques will be derived that provide accurate non-invasive diagnosis of some cases of glomerular disease, thus obviating the need for renal biopsy. Success in this project will result in significant reductions in both the cost and the morbidity associated with the investigation of glomerular disease.},
audit-trail = {Reconstructed paper posted May 2006},
keywords = {Rule Learning, health},
location = {Sydney, Australia},
related = {health},
}

ABSTRACT A pilot study has applied the DLG machine learning algorithm to create expert systems for the assessment and interpretation of clinical and laboratory data in glomerular disease. Despite the limited size of the data-set and major deficiencies in the information recorded therein, for one of the conditions examined in this study, microscopic polyarteritis, a consistent diagnostic accuracy of 100% was obtained. With expansion of the data base, it is possible that techniques will be derived that provide accurate non-invasive diagnosis of some cases of glomerular disease, thus obviating the need for renal biopsy. Success in this project will result in significant reductions in both the cost and the morbidity associated with the investigation of glomerular disease.