Susceptibility of AutoML mortality prediction algorithms to model drift caused by the COVID pandemic

Kagerbauer, Simone Maria

Ulm, Bernhard

Podtschaske, Armin Horst

Andonov, Dimislav Ivanov

Blobner, Manfred

Jungwirth, Bettina

Graessner, Martin

Download
12911_2024_Article_2428.pdf 1,78MB

Name

Value

Titel

Susceptibility of AutoML mortality prediction algorithms to model drift caused by the COVID pandemic

Autor/in

Verlag

BioMed Central

Erscheinungsjahr

2024

Publikationstyp

Artikel |

Online veröffentlicht

2024-02-02

Erschienen in

http://lobid.org/resources/99370673400706441#! |

Quellenangabe

24(1):34

Copyrightjahr

2024

Lizenz

https://creativecommons.org/licenses/by/4.0/ |

Verlagsversion

https://doi.org/10.1186/s12911-024-02428-z |
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10877890/ |

Publikationsstatus

Postprint Verlagsversion |

Begutachtungsstatus

begutachtet (Peer-reviewed) |

Sprache der Publikation

Englisch |

Abstract/Summary

<jats:title>Abstract</jats:title><jats:sec> <jats:title>Background</jats:title> <jats:p>Concept drift and covariate shift lead to a degradation of machine learning (ML) models. The objective of our study was to characterize sudden data drift as caused by the COVID pandemic. Furthermore, we investigated the suitability of certain methods in model training to prevent model degradation caused by data drift.</jats:p> </jats:sec><jats:sec> <jats:title>Methods</jats:title> <jats:p>We trained different ML models with the H2O AutoML method on a dataset comprising 102,666 cases of surgical patients collected in the years 2014–2019 to predict postoperative mortality using preoperatively available data. Models applied were Generalized Linear Model with regularization, Default Random Forest, Gradient Boosting Machine, eXtreme Gradient Boosting, Deep Learning and Stacked Ensembles comprising all base models. Further, we modified the original models by applying three different methods when training on the original pre-pandemic dataset: (1) we weighted older data weaker, (2) used only the most recent data for model training and (3) performed a z-transformation of the numerical input parameters. Afterwards, we tested model performance on a pre-pandemic and an in-pandemic data set not used in the training process, and analysed common features.</jats:p> </jats:sec><jats:sec> <jats:title>Results</jats:title> <jats:p>The models produced showed excellent areas under receiver-operating characteristic and acceptable precision-recall curves when tested on a dataset from January-March 2020, but significant degradation when tested on a dataset collected in the first wave of the COVID pandemic from April-May 2020. When comparing the probability distributions of the input parameters, significant differences between pre-pandemic and in-pandemic data were found. The endpoint of our models, in-hospital mortality after surgery, did not differ significantly between pre- and in-pandemic data and was about 1% in each case. However, the models varied considerably in the composition of their input parameters. None of our applied modifications prevented a loss of performance, although very different models emerged from it, using a large variety of parameters.</jats:p> </jats:sec><jats:sec> <jats:title>Conclusions</jats:title> <jats:p>Our results show that none of our tested easy-to-implement measures in model training can prevent deterioration in the case of sudden external events. Therefore, we conclude that, in the presence of concept drift and covariate shift, close monitoring and critical review of model predictions are necessary.</jats:p> </jats:sec>

Sacherschließung

gnd 1206347392	COVID-19
lokal	Covariate shift
lokal	Algorithms [MeSH]
lokal	Humans [MeSH]
lokal	Data shift
lokal	AutoML
lokal	COVID-19
lokal	Hospital Mortality [MeSH]
lokal	Pandemics [MeSH]
lokal	Research
lokal	Concept drift
lokal	Machine Learning [MeSH]
lokal	Model deterioration
lokal	COVID-19/epidemiology [MeSH]

Fächerklassifikation (DDC)

Medizin und Gesundheit |

Liste der Beteiligten

https://frl.publisso.de/adhoc/uri/S2FnZXJiYXVlciwgU2ltb25lIE1hcmlh|https://frl.publisso.de/adhoc/uri/VWxtLCBCZXJuaGFyZA==|https://frl.publisso.de/adhoc/uri/UG9kdHNjaGFza2UsIEFybWluIEhvcnN0|https://frl.publisso.de/adhoc/uri/QW5kb25vdiwgRGltaXNsYXYgSXZhbm92|https://frl.publisso.de/adhoc/uri/QmxvYm5lciwgTWFuZnJlZA==|https://frl.publisso.de/adhoc/uri/SnVuZ3dpcnRoLCBCZXR0aW5h|https://frl.publisso.de/adhoc/uri/R3JhZXNzbmVyLCBNYXJ0aW4=

Hinweis

DeepGreen-ID: 96d442fae7234b96a8a0e6f11582f889 ; metadata provieded by: DeepGreen (https://www.oa-deepgreen.de/api/v1/), LIVIVO search scope life sciences (http://z3950.zbmed.de:6210/livivo), Crossref Unified Resource API (https://api.crossref.org/swagger-ui/index.html), to.science.api (https://frl.publisso.de/), ZDB JSON-API (beta) (https://zeitschriftendatenbank.de/api/), lobid - Dateninfrastruktur für Bibliotheken (https://lobid.org/resources/search)

Label

frl:6506961

Förderer

Fördernummer

Förderprogramm

Dateien

Susceptibility of AutoML mortality prediction algorithms to model drift caused by the COVID pandemic

Förderung

joinedFunding-child

1000	Förderer	German Federal Ministry for Economic Affairs and Energy \|
1000	Förderprogramm	-
1000	Fördernummer	-

joinedFunding-child

1000	Förderer	Universität Ulm \|
1000	Förderprogramm	-
1000	Fördernummer	-

Objektart

article

Beschrieben durch

1000	@id	frl:6506961.rdf
1000	Erstellt am	2025-02-06T12:52:43.452+0100
1000	Erstellt von	322
1000	beschreibt	frl:6506961
1000	Zuletzt bearbeitet	2025-09-12T14:50:18.054+0200
1000	Objekt bearb.	Fri Sep 12 14:50:18 CEST 2025

Vgl.

frl:6506961

Oai Id

oai:frl.publisso.de:frl:6506961 |

Sichtbarkeit Metadaten

public

Sichtbarkeit Daten

public

Gegenstand von

https://repository.publisso.de/resource/frl:6506961/edit |

View source

Suche

Publisso-Menue

Susceptibility of AutoML mortality prediction algorithms to model drift caused by the COVID pandemic

Kagerbauer, Simone Maria

Ulm, Bernhard

Podtschaske, Armin Horst

Andonov, Dimislav Ivanov

Blobner, Manfred

Jungwirth, Bettina

Graessner, Martin

Suche

Publisso-Menue

Sie sind hier

Susceptibility of AutoML mortality prediction algorithms to model drift caused by the COVID pandemic

Kagerbauer, Simone Maria Ulm, Bernhard Podtschaske, Armin Horst Andonov, Dimislav Ivanov Blobner, Manfred Jungwirth, Bettina Graessner, Martin

Kagerbauer, Simone Maria

Ulm, Bernhard

Podtschaske, Armin Horst

Andonov, Dimislav Ivanov

Blobner, Manfred

Jungwirth, Bettina

Graessner, Martin