Predictive attributes for developing long COVID: A study using machine learning 
and real-world data from primary care physicians in Germany

Kessler, Roman; Philipp, Jos; Wilfer, Joanna; Kostev, Karel

doi:10.3390/jcm12103511

Item

ITEM ACTIONSEXPORT

Add to Basket

Local TagsRelease HistoryDetailsSummary

Released

Journal Article

Predictive attributes for developing long COVID: A study using machine learning and real-world data from primary care physicians in Germany

MPS-Authors

/persons/resource/persons281520

Kessler, Roman
Max Planck Research Group Learning in Early Childhood, MPI for Human Cognitive and Brain Sciences, Max Planck Society;

External Resource

No external resources are shared

Fulltext (restricted access)

There are currently no full texts shared for your IP range.

Fulltext (public)

Kessler_2023.pdf
(Publisher version), 2MB

Supplementary Material (public)

There is no public supplementary material available

Citation

Kessler, R., Philipp, J., Wilfer, J., & Kostev, K. (2023). Predictive attributes for developing long COVID: A study using machine learning and real-world data from primary care physicians in Germany. Journal of Clinical Medicine, 12(10): 3511. doi:10.3390/jcm12103511.

Cite as: https://hdl.handle.net/21.11116/0000-000D-3E48-E

Abstract

(1) In the present study, we used data comprising patient medical histories from a panel of primary care practices in Germany to predict post-COVID-19 conditions in patients after COVID-19 diagnosis and to evaluate the relevant factors associated with these conditions using machine learning methods. (2) Methods: Data retrieved from the IQVIATM Disease Analyzer database were used. Patients with at least one COVID-19 diagnosis between January 2020 and July 2022 were selected for inclusion in the study. Age, sex, and the complete history of diagnoses and prescription data before COVID-19 infection at the respective primary care practice were extracted for each patient. A gradient boosting classifier (LGBM) was deployed. The prepared design matrix was randomly divided into train (80%) and test data (20%). After optimizing the hyperparameters of the LGBM classifier by maximizing the F2 score, model performance was evaluated using several test metrics. We calculated SHAP values to evaluate the importance of the individual features, but more importantly, to evaluate the direction of influence of each feature in our dataset, i.e., whether it is positively or negatively associated with a diagnosis of long COVID. (3) Results: In both the train and test data sets, the model showed a high recall (sensitivity) of 81% and 72% and a high specificity of 80% and 80%; this was offset, however, by a moderate precision of 8% and 7% and an F2-score of 0.28 and 0.25. The most common predictive features identified using SHAP included COVID-19 variant, physician practice, age, distinct number of diagnoses and therapies, sick days ratio, sex, vaccination rate, somatoform disorders, migraine, back pain, asthma, malaise and fatigue, as well as cough preparations. (4) Conclusions: The present exploratory study describes an initial investigation of the prediction of potential features increasing the risk of developing long COVID after COVID-19 infection by using the patient history from electronic medical records before COVID-19 infection in primary care practices in Germany using machine learning. Notably, we identified several predictive features for the development of long COVID in patient demographics and their medical histories.