Machine learning approaches classify clinical malaria outcomes based on haematological parameters

Background Malaria is still a major global health burden, with more than 3.2 billion people in 91 countries remaining at risk of the disease. Accurately distinguishing malaria from other diseases, especially uncomplicated malaria (UM) from non-malarial infections (nMI), remains a challenge. Furthermore, the success of rapid diagnostic tests (RDTs) is threatened by Pfhrp2/3 deletions and decreased sensitivity at low parasitaemia. Analysis of haematological indices can be used to support the identification of possible malaria cases for further diagnosis, especially in travellers returning from endemic areas. As a new application for precision medicine, we aimed to evaluate machine learning (ML) approaches that can accurately classify nMI, UM, and severe malaria (SM) using haematological parameters. Methods We obtained haematological data from 2,207 participants collected in Ghana: nMI (n = 978), SM (n = 526), and UM (n = 703). Six different ML approaches were tested, to select the best approach. An artificial neural network (ANN) with three hidden layers was used for multi-classification of UM, SM, and uMI. Binary classifiers were developed to further identify the parameters that can distinguish UM or SM from nMI. Local interpretable model-agnostic explanations (LIME) were used to explain the binary classifiers. Results The multi-classification model had greater than 85% training and testing accuracy to distinguish clinical malaria from nMI. To distinguish UM from nMI, our approach identified platelet counts, red blood cell (RBC) counts, lymphocyte counts, and percentages as the top classifiers of UM with 0.801 test accuracy (AUC = 0.866 and F1 score = 0.747). To distinguish SM from nMI, the classifier had a test accuracy of 0.96 (AUC = 0.983 and F1 score = 0.944) with mean platelet volume and mean cell volume being the unique classifiers of SM. Random forest was used to confirm the classifications, and it showed that platelet and RBC counts were the major classifiers of UM, regardless of possible confounders such as patient age and sampling location. Conclusion The study provides proof of concept methods that classify UM and SM from nMI, showing that the ML approach is a feasible tool for clinical decision support. In the future, ML approaches could be incorporated into clinical decision-support algorithms for the diagnosis of acute febrile illness and monitoring response to acute SM treatment particularly in endemic settings.


Background
In 2018, there were 228 million cases of malaria worldwide, 93% of which occurred in the African region [1]. Furthermore, approximately 450,000 deaths were reported, of which 61% were children under 5 years old [1]. According to WHO 2018 report, over 2.7 billion US dollars were spent towards various control and elimination efforts to address the global burden of malaria [1]. This includes over 2.74 billion doses of artemisinin-based combination therapies, procured in 2017 [1]. Unfortunately, incorrect diagnosis leads to incorrect treatment. It can increase the chances of antimalarial drug resistance, or for false negative diagnosis, it may result in misdiagnosis of malaria, inappropriate treatment, and progress to severe disease or death [2][3][4]. The gold standard for malaria diagnosis is microscopy, which requires extensive training, but rapid diagnostic tests (RDTs) have become the frontline diagnostic tools for malaria because of their ease of use at point-of-care [5].
One drawback of RDTs is the emergence of gene deletions of the target antigen, histidine-rich protein (Pfhrp2/3) in the parasite genome, which render parasites undetectable by the most common RDTs [6]. Other challenges include insufficient sensitivity to detect lowlevel parasitaemia and the number of tests which need to be performed per positive result in settings with declining or low transmission [2,6,7]. Different problems are faced in non-endemic countries, where imported malaria must be suspected as a possible cause of fever before an RDT or microscopy would be performed in the first place, and failure to identify cases at first contact with health services often results in worse clinical outcomes [8,9]. Therefore, improved and complementary malaria diagnostic techniques are required, which can overcome some or all of these limitations.
Complete blood counts (CBCs) are the most commonly performed laboratory test in most hospitals in both developing and developed countries. The CBC is usually relied upon to provide clues for the diagnosis of patients where advanced methods for detection of specific diseases are lacking, with a parameter such as decreased platelet counts often associated with severe malaria (SM) [10,11]. In addition, haemoglobin (Hb) levels are very important for the classification of SM cases [12]. Indeed, the changes in haematological parameters during clinical malaria have been studied extensively to aid in the understanding of disease pathogenesis [13][14][15][16][17][18][19]. However, the potential and diagnostic value of haematological parameters measured by commonly available automated haematology analysers has not been fully studied using unbiased approaches such as machine learning (ML) techniques. These haematological parameters have the potential to be used in differentiating clinical malaria from other febrile illnesses, especially in areas where the reliability of RDTs is challenged by the high prevalence of Pfhrp2/3 deletion mutant parasites.
ML approaches use algorithms based on statistical assumptions and mathematical rules to learn patterns and produce meaningful classifications based on the association of each variable with the disease outcome [20][21][22][23][24]. These classifications can then be applied to new disease cases to make classifications on the most probable cause. This classification capability of ML has not been extensively implemented in the diagnosis of clinical malaria. To date, only a single study has reported the use of ML to diagnose malaria using clinical history and symptoms captured verbally and visually [25]. The sample size (n = 376) was very small to deduce meaningful classifications, and the author concluded that more work would be needed [25]. Despite this, there have been far reaching studies on the application of ML in other areas of malaria research [26][27][28][29][30]. The diagnosis of malaria using ML on clinical datasets has been impaired by the lack of large data, as well as difficulty in data curation. Moreover, classical modelling is prone to over-fitting or under-fitting of data [31], but recent approaches such as imputation, encoding, centering and scaling of variables, and model optimization [24] enable augmented use of ML in malaria classification.
We hypothesized that we can classify clinical malaria and non-malarial infections (nMI) with an ML approach. We first collected and curated data from 2,207 patients including nMI (n = 978), uncomplicated malaria (UM) (n = 703), and SM (n = 526). We generated ML models to classify clinical malaria (UM and SM) from nMI using haematological parameters.

Study population and sample collection
Standards for Reporting Diagnostic Accuracy Studies (STARD) guidelines [32,33] were followed in this study. The current study utilizes unpublished data of 526 patients from a previous case-control study of SM conducted by the Navrongo Health Research Centre (NHRC) located in the Kassena-Nankana Districts (KNDs) in the Upper East Region of Northern Ghana. In the original study, children with acute febrile symptoms admitted to the Navrongo War Memorial Hospital (NWMH), the only referral facility in the KNDs, were evaluated for inclusion into the study from August to December 2002 and May 2003 to April 2004. Full details of the study procedure, inclusion criteria, and demographic and clinical characteristics of SM cases may be reviewed in Oduro et al. [34].
In brief, the inclusion criteria for SM cases were (1) all children between 6 and 59 months who had fever (or history of fever in the past 24 h) and were admitted to the NWMH, (2) residence in the Navrongo Health and Demographic Surveillance System area [34], and (3) willingness of parents/caregivers to offer informed consent. Criteria for SM diagnosis and enrollment into the original study were classified as having SM by the WHO standard guidelines that include haemoglobin < 5 g/dL or haematocrit < 15% [34,35]. Ethical approval for the SM study was obtained from the NHRC Institutional Review Board (IRB), Noguchi Memorial Institute of Medical Research (NMIMR) IRB, Naval Medical Research Center IRB, and Ghana Health Service (GHS) Ethics Review Committee (ERC). Informed consent was obtained and documented, followed by administration of a questionnaire about the presenting symptoms and clinical examinations. Participants who did not consent and meet the study inclusion criteria and those who had reported taking antimalarial treatment in the past 2 weeks were excluded from the study, while those who turned out to be malaria negative by standard microscopy were withdrawn from the study. All study samples were taken prior to initiation of treatment except for samples taken for clinical monitoring during admission or for followup after discharge from the hospital.
The nMI and UM participants were recruited in a hospital-based cross-sectional study involving two hospitals: Kintampo North-Municipal Hospital, Kintampo, and Ledzokuku Krowor Municipal Assembly Hospital (LEKMA), Accra. The inclusion criteria were (1) outpatient children 1-15 years old, (2) presenting with fever or history of fever in the past 24 h or axillary temperature ≥ 38°C, (3) and (4) signed informed consent by self (adolescents) and parent/guardian. The exclusion criterion was participants with known chronic disease or history of antimalarial drug use in the past 2 weeks. Ethical approval was also obtained from NMIMR-IRB, GHS-ERC, and Kintampo Health Research Centre (KHRC) ERC. A case was defined as nMI if the individual presenting to the hospital was malaria negative by either RDTs, Taqman array, or microscopy. Clinical data such as age, sex, and body temperature and symptoms such as fever were collected on recruitment.

Sample collection procedure
Venous blood was collected in the ante-cubital fossa. Tourniquet was not applied beyond 1 min during venesection to avoid haemo-concentration, which could give erroneous results for all parameters measured. Samples were taken mostly between 8 am and 12 pm to avoid variations due to individuals' activity (such as rehydration and food intake). Samples (5 mL) were taken into K3 EDTA tubes (BD Vacutainer; Becton Dickinson, NJ, USA). Samples that could not get analysed within 2 h from the time of collection were stabilized at 2-8°C to avoid changes that could occur in some haematological parameters should the sample be left on the bench for more than 3 h. Samples were analysed not later than 24 h from the time of sample storage at 2-8°C. No capillary blood sample was taken during the study as it presents with subtle variations from venous blood parameters. CBC analysis was performed using the automated ABX Micros 60 haematology analyser, which measures white blood cell parameters, red blood parameters, and platelet parameters (Additional file 1: Table  S1). Data were manually cross-referenced twice for accuracy to ensure consistency in sample collection procedures.

Statistical classifier: median split
Kernel density estimation, which is a non-parametric technique, was used to estimate the probability density function of each haematological parameter and kernel distribution for each parameter between nMI, UM, and SM and visualized using density plots in R (R version 4.0.2). The median value within each diagnostic group (nMI, UM, and SM) was computed, and the mean of any two group medians was used for 'median split' to generate a dichotomous variable for each parameter (low and high levels representing below and above median, respectively) [36]. Contingency tables were used to summarize the relationship between clinical diagnosis (nMI, UM, and SM), and each dichotomous parameter. The generalized linear models for predictive analysis were used to explain the relationship between the clinical diagnosis and the dichotomous parameter. Odds ratios were computed through the exponent of the regression coefficients (logits) to estimate the strength of the relationship. Any OR with 95% confidence interval (CI) that includes a null value (1.0) indicated that the parameter was not significantly associated with clinical diagnosis. ANOVA was used to compare the model with the null model and chi-square test used to compute the degree of significance. All the analyses were done in Rsoftware (R version 4.0.2).

Data pre-processing and normalization
A multivariate imputation via chained equations (MICE) plot was used to visualize the missing observations in the data. It was difficult to determine whether the missing values were missing 'completely at random' or 'missing at random' or 'not at random' to enable selection of the imputation method. Therefore, the demographic/ clinical data and microscopy results were not imputed and were not used for modelling. The majority of the haematological parameters had less than 5% missing data, and the missing values were imputed using MICE package in R. Each variable in the training and test data was transformed using the Yeo-Johnson function, centred to have a mean of 0, and scaled to have a standard deviation of 1. The original dataset (before preprocessing and normalization) is available in Additional file 2: Table S2.

Machine learning
Six ML algorithms were evaluated to identify the best algorithm that can classify the binary data. These include partial least squares (PLS) logistic regression, multiple adaptive regression splines (MARS), random forest, decision trees, support vector machine, and artificial neural networks (ANN). PLS logistic regression was implemented by reducing the dimension of haematological parameters so as to increase accuracy. We used 10-fold cross-validation while tuning through 16 principal components (PC), whereby the optimal model used 2 PC. The optimal hyperparameters for MARS (with crossvalidation) were determined in a grid search of 30 different combinations of 3rd degree and sampling 1000 terms to retain the final model [37]. Decision tree was implemented with the rpart function, which performs auto-tuning with an optimal subtree of 10 total tree splits. Random forest and support vector machines were implemented by first performing a grid search to identify the optimal hyperparameters followed by classification analysis. Three ANN were developed, one multiclassification ANN (nMI vs UM vs SM) and two binary classifications denoted as ANN (UM and nMI) and ANN (SM and nMI). For each ML model, the data were split into 80% training and 20% testing. The outcome was the clinical diagnosis of the participant (as concluded by the attending clinicians) having either UM or nMI or SM. Haemoglobin and haematocrit levels were not included in the modelling because they are used to support the diagnosis of malaria [10,19,35,38].

Hyperparameter tuning for artificial neural networks
The ANN was composed of an input layer of 15 haematological parameters. The loss was computed using categorical cross-entropy for the multi-classifier and binary cross-entropy for binary classifiers, while accuracy was used as the main evaluation metric. During training, the 80% training data was further split into 70% training and 30% validation with randomization ( Fig. 1). Tensor board visualizations were used to check the dynamic graphs of our training and test metrics. Hyperparameters were tuned to identify the optimal model parameters for each classification. A hyper-grid was developed that adjusts the model capacity, normalization term, kernel regularization, and learning rate. To maximize the validation error performance, we tuned 12, 32, 64, 128, 256, and 512 rectified linear units (ReLU) in three hidden layers. We used batch normalization on each hidden layer for gradient propagation and performance improvement. We varied the dropout rate from 0.1, 0.2, 0.3, and 0.4 in all the three layers to identify the best dropout regularization that prevents the model from latching to happenstance patterns that are not significant. We used 'Adam' as the optimizer, but we varied the learning rate (0.1, 0.05, 0.001, and 0.0001) to find a global minimum. The tfruns R package was used to implement the hyper-grid in R-software, using 500 epochs, batch size of 64, and validation split of 0.3. These Keras models were initialized for all the three classifications to select the optimal model.

Model evaluations
Yardstick package was used to perform classifications on the test data as well as compute the performance of the model. The confusion matrix, accuracy, area under the receiver operating characteristic curve (AUC), precision and recall, and F1 score were the metrics used to evaluate performance. The F1 score is a measure of test data accuracy, which is a weighted average between precision and recall. To explain the model, we used local interpretable model-agnostic explanations (LIME Package in R) [39]. The classification model was set up, and an 'explainer' of the classifying model was initiated using the training data and the model output classifications. The explainer was used to explain the results of the test dataset as classification explanations (feature weights). The feature weights were used to build a heatmap for each ANN indicating how each feature explains the model.

Effect of patient age and sampling location on the model predictions
To test if patient age and sampling location significantly affected the models, we used three models: (1) a model for all the UM and nMI cases (n = 1681), (2) a model for UM and nMI from Kintampo cases only (n = 726), and (3) a model for only Kintampo cases and ages under 4 years (n = 416). We tested the possibility of using the ANN to evaluate the models but there was some level of over-fitting and under-fitting of the 2nd and 3rd models, due to sample size limitation. Therefore, random forest was subsequently used, because of (1) its robustness to smaller sample size with minimal over-fitting of the data and (2) its ability to reduce the high variance from decision trees by combining several trees into one ensemble tree [40].

Statistical analysis
The clinical categorical data was analysed using Pearson's chi-square while the continuous data such as the haematological parameters were analysed using the Kruskal-Wallis test with Dunn's post hoc tests across the three groups (UM, SM, and nMI). All tests were two sided, and statistical significance was set at P < 0.05 for all analyses with adjustment for multiple testing. Data analyses were performed using R-software (R version 4.0.2), R-studio (version 1.1), and Python (version 2.7). The R codes with the methods, including the curated data files, can be found on github: https://github.com/ misita-falcon/Machine-Learning-in-Clinical-Malaria.

Haematological parameters vary between nMI, UM, and SM
Median values for all the haematological parameters were significantly different among nMI, UM, and SM (P < 0.001) ( Table 2), but most of the parameters do not show distinct distributions between the different clinical diagnosis groups (Fig. 3). More so, Dunn's post hoc tests indicated that platelet distribution width, percentage neutrophils, and percentage lymphocytes were not significantly different between the nMI and SM (Table 2). Similarly, the pairwise comparisons showed that mean cell volume, neutrophil count, and mean platelet volume were not significantly different between nMI and UM ( Table 2). Despite the statistical test, we hypothesized that the median differences for each parameter cannot be used to confidently classify the disease outcomes.
To further confirm this hypothesis, the median was used to split the variables into categorical variables (low and high levels). The relationship or predictive value of the categorical parameters to accurately classify the clinical diagnosis was determined using contingency tables (Additional file 2: Table S3). The percentage number of individuals who had low levels of each parameter and were classified with nMI ranged from 29 to 70% (UM group) and 7 to 82% (SM group) (Fig. 4a). Comparatively, the percentage of individuals who had low levels of each parameter and were classified with UM ranged between 30 and 71%, while the percentage of individuals who were classified with SM ranged between 17 and 91% (Fig. 4b). There were similar trends for the percentage number of individuals who had high levels of each  parameter and were classified with either nMI, UM, or SM (Fig. 4c, d).
Additionally, we determined whether the levels could predict whether an individual has UM or SM using odds ratios. First, we predicted UM, and majority of the parameters were associated with clinical diagnosis of UM and nMI (P < 0.001), except mean cell volume, lymphocyte percentage, mixed cell counts, and neutrophil counts (Additional file 1: Table S4). The parameters that were not associated for nMI-SM category were lymphocyte counts, mean corpuscular Hb, lymphocyte percentage, mixed cell counts, and neutrophil counts (Additional file 1: Table S4). Furthermore, some of the haematological parameters had a 95% confidence interval that included the null value (1) when evaluating the odds ratios, which signifies that they are not significantly associated with clinical diagnosis (Additional file 1: Table S4).

Machine learning attained over 77.7% accuracy in classifying clinical malaria from nMI
Since there is no clear distinction between the distributions and the inability of the median-based categories to clearly classify the participant's clinical diagnosis, we sought to evaluate six ML approaches to classify clinical (See figure on previous page.) Fig. 2 Clinical manifestations using laboratory diagnosis compared to various suspected infections by clinicians. Blood, urine, and stool samples were collected from majority of the individuals who were categorized as nMI. Cultures of either blood, urine, or stool were performed, depending on the clinician's request and the suspected illness. The suspected organisms were categorized as either bacteria, viral, fungi, and protozoan or a combination of bacteria/protozoan, fungi/protozoan, viral/protozoan, and viral/bacteria. Laboratory results confirmed only 4% of the cases with the majority being undetermined/not available/not known (96%, 937/978). The major organisms determined to be present include dengue virus, Staphylococcus aureus, Salmonella typhi, Streptococcus pneumonia, and Coxiella burnetii. a shows the absolute counts of each diagnosed organism coloured by the suspected organisms while b shows the proportion of each diagnosed organism coloured by the suspected organism. HIV stands for Human immunodeficiency virus, URTI for upper respiratory tract infection, Pf for Plasmodium falciparum and SPP for species Table 2 Comparison of median and interquartile ranges in haematology values measured in nMI, UM, and SM cases    Table  S5). The SVM approach and the ANN generated the overall best classification outcome. Hyperparameter tuning for the ANN (n = 55,290 combinations) showed that the optimal model for multiclassification had 0.862 training accuracy with a model capacity of 3 layers (128, 64, and 16), with dropouts of 0.4 for layer 1, 0.3 for layer 2, and 0.4 for layer 3, and learning rate of 0.001 (as represented in Additional file 1: Fig. S2). The optimal model (n = 55,290 combinations) for ANN (nMI vs SM) with 0.985 accuracy had a model capacity of 3 layers (16, 128, and 256 RELU units, respectively), the dropout rate was 0.2 and 0.4 for the first two layers and the last layer had 0.1, and a learning rate of 0.0001. The optimal model (n = 55,290 combinations) for ANN (nMI vs UM) with 0.856 training accuracy had a model capacity of 3 hidden layers of 256, 64, and 16 RELU units, respectively; the dropout rate was 0.1 for the first and last layer and 0.3 for the second layer, and a learning rate of 0.0001. Training and validation history plots for the ANN showed good levelling off for accuracy and loss, as well as acceptable divergence between training loss/accuracy and validation loss/accuracy for all the three models (Additional file 1: Fig. S3).
Also, the history plots suggest that there was near zero over-fitting or under-fitting of the data as indicated by closeness of the training and validation curves (Additional file 1: Fig. S3). The ANN (UM vs. nMI) achieved 0.856 training accuracy and 0.842 validation accuracy, while the testing accuracy of the model was 0.801 (kappa 0.583) ( Table 3). The training and testing accuracies demonstrate the confidence of the networks in classifying UM. The ANN (SM vs nMI) achieved a higher accuracy (≥ 0.96) for training, validation, and testing accuracy (Table 3). Both ANN had an F1 score of above 0.747, which means the model can be used for the Fig. 4 Non-symmetrical predictive values of clinical diagnosis using median split (high vs low levels) of each haematological parameter. A 'median split' was used to divide each quantitative parameter into categorical variables by the median value (calculated as a mean of nMI and UM or SM median value shown in Table 2). The predictive values are calculated from contingency tables (Additional file 2: Table S3). a The percentage predictive value in predicting nMI from low levels. b Percentage predictive value in predicting SM or UM using the low levels. c Percentage predictive value in predicting nMI using high levels. d Predictive values of UM or SM using high levels classification of clinical malaria (Table 3). Since the binary classifiers had the best performance, we also performed a multi-classification analysis to assess the ability of the ANN to differentiate among UM, SM, and nMI. The data available for the multi-classification model was 2,207 samples, which were split to 80% training (n = 1,766) and 20% testing (n = 441). The training data was further split to 70% (n = 1,236) training and 30% (n = 530) cross-validation with accuracies of 0.862 and 0.828, respectively. The test accuracy was 0.853; kappa, precision, recall, and F1 score of the model was > 0.768 ( Table 3). The accuracy of multi-classification model provides confidence in the binary classifications.

Diagnostic value of the models using ROC curves
Having shown the accuracy of the models, we determined the ROC curves of ANN (UM vs nMI) and ANN (SM vs nMI) to show the diagnostic ability of these binary classifiers. Both classifiers had very good performance with an AUC of 0.866 for ANN (UM vs nMI) and AUC of 0.983 for ANN (SM vs nMI) (Fig. 5 and Table 3). This showed that the models could be used to distinguish individuals with SM or UM from those with nMI. The cut-offs for UM show that there is a trade-off in sensitivity and specificity as the cut-off increases or decreases, which is not the case for SM. These results could frame the clinical utility of the models and provide a benchmark for future studies.

Platelet and RBC counts classify clinical malaria from nonmalaria infections
The models were investigated to identify which haematological parameters were classified to be important for either SM, UM, or nMI using local interpretable modelagnostic explanations (LIME). Case by case analysis of the individuals showed that some haematological parameters are important classifiers of UM (Additional file 1: Fig. S4). Case by case analysis was merged into heatmap to generate a consolidated picture of useful parameters for classification (Fig. 6). The top three parameters that had low feature weights for UM are platelet counts, RBC counts, and lymphocyte percentages (Fig. 6a). Based on the order of importance, the top three parameters that were important for SM classification include RBC counts, platelet counts, and mean platelet volume (Fig. 6b). This shows that both platelet and RBC counts are important parameters for clinical malaria while the lymphocyte percentages were unique for UM. These parameters might be used to classify clinical malaria cases from nMI, with a very good diagnostic ability as shown by the ROC analysis (Fig. 5).

Patient age and sampling location do not affect the model classifications
We further tested if the models are agnostic to age and location variance. There was a significant difference in patient age between nMI and UM (P < 0.001), but there  Platelet and RBC counts identified as classifiers of both UM and SM. The Keras model was explained using local interpretable modelagnostic explanations (LIME Package in R-software). The explainer results of the test data, which are represented as feature weights, were extracted from the explainer and used to plot the heatmaps to show a consolidated picture of the importance of each haematological parameter. The weights that are < − 0.1 indicate that they are low during UM or SM. a The heatmap shows that platelet, RBC, and lymphocyte percentages/counts can classify UM and b shows the haematological parameters that can classify SM, and they include RBC counts, platelet counts, mean platelet volume, and mean cell volume was no significant difference in samples within Kintampo as well as children under the age of 4 years (Fig. 7a, c, e; Additional file 1: Fig. S5 & S6). The performance accuracy of the random forest models was 0.806, 0.767, and 0.768 for models 1, 2, and 3, respectively (Fig. 7b, d, f). The most important parameters that were featured across the three models were platelet and RBC counts, which are similar to the top two parameters identified by the ANN. Therefore, the data illustrates that age and location do not affect model classifications, and the platelet or RBC counts determined by ANN can be used to reliably classify clinical malaria from nMI in these datasets.

Discussion
Automated CBC is one of the blood tests routinely performed for children presenting to health facilities with fever. However, CBC analysis generates a significant amount of data on a range of haematological parameters, and the data is underutilized with only Hb and Hct levels being routinely used as an indicator of clinical malaria. Thus, an automated algorithm to detect malaria based on the haematological parameters as outlined in this study could have great value as a complementary malaria diagnostic strategy, particularly at frontline health centres where CBC is routinely performed. Such an algorithm also has the added value of enabling the monitoring of treatment outcomes for in-patients.
In malaria-endemic settings, malaria rapid diagnostic tests (mRDTs) have revolutionized diagnosis and significantly reduced presumptive treatment, particularly in rural settings where trained microscopists are lacking [3]. However, reports of emerging Pfhrp2/3 gene deletions threaten the future reliability of the RDTs. False negative RDT results are also known to occur in lowdensity infections [2,6,7]. Thus, an approach that is automated and agnostic to parasite genetic variation is critical both as a fail-safe and a surveillance strategy for false negative mRDTs (which might occur due to supply chain mismanagement or gene variation) [6]. In very low transmission settings, ML models have the potential to replace the primary use of mRDTs when the diagnostic yield of mRDTs becomes very low (i.e. many mRDTs needed to detect a single case of malaria). In non-endemic settings where malaria may occur in immigrants and non-immune travellers, the  Fig. 7 Classification of haematological parameters using random forest shows that patient age and sampling location do not affect the ML models. Three models were generated: a a model for all the UM and nMI cases (n = 1681), which show a significant difference in patient age, while b shows the impurity-based measurement of the feature importance of the model; c a model for UM and nMI from Kintampo cases only (n = 756), which do not show any significant difference between the patient age, and d shows the feature importance of the model; and e a model for only Kintampo cases and ages under 4 years, whereby there was no significant difference between the nMI and UM (n = 416) and f shows the feature importance of the model. The samples for each model were split 80% for training and 20% for testing. The accuracy of the models was 0.806, 0.767, and 0.768, respectively. The most important feature across the three models was platelet and RBC counts models may allow another fail-safe mechanism in case the diagnosis of malaria was not suspected by clinicians and malaria RDT or microscopy was not performed. Despite these advantages, there would be a little extra cost associated with incorporating the algorithm and an automated message into haematology analyser output, a message that can prompt clinicians to consider malaria in the presence of suggestive haematological features. Previous ML studies have looked into haematological parameters more generally and to classify sickle cell anaemia using deep convolutional networks [41,42], but did not classify clinical diagnosis. For the first time, ML approaches that can classify infections in children based on haematological parameters have been generated. Six different ML methods were evaluated, and they were all shown to classify clinical malaria from nMI with high accuracy especially the SVM and the ANN. We used the ANN to deconvolute the results: it identified platelet and RBC counts as the top features in classifying both UM or SM from nMI. Low RBC counts can be attributed to extensively parasitized RBCs, which are sequestered during SM [43]. This highlights the significance of RBC counts during Plasmodium falciparum malaria infections. In most occasions except cerebral malaria, SM is associated with anaemia due to RBC lysis during parasite invasion as well as many other RBC abnormalities [44]. This makes the diagnosis of SM much easier than UM, whereby one parameter, such as Hb level of < 5 g/dL, can diagnose or classify the disease.
Cohen et al. analysed data from 680,964 individuals with fever and confirmed that majority of antimalarial drugs are given to malaria-negative individuals [45]. Overtreatment indicates that most nMI can go without being treated, for their true cause, which is also possible for UM and this can lead to drug resistance. Therefore, the difference between febrile outpatient infections is far more challenging, especially between nMI and UM due to similarity in clinical presentations. In large population studies, values of studied metrics can be significant but they do not necessarily distinguish the populations as either nMI or UM as observed in this study. But, using the ML approach shown here, distinguishing the nMI and UM can be improved by combining all haematological parameters and learning the data-patterns before making classifications. The predictions made by ML are more accurate and reliable and can be improved by analysing more datasets. Lymphocyte counts/percentage were identified to be affected during UM and can be used to distinguish UM from nMI, mainly because individuals with malaria generally have a distinct immune response compared to nMI individuals [27,46,47].
Previous work in our laboratory showed differences in haematological presentation among areas of varying transmission intensity in Ghana [48]. To show that differences in age and transmission zones (sampling location) are not driving our diagnostic classifications, we down-sampled the data and used random forest to perform the classifications. The results showed that platelet and RBC counts were the key features in classifying UM and nMI regardless of age and sampling location of the participants. There were differences in the top three important features between the random forest and ANN, but this could be due to the differences in the approach of each algorithm [23,49]. This illustrates that patient age and location do not substantially influence the diagnostic classifications in this study. The ROC curves further showed that the models could be used for diagnosis with very reliable AUC values.
There are limitations to be considered in the use of this ML approach in routine diagnosis and the generalization of our approach. First, the models can distinguish between nMI and clinical malaria, but whether they can be used to distinguish the clinical disease state will depend on the pre-test probability or prevalence of malaria in different endemic settings. Second, all study subjects being Ghanaian children may limit the generalizability of the models to other countries; this is also the case for the limited range of SM manifestations in our dataset and the spectrum of laboratory-confirmed nMI. The few nMI cases that were clearly diagnosed and still grouped/retained as nMI may also present minimal bias to the models. Lastly, the study did not have adults > 15 years to comparatively understand the role of age in differentiating clinical malaria based on haematological parameters. Therefore, we recommend that more studies are needed to inform the broader utility of this work. Despite that only 4.6% (75/1645) of the cases were discordant between microscopy and RDT, probably due to hrp2/hrp3 deletions, although there is an insignificant chance that misclassification of malaria could have had an impact on our study. These limitations will be taken into account for further studies to inform the broader clinical utility of this work.

Conclusions
Fever is the most common symptom reported in sSA, and correct diagnosis of the implicated pathogen is of high importance for precision medicine. Personalized treatment reduces overtreatment, decreases malaria mortality and antimalarial resistance. This report demonstrates proof-of-principle that ML can be used to distinguish clinical malaria from nMI using routine haematological data. Case by case analysis showed that the models can make classifications based on combination of three parameters: platelet and RBC counts, lymphocyte counts/percentage, and mean platelet volume. These could be used for precision diagnosis of an individual's risk of having malaria, to inform the need for confirmatory diagnosis by microscopy or to prompt investigation for other diagnoses when malaria is unlikely. Further work is to calibrate and improve the classification capability of the model using more data from other geographical and transmission settings, demographic groups, coinfections, and different disease severities. Our findings hold promise for the design of clinical software to support the diagnosis of malaria in the WHO African region and might also prove useful for the diagnosis of malaria in returning travellers from nonendemic countries.
Additional file 1: Table S1. The list of haematological parameters adopted from laboratory procedure manual by the CDC [50]. Table S4. The odds ratio of median categories providing the odd of being diagnosed with either nMI, UM, and SM. The median categories were; low and high levels. Table S5. Performance evaluation of six machine learning models to classify clinical malaria outcomes. Fig. S1. Word cloud of clinical manifestations using clinicians/doctors notes or suspected infections. Fig. S2. Artificial Neural Network Schematic. Fig. S3. Plot for the training and validation history of the ANN. Fig. S4. Case by case analysis of the classification capability of the ML models. Fig. S5. Density estimates of the haematological parameters between nMI and UM cases for sub-sampled data from Kintampo only. Fig. S6. Density estimates of the haematological parameters between nMI, and UM cases for subsampled data from Kintampo only, as well limit of children under 4 years of age.
Additional file 2: Table S2. Clinical and raw haematological data of study participants. Table S3. The median splits predictive values on the clinical diagnostic categories (nMI, UM, and SM) of the study participants.