Share
Title
Presenter
Authors
Institutions

BACKGROUND: Case finding in Kenya faces multiple challenges. As Kenya approaches its first 95 target, the overall positivity rate among the remaining population is below 1%. Moreover, testing volumes face downward pressure due to supply disruptions and a desire to redeploy resources. Given these trends, a team in Kenya developed and deployed a machine learning risk profiling model to maximize yield from scarce testing resources.
METHODS: Using client-level data captured by electronic medical records (EMR) from June to November 2022, we applied supervised machine learning algorithms to predict the probability clients would test positive for HIV. Although data was available from earlier periods, client behavioral variables were available from June 2022. The dataset included 167,511 test results, of which 5,718 (3.4%) were positive. Of approximately 70 variables, 30 were dropped because data was missing for most observations or because or zero variance. To address missing data among remaining variables, we generated three versions of the dataset and modeled each separately: Rules-based imputation, using mean and mode; Modeled imputation, with Multiple Imputations using Chained Equations (MICE); No imputation. For each, we added binary flags to indicate if the value was present or missing in the original record. We applied a train-validate-test split with 60-20-20 proportions. All imputation parameters were calculated based on the 60% subset alone to avoid leakage. Predictor variables included demographic data and behavioral variables, plus location-specific estimates of risk factor prevalence. We applied XGBoost, Random Forest, and AdaBoost algorithms, implemented in R, each with a comprehensive hyperparameter grid search.
RESULTS: The XGBoost model performed best among all model types (AUC 87.8). The model successfully concentrated positive tests among high-risk scores: 75% of all positive tests occurred among the 18.6% highest risk scores. Table 1 below shows comprehensive results.

Risk categoryNumber of TestsCumulative Percent of All TestsNumber of Positive TestsPositivity RateCumulative Percent of Positive Tests
Highest Risk1,8765.6%57230.5%50%
High Risk4,35018.6%2866.6%75%
Medium Risk12,68156.4%2281.8%95%
Low Risk14,594100%570.4%100%

CONCLUSIONS: The implementation of machine learning improves care providers efficiency by maximizing yield while minimizing the number of tests performed. This implementation supports Kenya’s continued progress to control the HIV epidemic in an efficient and responsive manner.

Download the e-Poster (PDF)