Novel statistical approaches to identify risk factors for soil-transmitted helminth infection

Soil-transmitted helminths (STH) are parasitic intestinal worms that infect almost a quarter of the global population. Sustainable control of STH requires understanding the complex interaction of factors contributing to transmission. Identifying STH risk factors has mainly relied on logistic regression models where the underlying assumption of independence between variables is not always satisfied. Previously demonstrated risk factors including water, sanitation and hygiene (WASH) and socioeconomic status are intrinsically linked. As are environmental factors like soil, climate and land attributes. Although many studies have investigated such associations, the same risk factors are not consistently identified. Alternative statistical techniques such as recursive partitioning and Bayesian networks can handle correlated data. There are no published studies comparing these methods with logistic regression in the context of STH infection. 

Baseline cross-sectional data from school-aged children in the (S)WASH-D for Worms study was used to compare the risk factors identified from modelling the same data using mixed-effects logistic regression, recursive partitioning and Bayesian networks. Outcomes were infection with Ascaris spp. and any hookworm species (Necator Americanus, Ancylostoma duodenale, and A. ceylanicum). 

Using logistic regression, fewer risk factors were significant overall and some were omitted due to correlation. For Ascaris spp., vegetation was identified across all techniques while for hookworm only cleaning self with water after defecating was identified across all. For both outcomes, recursive partitioning identified the most WASH and demographic risk factors, while Bayesian networks identified the most environmental risk factors. Model performance was similar across all techniques with higher sensitivity and lower specificity for Ascaris spp. compared to hookworm. Additionally, the classification trees produced from recursive partitioning visualised potentially at risk-population sub-groups. Bayesian networks could visualise relationships between variables and additionally model different scenarios where the probability of responses for a variable of interest was modified and a change in infection predicted.

This study adds to the limited body of evidence exploring alternative data modelling approaches in identifying risk factors for STH infection. Our findings suggest these approaches can provide novel insights for more robust interpretation.

About Jessica

Jessica is an Honours student at the Research School of Population Health, supervised by A/Prof. Susana Vaz Nery, A/Prof, Alice Richardson and Dr. Naomi Clarke, with support from A/Prof. Colleen Lau and Dr. Helen Mayfield. Throughout her degree, Jessica has pursued several interests from biology to science communication. She has previously completed research projects with the Research School of Biology, CRAHW, CMHR and DGH.