Can artificial intelligence outperform traditional regression in health utility mapping? Our research explores how Symbolic Regression (SR), an AI-driven approach that automatically discovers mathematical relationships, compares with Ordinary Least Squares (OLS) regression when mapping quality-of-life data to EQ-5D utilities.
The study aimed to evaluate whether AI-based Symbolic Regression can produce more accurate and interpretable mapping algorithms than standard regression models. Specifically, we focused on converting EORTC QLQ-C30 data into EQ-5D utility values for patients with non-small cell lung cancer (NSCLC), a common and critical task in health-economic modelling.
We combined data from three randomized controlled trials:
- Jang (2010)
- Crott (2018)
- Khan & Morris (2014)
And compared four Symbolic Regression platforms:
- TuringBot
- Mathematica® (Data Modeler®)
- GPlearn
- PySR
Each model was evaluated using R², RMSE, MAE, and model complexity to assess both accuracy and interpretability.
We found that:
- Mathematica’s Genetic Programming–based SR consistently outperformed both OLS and other SR platforms.
- Achieved up to 26% higher R², produced mean utility estimates closer to observed data, and maintained comparable variability.
- SR methods were more effective at identifying key explanatory variables, offering valuable insights into underlying health-state relationships.
Symbolic Regression, particularly via Mathematica’s Genetic Programming, offers a powerful and flexible alternative to traditional regression approaches for mapping studies.
By improving predictive accuracy and transparency, SR has the potential to reshape how utility mapping is conducted in health technology assessment (HTA).
Future research will explore Bayesian, neural network and transformer-based AI methods across a wider range of disease areas to further advance mapping science.