Interpretable and accurate prediction models for metagenomics data
Edi Prifti, Yann Chevaleyre, Blaise Hanczar, Eugeni Belda, Antoine Danchin, Karine Clément, Jean-Daniel Zucker
Received Date: 17th September 2018
Biomarker discovery using metagenomic data is becoming more prevalent for patient diagnosis, prognosis and risk evaluation. Selected groups of microbial features provide signatures that characterize host disease states such as cancer or cardio-metabolic diseases. Yet, the current predictive models stemming from machine learning still behave as black boxes. Moreover, they seldom generalize well when learned on small datasets. Here, we introduce an original approach that focuses on three models inspired by microbial ecosystem interactions: the addition, subtraction, and ratio of microbial taxon abundances. While being extremely simple, their performance is surprisingly good and compares to or is better than Random Forest, SVM or Elastic Net. Such models besides being interpretable, allow distilling biological information of the predictive core-variables. Collectively, this approach builds up both reliable and trustworthy diagnostic decisions while agreeing with societal and legal pressure that require explainable AI models in the medical domain.
Read in full at bioRxiv.
This is an abstract of a preprint hosted on an independent third party site. It has not been peer reviewed but is currently under consideration at Nature Communications.