Enhancing Robustness in Differential Abundance Testing for Microbiome Data Analysis through Consensus-Based Approach

Enhancing Robustness in Differential Abundance Testing for Microbiome Data Analysis through Consensus-Based Approach


Author(s): Francesc Català Moll,Marc Noguera-Julian,Roger Paredes

Affiliation(s): IrsiCaixa AIDS Research Institute, Hospital Universitari Germans Trias i Pujol, Campus Can Ruti, Badalona, Spain



Introduction: The task of Differential Abundance (DA) testing in microbiome data poses significant challenges for both parametric and non-parametric statistical methods due to the data’s sparsity, high variability, and compositional nature. Microbiome-specific statistical methods often resort to classical distribution models or consider compositional specifics. However, these approaches yield results that fluctuate within the specificity versus sensitivity space, making it difficult to accurately determine type I and type II errors in real microbiome data when a single method is employed. Results: To enhance the robustness and reproducibility of DA testing in microbiome data, we have developed the ‘dar’ R package, available at GitHub. The ‘dar’ package facilitates automatic statistical testing under various data distribution assumptions and theoretical frameworks in a fully customizable manner, utilizing state-of-the-art methods such as ANCOM, DESeq2, and Lefse. The ‘dar’ package can generate consensus results under a majority-vote mechanism, which can be tailored by the user by assigning weights to each method in the consensus. Additionally, the package provides the functionality to export and import the parameters for each test and the characteristics of the consensus strategy into structured text files, enhancing the shareability and reproducibility of results. The ‘dar’ package was evaluated on the metaHIV dataset using all available methods with default parameters, identifying a total of 24 differentially abundant microbial features. Notably, ‘dar’ successfully reproduced results validated by the original article by detecting species of the Bacteroides and Prevotella genus, which play a crucial role in the stratification of men who have sex with men (MSM) and non-MSM. Conclusion: The ‘dar’ package, designed for differential abundance testing in microbiome data, enhances the robustness and reproducibility of results by facilitating automatic statistical testing under various distribution assumptions and theoretical frameworks. It employs state-of-the-art methods and can generate consensus results through a user-customizable majority-vote mechanism. The package’s successful testing on the metaHIV dataset and its ability to reproduce validated results underscore its effectiveness.