Benchmark of single-cell batch correction methods available in the R and Python ecosystems.

Benchmark of single-cell batch correction methods available in the R and Python ecosystems.


Author(s): Elena Zuin,Chiara Romualdi,Davide Risso,Gabriele Sales

Affiliation(s): Department of Biology, University of Padova, Italy



Single-cell datasets often include samples collected from multiple laboratories and conditions, leading to complex batch effects. This unwanted technical variation overlaps with biological effects of interest and confuses downstream analyses. A key challenge in the study of single-cell data is to correctly align various datasets while preserving biological variations. The term ‘data integration’ has been coined to describe this type of batch effect removal between samples from independent donors. Existing methods are based on different mathematical approaches and unfortunately produce highly dissimilar results. To gain a deeper understanding of their strengths and weaknesses, we performed a benchmark study using a selection of the most popular and novel approaches available in the R and Python ecosystems. We evaluated the performances of these methods using metrics such as NMI and ARI, which gauge their ability to remove batch effects while preserving biological signals. We also measured their computational efficiency. Lastly, we created an R package wrapping all tested methods and metrics to facilitate further studies.