Annotating the Human Cell Atlas with HPCell: an extensible high-performance-computing grammar for omic analyses
Author(s): Stefano Mangiola,Jiayi Si
Affiliation(s): Adelaide University
Single-cell and spatial omic technologies have transformed biological research. The vast amount of data generated challenges bioinformatics pipelines and the ability of a single user to keep pace with the rapidly evolving needs of impactful data-oriented research. One option is to use static Snakemake-like workflows, which live outside R. However, these present a steep learning curve and time-consuming customisation. Alternatively, single-machine, single-task parallelisation is possible within R analysis pipelines, lowering the learning curve but limiting the scalability. _What if users could deploy familiar multitask pipe-based ( |> ) analysis pipelines to high-performance computing without learning about workflow managers?_ We introduce HPCell, an extensible, user-friendly analysis grammar for high-performance computing on omic data. HPCell is based on Target, a powerful R-native workflow manager. Being completely R-native allows HPCell to leverage the user-friendly R ecosystem and achieve complete integration with the piping system ( |> ). This results in intuitive, user-defined pipelines internally converted into massively parallel, multitask workflows. HPCell aims to empower computational biologists to analyse large-scale novel and public datasets using familiar syntax. We showed HPCell's capability with the reannotation of the Human Cell Atlas (Mangiola et al., BiorXiv 2023), and we will present the plan to use HPCell to create a Human Cell Atlas annotation/knowledge hub. Following the success of the crowd-research community initiative that powered tidyomics development (Hutchison and Keyes et al., Nat Meth 2024), we will open a discussion with the Bioconductor community to start a similar initiative for HPCell.