Conformal inference for cell type prediction leveraging the cell ontology

Conformal inference for cell type prediction leveraging the cell ontology


Author(s): Daniela Corbetta,Livio Finos,Davide Risso

Affiliation(s): Department of Statistical Sciences, University of Padova



Recently, there has been rapid advancement in single-cell RNA sequencing technologies, leading to the generation of diverse datasets. A multitude of annotated datasets are now readily accessible, providing valuable references for annotating cells in unannotated datasets originating from similar tissues. Typically, a model is chosen and trained on the reference data to predict the label of a new, unannotated cell in the query dataset. Alternatively, users can employ methods like SingleR [1], that annotates cells by computing correlations between the gene expression profiles of query cells and each reference sample. These methods commonly provide point predictions of the cell label, along with estimated probabilities or scores assigned to each cell type in the reference dataset. Relying solely on point predictions can be problematic when the corresponding estimated probability is low, leading to unreliable classification. To address this issue, we propose returning prediction sets that include multiple labels, with the set size reflecting the confidence in the point prediction. In particular, we propose a method based on conformal risk control [2] which considers the inherent relationships among cell types encoded as graph-structured constraints available through the cell ontology. The final output is a prediction set aligned with the cell ontology structure: the more unsure the point prediction, the broader the classification. This method ensures that the true label falls within the set with a user-chosen probability, regardless of the performance of the model used. By incorporating cell ontology information, our method returns prediction sets that capture the hierarchical relationships between cell types, offering more informative and interpretable predictions. We demonstrate the effectiveness of our approach through an application to real single-cell data. We illustrate its benefit in terms of interpretability compared to other standard methods. [1] Aran, D., A. P. Looney, L. Liu, E. Wu, V. Fong, A. Hsu, S. Chak, et al. (2019). “Reference-based analysis of lung single-cell sequencing reveals a transitional profibrotic macrophage.” Nat. Immunol. 20 (2): 163–72. [2] Angelopoulos, A. N., Bates, S., Fisch, A., Lei, L., & Schuster, T. (2022). “Conformal risk control.” arXiv preprint arXiv:2208.02814.