Relative scBFA performance is positively correlated with dataset size and high technical noise. a Gene detection rate as a function of the number of cells for the 14 benchmarks, when processed using either HVG or HEG selection. Group I benchmarks refers to those datasets in which scBFA is a top performer, and group II benchmarks refers to datasets in which scBFA is a poor performer. Note that each of the 14 benchmarks are represented twice (once under each of HEG and HVG selection). Additional file 1: Table S6 indicates the membership of each benchmark within group I and group II. b Same as a, except mean-dispersion trends are estimated and visualized for each of the datasets from group I and group II, under HVG and HEG selection criteria. c Difference in performance (MCC) of cell type classifiers trained on individual benchmarks and for each method, either using HVG or HEG selection. Performance is assessed through cross-validation of cell type classifiers trained on scRNA-seq data in the respective embedding spaces of each method. Performance is averaged across all number of latent dimensions tested. d The corresponding gene detection rate under the two gene selection criteria. Note HEG yields systematically higher GDR compared to HVG.