Skip to main content
Figure 5 | Genome Biology

Figure 5

From: The importance of study design for detecting differentially abundant features in high-throughput experiments

Figure 5

Comparison of different data normalization approaches. (a) On simulated benchmark datasets using EDDA. Note that we used normalization by the sum of counts for all non-differentially-abundant entities (non-DA) as a measure of ideal performance here. In general, Upper-quartile Normalization (UQN) improved over the Default normalization (improvement shown by the checked box) but for cases where it did not we mark its performance with a solid line. Mode normalization always improved over the performance from UQN and the Default normalization for the DAT (improvement shown by the solid box). The parameters for the various experiments include for A: PDA = (26% UP, 10% DOWN), AP = Wu et al.; B: PDA = (35% UP, 15% DOWN); C: PDA = (40% DOWN), SM = Mutinomial; and D: the same as in Figure 4b. Unless stated otherwise, common parameters include NR =3, EC =1,000, FC = Log-normal (1.5, 1), ND =500 per entity, AP = HBR, SM = Full, and SV =0.5. (b) Comparison on real datasets, highlighting the robustness of mode normalization. Shown here is the overlap fraction (used to measure robustness) of the top 500 differentially abundant genes (sorted by P value using edgeR) in comparisons involving both the original full-size libraries versus those where one of the libraries is down-sampled (5% or 10% of original size). Barplots show the average of five runs and error bars represent one standard deviation.

Back to article page