Comparison of LEfSe and the KW test alone for false positive and negative rates in synthetic data. Both tests used α = 0.05 in all cases, and the three artificial datasets comprise 100 samples, each in two classes, each with two subclasses of cardinality 25. The samples consist of 1,000 synthetic features taking the place of microbial taxa, pathways, and so on; half are negative (not biomarkers) and the other half positive. (a) LEfSe and KW false positive and negative rates at increasing values of the difference between class means. Negative features are normally distributed with parameters (μ = 10,000, σ = 100) across classes; positive features contain classes with increasingly different means. (b) Performance as standard deviation varies within classes (rather than the difference between means, fixed at 2,000). (c) Performance as standard deviation increases within inconsistent subclasses. Negative features have subclasses sampled from the same normal distribution (and thus not representing consistent biomarkers). Positive features are distributed as in (b). In all cases, LEfSe sacrifices a small number of false negatives in order to achieve a false positive rate near zero, with the goal of ensuring that biomarkers of large effect size will be both reproducible and biologically interpretable.