Importance plots (probability density distributions) of gene importance scores calculated by LeFE: smoker versus nonsmoker dataset. Shown are representative distributions for three gene categories (red curves) and their corresponding negative control gene sets (black curves). The curves were smoothed according to default settings of the 'density' function in R. The shifted secondary peaks, denoted by red arrows, for aldehyde metabolism and glutathione metabolism reflect genes important to the Random Forest models. The viral life cycle category contains no secondary peaks and therefore does not appear to be associated with smoking. See Results for further details.