Improved characterization of medically relevant fungi in the human respiratory tract using next-generation sequencing

Background Fungi are important pathogens but challenging to enumerate using next-generation sequencing because of low absolute abundance in many samples and high levels of fungal DNA from contaminating sources. Results Here, we analyze fungal lineages present in the human airway using an improved method for contamination filtering. We use DNA quantification data, which are routinely acquired during DNA library preparation, to annotate output sequence data, and improve the identification and filtering of contaminants. We compare fungal communities and bacterial communities from healthy subjects, HIV+ subjects, and lung transplant recipients, providing a gradient of increasing lung impairment for comparison. We use deep sequencing to characterize ribosomal rRNA gene segments from fungi and bacteria in DNA extracted from bronchiolar lavage samples and oropharyngeal wash. Comparison to clinical culture data documents improved detection after applying the filtering procedure. Conclusions We find increased representation of medically relevant organisms, including Candida, Cryptococcus, and Aspergillus, in subjects with increasingly severe pulmonary and immunologic deficits. We analyze covariation of fungal and bacterial taxa, and find that oropharyngeal communities rich in Candida are also rich in mitis group Streptococci, a community pattern associated with pathogenic polymicrobial biofilms. Thus, using this approach, it is possible to characterize fungal communities in the human respiratory tract more accurately and explore their interactions with bacterial communities in health and disease. Electronic supplementary material The online version of this article (doi:10.1186/s13059-014-0487-y) contains supplementary material, which is available to authorized users.

1 Fungal diversity and community composition 1.1 Number of subjects and study groups Table 1 shows the subjects in each study group. As described in the main text and Additional file 1, subjects 3D03-3D06 were also sampled in group 3B more than a year previous to the 3D sample, using a two-scope rather than a one-scope bronchoscopy procedure.  Table 2 shows the number of each sample type appearing in the study groups. Table 3 shows the number of contamination control samples not associated with a particular subject.

Number of reads, OTUs, and fungal genera detected
Over 355 samples in the main study, we collected a median value of 798 reads per sample. A total of 1801 OTUs were formed, representing 277 fungal genera. Table 5 summarizes the taxonomic assignments for non-fungal OTUs.   return. To represent the contamination control samples, we show all contamination control samples excluding those collected from bronchoscope pre-washes (included samples are listed in Table 3). Genera represented by more than 100 reads in this sample set are shown in the heatmap. To supplement the fungal proportions shown in the main text, we produce a set of figures here showing the proportions in each study group. Figures 1-8 show the proportions fungi in each sample of the study, summarized at the genus level. In each table, genera shown are represented by more than 100 reads. Table 7 lists the median number of fungal species observed per sample in each body site, at a sampling depth of 300 reads per sample.

α-diversity of ITS samples
The Shannon diversity of fungal communities is plotted for each sample in Figure 9. Above a value of 100 reads/sample, the value is approximately constant.

β-diversity of ITS samples
To compare the composition of fungi in BAL, OW, and contamination control samples, we computed the Jaccard and Bray-Curtis distance between each pair of samples. Jaccard distance measures the percentage of OTUs in common between two samples, while Bray-Curtis distance measures the normalized difference in OTU abundances between samples. Figure 10 shows the Jaccard distances between samples, ordinated by principal coordinates analysis (PCoA). A PERMANOVA test for difference in group centroid resulted in a significant difference for OW samples vs. contamination controls (Table 8), but no significant difference in centroid for BAL vs. contamination controls (Table 9).
A PCoA plot of Bray-Curtis distance is shown in Figure 11. A PERMANOVA test of Bray-Curtis distance yielded a significant difference between OW and contamination control samples (Table 10) but not between BAL and contamination control samples (Table 11).

Reproducibility of ITS results in repeat extractions
To investigate the reproducibility of ITS sequencing results across repeat extractions from the same source material, we re-extracted, re-amplified, and re-sequenced material from 18 samples. The sample types, listed in Table 12, include bronchoscope pre-wash, oropharyngeal wash, and BAL.
The proportions of fungal genera recovered in repeat extractions are shown in Figure 12. The abundance of each genus is shown in Figure 13 after conversion to PicoGreen-corrected abundance.
For each set of replicate samples, we asked if OTUs appearing with more than 50% proportion in one replicate were present across all replicate samples.  Figure 4: Observed proportions of fungal genera in subjects undergoing bronchoscopy for a variety of purposes.  Figure 6: Observed proportions of fungal genera in contamination control samples from bronchoscope pre-wash (non-transplant subjects).     Table 6: Percentage of fungal OTUs classified at each taxonomic rank. Table 13. We followed this analysis by modeling the probability of obseriving OTUs across repeat extractions as a function of OTU proportion or PicoGreen-corrected abundance. Figure 14 shows the best fit for each predictor, using a Generalized Additive Model with cubic spline smoothing. The model parameters are listed in Table 14 and Table 15. The PicoGreen-corrected abundance model provides a better fit to the observed data.
To provide further insight on the ability of OTU proportion or PicoGreen-corrected abundance to predict the appearance of OTUs across replicate extractions, we generated an ROC curve for each predictor. The curve is plotted in Figure 15. The area under the curve (AUC) is 0.71 for proportions and 0.78 for PicoGreen-corrected abundance, indicating that the latter quantity is a better predictior for observing OTUs in repeat fungal extractions.
Scatterplots of the OTU proportions for each pair of replicate samples are given in Figures 16-33. 2 Analysis of PicoGreen-corrected OTU abundances

Genus-specific OTU abundances
We next created genus-specific contamination thresholds for some of the more common genera observed in contamination control samples, to see if they would be compatible with the global threshold derived from all genera together. Figure 34 shows the nonzero PicoGreen-corrected abundances for genera appearing in at least 10 contamination control samples. For each genus appearing in the figure, we estimated a 95% abundance threshold and computed confidence intervals using a bootstrap approach. The results are listed in Table 16. For every genus but one, the genus-specific contamination threshold was consistent with the global threshold. For Cladosporium, the genus-specific threshold was found to be about 3 times higher than the global threshold. Cladosporium was the most commonly occurring genus in contamination control samples, which allowed us to generate a relatively narrow confidence interval. Figure 35 shows the level of agreement between ITS sequencing and clinical culture results, as evaluated by Cohen's Kappa.                      OTUs absent in one sample are shown along the axis.