Genomic DNA for H. sapiens NA18507 (Yoruban) was prepared by the Coriell Institute. E. coli CC118 (MC1000 (araD139 Δ(ara leu)7697 ΔlacX74 phoAΔ20 galE galK thi rpsE rpoB argE
recA1)) , genomic DNA, and CRW10 and PA1 phage genomic DNA were extracted using Qiagen (Valencia, CA, USA) MidiPrep buffers P1, P2, P3 and were cleaned by phenol, purified from low melting point agarose and dissolved in Tris/EDTA (TE) buffer. YH1 genomic DNA was extracted from a lymphoblastoid cell line of Yanhuang  using protein K and phenol/chloroform  and further subjected to RNase treatment/purification. The D. melanogaster genomic DNA was extracted from whole bodies of several individuals by Puregene blood core kit B (Qiagen). The molecular weights of both Drosophila and YH1 genomic DNAs were confirmed to be larger than 23 kb by gel electrophoresis (not shown), with no detection of degradation or RNA/protein contamination, and quantified by Quant-iT dsDNA HS assay kit (0.2-100 ng; Invitrogen, Q32854, Carlsbad, CA, USA); of each, an aliquot was diluted to 25 ng/μl for use in library construction. P. aeruginosa PAO1 strains were selected for tobramycin resistance at 16 mg/l (41 strains), ciprofloxacin resistance at 4 mg/l (47 strains), or no antibiotic resistance (8 strains). DNA was then isolated using the Wizard® SV 96 genomic DNA purification system (Promega, Madison, WI, USA). Concentrations of isolated DNA were measured using a Nanodrop (Thermo Scientific, Waltham, MA, USA).
Shearing was performed on two 5-μg aliquots of E. coli CC118 genomic DNA brought up to 200 μl with TE. Sonication was performed by two 15-min treatments with a Bioruptor sonicator (Wolf Laboratories, Pocklington, UK) at maximum settings in a cold room, switching out the water between treatments to keep samples cool. After sonication, each sample was cleaned up using a QIAquick PCR purification kit, eluting in 30 μl Buffer EB (Qiagen). CRW10 and PA1 phage DNA was fragmented using 5 μg aliquots brought to 100 μl with TE and mixed with 500 μl nebulization buffer (Roche/454, Branford, CT, USA) in a nebulizer cup. Nebulization was carried out using 45 psi (310 kPa) nitrogen for 1 min on ice. The sheared DNA was cleaned and concentrated using MinElute columns and eluting in Buffer EB (Qiagen).
Enzymatic fragmentation was performed on six 1 μg aliquots of genomic DNA from both E. coli CC118 and H. sapiens NA18507 added to 2 μl 10× fragmentation reaction buffer, 0.2 μl 100× BSA (NEBiolabs, Ipswich, MA, USA), 2 μl NEB fragmentase enzyme, and nuclease-free water (Ambion, Austin, TX, USA) to 20 μl. Reactions were gently vortexed and spun-down, then incubated on ice for 5 min followed by a time-course incubation at 37°C, removing samples at 15, 20, 30, 45, 60, and 120 min for each of the two sets. Reactions were stopped by placing on ice followed by purification using a QIAquick PCR purification kit, eluting in 30 μl buffer EB (Qiagen). 4 μl of each sample was run on a Novex TBE gel (Invitrogen) to observe size distribution. The 60-min time point showed the desired size range and duplicate samples were prepared by the same method for each organism. Fragmentation of phage DNAs were performed using 1 μg aliquots of PA1 and CRW10 in a 20 μl reaction volume including fragmentation reaction buffer to 1×, BSA, and 2 μl NEB fragmentase enzyme (NEB). The reaction was incubated on ice for 5 min followed by incubation at 37°C for 20 min and stopped with 5 μl 500 mM EDTA and cleaned using MinElute columns (Qiagen).
Post-fragmentation library preparation
Post-fragmentation library preparation on the duplicate samples of E. coli CC118 sonication and fragmentase (60 min), and H. sapiens NA18507 fragmentase (60 min), was carried out as per standard Illumina methods, including a size selection using Novex TBE gels (Invitrogen), excising the 400-500 bp band. Final PCR amplification was carried out on a Bio-Rad (Hercules, CA, USA) MiniOpticon using SYBR Green I as a dye to monitor amplification. 1 μl of each final library was run on a Novex TBE gel (Invitrogen) for library size confirmation. Nebulized or fragmentase-treated phage samples were size-selected using SPRI beads (Beckman Coulter, Danvers, MA, USA) and used to construct libraries according to standard protocols, including end-polishing, adaptor ligation, fill-in, and single-strand isolation. Adaptor sequences included multiplex identifiers (barcodes).
Transposase-based library preparation
Transposase-based library preparation for E. coli CC118 and H. sapiens NA18507 Illumina-compatible libraries used 50 ng of genomic DNA brought up to 15 μl in nuclease-free water (Ambion) followed by the addition of 4 μl 5× LMW Nextera reaction buffer and 1 μl Nextera enzyme mix (Illumina-compatible; Epicentre), followed by a gentle vortex and brief centrifugation. Each reaction tube was incubated at 55°C in a thermocycler with a heated lid for 5 min followed by placement on ice and immediate purification using a QIAquick PCR purification kit and elution in 20 μl buffer EB (Qiagen). Suppression PCR was then carried out using 10 μl of the eluate as template with 11.5 μl nuclease-free water (Ambion), 25 μl 2× Nextera PCR buffer, 0.5 μl SYBR Green, 1 μl 50× Nextera primer cocktail (Illumina-compatible), 1 μl 50× Nextera adaptor 2 (barcodes 1-2 for E. coli and 3-4 for H. sapiens), and 1 μl Nextera PCR enzyme. The reaction was cycled in a Bio-Rad MiniOpticon to monitor the reaction under the following conditions: (1×) 3:00 min at 72°C and (1×) 0:30 min at 95°C, followed by 13 cycles of [0:10 min at 95°C, 0:30 min at 62°C, 3:00 min at 72°C] for E. coli barcodes 1 and 2 and H. sapiens barcodes 3 and 4 (12 cycles for H. sapiens barcode 4). 1 μl of each post-PCR library was electrophoresed through a Novex TBE gel (Invitrogen) for library size confirmation. Size selection of the E. coli CC118 transposase libraries was carried out at Epicentre Biotechnology using Agencourt AMPure (> 300 bp size selection), Zymo DNA purification (no size selection), or Caliper (350 ± 10% bp size selection) methods. The D. melanogaster library was constructed by pooling two standard Nextera reactions following the manufacturer's protocol (Epicentre). For each reaction, 50 ng genomic DNA was initially tagmented (in vitro transposase-catalyzed adaptor insertion) at 55°C for 5 min, and then MinElute purified. This was followed by PCR amplification with same conditions as with H. sapiens and E. coli libraries for 12 cycles. 400-450 bp gel-based size selection was carried out prior to sequencing.
A total of seven H. sapiens YH1 libraries were constructed, differing in mass of DNA, number of PCR cycles, and selected DNA fragment size. These included two (about 500 bp and about 550 bp) produced from pooling five standard Nextera reactions, three (400-500 bp, 500-550 bp and 550-600 bp) produced from pooling two modified reactions with nine cycles of PCR enrichment, and another two libraries (300-500 bp and 500-650 bp) from a single tagmentation reaction using 500 ng starting DNA with five cycles of PCR enrichment. The insert-size distribution and final yields for the Drosophila and H. sapiens YH1 libraries were validated separately using a 2100 Bioanalyzer (DNA 1000 and 7500 kit; Agilent, Santa Clara, CA, USA) and quantitative PCR.
P. aeruginosa PAO1 Illumina-compatible shotgun libraries were prepared for each strain using Epicentre Biotechnologies' Nextera DNA sample preparation kits with a customized, unique 9 bp barcode sequence for each strain. The tagmentation reaction consisted of 200 ng PAO1 DNA, 25 μl Nextera high molecular weight buffer, 1 μl Nextera transposase enzyme, and water to a total volume of 20 μl. The reaction was incubated for 5 min at 55°C, cleaned using Qiagen MiniElute columns, and eluted in 11 μl water. PCR reactions included 5 μl of the fragmented DNA, 17 μl water, 25 μl Nextera PCR buffer, 1 μl Nextera PCR enzyme, 1 μl of a Nextera primer cocktail containing two short primers (at 10 μM each) and one long Illumina-compatible adaptor (at 5 μM), and 1 μl of the barcode containing Illumina adaptor (at 5 μM). PCR conditions used were the same as above using 12 cycles of amplification, followed by MinElute clean-up as before. Samples were run on a Novex TBE polyacrylamide gel to confirm library quality, and DNA concentrations measured using a Nanodrop.
For the Roche (454)-compatible libraries, standard Nextera reaction conditions were used with 50 ng CRW10 (barcode 11) or PA1 (barcode 10) bacteriophage DNA, 454-Titanium compatible kit components and standard PCR methods, cycling 15 times. The PCR products were purified using Qiagen MinElute columns. Library fragment sizes were assessed using an Agilent Bioanalyzer DNA1000 chip.
Targeted sequence capture of the human exome
Libraries were prepared by transposase-catalyzed adaptor insertion by previously described methods using 50 ng genomic DNA (BK229.03, SFARI-SSC), 1 μl transposomes, 4 μl 5× HMW buffer, and water to 20 μl. Samples were incubated at 55°C for 5 min then cleaned up (AMPure) and eluted in 20 μl followed by the addition of 25 μl 2× Nextera PCR buffer, 1 μl 50× Nextera primer cocktail, 1 μl Nextera PCR enzyme, 0.5 μl 100× SYBR Green, and 1 μl of a barcoded adaptor (Table S4 in Additional file 3) with water to 50 μl. Reactions were carried out on a Bio-Rad MiniOpticon using recommended cycling conditions for 12 rounds. Each tube was cleaned up (AMPure, Agencourt, Boston, MA, USA) and checked for size and quantity on an Agilent Bioanalyzer DNA 1000 chip. One sample was selected for capture using all of the 414.6 ng for hybridization to Nimblegen (Madison, WI, USA) SeqCap EZ Exome probes v1.0 as per Nimblegen protocols using custom blockers (Nextera_Block1: 5'-AAT GAT ACG GCG ACC ACC GAG ATC TAC ACG CCT CCC TCG CGC CAT CAG AGA TGT GTA TAA GAG ACA G-3', Nextera_Block1_REV: 5'-CTG TCT CTT ATA CAC ATC TCT GAT GGC GCG AGG GAG GCG TGT AGA TCT CGG TGG TCG CCG TAT CAT T-3', Nextera_Block2: 5'-CAA GCA GAA GAC GGC ATA CGA GAT CGG TCT GCC TTG CCA GCC CGC TCA GAG ATG TGT ATA AGA GAC AG-3', Nextera_Block2_REV: 5'-CTG TCT CTT ATA CAC ATC TCT GAG CGG GCT GGC AAG GCA GAC CGA TCT CGT ATG CCG TCT TCT GCT TG-3') for 72 h at 47°C. After hybridization, wash was performed as per Nimblegen protocols with streptavidin-coupled magnetic beads. Finally, PCR amplification was performed on exome captured library (Post_Cap_Short_For_Amp: 5'-AAT GAT ACG GCG ACC ACC GAG ATC T-3', Post_Cap_Short_Rev_Amp: 5'-CAA GCA GAA GAC GGC ATA CGA GAT-3'; 1× [0:30 min at 98°C], 17× [0:10 min at 98°C, 0:30 min at 65°C, 0:45 min at 72°C]) followed by clean up (AMPure) and sequencing on an Illumina GAIIx SE36 run.
PCR-free library preparation
Adaptor sequences (NoPCR1: 5'-AAT GAT ACG GCG ACC ACC GAG ATC TAC ACG CCT CCC TCG CGC CAT CAG AGA TGT GTA TAA GAG ACA G-3', and NoPCR2: 5'-CAA GCA GAA GAC GGC ATA CGA GAT CGG TCT GCC TTG CCA GCC CGC TCA GAG ATG TGT ATA AGA GAC AG-3') were designed to contain the original 'Nextera' adaptor sequences, but with an additional 5' overhang of either P1 or P2 on adaptor 1 or adaptor 2, respectively (i.e. sequences to make compatible with cluster PCR on Illumina flow-cell), thus eliminating the need to add them during a PCR step. The 5' phosphorylated reverse compliment of the 19 bp mosaic end (ME: 5'-Phos-CTG TCT CTT ATA CAC ATC T-3') sequence was hybridized to NoPCR1/2 by combining 5 μl of each NoPCR1 and NoPCR2 with 10 μl ME reverse complement all at 100 μM with 80 μl TE, followed by denaturation at 95°C for 5 min then slow cooling to room temperature for a final annealed adaptor concentration of 10 μM. Transposomes were assembled by incubating 5 μl annealed adaptors at 10 μM with 5 μl 100% glycerol and 10 μl Ez-Tn5 transposase (Epicentre) and allowed to incubate at room temperature for 20 min.
Tagmentation was carried out using previously prepared E. coli (CC118) or human (NA18507) genomic DNA using either 100 or 200 ng of DNA, 5 μl prepared transposomes, 2 μl 5× Nextera HMW buffer (Epicentre), and water to 10 μl. Reactions were incubated at 55°C for 5 min, followed by the addition of 25 μl 2× FailSafe PCR master mix (Epicentre), 1 μl FailSafe DNA polymerase (Epicentre), and 14 μl nuclease-free water (Ambion), and subsequent 5 min incubation at 72°C for nick translation. Tubes were then cleaned up using Qiaquick MinElute PCR purification columns (Qiagen), eluting in 12 μl buffer EB.
For each reaction, 2 μl was used as template for a real time PCR on a MiniOpticon (Bio-Rad) using 0.5 μl SYBR Green, 25 μl 2× Nextera PCR master mix, 1 μl Nextera PCR enzyme and nuclease-free water to 50 μl. Alongside the NoPCR reactions, libraries of known concentrations were used as template at successive dilutions to be used as a standard for rough library quantification. Standard Nextera cycling conditions were used, without the initial 72°C extension step. PCR reactions were cleaned up by Qiaquick PCR purification columns and run on a Noved 6% TBE PAGE gel (Invitrogen) for library size verification. After quantification, libraries were sequenced as per standard Illumina GAIIx protocol as a paired-end 36 bp run.
Low input transposase-based library preparation
For the 500 pg and 100 pg E. coli (CC118) libraries and 10 pg human library (NA18507, Coriell), genomic DNA (in 1 μl volume) was incubated with 1 μl Nextera Illumina-compatible transposomes (Epicentre) at a 1 to 50 dilution (1 μl Nextera enzyme, 24 μl TE, 25 μl 100% glycerol), 1 μl 5× Nextera HMW buffer, and 2 μl nuclease-free water (Ambion). To avoid contamination, all dilutions and reaction preparation was carried out in a PCR hood. Reactions were incubated at 55°C for 5 min followed by addition of 25 μl 2× Nextera PCR buffer, 0.5 μl SYBR Green, 1 μl 50× Nextera primer cocktail, and 1 μl 0.5 μM barcode adaptor 2 (barcodes A6, A9, or A4 for 500 pg and 100 pg E. coli DNA, or 10 pg human DNA, respectively) and cycled under standard Nextera conditions in a MiniOpticon (Bio-Rad) real-time PCR thermocycler. Both reactions were removed after 20 cycles and cleaned up using Qiaquick MinElute columns, eluting in 20 μl EB. Libraries were run on a 6% Novex TBE PAGE gel (Invitrogen) for size verification and sequenced as barcoded spike-ins as per standard Illumina GAIIx protocol as a paired-end 101 bp (plus 9 bp barcode) run for E. coli libraries and a paired-end 36 bp (plus 9 bp barcode) run for human.
Direct colony-based library preparation
Fusion-Blue chemically competent E. coli (Clontech, Mountain View, CA, USA) were transformed with pUC19 bearing a 2 kbp insert of human genomic DNA, and then plated on Luria broth (LB) + ampicillin. A small number of cells were picked from a single bacterial colony with a 10 μl pipette tip, and then transferred with dipping into 15 μl nuclease-free H2O. The suspended cells were heat-lysed at 95°C for 5 min, then placed on ice for 2 min. Nextera 5× LMW reaction buffer and enzyme (Illumina-compatible; Epicentre) were added to the sample, followed by brief mixing and incubation at 55°C for 5 min. The reaction was then stopped by heating to 70°C for 15 min. Sequencing-compatible primer sites were added in a 50 μl PCR reaction using 5 μl of the transposase reaction directly as template without intervening purification. PCR was carried out with 31.6 μl H2O, 10 μl Kapa 2G robust A buffer 5× (Kapa Biosystems, Cape Town, South Africa), 1 μl dNTP mix (10 mM each), 0.25 μl SYBR Green 100×, 1 μl 50× Nextera primer cocktail, 1 μl 50× Nextera adaptor 2, and 0.20 μl Kapa 2G robust polymerase; cycling conditions were as described by Epicentre. The amplification reaction was cleaned up with a Qiaquick PCR clean-up column (Qiagen) and eluted into 50 μl EB.
Sequencing of the H. sapiens NA18507, E. coli CC118, and D. melanogaster libraries was done on an Illumina Genome Analyzer IIx as paired-end 36, 36, and 45 bp runs, respectively, using standard read primers for sonication and fragmentase libraries, run in individual lanes, and Nextera read primers for Nextera libraries. H. sapiens YH1 libraries were run on an Illumina HiSeq2000 as paired-end 90 bp run using Nextera read primers. Phage libraries contained library-specific barcodes and were run as multiplexed samples using GS FLX Titanium sequencing protocols. P. aeruginosa libraries were pooled by combining 100 ng of each strain library, and sequenced on an Illumina Genome Analyzer IIx with a paired-end 76-cycle run.
Short read mapping
Short read mapping was done on the E. coli CC118, D. melanogaster and H. sapiens (NA18507 & YH1) Illumina GAIIx or HiSeq2000 sequenced samples by converting the raw sequence files to fastq format and then mapping to the hg18 (NCBI36) reference using the BWA  alignment software. After mapping, PCR duplicates were removed, as well as read-pairs with an insert size shorter than that of the read length.
Long read assembly
Long read assembly from bacteriophage samples sequenced on the Roche GS FLX Ti was done using Roche's newbler assembler under default parameters. Individual reads from each dataset were mapped against the assembled genome using gsMapper (Roche Software Release: 2.3 (091027_1459)).
Fragmentation site characterization
Fragmentation site characterization was carried out by stacking all regions of the genome flanking forward strand mapping start locations and the reverse complement of reverse strand start locations followed by calculating nucleotide frequencies at each position relative to the fragmentation site, thus generating a positional weight matrix (PWM). The PWMs then were imported into the SeqLogo (Oliver Bembom, Dept. of Biostatistics, University of California, Berkeley, 2008) package for Bioconductor in R and used to generate positional information content (IC) and sequence logos using the equation outlined by T. D. Schneider et al.
where J is the number of variables in the alphabet (4; A, C, G, or T), and j is the base at position w. This equation does not factor in the background nucleotide frequencies.
Normalization of Illumina GAIIx coverage
Normalization of Illumina GAIIx coverage for the E. coli (non-size-selected) data was done by dividing the coverage at each position in the genome by the total number of mapped bases and then multiplying by a constant (the average number of mapped bases was close to 1 Gb, therefore 109 was used as the constant).
Coverage distribution histograms
Coverage distribution histograms were generated by calculating the number of times each base of the genome was sequenced and plotting the frequency of each level of coverage.
Coverage by G+C content
Coverage by G+C content plots were generated by binning the reference genome into 500 bp bins for E. coli, 10 kbp bins for human, and 1 kbp bins for Drosophila (other sizes were also investigated, resulting in very similar distributions) and calculating the G+C content of each bin, followed by plotting the coverage of that bin.
Library complexity was calculated by random sampling of 50,000 read-pairs without replacement and plotting the number of uniquely occurring read-pairs versus the total number of sampled read-pairs.
Insertion size distributions
Insertion size distributions were generated by taking the distance between the start mapping location of the first read and the end mapping location of the second read for every read-pair and plotting the frequency of occurrences of each insert size.
SNP calls for the YH1 genome were generated using the SAMtools  variant caller with a maximum coverage of 1,000 and minimum quality score of 30. Prior to variant calling, read-pairs with an insert size less than 90 bp and reads not properly paired were removed to reduce noise. Calls were then compared with the SNPs reported by Wang et al. (2008)  and to dbSNP build 129.
Cell-line-discordant SNP calls
In order to minimize false calls due to mapping errors, a repeat-masked version of hg18 was used along with further masking with respect to mappability according to the UCSC Genome Browser 'Rosetta 35mer uniqueness' and excluding all regions with a score of 0 (this score means that the sequence maps perfectly to multiple locations in the genome). This track was used because it was generated using the BWA aligner that was used in our analysis, and because the original YH1 sequence data is made up of 36 bp reads. Out of this newly masked genome, positions were called that had a SNP quality score in the cell line over 30, a reference call in blood over 30, and coverage less than 100× in both datasets. Of those with a quality score of 50 in both for their calls, 100 were randomly chosen for validation.
Validation was carried out using a mass spectrometry assay (Sequenom, San Diego, CA, USA). Primers for PCR amplification and extension were successfully designed for 100 mutation sites using the Sequenom MassArray Assay Design v3.1. PCR amplification, shrimp alkaline phosphatase treatment of unincorporated dNTPs, probe extension and resin desalting were carried out in sequence using the conditions described elsewhere . Sequenom genotyping was performed in parallel for genomic DNA from YH1 blood and the same batch of lymphoblastoid cell lines as was used for sequencing. A negative control and technical replicate were also run in parallel for each typed position. Genotyping of all 100 testing sites passed the filter criteria of: (1) no failing extension, (2) no false positive in the negative control, (3) consistency between two technical replicates. Genotyping was further performed for the 100 positions in the YH primary lymphoblastoid cell lines using the same method, with 98 meeting the above filter criteria.
Sequence reads from transposase-based libraries subjected to human exome capture were aligned to the human reference (hg18) using BWA . Each aligned base was deemed to be on target if it was within 100 bases of a targeted sequence. At each position within target regions, coverage was assessed and any position with a depth of at least one was considered covered. Comparison to standard exome methods was made by trimming read 1 of a PE76 lane from a GAIIx down to 36 bases and aligning it the same way. Because the standard library had fewer reads mapped, 28 million reads were randomly taken from both libraries and above analysis performed. Complexity was interrogated by taking an equivalent number of on-target SE36 reads generated by each method and calculating the percentage with unique start-points.
Barcode design yielded a set of 96 × 9 bp sequences in which each 9 bp sequence contained no homopolymer run of three or more bases, had a GC content of ≤ 60%, was a edit distance of at least four away from all other members of the set of 96, and screened negative when compared with other adaptor and primer sequences used here. Also, we took care to ensure that each base (A, G, C, or T) was represented at least once in each 9 bp barcode, and at least once in each position along the 9 bp.
Barcode deconvolution for pooled, multiplexed Pseudomonas samples was carried out by computing the Levenshtein edit distance between the obtained index read and each of the 96 barcode sequences used. The corresponding read-pair was assigned to a barcode when that barcode was within edit distance of 2 of the index read, with the next closest matching barcode being at least two further edits away.
All sequence data described here is being deposited in the NCBI Sequence Read Archive (SRA) under accession SRP004087.