A comparative evaluation of genome assembly reconciliation tools

Background The majority of eukaryotic genomes are unfinished due to the algorithmic challenges of assembling them. A variety of assembly and scaffolding tools are available, but it is not always obvious which tool or parameters to use for a specific genome size and complexity. It is, therefore, common practice to produce multiple assemblies using different assemblers and parameters, then select the best one for public release. A more compelling approach would allow one to merge multiple assemblies with the intent of producing a higher quality consensus assembly, which is the objective of assembly reconciliation. Results Several assembly reconciliation tools have been proposed in the literature, but their strengths and weaknesses have never been compared on a common dataset. We fill this need with this work, in which we report on an extensive comparative evaluation of several tools. Specifically, we evaluate contiguity, correctness, coverage, and the duplication ratio of the merged assembly compared to the individual assemblies provided as input. Conclusions None of the tools we tested consistently improved the quality of the input GAGE and synthetic assemblies. Our experiments show an increase in contiguity in the consensus assembly when the original assemblies already have high quality. In terms of correctness, the quality of the results depends on the specific tool, as well as on the quality and the ranking of the input assemblies. In general, the number of misassemblies ranges from being comparable to the best of the input assembly to being comparable to the worst of the input assembly. Electronic supplementary material The online version of this article (doi:10.1186/s13059-017-1213-3) contains supplementary material, which is available to authorized users.


A Comparative Evaluation of Assembly Reconciliation Tools: Supplementary Material
Hind Alhakami, Hamid Mirebrahim and Stefano Lonardi

Supplementary Note 1: Parameter tuning
For the results in the Main Text and in Supplementary Note 3, all assembly reconciliation tools were ran with default parameters. In this Supplementary Note we explored how other parameter settings affected the experimental results. Each tool has its own set of parameters, as briefly described next.
• CISA has three main parameters namely the minimum contig cutoff, the maximum number of consecutive N's, and the maximum unaligned gap (default values 100 bp, 10 bp, 0.95 quintile, respectively); we changed the minimum contig cutoff to 200 bp and 500 bp and the maximum gap size to 100 and 200; we also tried scaffolds as inputs. • For GAA we focused on two parameters, namely the minimum contig cutoff and the maximum tip size (default values of 100 bp and 90 bp, respectively); we changed the contig cutoff size to 200 bp and 500 bp and the maximum tip size allowed to 15 bp and 50 bp. • GAM NGS has three main parameters, namely the minimum number of reads to build a block, the block coverage filtering, and the minimum block length; for these parameters the authors suggest using 10 bp, 0.75, 200 bp, respectively for bacterial genomes, and 50 bp, 0.75, 500 bp for Hg chr14 ; since there was no option to change the minimum block size, we explored the other two parameters; we used the default values of at least 50 reads per block and 0.75 block coverage filter; we also tried setting the read support to 10 and 30 with 0.75 block coverage filter, as well as read support of 10 and 50 with 0.95 block coverage filter. and minimum contig cutoff (default 500 bp); according to the documentation, if these two parameters are not specified MIX is supposed to check thresholds from 0 to 2000 with step of 50; we tried this option, but only got results with default settings; the author of MIX recommend a minimum alignment of 500 bp and a minimum contig cutoff of 0 bp for bacterial genomes (which is what we used); in addition we tried (i) minAlign=50 and minctg=100, (ii) minAlign=50 and minctg=200, and (iii) minAlign=100 and minctg=500.
Experimental results for all these parameter sets are reported in Supplemental Observe that for S. aureus, Metassembler and GAM NGS maintained the same statistics for all parameters configurations, with the exception of a slight variation in the size of the assembly. CISA produced changes only when the minimum contig cutoff increased to 500 bp, with contigs as input. In this case, both genome and gene coverage improved but the contiguity decreased with respect to other configurations.
With scaffolds as inputs, the contiguity increased but the genome fraction was lower than 50% in most cases. In GARM we observed a small variation in the number of mismatches and indels and an insignificant change in the genome coverage.

Supplementary Note 2: Gene coverage analysis
We used the following reference genomes and their corresponding gene annotations First, we created a BLAST database for each of the GAGE reference genome assemblies and each of the merged output assemblies. Then, we used BLASTn to align the primary sequence of each gene against each database (using default parameters). For each hit reported in BLASTn output, we chose the best ranked alignment with 75% minimum identity. The total gene coverage reported is the cumulative sum of the coverage of each hit minus any overlaps between the hits. Supplemental Table S4 -S18) 3.1. High contiguity, high correctness inputs (GAGE) In the first set of experiments, the objective was to explore the contiguity/correctness tradeoff. Specifically, we wanted to test the ability of reconciliation tools to take advantage of the contiguity of the first input assembly and the correctness of the second in order to create a merged assembly with a number of misassemblies comparable to the second assembly and a contiguity comparable to the first assembly. The two input assemblies to be merged were chosen so that one has high N50 value (but possibly a relatively high number of misassembly errors) and the other has few misassembly errors (and possibly a lower N50). Supplemental Table S4 reports the results of merging the SOAPdenovo assembly (high N50) with the ABySS assembly (low misassembly errors) for the three chosen genomes. Since the assembly produced by ABySS on the R. sphaeroides genome has more misassembly errors than the assembly generated by SOAPdenovo we also considered the results on R. sphaeroides reported in Supplemental Table S5 where the input assemblies were produced by ALLPATHS-LG and SGA. The SOAPdenovo assembly was used as the "master" assembly in all tools that distinguish the assembly inputs.

Supplementary Note 3: Experimental results on GAGE assemblies (observations on
Observe in Supplemental Table S4 that on the S. aureus genome, all tools increase the contiguity by less than 3%, although the number of contigs decreased by 7 − 30% (except for GAA). While none of the tools was able to improve assembly errors compared to the ABySS assembly, GAA and MIX produced more errors than SOAPdenovo. CISA produced the lowest number of misassemblies (13% less than SOAPdenovo) at the cost of a 4% decrease in genome and gene coverage. Otherwise, GAM NGS and Metassembler maintained quality statistics close to that of SOAPdenovo.
In this and the rest of the experiments below, GAA consistently produced assemblies with predictable statistics. In the vast majority of the cases, GAA created a merged assembly in which the number of contigs, the size of the resulting assembly, and the number of misassemblies were very close to the sum of those statistics for the input assemblies. GAA's gene coverage was typically low in S. aureus and R. sphaeroides (not as much on Hg chr14, where the gene coverage was generally high in comparison to other merged assemblies), while the percentage of covered genome was relatively high. While GAA's N50 was low, in terms of NGA50 the contiguity was at least as good as the most contiguous input assembly. In fact for Hg chr14, GAA increased NGA50 by 19 − 123% except for one case in which the increase was negligible.
When the input was composed of scaffolds, all tools improved contiguity by less than 5%, and reduced the number of scaffolds by 12 − 92%, with GARM reporting the highest decrease. GARM was the only tool that significantly increased N50 and produced the lowest number of misassemblies; however, GARM's merged assembly covered less than 40% of the reference sequence and less than one third of the genes. In contrast, MIX's merged assembly covered 94% of the genes despite (i) including only about 44% of the reference genome and (ii) decreasing the contiguity by 48%.
If we exclude the number of contigs and NGA50, all the other assembly statistics for GAM NGS and Metassembler are very similar to SOAPdenovo. None of the tools was able to reduce the number of misassembly errors compared to ABySS; in fact, CISA and MIX produced more errors than SOAPdenovo.
Despite the fact that ABySS's assembly for R. sphaeroides had a higher number of misassembly errors than SOAPdenovo, none of the merged assemblies improved on the number of misassemblies compared to SOAPdenovo. Except for GAA, the number of misassembly errors produced by all tools were closer to the master (SOAPdenovo). As expected, tools that rely on the master assembly had a lower number of misassemblies than those that did not rank the inputs. With scaffolds as inputs, changes in NGA50 were negligible for all tools except for CISA. With contigs as inputs, GAM NGS improved the contiguity by at most 11%, Metassembler and MIX increased it by 2%, and CISA dropped it by 85%. CISA also increased the number of contigs by 18%, and decreased genome and gene coverage by about 45%. GAM NGS's assembly covered less than one quarter of the genome and about one fifth of the genes sequences, but its output had quality statistics similar to SOAPdenovo (with a 5% decrease in scaffolds). MIX and Metassembler decreased the number of scaffolds by 30% and 39%, respectively; otherwise, they maintained contiguity and coverage statistics within 1% of SOAPdenovo. GARM significantly improved the contiguity in terms of N50 but maintained the same NGA50 as SOAPdenovo. GARM decreased genome and gene coverage by 11%.
With contigs as inputs, GAM NGS maintained the same genome and gene coverage as SOAPdenovo. MIX and Metassembler produced comparable results, namely (i) they both reduced the number of contigs by nearly one quarter, (ii) increased N50 by 10%, (iii) maintained the same genome coverage, and (iv) decreased gene coverage by less than 2%.
In the majority of the cases, experimental results obtained with ALLPATHS-LG (high N50) and SGA (low misassembly errors) on the R. sphaeroides genome (reported in Supplemental Table S5) followed similar patterns to the ones we observed in Supplemental Table S4. CISA increased the number of contigs, but decreased the contiguity, genome and gene coverage (although the reduction was far less this time). GAA followed the same general pattern mentioned earlier. GAM NGS did not increase contiguity but rather maintained it close to that of the master assembly. Metassembler and MIX also did not increase contiguity, but they reduced the number of contigs, as well as genome and gene coverage. ZORRO worked for this experiment: it increased the number of contigs, decreased contiguity by 10%, but retained genome and gene coverage of ALLPATHS-LG. ZORRO's merged assembly is the only one that achieved a smaller number of misassembly errors than ALLPATHS-LG (but still higher that SGA).
With scaffolds as input assemblies, CISA again reduced the number of contigs and produced an assembly with low genome and gene coverage. GAM NGS reduced the number of contigs slightly but retained the quality statistics of the master assembly. Observe in Supplemental Table S4 that GARM improved N50 by 57% although it retained NGA50 close to SOAPdenovo (the master assembly). Observe in Supplemental Table S5 that GARM maintained ALLPATHS-LG's contiguity statistics. In both experiments GARM decreased genome and gene coverage; on the positive side, the consensus assembly had about 85% less scaffolds compared to the master.
Experimental results on the Hg chr14 with contigs as input assemblies (Supplemental Table S4), show that (i) GAM NGS slightly improved contiguity, (ii) Metassembler maintained contiguity with fewer contigs, (iii) GAA crashed, (iv) number of misassemblies were closer to SOAPdenovo. With scaffolds as inputs, GARM drastically reduced the number of contigs, but also decreased the genome coverage by 7%. GAM NGS and Metassembler produced assemblies with quality statistics close to SOAPdenovo except for a 26% decrease in the number of contigs for Metassembler.

3.2.
Reordering the inputs (GAGE) As mentioned above, some of the assembly reconciliation tools assume that the first input assembly is the master assembly, and should be "trusted" more (we call these tools asymmetric). The goal of this set of experiments is determine how the quality of the merged assembly depends on the specific order of the inputs.
To determine how the ranking affected the results, we repeated the same experiments reported in the previous section but switched the order of the inputs. A comparative analysis of the results in Supplemental Table S4 and Supplemental  Table S6 prompts a few observations. First, we note that CISA, MIX, and GARM are symmetric (i.e., they do not require users to rank the inputs, see Main Text Table 1), hence they are expected to be unaffected by the reordering. Experimental results confirm that CISA and GARM are indeed unaffected. The reordering however affected MIX results, albeit only slightly.
For S. aureus, MIX's contiguity statistics (N50 and NGA50) and genome coverage were not affected by the reordering of the inputs. However, we observed (i) a 2% decrease in gene coverage, (ii) a small difference in the number of contigs (±1), and (iii) a small change in the number of misassemblies, although still higher than SOAPdenovo in both cases.
On the R. sphaeroides genome, all statistics remained unchanged except for the number of misassemblies that increased after reordering. In addition, with contigs as inputs we did not observe an increase in NGA50 after the reordering.
Despite the fact that GAA requires input ranking, the results for S. aureus and R. sphaeroides were similar. The output statistics of GAA followed the general pattern mentioned in the previous section. For Hg chr14, GAA crashed in one ordering but not on the other. For all three genome, GAM NGS and Metassembler produced consensus assemblies with quality statistics close to the master assembly.
Note that the merged assemblies have higher contiguity in Supplemental Table S4, in which the master has higher N50. In contrast, the number of misassemblies were lower in Supplemental Table S6 for both S. aureus and Hg chr14 in which the master had lower errors (with the exception of MIX). Merged assemblies for R. sphaeroides had higher contiguity and lower number of misassemblies, in which the master had higher N50 and lower number of misassemblies (see Supplemental Table S4).

High-quality inputs (GAGE)
In the third set of experiments we tested the ability of the reconciliation tools to merge two high quality assemblies. We selected two highly contiguous assemblies (i.e., small number of contigs and scaffolds, high N50 values) and low number of misassembly errors. Supplemental Table S7 show the result of merging assemblies produced by ALLPATHS-LG as first input and either MSR-CA, SOAPdenovo, or CABOG as the second assembly.
Observe that for S. aureus with contigs as inputs, GAM NGS produced an improved assembly that (i) had no misassemblies, (ii) was 66% more contiguous, and (iii) covered the same portions of the genome and the genes. The next best assembly was by Metassembler with a 107% increase in contiguity and a 51% decrease in the number of contigs, but it had a slight increase in the number of misassemblies compared to ALLPATHS-LG. MIX also improved the contiguity by 107% (N50), but due to the high number of misassemblies (higher than MSR-CA) the increase in contiguity dropped to 4% when aligned to the reference. MIX's gene coverage also dropped by 37%. CISA improved contiguity by 11%, and reduced the number of contigs by nearly a half, but it produced a number of errors higher than ALLPATHS-LG. CISA also decreased genome and gene coverage. ZORRO decreased contiguity by 30% and increased the number of contigs by 22%, although it maintained genome and gene coverage.
With scaffolds as inputs, ALLPATHS-LG has no misassemblies, a lower N50 than MSR-CA but higher NGA50. In general, asymmetric tools produced a lower number of misassemblies and decreased the N50. For instance, GAM NGS maintained quality statistics of ALLPATHS-LG. Although ZORRO is asymmetric it decreased contiguity by more than 90%. On the other hand, symmetric tools had a higher number of misassemblies. GARM achieved the highest increase of NGA50 (16%).
The contiguity of the merged assemblies improved 11%−108% with the exception of ZORRO, which decreased the contiguity by 30%. GARM increased contiguity the most (108%) at the expense of (i) an additional 12% duplication rate, (ii) a number of misassemblies close to MSR-CA, and (iii) a 10% decrease in gene coverage. MIX introduced no misassemblies, but covered only 25% of the genome and gene sequences. Notably, both GAM NGS and Metassembler (i) improved contiguity by 66.5%, (ii) reduced the number of contigs, (iii) introduced no misassemblies, (iv) and maintained gene coverage. These are two rare examples in which we observed an unquestionable improvement in the merged assembly.
On the R. sphaeroides genome, the two input assemblies had almost the same number of misassemblies but the assembly produced by SOAPdenovo was much less fragmented. Only MIX, Metassembler and GARM increased N50 by 37%, 43%, and 69%, respectively (with only Metassembler increasing NGA50 significantly). All other tools decreased the contiguity. In terms of correctness, ZORRO and CISA (using scaffolds as inputs) reduced the number of misassemblies but also decreased the contiguity by 99% and 60%, respectively. Other tools produced merged assemblies with a number of misassemblies not better than the inputs. GARM improved the contiguity by 38% while CISA increased it by less than 2%. GARM, CISA, and MIX reduced the number of contigs by 48%, 51%, and 60%, respectively, but also decreased genome and gene coverage. MIX is the only tool that reduced the number of misassemblies, but again its assembly only covered about half of the genome. None of the tools improved both contiguity and the number of misassemblies.
In Hg chr14, GAA decreased the contiguity by 8%, but it improved the NGA50 by 76%, and increased the gene coverage by 13%. Nevertheless, it had a 198% inflation rate and produced a number of misassemblies equal to the sum of the number of misassemblies in the two inputs. GAM NGS reduced the number of contigs by 10%, improved the contiguity (39% increase in N50, 28% increase in NGA50), slightly reduced the number of misassemblies, but decreased the gene coverage by 11%. Metassembler produced quality statistics that are very close to ALLPATHS-LG.
With scaffolds as inputs, GAM NGS and Metassembler maintained similar quality statistics to ALLPATHS-LG, with the exception of the number of contigs (Metassembler decreased it by 33%) and gene coverage (GAM NGS and Metassembler decreased by 18% and 51%, respectively). GARM improved N50 but decreased NGA50 by 9%. It also increased the number of misassemblies and decreased genome and gene coverage. GARM improved the contiguity by 128% and reduced the number of contigs in half at the cost of 14% inflation and about 41% increase in the number of misassemblies. GAA and GAM NGS improved the contiguity by 76% and 28%, but only GAA increased the gene coverage.

Highly-fragmented inputs (GAGE)
The goal of this set of experiments was to evaluate the performance of assembly reconciliation tools when provided with two highly fragmented input assemblies. Input assemblies were selected to have a high percentage of contigs shorter than 200 bps, a high number of contigs and scaffolds, and low N50.
Supplemental Table S8 shows the results of merging ABySS assembly and SGA assembly. Observe that when we used contigs as inputs, ABySS had a higher contiguity than SGA (except in Hg chr14 ). The opposite, however, was observed when scaffolds were provided in input. In S. aureus and R. sphaeroides with contigs as inputs, all tools increased N50 except for GAA. In terms of NGA50, only asymmetric tools maintained or improved over NGA50 of the better input assembly (in S. aureus we observed up to 8% increase, and up to 17% in R. sphaeroides). However, in Hg chr14 (with contigs as inputs) only GAM NGS improved the N50. In terms of NGA50, GAA produced a 123% increase over SGA, while GAM NGS did not improve it over SGA, but it increased it 33% over ABySS.
With scaffolds as inputs, we observed a decrease in N50 except for MIX and GARM (when SGA inputs are scaffolds). MIX, GARM, and CISA are symmetric tools, hence they are expected to perform better than other tools when the non-master input has better quality. CISA, however, produced inferior results with scaffolds as inputs in most experiments. It turns out that CISA with default parameters break scaffolds into contigs when a scaffold contains more than ten consecutive occurrences of Ns. MIX and GARM enhanced or maintained N50 of SGA. In terms of NGA50, MIX maintained it, while GARM slightly decreased it compared to SGA (yet still higher than ABySS). The number of contigs decreased although it remained relatively high in the majority of the cases. CISA had more than 80% decrease in the number of contigs with scaffolds as inputs, but the genome coverage was poor. GARM reduced the number of contigs by 74 − 91%, regardless of the genome coverage.

De Bruijn vs. string graph assembly (GAGE)
Here we tested the effect of merging assemblies generated using different assembly strategies. Specifically, we merged an assembly generated by an assembler that uses a de Bruijn graphs with an assembly produced by an assembler based on the string graph. Supplemental Table S5 shows the result of merging an assembly produced by ALLPATHS-LG (based on the de Bruijn graph) with an assembly produced by SGA (based on the string graph). Overall, GAM NGS, Metassembler, and MIX maintained similar assembly statistics as ALLPATHS-LG.
Note that S. aureus input assemblies (as contigs) had only one misassembly. The merged assemblies also have one misassembly, with the exception of GAA (two) and ZORRO (none). ZORRO corrected the assembly error without affecting N50 but at the price of a 17% increase in the number of contigs. CISA also increased the number of contigs, decreased NGA50 by 49%, and decreased the gene coverage by 15%. With scaffolds as inputs, ALLPATHS-LG's assembly has no assembly errors. In fact, observe that all merged assemblies did not have any misassemblies. GARM produced only 3 scaffolds and increased N50 by 31% but kept NGA50 close to ALLPATHS-LG, while decreasing less than 6% of genome and gene coverage. CISA covered less than 40% of the genome, while ZORRO decreased the contiguity by 99%.
On R. sphaeroides with contigs as inputs, CISA and ZORRO decreased the contiguity by 34% and 10%, respectively. CISA decreased genome and gene coverage by 8%, while ZORRO maintained ALLPATHS-LG's coverage. GAM NGS and Metassembler slightly reduced the number of contigs; otherwise they maintained ALLPATHS-LG's quality statistics. All tools produced a relatively high number of misassemblies (similar to ALLPATHS-LG). With scaffolds as inputs, CISA, ZORRO, and GARM's assembly statistics followed the same of statistics of S. aureus. All assemblies, with the exception of CISA and ZORRO, had a number of misassemblies closer to ALLPATHS-LG. CISA again covered less than one fifth of the genome and ZORRO decreased the contiguity by 99%. GARM produced only four contigs but decreased the genome coverage by less than 5%. GAM NGS, Metassembler, and MIX produced consensus assemblies with quality statistics comparable to ALLPATHS-LG.
In Hg chr14 (with contigs as inputs) GAM NGS and Metassembler reduced the number of contigs by 4% and 2%, respectively. GAM NGS increased NGA50 by 2%. With scaffolds as inputs, GAM NGS and Metassembler maintained assembly statistics close to ALLPATHS-LG except for the fact that Metassembler reduced the number of contigs by 21%. GARM reduced the number of contigs by 83%, maintained genome and gene coverage but increased the number of misassemblies by 9% (compared to ALLPATHS-LG) and decreased NGA50 by 9%.
GARM increased contiguity by 58%, while other tools improved it by less than 3%. GAM NGS and Metassembler produced about the same number of misassembly errors as the higher of the two inputs. GARM improved NGA50 the most, but also increased the number of misassemblies by 42% and had 31% inflation rate.

Multiple inputs (GAGE)
In this set of experiments we tested the ability of the tools to merge more than two assemblies. When an assembly reconciliation tool allowed no more than two assem-  Figure S1: Experimental results on merging more than two assemblies (as contigs) ordered by the FRCurve score (R. sphaeroides, genome size 4,603,060 bp). The Figure reports on quality of merged assembly compared to the input assemblies. Tools were ran using default parameters, unless otherwise noted blies in input (see Main Text Table 1 for a list), we merged them in an iterative fashion. For instance, to merge three assemblies, we first merged two assemblies, then merged the result to the third assemblies. Metassembler uses a similar strategy: when the user provides multiple assemblies the tool iteratively performs pairwise reconciliation, where the output of one iteration is the input of the next. The ordering of the input assemblies was chosen based on feature response curve (FR curve), which is an assembly quality metric proposed in [1]. The FR curve represents the dependency between contigs that contains no more than τ features and the corresponding genome coverage. The x-axis represents τ and the y-axis represent genome coverage: the "steeper" is the curve, the better is the assembly. We used the FR curves in [2] to determine the merging order of the GAGE assemblies, starting with the assemblies with highest quality. Results for an alternative ordering is discussed in the next section. For tools that allowed to merge more than two assemblies (e.g., CISA and MIX), the merging was done in one step from the original assemblies. Here we were interested in measuring the contiguity and correctness of the resulting assemblies as the number of input assemblies increases.
Supplemental Tables S9, S10, S11 and Supplemental Figures S1 and S2, show the experimental results for S. aureus, R. sphaeroides and Hg chr14, respectively, when inputs are contigs. First observe that in several cases, the process of iterative merging did not complete.
On S. aureus and R. sphaeroides, CISA generally improved the contiguity and decreased the number of contigs as the number of merged assemblies increased. The number of errors and the percentage of genome covered fluctuated over the iterations. As the number of merged assemblies increased, CISA increased the duplication rate and decreased the percentage of covered genes. GAA did not produce assembly files for the first iteration. Although GAA did not work for this particular ordering it did produce results for the alternative ordering reported in the next section.
In S. aureus and R. sphaeroides, GAM NGS's contiguity improved over successive iterations, but the number of misassemblies errors did not decrease (it stayed close to the first master input in all iterations). On the positive side, (i) the number of contigs was relatively small and (ii) the percentage of genome covered was relatively high,  Figure S3: Experimental results on merging more than two assemblies (as scaffolds) ordered by the FRCurve score (S. aureus, genome size 2,903,081 bp). The Figure reports on quality of merged assembly compared to the input assemblies. Tools were ran using default parameters, unless otherwise noted and (iii) gene coverage was relatively high, although slightly lower than the best gene coverage in the input assemblies. In contrast, the percentage of gene coverage decreased for Hg chr14. Although the genome coverage and contiguity were high, the number of misassemblies was also relatively high. GAM NGS increased NGA50 by at least 70% compared to CABOG.
In S. aureus, Metassembler's contiguity improved and the number of contigs decreased over successive iterations, but the number of misassemblies also increased. Metassembler maintained high genome and gene coverage, although slightly lower than the best gene coverage in the input assemblies. In R. sphaeroides, Metassembler's assembly did not improve after the forth iteration. Note that NGA50 was lower than BAMBUS2 and SOAPdenovo. Metassembler's assembly had low genome and gene coverage and number of misassemblies was about the average of the inputs. In Hg chr14, the number of contigs and misassembly errors were low and decreased over successive iterations. Contiguity, genome and gene coverage were high, but slightly decreased over successive iterations.
MIX maintained a low number of misassemblies in most iterations but suffered from low genome and gene coverage. Also, NGA50 was relatively poor. Since the genome coverage in most iterations was less than 50% of the reference, no NGA50 was reported for those iteration. On the S. aureus genome, the coverage was less  Figure S4: Experimental results on merging more than two assemblies (as scaffolds) ordered by the FRCurve score (R. sphaeroides, genome size 4,603,060 bp). The Figure reports on quality of merged assembly compared to the input assemblies. Tools were ran using default parameters, unless otherwise noted  Figure S5: Experimental results on merging more than two assemblies (as scaffolds) ordered by the FRCurve score (Hg chr14, genome size: 107,349,540). The Figure reports on quality of merged assembly compared to the input assemblies. Tools were ran using default parameters, unless otherwise noted than 50% in all iterations but it steadily improved with increasing number of inputs. On R. sphaeroides, the genome coverage was below 50% with four or more inputs.
ZORRO frequently failed to produce results. When it worked, it increased genome and gene coverage. Contiguity usually started high, then fluctuated over iterations. ZORRO produced relatively high number of contigs and misassemblies (somewhat in between the values of the inputs).
We repeated the same experiment but with scaffolds as inputs. Results are reported in Supplemental Tables S12, S13, and S14 and Supplemental Figures S3, S4, and S5. CISA's results show that after a certain number of input assemblies, increasing the number of inputs did not affect the results significantly. From that point forward, it generally improved the contiguity and reduced the number of contigs as the number of merged assemblies increased, at the cost of decreased genome and gene coverage and about 25% inflation rate. The number of misassemblies were with the range of input assemblies. CISA reached stability with four inputs on S. aureus and three inputs on R. sphaeroides).
MIX maintained a low number of contigs albeit this number fluctuated in R. sphaeroides with increasing number of inputs. MIX also produced a high duplication ratio. On S. aureus, MIX produced a high number of misassemblies which generally increased as the number of inputs increased. It maintained high genome coverage but gene coverage was poor in comparison to the inputs. It also maintained high contiguity except for the last iteration. On R. sphaeroides, the number of misassemblies were also relatively high but it fluctuated as the number of inputs increased. Genome coverage increased steadily but gene coverage decreased. It also maintained high contiguity, achieving the best NGA50 for less than five inputs.
ZORRO produced a high number of contigs and a low number of misassemblies on S. aureus and R. sphaeroides. It maintained a high genome coverage but it slightly decreased gene coverage. Contiguity was poor and generally decreased over successive iterations. GAM NGS maintained results very close to the first input throughout all iterations on S. aureus, R. sphaeroides, and Hg chr14. In the latter genome, GAM NGS contiguity generally improved in successive iterations but so did the number of misassemblies.
Metassembler maintained similar quality statistics to CABOG on Hg chr14, although the number of contigs slightly decreased over successive iteration. On R. sphaeroides, Metassembler also maintained CABOG's quality statistics with a slight decrease of (i) number of contigs, (ii) number of misassemblies, (iii) genome and gene coverage, and (iv) contiguity, as the number of iteration increased. On S. aureus, Metassembler also maintained quality statistics close but not identical to MSR-CA. In general, Metassembler produced a small number of contigs. Also, as the number of inputs increased, the number of misassemblies slightly decreased and the contiguity slightly improved.

Multiple inputs (alternative ordering)
In this set of experiments we tested the ability of the tools to merge more than two assemblies on an alternative ordering to the FR curves used in the main Text. Recall that when an assembly reconciliation tool allowed no more than two assemblies in input (see Table 1 in the main text for a list), we merged them in an iterative fashion starting from the most contiguous assemblies (see main Text for more details) Supplemental Tables S15, S16, S17, and S18 show the experimental results for S. aureus, R. sphaeroides (two tables) and Hg chr14, respectively on this alternative ordering. Supplemental Figures S6 -S8 summarize the results with respect to contiguity and correctness. First observe that similar to what we observed for the ordering based on FR curves, in many instances the process of iterative merging did not complete.
On S. aureus and R. sphaeroides, CISA generally increased the contiguity and decreased the number of contigs as the number of merged assemblies increased. The number of errors and the percentage of genome covered fluctuated over the iterations. While the percentage of covered genes peaked with three input assemblies, CISA increased the duplication rate as the number of merged assemblies increased. GAA instead increased contiguity, number of errors, and duplication rate and the percentage of covered genome fraction, as the number of merged assemblies increased.
In S. aureus, R. sphaeroides, and Hg chr14, GAA produced a monotonic increase in duplication rate at successive iterations, while misassemblies seemed to be the union of those present in the input assemblies. GAA's contiguity did not increase  Figure S7: Experimental results on merging more than two assemblies (as contigs) -alternative ordering (R. sphaeroides, genome size 4,603,060 bp). The Figure reports on quality of merged assembly compared to the input assemblies. Tools were ran using default parameters, unless otherwise noted over successive iterations, but the genome coverage was relatively high, while gene coverage which was very low in both S. aureus and R. sphaeroides.
GAM NGS's contiguity increased over successive iterations, but the number of misassemblies did not decrease. On the positive side, the number of misassemblies was small and the percentage of genome covered was high. In S. aureus and R. sphaeroides, gene coverage was high, although slightly lower than the best gene coverage in the input assemblies. In contrast, the percentage of gene coverage decreased for Hg chr14.
GARM increased the contiguity over successive iterations but also inflated the resulting assembly. The number of misassemblies and the genome/gene coverage fluctuated. The percentage of gene coverage decreased in Hg chr14. In R. sphaeroides, GARM crashed after the third iteration. Note that in the second iteration of S. aureus only 26 contigs covered nearly 93% of the genome with 91% gene coverage, no misassemblies, and no inflation. In S. aureus, Metassembler maintained a low error rate and NGA50 (with the exception of Hg chr14 ) over successive iterations (although NGA50 was consistently low). In Hg chr14, NGA50 was low and also decreasing over iterations. In R. sphaeroides, genome and gene coverage for Metassembler was low with respect to input assemblies.  Figure S8: Experimental results on merging more than two assemblies (as contigs) -alternative ordering (Hg chr14, genome size 107,349,540 bp). The Figure reports on quality of merged assembly compared to the input assemblies. Tools were ran using default parameters, unless otherwise noted MIX maintained a low number of misassemblies in most iterations but suffered from low genome and gene coverage. Its NGA50 fluctuated over successive iteration, but it was relatively poor. Since the genome coverage in some iterations is less than 50% of the reference, no NGA50 was reported for those iteration.
ZORRO frequently failed to produce results. When it worked, it increased the percentage of genome coverage and gene coverage and it did not increased duplication.

Supplementary Note 4: Time and Space Analysis
As said, all experiments were performed on a Linux Ubuntu 12.10 server with a 20 cores Intel Xeon CPU E5-2690v2 3GHz and 512GB of RAM. Multithreading was used when available.
First, we measured the usage of computational resources to merge two input assemblies. Graphs in Supplemental Figure S9 illustrate the average (wall clock) run time, the average percentage of processor utilization (where 100% indicates full utilization of one core), and the average memory usage required by each tool to perform each experiments on the four genomes. The average are over all the tested pairs of GAGE assemblies for that genome. Error bars indicate the minimum and maximum.
Second, we measured the usage of computational resources as a function of the number of input assemblies using CISA and MIX, which are the only tools that can merge more than two input assemblies. Graphs in Supplemental Figure S10 shows the (wall clock) run time, processor utilization (where 100% indicates full utilization of one core), and memory usage as the number of input assemblies increases.

Supplementary Note 5: Large genomes (GAGE)
To test the ability of these tools to scale to large eukaryotic genomes, we used GAGE's assemblies for Bombus impatiens. We selected the two input assemblies where most of the tools were able to complete. A high quality reference genome is unavailable for Bombus impatiens, so the statistics we reported were produced by the GAGE script. In addition to the usual assembly statistics, GAGE computes the e-size, which is the expected size of a contig (or scaffold  Figure S10: Wall clock run time, processor utilization (where 100% indicates full utilization of one core), and memory usage as the number of input assemblies increases (for CISA and MIX) as c L 2 c /G, where the sum is over all contigs c, G is the expected genome length and L c is the length of contig c [3].
Results are reported in Supplemental Table S19, in which only contigs and scaffolds of 500 bp or longer were considered. Observe that GARM reduced the number of contigs, increased N50 and the e-size for all experiments. GAM NGS did not work for one of the experiments. In the others, it decreased the number of contigs in all but one experiment. GAM NGS always improved N50, and increased the e-size in all but one experiment. GAA did not work for two of the experiments. When it worked, it did not reduce the number of contigs, but it increased both N50 and the e-size. Lastly, Metassembler decreased N50 and the e-size in three out of four experiments. Metassembler reduced the number of contigs in half of the experiments.

Supplementary Note 6: Limitations
• MIX and CISA: we did not run these two tools on the Hg chr14 dataset because they were designed for bacteria-sized genome and they would not handle such a large input • GARM: while GARM's manual claims that the tool can accept two contigs, two scaffolds, or contig/scaffold combination as an input, we were only successful to run the tool using one contig and one scaffold; in most cases, running with two contigs produced an empty FASTA file, while using two scaffolds produced FASTA files with all nucleotides set to N Supplementary Note 7: Usage of reads Some of the tools can take advantage of the raw reads, in addition to the input assemblies. For GAA, while the paper mentions using paired-end reads for error correction, there is no option to provide them. Therefore, we didn't use them for GAA. We used reads in these cases: • GAM NGS: we used paired-end reads with a 155-180 bp insert (Library 1 in GAGE) • Metassembler: for bacterial genomes we used the available short-jump library (insert size of 3500 bp); for Hg chr14 we used the available long-jump library (insert size is approximately 35 kbp), and for Bombus impatiens we used the available short-jump library 2 (insert size is approximately 8 kbp) • ZORRO: we used paired-end reads with a 155-180 bp insert (Library 1 in GAGE)  Supplemental Table S4: Contiguity-correctness experimental results. Assembly reconciliation tools are given in input two assemblies to merge, in which the first has high contiguity, the second has high correctness. The table reports on quality of merged assembly compared to the two input assemblies. Notes: (c) indicates that the assembly is composed of contigs, (s) indicates that the assembly is composed of scaffolds; all reported statistics are for contigs; the number of mismatches/indels/Ns are per 100 Kbps; tools were ran using default parameters, unless otherwise noted Supplemental Table S6: Contiguity-correctness experimental results. Assembly reconciliation tools are given in input the same two assemblies in Supplemental Table S4, but the order is swapped. The table reports on quality of merged assembly compared to the two input assemblies. Notes: (c) indicates that the assembly is composed of contigs, (s) indicates that the assembly is composed of scaffolds; all reported statistics are for contigs; the number of mismatches/indels/Ns are per 100 Kbps; tools were ran using default parameters, unless otherwise noted Produced an empty assembly file Supplemental Table S10: Experimental results on merging more than two assemblies (contigs) ordered by the FRCurve score (R. sphaeroides, genome size 4,603,060 bp). The table reports on quality of merged assembly compared to the two input assemblies. Notes: all reported statistics are for contigs; the number of mismatches/indels/Ns are per 100 Kbps; tools were ran using default parameters, unless otherwise noted; (1+2)+3 means that assembly 1 and 2 were merged first, the result of which was then merged to assembly 3 Produced an empty assembly file Supplemental Table S11: Experimental results on merging more than two assemblies (contigs) ordered by the FRCurve score (Hg chr14, genome size 107,349,540 bp). The table reports on quality of merged assembly compared to the two input assemblies. Notes: all reported statistics are for contigs; the number of mismatches/indels/Ns are per 100 Kbps; tools were ran using default parameters, unless otherwise noted; (1+2)+3 means that assembly 1 and 2 were merged first, the result of which was then merged to assembly 3 Supplemental Table S12: Experimental results on merging more than two assemblies (scaffolds) ordered by the FRCurve score (S. aureus, genome size 2,903,081 bp). The table reports on quality of merged assembly compared to the two input assemblies. Notes: all reported statistics are for scaffolds; the number of mismatches/indels/Ns are per 100 Kbps; tools were ran using default parameters, unless otherwise noted; (1+2)+3 means that assembly 1 and 2 were merged first, the result of which was then merged to assembly 3 Supplemental Table S13: Experimental results on merging more than two assemblies (scaffolds) ordered by the FRCurve score (R. sphaeroides, genome size 4,603,060 bp). The table reports on quality of merged assembly compared to the two input assemblies. Notes: all reported statistics are for scaffolds; the number of mismatches/indels/Ns are per 100 Kbps; tools were ran using default parameters, unless otherwise noted; (1+2)+3 means that assembly 1 and 2 were merged first, the result of which was then merged to assembly 3 Produced an empty assembly file Supplemental Table S16: Experimental results on merging more than two assemblies (as contigs) with an alternative ordering (R. sphaeroides, genome size 4,603,060 bp). The table reports on quality of merged assembly compared to the two input assemblies. Notes: Statistics reported are for contigs; the number of mismatches/indels/Ns are per 100 Kbps; tools were ran using default parameters, unless otherwise noted; (1+2)+3 means that assembly 1 and 2 were merged first, the result of which was then merged to assembly 3 Did not produce assembly files Supplemental Table S17: Experimental results on merging more than two assemblies (as contigs) with an alternative ordering (R. sphaeroides, genome size 4,603,060 bp). The table reports on quality of merged assembly compared to the two input assemblies. Notes: Statistics reported are for contigs; the number of mismatches/indels/Ns are per 100 Kbps; tools were ran using default parameters, unless otherwise noted; (1+2)+3 means that assembly 1 and 2 were merged first, the result of which was then merged to assembly 3   Figure S12: Experimental results on merging assemblies produced by assemblers based on the de Bruijn graph compared to string graph (top row for input contigs, bottom row for input scaffolds); tools were ran using default parameters