TY - JOUR AU - Altschul, S. F. AU - Gish, W. AU - Miller, W. AU - Myers, E. W. AU - Lipman, D. J. PY - 1990 DA - 1990// TI - Basic local alignment search tool JO - J Mol Biol VL - 215 UR - https://doi.org/10.1016/S0022-2836(05)80360-2 DO - 10.1016/S0022-2836(05)80360-2 ID - Altschul1990 ER - TY - STD TI - GenBank and WGS Statistics. http://www.ncbi.nlm.nih.gov/genbank/statistics. Accessed 31 May 2016. UR - http://www.ncbi.nlm.nih.gov/genbank/statistics ID - ref2 ER - TY - JOUR AU - Stephens, Z. D. AU - Lee, S. Y. AU - Faghri, F. AU - Campbell, R. H. AU - Zhai, C. AU - Efron, M. J. PY - 2015 DA - 2015// TI - Big data: astronomical or genomical? JO - PLoS Biol VL - 13 UR - https://doi.org/10.1371/journal.pbio.1002195 DO - 10.1371/journal.pbio.1002195 ID - Stephens2015 ER - TY - STD TI - Broder AZ. On the resemblance and containment of documents. Compression and Complexity of Sequences 1997 - Proceedings 1998:21–29. ID - ref4 ER - TY - STD TI - Indyk P, Motwani R. Approximate nearest neighbors: towards removing the curse of dimensionality. In: Proceedings of the Thirtieth Annual ACM Symposium on Theory of Computing. Dallas, TX: ACM; 1998. ID - ref5 ER - TY - CHAP AU - Broder, A. Z. PY - 2000 DA - 2000// TI - Identifying and filtering near-duplicate documents BT - COM ’00 Proceedings of the 11th Annual Symposium on Combinatorial Pattern Matching PB - Springer CY - London UR - https://doi.org/10.1007/3-540-45123-4_1 DO - 10.1007/3-540-45123-4_1 ID - Broder2000 ER - TY - STD TI - Chum O, Philbin J, Zisserman A. Near Duplicate Image Detection: min-Hash and tf-idf Weighting. In: Proceedings of the British Machine Vision Conference 2008. Durham, UK: British Machine Vision Association and Society for Pattern Recognition; 2008. ID - ref7 ER - TY - JOUR AU - Narayanan, M. AU - Karp, R. M. PY - 2004 DA - 2004// TI - Gapped local similarity search with provable guarantees JO - Algorithms in Bioinformatics, Proceedings VL - 3240 UR - https://doi.org/10.1007/978-3-540-30219-3_7 DO - 10.1007/978-3-540-30219-3_7 ID - Narayanan2004 ER - TY - JOUR AU - Berlin, K. AU - Koren, S. AU - Chin, C. S. AU - Drake, J. P. AU - Landolin, J. M. AU - Phillippy, A. M. PY - 2015 DA - 2015// TI - Assembling large genomes with single-molecule sequencing and locality-sensitive hashing JO - Nat Biotechnol VL - 33 UR - https://doi.org/10.1038/nbt.3238 DO - 10.1038/nbt.3238 ID - Berlin2015 ER - TY - CHAP AU - Yang, X. AU - Zola, J. AU - Aluru, S. PY - 2011 DA - 2011// TI - Parallel metagenomic sequence clustering via sketching and maximal quasi-clique enumeration on map-reduce clouds BT - Parallel & Distributed Processing Symposium (IPDPS), 2011 IEEE International. IEEE UR - https://doi.org/10.1109/IPDPS.2011.116 DO - 10.1109/IPDPS.2011.116 ID - Yang2011 ER - TY - CHAP AU - Drew, J. AU - Hahsler, M. PY - 2014 DA - 2014// TI - Strand: fast sequence comparison using mapreduce and locality sensitive hashing BT - Proceedings of the 5th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics PB - ACM CY - Newport Beach, CA ID - Drew2014 ER - TY - CHAP AU - Rasheed, Z. AU - Rangwala, H. PY - 2013 DA - 2013// TI - A Map-Reduce Framework for Clustering Metagenomes BT - 2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum: IEEE ID - Rasheed2013 ER - TY - JOUR AU - Vinga, S. AU - Almeida, J. PY - 2003 DA - 2003// TI - Alignment-free sequence comparison-a review JO - Bioinformatics VL - 19 UR - https://doi.org/10.1093/bioinformatics/btg005 DO - 10.1093/bioinformatics/btg005 ID - Vinga2003 ER - TY - JOUR AU - Haubold, B. PY - 2014 DA - 2014// TI - Alignment-free phylogenetics and population genetics JO - Brief Bioinform VL - 15 UR - https://doi.org/10.1093/bib/bbt083 DO - 10.1093/bib/bbt083 ID - Haubold2014 ER - TY - JOUR AU - Blaisdell, B. E. PY - 1986 DA - 1986// TI - A measure of the similarity of sets of sequences not requiring sequence alignment JO - Proc Natl Acad Sci U S A VL - 83 UR - https://doi.org/10.1073/pnas.83.14.5155 DO - 10.1073/pnas.83.14.5155 ID - Blaisdell1986 ER - TY - CHAP AU - Torney, D. C. AU - Burks, C. AU - Davison, D. AU - Sirotkin, K. M. ED - Bell, G. I. ED - Marr, T. G. PY - 1990 DA - 1990// TI - Computation of d2: a measure of sequence dissimilarity BT - Computers and DNA: the proceedings of the Interface between Computation Science and Nucleic Acid Sequencing Workshop, held December 12 to 16, 1988 in Santa Fe, New Mexico PB - Addison-Wesley Pub. Co CY - Redwood City ID - Torney1990 ER - TY - JOUR AU - Lippert, R. A. AU - Huang, H. AU - Waterman, M. S. PY - 2002 DA - 2002// TI - Distributional regimes for the number of k-word matches between two random sequences JO - Proc Natl Acad Sci U S A VL - 99 UR - https://doi.org/10.1073/pnas.202468099 DO - 10.1073/pnas.202468099 ID - Lippert2002 ER - TY - JOUR AU - Yang, K. AU - Zhang, L. PY - 2008 DA - 2008// TI - Performance comparison between k-tuple distance and four model-based distances in phylogenetic tree reconstruction JO - Nucleic Acids Res VL - 36 UR - https://doi.org/10.1093/nar/gkn075 DO - 10.1093/nar/gkn075 ID - Yang2008 ER - TY - JOUR AU - Deloger, M. AU - Karoui, M. AU - Petit, M. A. PY - 2009 DA - 2009// TI - A genomic distance based on MUM indicates discontinuity between most bacterial species and genera JO - J Bacteriol VL - 191 UR - https://doi.org/10.1128/JB.01202-08 DO - 10.1128/JB.01202-08 ID - Deloger2009 ER - TY - JOUR AU - Yi, H. AU - Jin, L. PY - 2013 DA - 2013// TI - Co-phylog: an assembly-free phylogenomic approach for closely related organisms JO - Nucleic Acids Res VL - 41 UR - https://doi.org/10.1093/nar/gkt003 DO - 10.1093/nar/gkt003 ID - Yi2013 ER - TY - JOUR AU - Haubold, B. AU - Klotzl, F. AU - Pfaffelhuber, P. PY - 2015 DA - 2015// TI - andi: fast and accurate estimation of evolutionary distances between closely related genomes JO - Bioinformatics VL - 31 UR - https://doi.org/10.1093/bioinformatics/btu815 DO - 10.1093/bioinformatics/btu815 ID - Haubold2015 ER - TY - JOUR AU - Fan, H. AU - Ives, A. R. AU - Surget-Groba, Y. AU - Cannon, C. H. PY - 2015 DA - 2015// TI - An assembly and alignment-free method of phylogeny reconstruction from next-generation sequencing data JO - BMC Genomics VL - 16 UR - https://doi.org/10.1186/s12864-015-1647-5 DO - 10.1186/s12864-015-1647-5 ID - Fan2015 ER - TY - JOUR AU - Konstantinidis, K. T. AU - Tiedje, J. M. PY - 2005 DA - 2005// TI - Genomic insights that advance the species definition for prokaryotes JO - Proc Natl Acad Sci U S A VL - 102 UR - https://doi.org/10.1073/pnas.0409727102 DO - 10.1073/pnas.0409727102 ID - Konstantinidis2005 ER - TY - JOUR AU - Schatz, M. C. AU - Phillippy, A. M. PY - 2012 DA - 2012// TI - The rise of a digital immune system JO - Gigascience VL - 1 UR - https://doi.org/10.1186/2047-217X-1-4 DO - 10.1186/2047-217X-1-4 ID - Schatz2012 ER - TY - JOUR AU - Pruitt, K. D. AU - Tatusova, T. AU - Brown, G. R. AU - Maglott, D. R. PY - 2012 DA - 2012// TI - NCBI Reference Sequences (RefSeq): current status, new features and genome annotation policy JO - Nucleic Acids Res VL - 40 UR - https://doi.org/10.1093/nar/gkr1079 DO - 10.1093/nar/gkr1079 ID - Pruitt2012 ER - TY - JOUR AU - Saitou, N. AU - Nei, M. PY - 1987 DA - 1987// TI - The neighbor-joining method: a new method for reconstructing phylogenetic trees JO - Mol Biol Evol VL - 4 ID - Saitou1987 ER - TY - JOUR AU - Miller, W. AU - Rosenbloom, K. AU - Hardison, R. C. AU - Hou, M. AU - Taylor, J. AU - Raney, B. PY - 2007 DA - 2007// TI - 28-way vertebrate alignment and conservation track in the UCSC Genome Browser JO - Genome Res VL - 17 UR - https://doi.org/10.1101/gr.6761107 DO - 10.1101/gr.6761107 ID - Miller2007 ER - TY - JOUR AU - Perelman, P. AU - Johnson, W. E. AU - Roos, C. AU - Seuanez, H. N. AU - Horvath, J. E. AU - Moreira, M. A. PY - 2011 DA - 2011// TI - A molecular phylogeny of living primates JO - PLoS Genet VL - 7 UR - https://doi.org/10.1371/journal.pgen.1001342 DO - 10.1371/journal.pgen.1001342 ID - Perelman2011 ER - TY - JOUR AU - Kuhner, M. K. AU - Felsenstein, J. PY - 1994 DA - 1994// TI - A simulation comparison of phylogeny algorithms under equal and unequal evolutionary rates JO - Mol Biol Evol VL - 11 ID - Kuhner1994 ER - TY - JOUR AU - Loman, N. J. AU - Quick, J. AU - Simpson, J. T. PY - 2015 DA - 2015// TI - A complete bacterial genome assembled de novo using only nanopore sequencing data JO - Nat Methods VL - 12 UR - https://doi.org/10.1038/nmeth.3444 DO - 10.1038/nmeth.3444 ID - Loman2015 ER - TY - JOUR AU - Song, L. AU - Florea, L. AU - Langmead, B. PY - 2014 DA - 2014// TI - Lighter: fast and memory-efficient sequencing error correction without counting JO - Genome Biol VL - 15 UR - https://doi.org/10.1186/s13059-014-0509-9 DO - 10.1186/s13059-014-0509-9 ID - Song2014 ER - TY - JOUR AU - Seth, S. AU - Valimaki, N. AU - Kaski, S. AU - Honkela, A. PY - 2014 DA - 2014// TI - Exploration and retrieval of whole-metagenome sequencing samples JO - Bioinformatics VL - 30 UR - https://doi.org/10.1093/bioinformatics/btu340 DO - 10.1093/bioinformatics/btu340 ID - Seth2014 ER - TY - JOUR AU - Maillet, N. AU - Lemaitre, C. AU - Chikhi, R. AU - Lavenier, D. AU - Peterlongo, P. PY - 2012 DA - 2012// TI - Compareads: comparing huge metagenomic experiments JO - BMC Bioinformatics VL - 13 UR - https://doi.org/10.1186/1471-2105-13-S19-S10 DO - 10.1186/1471-2105-13-S19-S10 ID - Maillet2012 ER - TY - CHAP AU - Maillet, N. AU - Collet, G. AU - Vannier, T. AU - Lavenier, D. AU - Peterlongo, P. PY - 2014 DA - 2014// TI - COMMET: comparing and combining multiple metagenomic datasets BT - 2014 IEEE International Conference on Bioinformatics and Biomedicine (BIBM): IEEE ID - Maillet2014 ER - TY - JOUR AU - Rusch, D. B. AU - Halpern, A. L. AU - Sutton, G. AU - Heidelberg, K. B. AU - Williamson, S. AU - Yooseph, S. PY - 2007 DA - 2007// TI - The Sorcerer II Global Ocean Sampling expedition: northwest Atlantic through eastern tropical Pacific JO - PLoS Biol VL - 5 UR - https://doi.org/10.1371/journal.pbio.0050077 DO - 10.1371/journal.pbio.0050077 ID - Rusch2007 ER - TY - JOUR PY - 2012 DA - 2012// TI - Structure, function and diversity of the healthy human microbiome JO - Nature VL - 486 UR - https://doi.org/10.1038/nature11234 DO - 10.1038/nature11234 ID - ref36 ER - TY - JOUR AU - Qin, J. AU - Li, R. AU - Raes, J. AU - Arumugam, M. AU - Burgdorf, K. S. AU - Manichanh, C. PY - 2010 DA - 2010// TI - A human gut microbial gene catalogue established by metagenomic sequencing JO - Nature VL - 464 UR - https://doi.org/10.1038/nature08821 DO - 10.1038/nature08821 ID - Qin2010 ER - TY - JOUR AU - Freedman, M. J. AU - Nissim, K. AU - Pinkas, B. PY - 2004 DA - 2004// TI - Efficient private matching and set intersection JO - Advances in Cryptology - Eurocrypt 2004, Proceedings VL - 3027 UR - https://doi.org/10.1007/978-3-540-24676-3_1 DO - 10.1007/978-3-540-24676-3_1 ID - Freedman2004 ER - TY - CHAP AU - Cristofaro, E. AU - Faber, S. AU - Gasti, P. AU - Tsudik, G. PY - 2012 DA - 2012// TI - Genodroid: are privacy-preserving genomic tests ready for prime time? BT - Proceedings of the 2012 ACM workshop on Privacy in the electronic society PB - ACM CY - Raleigh, NC ID - Cristofaro2012 ER - TY - JOUR AU - Solomon, B. AU - Kingsford, C. PY - 2015 DA - 2015// TI - Large-scale search of transcriptomic read sets with sequence bloom trees JO - bioRxiv ID - Solomon2015 ER - TY - JOUR AU - Fofanov, Y. AU - Luo, Y. AU - Katili, C. AU - Wang, J. AU - Belosludtsev, Y. AU - Powdrill, T. PY - 2004 DA - 2004// TI - How independent are the appearances of n-mers in different genomes? JO - Bioinformatics VL - 20 UR - https://doi.org/10.1093/bioinformatics/bth266 DO - 10.1093/bioinformatics/bth266 ID - Fofanov2004 ER - TY - JOUR AU - Roberts, M. AU - Hayes, W. AU - Hunt, B. R. AU - Mount, S. M. AU - Yorke, J. A. PY - 2004 DA - 2004// TI - Reducing storage requirements for biological sequence comparison JO - Bioinformatics VL - 20 UR - https://doi.org/10.1093/bioinformatics/bth408 DO - 10.1093/bioinformatics/bth408 ID - Roberts2004 ER - TY - JOUR AU - Roberts, M. AU - Hunt, B. R. AU - Yorke, J. A. AU - Bolanos, R. A. AU - Delcher, A. L. PY - 2004 DA - 2004// TI - A preprocessor for shotgun assembly of large genomes JO - J Comput Biol VL - 11 UR - https://doi.org/10.1089/cmb.2004.11.734 DO - 10.1089/cmb.2004.11.734 ID - Roberts2004 ER - TY - JOUR AU - Deorowicz, S. AU - Kokot, M. AU - Grabowski, S. AU - Debudaj-Grabysz, A. PY - 2015 DA - 2015// TI - KMC 2: fast and resource-frugal k-mer counting JO - Bioinformatics VL - 31 UR - https://doi.org/10.1093/bioinformatics/btv022 DO - 10.1093/bioinformatics/btv022 ID - Deorowicz2015 ER - TY - JOUR AU - Wood, D. E. AU - Salzberg, S. L. PY - 2014 DA - 2014// TI - Kraken: ultrafast metagenomic sequence classification using exact alignments JO - Genome Biol VL - 15 UR - https://doi.org/10.1186/gb-2014-15-3-r46 DO - 10.1186/gb-2014-15-3-r46 ID - Wood2014 ER - TY - JOUR AU - Patrascu, M. AU - Thorup, M. PY - 2012 DA - 2012// TI - The power of simple tabulation hashing JO - J ACM VL - 59 UR - https://doi.org/10.1145/2220357.2220361 DO - 10.1145/2220357.2220361 ID - Patrascu2012 ER - TY - JOUR AU - Ukkonen, E. PY - 1992 DA - 1992// TI - Approximate string-matching with Q-grams and maximal matches JO - Theor Comput Sci VL - 92 UR - https://doi.org/10.1016/0304-3975(92)90143-4 DO - 10.1016/0304-3975(92)90143-4 ID - Ukkonen1992 ER - TY - STD TI - Bar-Yossef Z, Jayram TS, Kumar R, Sivakumar D, Trevisan L. Counting distinct elements in a data stream. In: Proceedings of the 6th International Workshop on Randomization and Approximation Techniques. Springer-Verlag; 2002. p. 1–10. ID - ref48 ER - TY - JOUR AU - Phillippy, A. M. AU - Schatz, M. C. AU - Pop, M. PY - 2008 DA - 2008// TI - Genome assembly forensics: finding the elusive mis-assembly JO - Genome Biol VL - 9 UR - https://doi.org/10.1186/gb-2008-9-3-r55 DO - 10.1186/gb-2008-9-3-r55 ID - Phillippy2008 ER - TY - JOUR AU - Felsenstein, J. PY - 1989 DA - 1989// TI - PHYLIP - Phylogeny Inference Package (Version 3.2) JO - Cladistics VL - 5 ID - Felsenstein1989 ER - TY - STD TI - UCSC multiz20way. http://hgdownload.cse.ucsc.edu/goldenPath/hg38/multiz20way/. Accessed 31 May 2016. UR - http://hgdownload.cse.ucsc.edu/goldenPath/hg38/multiz20way/ ID - ref51 ER - TY - STD TI - HMP Illumina WGS Reads. http://hmpdacc.org/HMIWGS/all/. Accessed 31 May 2016. UR - http://hmpdacc.org/HMIWGS/all/ ID - ref52 ER - TY - STD TI - HMP Illumina WGS Assemblies. http://hmpdacc.org/HMASM/all/. Accessed 31 May 2016. UR - http://hmpdacc.org/HMASM/all/ ID - ref53 ER - TY - STD TI - MetaHIT assemblies. http://www.bork.embl.de/~arumugam/Qin_et_al_2010/. Accessed 31 May 2016. UR - http://www.bork.embl.de/~arumugam/Qin_et_al_2010/ ID - ref54 ER - TY - JOUR AU - Li, H. AU - Durbin, R. PY - 2009 DA - 2009// TI - Fast and accurate short read alignment with Burrows-Wheeler transform JO - Bioinformatics VL - 25 UR - https://doi.org/10.1093/bioinformatics/btp324 DO - 10.1093/bioinformatics/btp324 ID - Li2009 ER - TY - STD TI - Cap’n Proto. https://capnproto.org. Accessed 31 May 2016. UR - https://capnproto.org/ ID - ref56 ER - TY - STD TI - MurmurHash3. https://code.google.com/p/smhasher. Accessed 31 May 2016. UR - https://code.google.com/p/smhasher ID - ref57 ER - TY - BOOK AU - Gough, B. PY - 2009 DA - 2009// TI - GNU scientific library reference manual PB - Network Theory Ltd. CY - Godalming ID - Gough2009 ER - TY - STD TI - Open Bloom Filter Library. https://code.google.com/p/bloom. Accessed 31 May 2016. UR - https://code.google.com/p/bloom ID - ref59 ER - TY - BOOK AU - Siek, J. G. AU - Lee, L. -. Q. AU - Lumsdaine, A. PY - 2001 DA - 2001// TI - The Boost Graph Library: User Guide and Reference Manual PB - Pearson Education CY - New York, NY ID - Siek2001 ER - TY - JOUR AU - Shannon, P. AU - Markiel, A. AU - Ozier, O. AU - Baliga, N. S. AU - Wang, J. T. AU - Ramage, D. PY - 2003 DA - 2003// TI - Cytoscape: a software environment for integrated models of biomolecular interaction networks JO - Genome Res VL - 13 UR - https://doi.org/10.1101/gr.1239303 DO - 10.1101/gr.1239303 ID - Shannon2003 ER - TY - JOUR AU - Kamada, T. AU - Kawai, S. PY - 1989 DA - 1989// TI - An algorithm for drawing general undirected graphs JO - Inform Process Lett VL - 31 UR - https://doi.org/10.1016/0020-0190(89)90102-6 DO - 10.1016/0020-0190(89)90102-6 ID - Kamada1989 ER - TY - JOUR AU - Bankevich, A. AU - Nurk, S. AU - Antipov, D. AU - Gurevich, A. A. AU - Dvorkin, M. AU - Kulikov, A. S. PY - 2012 DA - 2012// TI - SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing JO - J Comput Biol VL - 19 UR - https://doi.org/10.1089/cmb.2012.0021 DO - 10.1089/cmb.2012.0021 ID - Bankevich2012 ER -