Skip to main content

Table 1 Summary of STAT Human Sequence Removal Tool Results

From: STAT: a fast, scalable, MinHash-based k-mer tool to assess Sequence Read Archive next-generation sequence submissions

 

Human RNA_Seq: bronchoalveolar lavage fluid

SARS-CoV-2 Amplicon

Accession

SRR11092056

SRR11092057

SRR13402847

SRR13444106

Total spots

5239723

5184909

216859

471848

Total spots remaining

438796

501436

216720

470934

Total spots removed

4800927

4683473

139

914

Human spots remaining

26265

25384

20

2

Conserved lineage spots

27217

29507

70

13

Total length (kbp) of human spot alignments

3684

3508

<  3

<  1

  1. Summary of results for SRA accessions subjected to STAT Human Sequence Removal Tool (see Human Contamination Identification and Removal in “Methods”). “Total Spots Remaining” is the count of spots found in the output (fastq) file and subtracting this count from the total determine “Total Spots Removed”
  2. We define “Human Spots” as those where all hits (up to top five) are identified as human with eValue < 1e−10. “Conserved Lineage Spots” are those containing a human top hit (lowest eValue) though not the exclusive organism of hits with eValue < 1e−10, and where all spot hits have either identical eValue or the greatest has eValue < 1e−14. “Total Length of Human Spot Alignments” is the sum of all the top alignments for all human spots remaining