EAGER: efficient ancient genome reconstruction

Table 6 Comparison of DeDup with the SAMtools rmdup method

Percentage	Method	Var calls	cov(fold)	cov(%)	refCall/ Δ
1	NoRMDup	1	1.16	1.02	33,277
1	DeDup	1	1.16	1.01	−207
1	rmdup	1	1.16	0.98	−1,362
2	NoRMDup	11	2.33	10.17	332,395
2	DeDup	11	2.33	10.14	−1,051
2	rmdup	11	2.32	9.85	−10,563
4	NoRMDup	55	4.7	49.82	1,628,172
4	DeDup	55	4.69	49.73	−2,978
4	rmdup	55	4.64	49.10	−23,481
5	NoRMDup	80	5.89	66.85	2,184,874
5	DeDup	80	5.88	66.77	−2,889
5	rmdup	78	5.8	66.19	−21,761
6	NoRMDup	91	7.06	78.85	2,576,795
6	DeDup	91	7.05	78.78	−2,219
6	rmdup	89	6.94	78.31	−17,500
7	NoRMDup	102	8.26	86.68	2,832,796
7	DeDup	102	8.24	86.62	−1,931
7	rmdup	101	8.09	86.29	−12,650
70	NoRMDup	114	82.58	98.39	3,215,440
70	DeDup	114	80.84	98.39	0
70	rmdup	114	68.87	98.39	−52
80	NoRMDup	114	94.38	98.4	3,215,840
80	DeDup	114	92.11	98.4	−2
80	rmdup	114	76.89	98.4	−54
90	NoRMDup	114	106.23	98.42	3,216,400
90	DeDup	114	103.36	98.42	0
90	rmdup	114	84.62	98.42	−30
100	NoRMDup	114	118.03	98.43	3,216,748
100	DeDup	114	114.51	98.43	−1
100	rmdup	114	92.02	98.43	−30

The first column describes the percentage of randomly drawn reads from the Jorgen625 leprosy data set, with a genome size of 3,268,202 base pairs. Var calls shows the number of variant positions that were called. cov(fold) and cov(%) show the coverage of the genome. refCall describes the number of reference calls that were made, where Δ describes the difference between the non-de-duplicated sample at the given sub-sampling degree and the duplicate removed sample. All other positions of the genome have been filtered out. The parameters to call a position confidently were a coverage of at least fivefold, a variant quality of at least 30, and a minimum allele frequency of 90 %. NoRMDup refers to not applying any duplicate removal to the corresponding sample

ISSN: 1474-760X