Skip to main content

Table 1 Status of Release 3

From: Finishing a whole-genome shotgun: Release 3 of the Drosophila melanogastereuchromatic genome sequence

   

Physical map gaps

    

Estimated error rate*

Chromosomal region

Group

Size

Number

Location

Estimated maximum size†

Finished BACs

Unfinished BACs

Sequence gaps‡

Release 2 sequence

104 to 105

105 to 106

>106

X (1-11)

HGSC

13,053,575

1

9EF

150 kb

85

14§

22

234,520¶

2

16

241

X (12-20)

LBNL

8,921,907

1

20B2

200 kb

73Â¥

2#

2

0

19

76

84

2L

LBNL

22,217,931

1

39D

~500 kb - 1 Mb

177

2**

3

0

14

30

397

    

42B

100 kb

       

2R

LBNL

20,302,755

2

  

159

4††

5

0

11

56

335

    

57B

300 kb

       

3L

HGSC

23,352,213

1

64C

100 kb

175

9‡‡

11

47,653§§

6

50

409

3R

LBNL

27,890,790

0

NA

NA

235

0

0

0

8

119

430

4

LBNL

1,237,864

1

102F

100 kb

14

0

1

0

3

7

13

Total

 

116,914,271

7

  

917

31

44

 

63

354

1,909

  1. *Estimated error rates were determined for 100-kb bins, chosen to overlap by 50 kb. Estimated error rates were determined for bins containing sequence or physical map gaps. However, gaps represented by Ns in the sequence did not contribute to the estimated error rate; thus, the error rate reflects only those sequences present. †In situ hybridization of flanking clones to polytene chromosomes and estimates of DNA content per band [47] allowed us to estimate the maximum size of the clone gaps. All of the gaps are in regions of tandem repeats and the flanking BACs extending into the gap might contain sufficient amounts of the repeat to lead to a misleading in situ mapping result. Therefore, we also examined the next BAC in the tiling path, not containing the repeat, to ensure we were using a unique sequence probe. Four BACs are listed for each gap, two on each side, in the order they occur in the genome. The gap at 9EF is flanked by BACR48E06 (location, 9C2-E1), BACR10I17 (ND) and BACR26N01 (9F1-10A2), BACR17B23 (10A1-2). The gap at 20B2 is flanked by BACR23I18 (19F3-A2) BACR22O16 (20A3-B2) and BACR06L03 (20B2-C2), BACR05K22 (20C1-2). The gap at 39D was sized by estimating the histone repeat copy number [16]. The estimate from the flanking BACs, BACR34H23 (39A6-C3) and BACR03L08 (39F1-2) is 400 kb. The gap at 42B is flanked by BACR13P06 (42A3-19), BACR36A03 (42B1-2) and BACR28N07 (42B1-3), BACR01C10 (42B3-C6). The gap at 57B is flanked by BACR03N16 (57A1-4), BACR08P05 (57A5-B3), and BACR10P16 (cytology 57B2-6), BACR04E05 (57B4-6). The gap at 64 C is flanked by BACR23H09 (64B15-C2), BACR17L24 (64C1-4), and BACR12G07 (64C5-12), BACR12P14 (64C9-12). The gap at 102F is flanked by BACR13D24 (102D6-E6), BACR22J20 (102E3-F2), and BACH59K20 (102F1-5), BACN05O16 (cross-hybridizes to all telomeres, consistent with its location at the chromosome end). ‡This number includes all instances where we inserted a string of Ns to indicate missing sequences; it is the sum of physical map gaps and gaps due to failure to complete the sequence of cloned regions. In some cases a single physical map gap results in more than one sequence gap. For example, all three sequence gaps on 2L are found in the unfinished BACs that extend into the histone repeat region and four of the five sequence gaps on 2R are found in the unfinished BACs that extend into the repeat region of 42B. Excluding the physical map gaps, the gaps on X 1-11 total 60.6 kb; the gaps on 2R total 1,549 bp; the gaps on 3L total 26.2 kb, excluding the two gaps mapping to heterochromatin. There are no gaps, other than physical map gaps, on 2L, 4 or X 12-20. §The Release 3 sequence of chromosome X 1-11 includes sequence from 14 unfinished BAC clones. Each of these BACs contains one or two regions of repeat sequence that are difficult to resolve. Eight of the unfinished clones contain Foldback (BACR40O10, BAC23M02, BACR19G09, BACR26B05, BACR29A04), multiple or rearranged roo (BACR17E02, BACR46E23) or 412 (BACR07P13) elements. Six of the clones (BACR01A14, BACR17E02, BACR19D19, BACR25I09, BACR29B18, BACR39C15) contain duplications of other, uncharacterized, repeats. BACR13J02 is the most distal clone in Release 3, extending the Release 2 assembly by approximately 15 kb. Seven of the 14 BACs that were unfinished at the time of Release 3 have since been finished. Five clones (CHORI 22340I08, BACR32E02, CHORI 221-14P20, CHORI 221-17A11 and CHORI 223-05O10) have been added to the tiling path to span the genomic regions that are still represented by Release 2 sequences (see ¶); these BACs were not sequenced for Release 3. 366 bp of sequence (coordinate 3.4 Mb, cytology 3EF) are not contained within a BAC but are spanned by 10-kb genomic clones. The EDGP identified two clones, BACR37M19 and BACR20K04, as mapping to this region [12] but we determined that their end sequences align elsewhere. The BAC clone coverage of the X chromosome is expected to be lower than the BAC clone coverage of the autosomes and may explain the BAC clone gap in 3EF. BACs whose names begin with CHORI are derived from a library made with randomly sheared DNA [48]. ¶Four Release 2 segments not covered in finished BACs were used to produce the Release 3 sequence (see Materials and methods, Arm assembly and overlap verification): 18.3 kb starting at position 1,262,967 bp; 104 kb starting at position 3,412,482 bp; 12.2 kb starting at position 9,489,057 bp; 99.7 kb at starting at position 10,462,912 bp. The latter segment extends into the clone gap at 9EF. ¥The last 36 kb of sequence at the centromeric end of the X chromosome are not contained within a BAC and are derived from a phrap assembly using WGS traces and the complete sequence of two 10-kb genomic clones. #One of the two unfinished BACs (BACR22O16) extends into the physical map gap and the second (BACR39I01) contains a sequence gap resulting from our inability to assemble a difficult repetitive region that includes at least eight copies of a 4.7 kb tandem repeat having similarity to a degenerate mdg3 transposable element lacking LTRs. **These two unfinished BACs (BACR05D08 and BACR43O11) flank and extend into the 1-Mb histone gene cluster. ††Three unfinished BACs (BACR48D05, BACR03A06 and BACR36A03) extend into the gap at 42B and one unfinished BAC (BACR08P05) extends into the 57B gap. ‡‡The nine unfinished BACs are BACR31B14, BACR43N11, BACR27G13, BACR29O22, BACR01D04, BACR01B21, BACR09G21, BACR30I05 and BACR34K23. BACR31B14, BACR43N11, and BACR27G13 contain sequence gaps that are a consequence of transposable elements (FB or roo) with complex internal rearrangements, tandem repeats or deletions. Two BACs, BACR29O22 and BACR01B21, contain a roo and a Doc element, respectively, and were not completed. One sequence gap in BACR01D04 is the result of a small misassembly that could not be resolved. Three other sequence gaps are in an unfinished segment of clone BACR34K23. Three (BACR09G21, BACR30I05 and BACR27G13) of the nine BACs that were unfinished at the time of Release 3 are now finished. Five clones (BACR29A07, CHORI 223-12D09, BACR15L14, CHORI221-06A19 and BACR03B05) have been have been added to the tiling path to span the genomic regions that are still represented by Release 2 sequences (see §§); these BACs were not sequenced for Release 3. The addition of BACR29A07 to the tiling path corrects an inversion in Release 3 at the 3L centromere. The BAC order is now BACR17M18, BACR29A07, BACR22B15 and BACR34K23. In addition, there are 13 finished BACs from 3L that have been submitted to GenBank with unresolved tandem repeat annotations, in accordance with the G16 finishing standards for the human genome project [49]. §§Three Release 2 segments not covered in finished BACs were used to produce the Release 3 sequence (see Materials and methods, Arm assembly and overlap verification): 10.8 kb starting at position 1, 18.9 kb starting at position 5,065,167, 12.6 kb starting at position 23,339,636 bp. The 18.9 kb sequence extends into the 64 C clone gap. The 12.6 kb sequence contains two gaps mapping to BACR30H12.