Skip to main content

Table 2 Presence of genes in gene clusters of all available finished and unfinished genome sequences

From: The ESAT-6 gene cluster of Mycobacterium tuberculosis and other high G+C Gram-positive bacteria

    

Presence and names of genes in each species

Gene family

Description

Protein size (in M. tb)

ESAT-6 cluster region

M. tuberculosis H37Rv

M. tuberculosis CDC1551 (CSU#93)

M. tuberculosis* 210

M. bovis* AF2122/97 (spoligotype 9)

M. bovis* BCG Pasteur 1173P2

 

A

ABC transporter family signature, 19-27% homology

283

1

Rv3866

MT3980

ND

MB851A

No sequence data

  

276

2

Rv3889c

MT4004

MTB12A

MB727.3A (partly deleted #)

No sequence data

  

295

3

Rv0289

MT0302

MTB203A

MB548A

No sequence data

  

-

4

No duplication

No duplication

No duplication

No duplication

No duplication

  

300

5

Rv1794

MT1843

MTB196A

MB557A

No sequence data

B

AAA+ class ATPases, CBXX/CFQX family, SpoVK, 1× ATP/GTP-binding site, 29-39% homology

573

1

Rv3868

MT3981

MTB44B

MB851B

No sequence data

  

619

2

Rv3884c

MT3999

MTB12B

MB727.1B

No sequence data

  

631

3

Rv0282

MT0295

MTB23B

MB672B

No sequence data

  

-

4

No duplication

No duplication

No duplication

No duplication

No duplication

  

610

5

Rv1798

MT1847

MTB196B

MB542B

No sequence data

C

Amino-terminal transmembrane protein, possible ATP/GTP-binding motif, 31-41% homology

480

1

Rv3869

MT3982

MTB44C

MB851C

No sequence data

  

495

2

Rv3895c

MT4011

MTB136C

MB780.1C

No sequence data

  

538

3

Rv0283

MT0296

MTB23C

MB672C

No sequence data

  

470

4

Rv3450c

MT3556

MTB45C

MB493.1C

No sequence data

  

506

5

Rv1782

MT1832

MTB46C

MB771.1C

No sequence data

D

DNA segregation ATPase, ftsK chromosome partitioning protein, SpoIIIE, yukA, 3× ATP/GTP-binding sites, 2× amino-terminal transmembrane protein, 28-39% homology

747 + 591

1

Rv3870+71

MT3983+85

MTB44Da+Db

MB851D

MB851D (partly deleted)

  

1396

2

Rv3894c

MT4010

MTB3D

MB780.1D

No sequence data

  

1330

3

Rv0284

MT0297

MTB23D

MB672D

No sequence data

  

1236

4

Rv3447c

MT3553

MTB45D

MB585.1D

No sequence data

  

435 + 932

5

Rv1783+84

MT1833

MTB46Da+Db

MB771.1D

No sequence data

E

PE, 18-90% homology

99

1

Rv3872

MT3986

MTB44E

MB851E

Deleted

  

77

2

Rv3893c

MT4008

MTB3E

MB780.1E

No sequence data

  

102

3

Rv0285

MT0298

MTB23E

MB389E

No sequence data

  

-

4

No duplication

No duplication

No duplication

No duplication

No duplication

  

99 & 99

5

Rv1788 & 91

MT1837 & 40

MTB196Ea & Eb

MB771.0E & MB557E

No sequence data

F

PPE, 19-88% homology

368

1

Rv3873

MT3987

MTB44F

MB851F

Deleted

  

399

2

Rv3892c

MT4007

MTB3F

MB780.1F

No sequence data

  

513

3

Rv0286

MT0299

MTB472F

MB528F

No sequence data

  

-

4

No duplication

No duplication

No duplication

No duplication

No duplication

  

365, 393 & 350

5

Rv1787 & 89 & 90

MT1836 & 38 & 39

MTB196Fa & Fb & Fc

MB771.0Fa & Fb & MB557F

No sequence data

G

lhp or CFP-10, also MTSA-10, grouped into ESAT-6 family, potent secreted T-cell antigens, 9-32% homology

100

1

Rv3874

MT3988

MTB44G

MB851G

Deleted

  

107

2

Rv3891c

MT4006

MTB12G

MB727.3G

No sequence data

  

97

3

Rv0287

MT0300

MTB472G

MB548G

No sequence data

  

125

4

Rv3445c

MT3550

MTB45G

MB585.0G

No sequence data

  

98

5

Rv1792 (Stop)

MT1841 (Stop)

MTB196G (Stop)

MB557G

No sequence data

H

ESAT-6 family, cfp7, L45 or l-esat, also Mtb9.9 family, potent secreted T-cell antigens, 15-27% homology

95

1

Rv3875

MT3989

MTB44H

MB851H †

Deleted

  

95

2

Rv3890c

MT4005

MTB12H

MB727.3H

No sequence data

  

96

3

Rv0288

MT0301

MTB203H

MB548H

No sequence data

  

100

4

Rv3444c

MT3549

MTB45H

MB585.0H

No sequence data

  

94

5

Rv1793

MT1842

MTB196H

MB557H

No sequence data

I

ATPases involved in chromosome partitioning, 1× ATP/GTP-binding motif, -33% homology-

666

1

Rv3876

MT3990

MTB60I

MB477I

Deleted

  

341

2

Rv3888c

MT4003

MTB12I

Deleted #

No sequence data

  

-

3

No duplication

No duplication

No duplication

No duplication

No duplication

  

-

4

No duplication

No duplication

No duplication

No duplication

No duplication

  

-

5

No duplication

No duplication

No duplication

No duplication

No duplication

J

Integral inner membrane protein, binding-protein-dependent transport systems inner membrane component signature, putative transporter protein, 19-27% homology

511

1

Rv3877

MT3991

MTB369J

MB477J

Deleted

  

509

2

Rv3887c

MT4002

MTB12J

MB727.3J (partly deleted #)

No sequence data

  

472

3

Rv0290

MT0303

MTB203J

MB548J

No sequence data

  

467

4

Rv3448

MT3554

MTB45J

MB585.1J

No sequence data

  

503

5

Rv1795

MT1844

MTB196J

MB506J

No sequence data

K

Mycosins, subtilisin-like cell-wall associated serine proteases, 43-49% homology

446

1

Rv3883c

MT3998

MTB12Ka

MB727.0K

No sequence data

  

550

2

Rv3886c

MT4001(Frame)

MTB12Kb

MB727.2K

No sequence data

  

461

3

Rv0291

MT0304

MTB203K

MB548K

No sequence data

  

455

4

Rv3449

MT3555

MTB45K

MB585.1K

No sequence data

  

585

5

Rv1796

MT1845

MTB196K

MB506K

No sequence data

L

2× amino-terminal transmembrane protein, 16-27% homology

462

1

Rv3882c

MT3997

MTB12La

MB727.0L

No sequence data

  

537

2

Rv3885c

MT4000 (Frame)

MTB12Lb

MB727.2L

No sequence data

  

331

3

Rv0292

MT0305

MTB203L

MB694.0L

No sequence data

  

-

4

No duplication

No duplication

No duplication

No duplication

No duplication

  

406

5

Rv1797

MT1846

MTB196L

MB542L

No sequence data

    

Presence and names of genes in each species

Gene family

Description

Protein size (in M. tb)

ESAT-6 cluster region

M. leprae TN

M. avium* 104

M. paratuberculosis K 10

M. smegmatis* MC2 155

C. diphtheriae* NCTC13129

S. coelicolor A3 (2)

A

ABC transporter family signature, 19-27% homology

283

1

ML0057(pseudo)

ND

ND

MS29A

ND

ND

  

276

2

MLabc (pseudo)‡

MA138A

MP3889c

ND

ND

ND

  

295

3

ML2530

MA141A

MP0289

MS32A

ND

ND

  

-

4

No duplication

No duplication

No duplication

No duplication

No duplication

No duplication

  

300

5

ML1540

MA310A

MP1794

ND

ND

ND

B

AAA+ class ATPases, CBXX/CFQX family, SpoVK, 1x ATP/GTP binding site, 29-39% homology

573

1

ML0055

ND

ND

MS29B

ND

ND

  

619

2

ML0039(pseudo)

MA177B

MP3884c

ND

ND

ND

  

631

3

ML2537

MA78B

MP0282

MS32B

ND

ND

  

-

4

No duplication

No duplication

No duplication

No duplication

No duplication

No duplication

  

610

5

ML1536

MA310B

MP1798

ND

ND

ND

C

Amino-terminal transmembrane protein, possible ATP/GTP- binding motif, 31-41% homology

480

1

ML0054

ND

ND

MS29C

ND

ND

  

495

2

Deleted

MA144C

MP3895c

ND

ND

ND

  

538

3

ML2536

MA78C

MP0283

MS32C

ND

ND

  

470

4

Deleted

MA94C

MP3450c

MS8C

CORDmem

SC3C3.07

  

506

5

ML1544

MA221C

MP1782

ND

ND

ND

D

DNA segregation ATPase, ftsK chromosome partitioning protein, SpoIIIE, yukA, 3× ATP/GTP-binding sites 2 × amino-terminal transmembrane protein, 28-39% homology

747+591

1

ML0053+52

ND

ND

MS29D (Stop$)

ND

ND

  

1396

2

Deleted

MA144D

MP3894c

ND

ND

ND

  

1330

3

ML2535

MA78D

MP0284

MS32D

ND

ND

  

1236

4

Deleted

MA504D

MP3447c

MS8D

CORDyuk

SC3C3.20c

  

435+932

5

ML1543

MA221D

MP1783

ND

ND

ND

E

PE, 18-90% homology

99

1

Deleted

ND

ND

MS29E

ND

ND

  

77

2

Deleted

MA138E

MP3893c

ND

ND

ND

  

102

3

ML2534

MA78E

MP0285

MS32E

ND

ND

  

-

4

No duplication

No duplication

No duplication

No duplication

No duplication

No

  

99 & 99

5

Deleted

MA310Ea & Eb

MP1788 & 91

ND

ND

ND

F

PPE, 19-88% homology

368

1

ML0051

ND

ND

MS29F

ND

ND

  

399

2

Deleted

MA138F

MP3892c

ND

ND

ND

  

513

3

ML2533 (pseudo)

MA78F

MP0286

MS32F

ND

ND

  

-

4

No duplication

No duplication

No duplication

No duplication

No duplication

No duplication

  

365, 393 & 350

5

Deleted

MA310Fa & Fb & Fc

MP1787 & 89 & 90

ND

ND

ND

G

lhp or CFP-10, also MTSA-10, grouped ESAT-6 family, potent secreted T-cell antigens, 9-32% homology

100

1

ML0050

ND

ND

MS29G

ND

SC3C3.10 and SC3C3.11(c)

  

107

2

Deleted

MA138G

MP3891c §

ND

ND

ND

  

97

3

ML2532

MA141G

MP0287

MS32G

ND

ND

  

125

4

Deleted

MA319G

MP3445c

MS8G

CORDcfp10

ND

  

98

5

MLcfp (pseudo)‡

MA310G

MP1792

ND

ND

ND

H

ESAT-6 family, cfp7, L45 or l-esat, also Mtb9.9 family, potent secreted T-cell antigens, 15-27% homology

95

1

ML0049

ND

ND

MS29H

ND

SC3C3.10 and SC3C3.11¶

  

95

2

ML0034 (pseudo)

MA138H

MP3890c §

ND

ND

ND

  

96

3

ML2531

MA141H

MP0288

MS32H

ND

ND

  

100

4

ML0363

MA319H

MP3444c

MS8H

CORDesat6

ND

  

94

5

MLesat (pseudo)‡

MA310H

MP1793

ND

ND

ND

I

ATPases involved in chromosome partitioning, 1x ATP/GTP-binding motif, 33% homology

666

1

ML0048

ND

ND

MS29I

ND

SC3C3.03c

  

341

2

ML0035 (pseudo)

MA138I

MP3888c

ND

ND

ND

  

-

3

No duplication

No duplication

No duplication

No duplication

No duplication

No duplication

  

-

4

No duplication

No duplication

No duplication

No duplication

No duplication

No duplication

  

-

5

No duplication

No duplication

No duplication

No duplication

No duplication

No duplication

J

Integral inner membrane protein, binding-protein-dependent transport systems inner membrane component signature, putative transporter protein, 19-27% homology

511

1

ML0047

ND

ND

MS29J

ND

ND

  

509

2

ML0036 (pseudo)

MA138J

MP3887c

ND

ND

ND

  

472

3

ML2529

MA141J

MP0290

MS32J

ND

ND

  

467

4

Deleted

MA504J

MP3448

MS8J

CORDtransporter

SC3C3.21

  

503

5

ML1539

MA310J

MP1795

ND

ND

ND

K

Mycosins, subtilisin-like cell-wall associated serine proteases, 43-49% homology

446

1

ML0041

ND

ND

MS65K

ND

ND

  

550

2

ML0037 (pseudo)

MA177K

MP3886c

ND

ND

ND

  

461

3

ML2528

MA141K

MP0291

MS32K

ND

ND

  

455

4

Deleted

MA439K

MP3449

MS8K

CORDsub

SC3C3.17c and SC3C3.08

  

585

5

ML1538

MA310K

MP1796

ND

ND

ND

L

2× amino-terminal transmembrane protein, 16-27% homology

462

1

ML0042

ND

ND

MS65L

ND

ND

  

537

2

ML0038 (pseudo)

MA177L

MP3885c

ND

ND

ND

  

331

3

ML2527

MA81L

MP0292

MS32L

ND

ND

  

-

4

No duplication

No duplication

No duplication

No duplication

No duplication

No duplication

  

406

5

ML1537

MA310L

MP1797

ND

ND

ND

Other region-specific genes of known functions (not assigned to a family)

Region 5 (not present in M. smegmatis, C. diphtheriae and S. coelicolor)

Rv1785c

Probable member of the cytochrome P450 family (pseudogene in M. leprae)

 

Rv1786

Probable ferredoxin (pseudogene in M. leprae)

Other region-specific genes of unknown functions (not assigned to a family)

Region 1(deleted in M. avium and M. paratuberculosis, not present in C. diphtheriae and S. coelicolor)

Rv3867

Unknown, annotated as part of MT3980 (Rv3866) in M. tuberculosis CDC1551 sequence with a frameshift (functional in M. leprae)

 

Rv3878

Unknown, some similarity to PPE family, deleted with RD1 deletion region in M. bovis BCG (pseudogene in M. leprae)

   

Rv3879c

Unknown, repetitive, highly proline-rich N-terminus, deleted with RD1 deletion region in M. bovis BCG (pseudogene in M. leprae)

   

Rv3880c

Unknown (functional in M. leprae)

   

Rv3881c

Unknown (pseudogene in M. leprae)

Region 4 (not present in S. coelicolor)

Rv3446c

Unknown, may contain a possible ABC transporter signature (deleted in M. leprae)

  1. *Names of genes of these organisms were given arbitrarily by the authors of this paper. †Gene not identified by BLAST, data obtained from [1], GenBank accession no. U34848 and AAC44033. ‡The gene is present in the sequence, but not annotated (name given arbitrarily by authors of this paper). §Genes identified by BLAST as well as data obtained from GenBank, accession no. AJ250015. ¶Orthologs in S. coelicolor are equally similar to family G and H. ND, Not detected - not necessarily absent from genome but possibly not detected because of unfinished sequencing process. No duplication, no duplication of this gene is present in this region. No sequence data, no sequence data is available for this organism, published deletion information is included ([1] and others). Deleted, deleted from the genome of this particular species or strain (# = deleted in only some strains of this species). Frame, frameshift. Stop, in-frame stop codon. Stop$, stop codon corresponds to stop codon in M. tuberculosis H37Rv, which splits gene into Rv3870 and Rv3871. Pseudo, confirmed pseudogene due to multiple frameshifts and stop codons.