Skip to main content

Table 3 Comparison of prevalent compositionally biased regions for the whole proteome, translated intergenic DNA, known proteins, hypothetical proteins and dORFs in budding yeast

From: A method to assess compositional bias in biological sequences and its application to prion-like glutamine/asparagine-rich domains in eukaryotic proteomes

(a) Proteome

Pbias < 1 × 10-5

Pbias < 1 × 10-9

Pbias < 1 × 10-13

S

37,006

S

18,502

S

10,630

E

21,163

E

9,147

T

5,900

L

18,064

T

6,836

E

4,704

K

17,067

N

6462 (9.3)

Q

3,924 (10.4)

N

15,577 (7.4)

Q

5,212 (7.5)

N

3,745 (10.0)

A

13,974

K

4,280

P

2,049

G

12,927

P

3,831

K

1,910

D

10,004

L

3,512

D

1,292

P

9,892

D

3,176

G

961

T

9,866

A

2,473

A

916

F

8,934

G

2,115

L

554

Q

8,689 (4.1)

C

810

C

256

I

6,939

F

764

R

204

R

5,333

H

662

H

195

V

4,121

R

509

M

163

C

3,293

I

264

F

94

Y

2,960

Y

262

V

90

H

2,645

M

245

Y

33

W

2,009

V

150

W

0

M

850

W

0

I

0

Total

211,313

Total

69,212

Total

37,620

(b) Translated igDNA*

Pbias < 1 × 10-5

Pbias < 1 × 10-9

Pbias < 1 × 10-13

F

28,949

F

5,692

F

1,211

C

10,074

C

1,280

H

602

K

7,800

H

908

V

490

R

7,551

V

814

T

448

Y

6,450

K

753

C

377

L

6,283

Y

690

L

366

I

3,789

T

681

Y

282

H

3,157

P

675

P

243

P

1,650

R

594

S

222

S

1,613

L

576

K

186

V

1,566

S

380

I

185

T

1,299

G

380

R

178

G

1,136

I

353

N

173 (3.2)

N

798 (0.9)

W

299

G

166

W

746

N

242 (1.7)

W

98

Q

498 (0.6)

Q

125 (0.9)

Q

51 (1.0)

M

282

E

85

E

39

A

268

M

26

D

16

E

241

D

16

M

15

D

16

A

0

A

0

Total

84,166

Total

14,569

Total

5,348

(c) Known yeast proteins†

Pbias < 1 × 10-5

Pbias < 1 × 10-9

Pbias < 1 × 10-13

S

27,539

S

15,328

S

9,819

E

17,519

E

8,074

T

5,900

L

13,928

N

5,716 (9.9)

E

4,289

K

13,785

T

5,413

N

3,551 (11.9)

N

12,854 (7.7)

Q

4,520 (7.8)

Q

3,348 (11.3)

A

12,482

K

3,653

K

1,723

G

11,783

L

2,864

P

1,669

D

1,934

L

595

P

170

P

1,883

P

453

G

62

Q

7,299 (4.4)

A

2,434

G

899

P

7,045

G

1,969

L

451

F

6,154

C

608

C

207

I

5,495

H

530

H

162

R

3,973

R

447

R

155

V

3,415

F

443

M

113

C

2,400

I

264

F

78

Y

2,158

Y

218

V

0

H

1,536

M

195

Y

0

W

1,484

V

60

W

0

M

656

W

0

I

0

Total

166,920 (13)

Total

57,938 (38)

Total

33,070 (19)

(d) Hypothetical yeast proteins†

Pbias < 1 × 10-5

Pbias < 1 × 10-9

Pbias < 1 × 10-13

S

8,621

S

2,958

T

1,240

L

3,905

T

1,423

S

772

E

3,630

E

1,073

Q

576 (13.7)

K

3,043

Q

680 (6.8)

E

415

F

2,747

N

664 (6.6)

D

262

N

2,506 (6.4)

K

602

N

194 (4.6)

T

2,050

D

600

K

187

D

1,934

L

595

P

170

P

1,883

P

453

G

62

A

1,386

F

321

V

55

I

1,267

C

202

M

50

R

1,264

G

146

L

50

Q

1,171 (3.0)

H

106

R

49

G

882

R

62

C

49

C

863

V

55

Y

33

H

528

M

50

H

33

W

514

Y

44

F

16

Y

512

A

14

A

0

V

389

V

0

W

0

M

179

W

0

I

0

Total

39,274 (16)

Total

10,048 (221)

Total

4,213 (150)

(e) dORFs

Pbias < 1 × 10-5

Pbias < 1 × 10-9

Pbias < 1 × 10-13

R

459

R

254

R

254

H

307

L

204

L

204

S

288

T

138

H

122

G

271

Q

129 (11.0)

T

120

L

248

H

122

C

99

Q

225 (6.8)

C

99

Q

74 (8.3)

T

208

S

82

N

23 (2.6)

N

172 (5.2)

P

72

A

0

F

168

Y

50

D

0

C

163

N

23 (2.0)

E

0

V

151

A

0

F

0

A

149

D

0

G

0

D

111

E

0

I

0

I

98

F

0

K

0

P

84

G

0

P

0

Y

67

I

0

S

0

E

45

K

0

V

0

K

37

M

0

Y

0

W

23

V

0

W

0

M

14

W

0

M

0

Total

3,288

Total

1,173

Total

896

  1. *Translated igDNA ('intergenic DNA') is conceptually translated in six frames. For analysis of intergenic DNA in budding yeast, we used the 'Not Feature' file of sequences in FASTA format distributed by SGD (this contains all genomic DNA that does not overlap an annotated feature [32]). This set of nucleotide sequences was conceptually translated in all six reading frames, and the amino-acid compositional biases were tallied up as for the annotated budding-yeast proteome. A dORF is an open reading frame that is disrupted by one or more frameshifts or premature stop codons, and which is likely to be a pseudogene. A data set of dORFs has been derived previously for the budding-yeast genome [9]. †In the totals for known and hypothetical proteins, the number of bias residues per residue of protein is given in parentheses.