Skip to main content

Advertisement

Table 3 Comparison of prevalent compositionally biased regions for the whole proteome, translated intergenic DNA, known proteins, hypothetical proteins and dORFs in budding yeast

From: A method to assess compositional bias in biological sequences and its application to prion-like glutamine/asparagine-rich domains in eukaryotic proteomes

(a) Proteome
Pbias < 1 × 10-5 Pbias < 1 × 10-9 Pbias < 1 × 10-13
S 37,006 S 18,502 S 10,630
E 21,163 E 9,147 T 5,900
L 18,064 T 6,836 E 4,704
K 17,067 N 6462 (9.3) Q 3,924 (10.4)
N 15,577 (7.4) Q 5,212 (7.5) N 3,745 (10.0)
A 13,974 K 4,280 P 2,049
G 12,927 P 3,831 K 1,910
D 10,004 L 3,512 D 1,292
P 9,892 D 3,176 G 961
T 9,866 A 2,473 A 916
F 8,934 G 2,115 L 554
Q 8,689 (4.1) C 810 C 256
I 6,939 F 764 R 204
R 5,333 H 662 H 195
V 4,121 R 509 M 163
C 3,293 I 264 F 94
Y 2,960 Y 262 V 90
H 2,645 M 245 Y 33
W 2,009 V 150 W 0
M 850 W 0 I 0
Total 211,313 Total 69,212 Total 37,620
(b) Translated igDNA*
Pbias < 1 × 10-5 Pbias < 1 × 10-9 Pbias < 1 × 10-13
F 28,949 F 5,692 F 1,211
C 10,074 C 1,280 H 602
K 7,800 H 908 V 490
R 7,551 V 814 T 448
Y 6,450 K 753 C 377
L 6,283 Y 690 L 366
I 3,789 T 681 Y 282
H 3,157 P 675 P 243
P 1,650 R 594 S 222
S 1,613 L 576 K 186
V 1,566 S 380 I 185
T 1,299 G 380 R 178
G 1,136 I 353 N 173 (3.2)
N 798 (0.9) W 299 G 166
W 746 N 242 (1.7) W 98
Q 498 (0.6) Q 125 (0.9) Q 51 (1.0)
M 282 E 85 E 39
A 268 M 26 D 16
E 241 D 16 M 15
D 16 A 0 A 0
Total 84,166 Total 14,569 Total 5,348
(c) Known yeast proteins
Pbias < 1 × 10-5 Pbias < 1 × 10-9 Pbias < 1 × 10-13
S 27,539 S 15,328 S 9,819
E 17,519 E 8,074 T 5,900
L 13,928 N 5,716 (9.9) E 4,289
K 13,785 T 5,413 N 3,551 (11.9)
N 12,854 (7.7) Q 4,520 (7.8) Q 3,348 (11.3)
A 12,482 K 3,653 K 1,723
G 11,783 L 2,864 P 1,669
D 1,934 L 595 P 170
P 1,883 P 453 G 62
Q 7,299 (4.4) A 2,434 G 899
P 7,045 G 1,969 L 451
F 6,154 C 608 C 207
I 5,495 H 530 H 162
R 3,973 R 447 R 155
V 3,415 F 443 M 113
C 2,400 I 264 F 78
Y 2,158 Y 218 V 0
H 1,536 M 195 Y 0
W 1,484 V 60 W 0
M 656 W 0 I 0
Total 166,920 (13) Total 57,938 (38) Total 33,070 (19)
(d) Hypothetical yeast proteins
Pbias < 1 × 10-5 Pbias < 1 × 10-9 Pbias < 1 × 10-13
S 8,621 S 2,958 T 1,240
L 3,905 T 1,423 S 772
E 3,630 E 1,073 Q 576 (13.7)
K 3,043 Q 680 (6.8) E 415
F 2,747 N 664 (6.6) D 262
N 2,506 (6.4) K 602 N 194 (4.6)
T 2,050 D 600 K 187
D 1,934 L 595 P 170
P 1,883 P 453 G 62
A 1,386 F 321 V 55
I 1,267 C 202 M 50
R 1,264 G 146 L 50
Q 1,171 (3.0) H 106 R 49
G 882 R 62 C 49
C 863 V 55 Y 33
H 528 M 50 H 33
W 514 Y 44 F 16
Y 512 A 14 A 0
V 389 V 0 W 0
M 179 W 0 I 0
Total 39,274 (16) Total 10,048 (221) Total 4,213 (150)
(e) dORFs
Pbias < 1 × 10-5 Pbias < 1 × 10-9 Pbias < 1 × 10-13
R 459 R 254 R 254
H 307 L 204 L 204
S 288 T 138 H 122
G 271 Q 129 (11.0) T 120
L 248 H 122 C 99
Q 225 (6.8) C 99 Q 74 (8.3)
T 208 S 82 N 23 (2.6)
N 172 (5.2) P 72 A 0
F 168 Y 50 D 0
C 163 N 23 (2.0) E 0
V 151 A 0 F 0
A 149 D 0 G 0
D 111 E 0 I 0
I 98 F 0 K 0
P 84 G 0 P 0
Y 67 I 0 S 0
E 45 K 0 V 0
K 37 M 0 Y 0
W 23 V 0 W 0
M 14 W 0 M 0
Total 3,288 Total 1,173 Total 896
  1. *Translated igDNA ('intergenic DNA') is conceptually translated in six frames. For analysis of intergenic DNA in budding yeast, we used the 'Not Feature' file of sequences in FASTA format distributed by SGD (this contains all genomic DNA that does not overlap an annotated feature [32]). This set of nucleotide sequences was conceptually translated in all six reading frames, and the amino-acid compositional biases were tallied up as for the annotated budding-yeast proteome. A dORF is an open reading frame that is disrupted by one or more frameshifts or premature stop codons, and which is likely to be a pseudogene. A data set of dORFs has been derived previously for the budding-yeast genome [9]. In the totals for known and hypothetical proteins, the number of bias residues per residue of protein is given in parentheses.