A method to assess compositional bias in biological sequences and its application to prion-like glutamine/asparagine-rich domains in eukaryotic proteomes

Harrison, Paul M; Gerstein, Mark

doi:10.1186/gb-2003-4-6-r40

Table 3 Comparison of prevalent compositionally biased regions for the whole proteome, translated intergenic DNA, known proteins, hypothetical proteins and dORFs in budding yeast

From: A method to assess compositional bias in biological sequences and its application to prion-like glutamine/asparagine-rich domains in eukaryotic proteomes

(a) Proteome
P_bias < 1 × 10^-5		P_bias < 1 × 10^-9		P_bias < 1 × 10^-13
S	37,006	S	18,502	S	10,630
E	21,163	E	9,147	T	5,900
L	18,064	T	6,836	E	4,704
K	17,067	N	6462 (9.3)	Q	3,924 (10.4)
N	15,577 (7.4)	Q	5,212 (7.5)	N	3,745 (10.0)
A	13,974	K	4,280	P	2,049
G	12,927	P	3,831	K	1,910
D	10,004	L	3,512	D	1,292
P	9,892	D	3,176	G	961
T	9,866	A	2,473	A	916
F	8,934	G	2,115	L	554
Q	8,689 (4.1)	C	810	C	256
I	6,939	F	764	R	204
R	5,333	H	662	H	195
V	4,121	R	509	M	163
C	3,293	I	264	F	94
Y	2,960	Y	262	V	90
H	2,645	M	245	Y	33
W	2,009	V	150	W	0
M	850	W	0	I	0
Total	211,313	Total	69,212	Total	37,620
(b) Translated igDNA*
P_bias < 1 × 10^-5		P_bias < 1 × 10^-9		P_bias < 1 × 10^-13
F	28,949	F	5,692	F	1,211
C	10,074	C	1,280	H	602
K	7,800	H	908	V	490
R	7,551	V	814	T	448
Y	6,450	K	753	C	377
L	6,283	Y	690	L	366
I	3,789	T	681	Y	282
H	3,157	P	675	P	243
P	1,650	R	594	S	222
S	1,613	L	576	K	186
V	1,566	S	380	I	185
T	1,299	G	380	R	178
G	1,136	I	353	N	173 (3.2)
N	798 (0.9)	W	299	G	166
W	746	N	242 (1.7)	W	98
Q	498 (0.6)	Q	125 (0.9)	Q	51 (1.0)
M	282	E	85	E	39
A	268	M	26	D	16
E	241	D	16	M	15
D	16	A	0	A	0
Total	84,166	Total	14,569	Total	5,348
(c) Known yeast proteins^†
P_bias < 1 × 10^-5		P_bias < 1 × 10^-9		P_bias < 1 × 10^-13
S	27,539	S	15,328	S	9,819
E	17,519	E	8,074	T	5,900
L	13,928	N	5,716 (9.9)	E	4,289
K	13,785	T	5,413	N	3,551 (11.9)
N	12,854 (7.7)	Q	4,520 (7.8)	Q	3,348 (11.3)
A	12,482	K	3,653	K	1,723
G	11,783	L	2,864	P	1,669
D	1,934	L	595	P	170
P	1,883	P	453	G	62
Q	7,299 (4.4)	A	2,434	G	899
P	7,045	G	1,969	L	451
F	6,154	C	608	C	207
I	5,495	H	530	H	162
R	3,973	R	447	R	155
V	3,415	F	443	M	113
C	2,400	I	264	F	78
Y	2,158	Y	218	V	0
H	1,536	M	195	Y	0
W	1,484	V	60	W	0
M	656	W	0	I	0
Total	166,920 (13)	Total	57,938 (38)	Total	33,070 (19)
(d) Hypothetical yeast proteins^†
P_bias < 1 × 10^-5		P_bias < 1 × 10^-9		P_bias < 1 × 10^-13
S	8,621	S	2,958	T	1,240
L	3,905	T	1,423	S	772
E	3,630	E	1,073	Q	576 (13.7)
K	3,043	Q	680 (6.8)	E	415
F	2,747	N	664 (6.6)	D	262
N	2,506 (6.4)	K	602	N	194 (4.6)
T	2,050	D	600	K	187
D	1,934	L	595	P	170
P	1,883	P	453	G	62
A	1,386	F	321	V	55
I	1,267	C	202	M	50
R	1,264	G	146	L	50
Q	1,171 (3.0)	H	106	R	49
G	882	R	62	C	49
C	863	V	55	Y	33
H	528	M	50	H	33
W	514	Y	44	F	16
Y	512	A	14	A	0
V	389	V	0	W	0
M	179	W	0	I	0
Total	39,274 (16)	Total	10,048 (221)	Total	4,213 (150)
(e) dORFs
P_bias < 1 × 10^-5		P_bias < 1 × 10^-9		P_bias < 1 × 10^-13
R	459	R	254	R	254
H	307	L	204	L	204
S	288	T	138	H	122
G	271	Q	129 (11.0)	T	120
L	248	H	122	C	99
Q	225 (6.8)	C	99	Q	74 (8.3)
T	208	S	82	N	23 (2.6)
N	172 (5.2)	P	72	A	0
F	168	Y	50	D	0
C	163	N	23 (2.0)	E	0
V	151	A	0	F	0
A	149	D	0	G	0
D	111	E	0	I	0
I	98	F	0	K	0
P	84	G	0	P	0
Y	67	I	0	S	0
E	45	K	0	V	0
K	37	M	0	Y	0
W	23	V	0	W	0
M	14	W	0	M	0
Total	3,288	Total	1,173	Total	896

*Translated igDNA ('intergenic DNA') is conceptually translated in six frames. For analysis of intergenic DNA in budding yeast, we used the 'Not Feature' file of sequences in FASTA format distributed by SGD (this contains all genomic DNA that does not overlap an annotated feature [32]). This set of nucleotide sequences was conceptually translated in all six reading frames, and the amino-acid compositional biases were tallied up as for the annotated budding-yeast proteome. A dORF is an open reading frame that is disrupted by one or more frameshifts or premature stop codons, and which is likely to be a pseudogene. A data set of dORFs has been derived previously for the budding-yeast genome [9]. ^†In the totals for known and hypothetical proteins, the number of bias residues per residue of protein is given in parentheses.

Back to article page