Skip to main content
Fig. 1 | Genome Biology

Fig. 1

From: Enhanced protein isoform characterization through long-read proteogenomics

Fig. 1

Challenges of protein isoform identification using MS-based proteomics. a Many peptides detected in MS-based proteomics map to multiple protein isoforms. Indistinguishable protein isoforms are represented as protein groups. This ambiguity limits the utility of MS-based proteomics for isoform detection. b The assumption of MS-based proteomics search algorithms is that the reference and sample isoforms match. Isoforms in the light blue boxes represent those annotated in a reference database. Isoforms in the light pink boxes represent isoforms that are actually expressed in a sample (which is unknowable using current technologies). When the reference and sample isoform are concordant (“Match”), protein identification can be accurate. c–f Reference-sample discordances can result in inferred proteins that are ambiguous or incorrect. Schematic of a case in which a sample contains a subset of isoforms in the reference database (“Subset,” c), additional isoforms (i.e., novel) not found in the reference database (“Superset,” d), a subset of isoforms but also additional novel isoforms (“Partial Overlap,” e), or only additional novel isoforms (“Distinct,” f). g Comparison of short versus long reads for proteogenomics analysis. Short-read RNA-Seq provides fragmented evidence of transcript isoforms, whereas long-read RNA-Seq provides full-length transcript sequences that can be used to predict full-length protein isoforms

Back to article page