- Short Report
- Open Access
Interferon inducible pseudouridine modification in human mRNA by quantitative nanopore profiling
Genome Biology volume 22, Article number: 330 (2021)
Pseudouridine (Ψ) is an abundant mRNA modification in mammalian transcriptome, but its functions have remained elusive due to the difficulty of transcriptome-wide mapping. We develop a nanopore native RNA sequencing method for quantitative Ψ prediction (NanoPsu) that utilizes native content training, machine learning modeling, and single-read linkage analysis. Biologically, we find interferon inducible Ψ modifications in interferon-stimulated gene transcripts which are consistent with a role of Ψ in enabling efficacy of mRNA vaccines.
Pseudouridine (Ψ) is the second most abundant mRNA modification in the mammalian transcriptome as measured by quantitative mass spectrometry  and may exert many cellular functions. For example, Ψ incorporation in synthetic, transfected reporter mRNA increases translation  through decreased activation of the RNA-dependent protein kinase (PKR) . The innate immune evading property of Ψ (and its methylated derivative N1-methyl-Ψ ) in mRNA is essential to the remarkable immunogenicity of successful COVID-19 mRNA vaccines .
Functional exploration and mechanistic investigation of mRNA Ψ modification requires appropriate mapping methods. Illumina sequencing of Ψ in mRNA relies on chemical RNA treatments that induce stop, mutation, or deletion signatures in cDNA synthesis [1, 5,6,7,8]. Many computational methods have been developed to map mRNA Ψ sites [9,10,11,12,13,14,15,16,17,18,19,20]. However, mRNA Ψ mapping is inconsistent among these studies, in part due to the high false positives and negatives generated by the chemical treatments. The read-length limitation of Illumina sequencing also narrows the possibility to examine Ψ usage in mRNA splice isoforms and the linkage of multiple Ψ sites in single molecules.
The emergence of nanopore sequencing enables direct interrogation of RNA modifications [21,22,23]. Additionally, nanopore sequencing can extend to the full length of the mRNA , revealing all modified sites in single RNA isoforms . Both signal strength and dwell time have been used to identify Ψ . Recently, a nanopore direct RNA sequencing method, nanoRMS was developed by Novoa and co-workers that employs characteristic base-calling “error” signatures in the nanopore data for Ψ mapping . NanoRMS identified new Ψ sites in mitochondrial rRNA, small nuclear RNA, small nucleolar RNA, and mRNA under normal and stress conditions in yeast and further, predicted stoichiometry via supervised learning. Although nanoRMS prediction of Ψ site incorporation using a threshold for base mismatch frequency is straightforward, it is unclear whether this approach can be applied to the mammalian transcriptomes, which are much larger than yeast, can contain introns and occur in multiple isoforms. For example, the standard Tombo software for nanopore data analysis is ineffective with spliced reads. Also, even though nanoRMS collects features from single reads, the single read features were averaged before Ψ prediction, erasing single molecule Ψ site incorporation information.
Results and discussion
The key to nanopore identification of RNA modification is to generate training data from known modification sites. Modification training data generation, however, requires reads from long transcripts with distinct sequence contexts at the modification site. To maximize our ability to obtain nanopore training data from as many distinct Ψ sites as possible, we generated a mixture of rRNAs from human, yeast, Caenorhabditis elegans, Drosophila, and from human fecal bacteria (Fig. 1a). We Illumina sequenced half of the mixture after fragmentation, using the bisulfite reaction  to map rRNA Ψ sites, providing a total of 2142 Ψ sites (Additional file 1: Fig. S1a, Additional file 2: Table S1). In Illumina sequencing of the bisulfite method, Ψ sites are found by RT deletions which enable identification and quantitative assessment of closely spaced rRNA Ψ sites; these sites are more difficult to assess using the more commonly used carbodiimide method that identifies Ψ sites by RT stops. Sequencing the remaining sample via direct RNA nanopore sequencing, we found that 640 of these Ψ sites passed our filter of 20 read coverage for further analysis (Additional file 1: Fig. S1b). The lower number of Ψ sites in nanopore sequencing was in part derived from the 3′ bias of the nanopore sequencing library design where all reads start from the 3′ end of the rRNA. These 640 sites were combined with 689 randomly chosen unmodified U sites as the training data set (Additional file 1: Fig. S1c). The modified and unmodified U sites contained 236 of the 256 NN(Ψ/U)NN different 5mer contexts.
We next extracted nanopore signal features suitable for Ψ identification. NanoRMS  found that Ψ had negligible effect on nanopore current signals, which means it is hard to directly identify Ψ from the current squiggles like m6A [23, 25, 28]. However, distinct features could be found for Ψ identification. For instance, like NanoRMS, we found that apparent mutation to C is a prominent signature for Ψ modification, with apparent deletion also significant for some Ψ sites (Figs. 1b, c). In total, we examined 12 features of base calling errors and found that Ψ sites tend to have lower base quality mean values and standard deviation (Fig. 1d). The ternary plot of mutation signatures confirmed Ψ sites having a strong preference to be read as a C but not A or G (Fig. 1e). The significance of these features was quantified in the correlation matrix of all features and the modification labels (Fig. 1f). We made an extremely randomized trees (EXT) model to carry out Ψ probability prediction for each U site. To decide the combination of features included in the model, we added one feature at a time in the order of their correlation strength with the modification label. This revealed that the performance of Ψ calling maximized when all 12 features were included (Fig. 1g). Using the optimized parameters of our EXT model, its performance was evaluated by the testing set with an area under curve (AUC) of 0.9383 (Fig. 1h, Additional file 1: Fig. S1d). We named our method nanopore investigation of pseudouridines or “NanoPsu.”
Interferons (IFNs), cytokines produced by nearly all cell types during viral and other microbial infections, play crucial roles regulating immune response . mRNA vaccines incorporate Ψ or m1Ψ to evade host cell foreign RNA sensing and enhance mRNA translation. Do endogenous mRNAs also use the same strategy through Ψ modification? IFNs can induce the expression of more than a thousand interferon stimulated gene (ISG) transcripts. ISGs include protein kinase R (PKR) which phosphorylates eIF2α to reduce global translation. It is well established that Ψ-modified reporter mRNA activates PKR much less than the same unmodified mRNA and is translated at much higher levels . We therefore hypothesize that ISG transcripts may have elevated levels of Ψ modification to enhance translation in the presence of PKR.
We tested this hypothesis by treating cells with either IFN-γ or IFN-β followed by nanopore sequencing (Additional file 1: Fig. S2a). IFN treatments worked well as determined by upregulation of surface MHC class I (Additional file 1: Fig. S2b). The mRNA expression levels of the biological replicates were highly correlated (Additional file 1: Fig. S2c). For improved coverage, we combined the nanopore data from the biological replicates for downstream analysis (Additional file 1: Fig. S2d). We found strongly upregulated mRNA transcripts upon IFN treatment that belong to the ISG genes with the expected gene ontology of interferon signaling pathway and viral defense (Fig. 2a, Additional file 1: Fig. S3a). These results indicate the feasibility of using nanopore sequencing to study the interferon response transcriptome.
We used the EXT model to predict Ψ modification probabilities in the transcriptome. In total, ~2.6 million U sites were analyzed in each transcriptome (Additional file 1: Fig. S3b). We found a “RAΨU” motif and the previous revealed  “GUΨC” motif among top Ψ sites in the untreated sample (Additional file 1: Fig. S3c). The Ψ sites belonging to “median” or higher groups in the previous study  showed significantly higher predicted Ψ probabilities than other U sites in the untreated sample (Additional file 1: Fig. S3d), indicating that our method provides valid prediction of Ψ. For the 500 sites with the highest probability of Ψ modification, the three samples shared some but also had distinct sites (Additional file 1: Fig. S3e). However, IFN treated samples had a wider range of GO terms than the untreated sample (Fig. 2b), suggesting that Ψ modification becomes more widespread to transcripts belonging to more diverse cellular processes. Going beyond the top 500 probable Ψ sites, globally the upregulated gene transcripts had higher average modification probabilities for IFN treated vs. untreated samples, with the magnitude of increase strongly correlated with the expression fold change (Figs. 2c, d, Additional file 1: Fig. S3f). Increased Ψ modification probability in a mRNA transcript could be attributed to increased number of Ψ sites and/or increased modification fraction of modified sites. The top 50 genes with highest increase in Ψ modification probability were related to the interferon pathway and anti-viral response (Fig. 2e), they included 88.5% of all genes with >10-fold increase and 60.9% of all genes with >5-fold increase in mRNA expression (Fig. 2f). These results are consistent with increased Ψ modification in the transcriptome upon interferon treatment enhancing ISG function.
We used a RT-qPCR method to validate the increased Ψ level in the ISG transcripts. Our method takes advantage of the standard Ψ detection method using N-cyclohexyl-N′-(2-morpholinoethyl) carbodiimide (CMC). The Ψ-CMC adduct introduces a RT stop in cDNA synthesis which reduces the amount of cDNA product compared to the control reaction without CMC. The differential amount of the cDNA product can then be precisely measured using real-time qPCR (scheme in Additional file 1: Fig. S3g). We first showed that the actin mRNA did not change its abundance nor its Ψ level, making it an appropriate internal control for comparing the Ψ levels of the ISG transcripts (Fig. 2g, Additional file 1: Fig. S3h, left panels). Ψ level increase in the ISG15 mRNA upon interferon treatment was validated upon normalization of its expression level and to the actin mRNA within the same sample (Fig. 2g, Additional file 1: Fig. S3h, right panels).
We performed single-read analysis for quantitative Ψ stoichiometry prediction and investigation of linking modification states of Ψ sites in single molecules of a mRNA transcript. Our training set contained the data points from previously reported , 100% modified human rRNA Ψ sites and randomly selected unmodified U sites. We used the same set of features and the same EXT algorithm, while we replaced all the “rate” features (like “mismatch ratio”) with “indicator” features (like “mismatch-or-not”) (Additional file 1: Fig. S4a). We tested the Ψ stoichiometry prediction from single reads on 22 partially modified Ψ sites (5–85%) and found that the predicted stoichiometry matched well with the previous reported stoichiometry obtained by LC/MS  (Fig. 2h).
A new aspect of our method is the ability to perform single-read analysis that links occurrence of multiple Ψ sites in individual mRNA transcripts. We examined whether pairs of Ψ site modifications are linked either positively or negatively, meaning whether the modification state of site 2 is affected by the modification state of site 1 and vice versa. We selected 31 positions in the B2M transcript (which encodes the common small subunit of MHC class I molecules) and checked pairwise linkage by two sample Kolmogorov-Smirnov test. In most cases, the maximum distance D value from two sample K-S test was small (Additional file 1: Fig. S4b), which is consistent with the presence of Ψ at site 1 being independent of Ψ at site 2, these two sites are not linked. An example of a specific unlinked pair is shown in Fig. 2i, j (right panels). A few pairs of sites had high D values, but most of those were immediately adjacent Ψ sites. The pairs of Ψ sites with negative linkage tend to avoid each other in the same mRNA molecule (Additional file 1: Fig. S4c). An example of a specific negatively linked pair is shown in Fig. 2i, j (left panels) where simultaneous Ψ occurrence at both sites is very rare. This result indicates that the modification of Ψ at two sites in single molecule transcripts is negatively related for some and completely independent for others. Upon IFN treatment, the linkage between some sites in the B2M transcript became more prominent (Fig. 2k), suggesting that IFN-induced Ψ installation has stronger co-dependency.
In summary, we generated a supervised-learning-based protocol to predict Ψ modification in the human transcriptome and analyzed Ψ on single reads which allows for the evaluation of stoichiometry and linkage between distal Ψ sites in the same mRNA molecule. Human genome contains 13 confirmed and putative Ψ installation enzymes , suggesting that Ψ installation is a highly robust and dynamic process in human cells. How these enzymes coordinate or antagonize their activities remains to be determined. We found a biological response of Ψ modification change in endogenous mRNA upon IFN treatment which is consistent with Ψ playing a role in IFN signaling pathway and viral defense.
Stool sample collection and total RNA extraction
Stool specimens were self-collected by 1 female volunteer using a commercial “toilet hat” stool specimen collection kit (Fisherbrand Commode Specimen Collection System; Thermo Fisher Scientific, 02-544-208). Specimens were immediately transported to the laboratory (<1h) and thoroughly homogenized. A 100 mg of stool was transferred into a cryovial using a sterile spatula, and 700 μl RNAlater Stabilization solution was added. Specimens were stored at −80°C until extraction.
RNAlater was first removed from stool sample by centrifugation at 17,200 rcf for 10 min at 4°C. Pelleted material was lysed in 400 μL of 0.3M NaOAc/HOAc, 10mM EDTA, and pH 4.8 with an equal volume of acetate-saturated phenol to chloroform pH 4.5 (Invitrogen, AM9722). After addition of 1.0 mm glass lysing beads (Bio-Spec Products, 11079110) in a 1:1 ratio (bead to sample weight), samples were placed in a reciprocating bead beater (Mini-Beadbeater-16, Bio-Spec Products) for two 1-min intervals on maximum intensity. Samples were centrifuged at 17,200 rcf for 15 min at 4°C before re-extraction and isopropanol precipitation of total RNA. Pellets were washed with 75% ethanol before resuspension in an acid-buffered elution buffer (10mM NaOAc, 1mM EDTA, pH 4.8).
rRNA mixture sample preparation
A mixture of human HEK293T, yeast BY4741 strain, Drosophila S2 cells, and C. elegans whole animal and stool microbiome total RNA was made by mixing 1 μg RNA from each model organism sample and 8 μg total RNA from a stool microbiome sample. ZYMO RNA Clean & Concentrator-5 (R1013) kit was used on this mixture to remove all small RNAs <200nt. The final sample was eluted with 20 μl RNase-Free H2O. The mixture was split into two halves. One half was used for Illumina sequencing (see below). For nanopore sequencing, the other half was polyadenylated by yeast Poly(A) Polymerase (ThermoFisher 74225Z25KU) by incubation with 0.48 mM ATP, 20 U/μL Poly(A) Polymerase, and 1x Poly(A) Polymerase Reaction Buffer at 37°C for 15 min. The product was size selected using ZYMO RNA Clean & Concentrator-5 (R1013) kit, and RNA molecules >200nt were retained. The sample was eluted with 20 μL RNase-free H2O. Then, ~500 ng of this rRNA mixture was used for nanopore direct RNA seq library preparation and nanopore direct RNA sequencing described below.
rRNA mixture Illumina sequencing and mapping
For Illumina sequencing, bisulfite treatment was performed as described previously . Ψ modification was identified through the deletion at the Ψ site in the sequencing data. Raw reads were demultiplexed via a 4nt barcode on read 2 using je suite  with the following parameters: je demultiplex F1=#read1 F2=$read2 BF=$barcode_key BPOS=BOTH BM=READ_2 LEN=6:4 O=$output. Only read 2 from paired-end reads were mapped with bowtie2 (version: bowtie2-126.96.36.199-linux-x86_64)  using the following parameters: bowtie2 -x $reference -U $read2 -S $ouput -q -p 10 --local --no-unal. Reads were mapped to either a set of rRNA from model organisms or a set of bacterial rRNA reads: rfam family RF02541 (bacterial large subunit) and RF00177 (bacterial small subunit). SAM files from bacterial rRNAs were processed with a custom python script to count the total number of reads mapping to each sequence. Only sequences with >1000 reads were processed further. Model organism rRNA sequences from human (NCBI: NR_003286.4, NR_003287.4), yeast (RNACentral: URS00005F2C2D_559292, URS000061F377_4932), C. elegans (RNACentral: URS00005A42AA_6239, URS00008C9AB9_6239), and Drosophila (RNACentral: URS000030AF9A_7227, URS000008C6A9_7227) to form a reference genome for bowtie mapping. Bowtie2 output “sam” files were converted to sorted bam files with samtools . IGV was used to calculate deletion rates with the following parameters: igvtools  count -z 5 -w 1 -e 250 --bases $input $output $reference. Custom python scripts were used to reformat the “wig” file.
Nanopore direct RNA seq library preparation and sequencing
The library preparation followed the protocol of Direct RNA Sequencing Kit (SQK-RNA002) provided by Oxford Nanopore Technology. Briefly, ~500 ng of Poly(A)+ RNA sample was used for each run. Each single run contained one biological replicate of one sample. The RT Adaptor (RTA) was ligated to the 3′ end of Poly(A)+ RNA by T4 DNA ligase (NEB M0202S) and then reverse transcribed by SuperScript III Reverse Transcriptase (ThermoFisher 12574018). The RNA was purified by 1.8x RNAClean XP beads (72 μL) (Beckman Coulter A63987) and then the RNA Adaptor (RMX) was ligated to the 3′ end of Poly(A)+ RNA using T4 DNA ligase (NEB M0202S) and then the RNA was purified with 1x RNAClean XP beads (40 μL). The sample was eluted with 21 μl Elution Buffer. Then, the sample was loaded onto a R9.4.1 flow cell (FLO-MIN106D) in a MinION sequencer. Each flow cell was sequenced for 72 h.
Nanopore data pre-processing
All raw fast5 files generated during sequencing were uploaded to Midway2 cluster for the following steps. Reads were base called by guppy base caller (version 3.2.2+9fe0a78) with min_qscore 7. The reads were aligned to by minimap2 (version 2.18-r1015)  with parameters -ax splice -uf -k14. The rRNA mixture reads are aligned to the same reference as the rRNA Illumina seq data described above. The human mRNA reads are aligned to the hg38 human genome reference (GRCh38.p13). The mapped reads were piled up to the reference chromosomes by samtools (v1.11). The “error” features were extracted from the mpileup files by customized python scripts (https://github.com/sihaohuanguc/Nanopore_psU).
For nanopore seq data of rRNA, all sites mapped to “T” in the reference with >20 coverage made up the data pool. 640 Ψ sites revealed by Illumina sequencing and 689 randomly selected U sites from the data pool made up the model training dataset. The dataset was divided into 60% training set, 20% validation set, and 20% testing set. The Ψ modification prediction models were generated by training set and validated with the validation set by extremely randomized trees (EXT) models with 1–12 features and customized parameters. Then, the models were applied to predict Ψ modification probabilities of the testing set and evaluated by AUC of ROC (receiver operating characteristic) curves derived from the predicted probabilities of the testing set. The final model used EXT algorithm (n_estimators=200, criterion= “gini”, max_depth=None, min_samples_split=2) with 12 features.
HeLa cell culture and interferon treatment
HeLa cells (ATCC, authenticated and tested for mycoplasma contamination) were cultured in the presence of 500 U/mL human interferon gamma (IFN γ, Peprotech), 500 U/mL human interferon beta (IFN β, Peprotech), or left untreated, with biological duplicates for each. Cells were incubated for 24 h, and an aliquot of each was processed for flow cytometry. Cells were washed into a flow cytometry staining buffer (FBS-containing RPMI and Hanks’ Balanced Salt Solution) containing the anti-pan-MHC-I antibody W6/32 (BioXcell) conjugated with AlexaFluor 647 (Invitrogen). Cells were then washed 3× and analyzed by a Fortessa X-20 (BD Biosciences) to determine upregulation of MHC class I. The rest of the cells were used for RNA extraction via the RNeasy Mini kit (Qiagen) following the manufacturer’s protocol. RNA was eluted in pure water and quantified by Nanodrop (Thermo). PolyA+ RNA from 50 μg total RNA of each sample was extracted by Promega PolyATtract® mRNA Isolation Systems Z5310. Each sample was eluted with 15 μL H2O.
Prediction of Ψ in HeLa samples
The raw data of two replicates for the untreated, IFN γ-treated, and IFN β-treated samples were merged after aligned to the hg38 human genome reference (GRCh38.p13). The merged samples were down sampled so that they have almost the same number of reads and are directly comparable. The Ψ modification probabilities of all sites mapped to “T” in the reference with over 20 coverage were evaluated by the EXT model generated with the rRNA mixture sample. The coverage independence of Ψ probability was examined by down sampling all sites of the samples to similar coverages (expectation = 30) using different random seeds. We found that the change in mean Ψ probability of the transcripts maintained the same after down sampling. The coverage completeness of the transcripts was checked by counting the U sites predicted in the samples . For the untreated sample, the U sites within 5′UTR, CDS, and 3′UTR represented 2.43%, 42.84%, and 54.73% of all U sites, respectively. The gene information was provided by the comprehensive gene annotation file (gencode.v37.annotation.gff3) in the GENCODE database (https://www.gencodegenes.org) . The gene ontology (GO) analysis was performed using the Gene Ontology Resource (http://geneontology.org) [40, 41]. The sequence logo plots were generated by MEME (https://meme-suite.org/meme/tools/meme) .
CMC-mediated RT-qPCR (CRP) validation of Ψ level in mRNA transcripts
qPCR primers were designed using NCBI Primer-BLAST tool (https://www.ncbi.nlm.nih.gov/tools/primer-blast/). Two to 3 sets of primers were selected to cover the 3′ end, middle, and 5′ end region of the whole transcript. qPCR was performed with TaqMan style fluorescent probes. Probes for each PCR primer pair were designed using IDT PrimerQuest tool (https://www.idtdna.com/pages/tools/primerquest) and examined using NCBI nucleotide BLAST. Primers and probes were purchased from IDT. Actin (NM_001101.5) and ISG15 (NM_005101.4) transcripts were selected for Ψ validation. Below is the list of the sequences of qPCR primers and probes.
ISG15 primer1-Forward: GTGGACAAATGCGACGAACC
ISG15 primer1-Reverse: ATTTCCGGCCCTTGATCCTG
ISG15 probe1: 5′-/56-FAM/TCC TGG TGA/ZEN/GGA ATA ACA AGG GCC/3IABkFQ/-3′
ISG15 primer2-Forward: GCGCAGATCACCCAGAAGAT
ISG15 primer2-Reverse: GTTCGTCGCATTTGTCCACC
ISG15 probe2: 5′-/56-FAM/TTC CAG CAG/ZEN/CGT CTG GCT GT/3IABkFQ/-3′
ISG15 primer3-Forward: CAGCGAACTCATCTTTGCCAG
ISG15 primer3-Reverse: GACACCTGGAATTCGTTGCC
ISG15 probe3: 5′-/56-FAM/TGG GAC CTG/ZEN/ACG GTG AAG ATG C/3IABkFQ/-3′
ACTB primer1-Forward: ACAGGAAGTCCCTTGCCATC
ACTB primer1-Reverse: CAGTGTACAGGTAAGCCCTGG
ACTB probe1:5′-/56-FAM/ACA CGA AAG/ZEN/CAATGCTATCACCTCCC/31ABkFQ/-3′
ACTB primer2-Forward: AGATGTGGATCAGCAAGCAGG
ACTB primer2-Reverse: GGGGGATGCTCGCTCCA
ACTB probe2: 5'-/56-FAM/TCG TCC ACC/ZEN/GCA AAT GCT TCT AGG/31ABkFQ/-3′
CMC-mediated RT-qPCR (CRP) experiment
CMC [N-cyclohexyl-N′-(2-morpholinoethyl) carbodiimide] treatment was done as previously described . 1.5 μg of untreated, IFNβ-treated, and IFNγ-treated total RNA in 12 μl was mixed with 24 μl TEU buffer (50 mM Tris-HCl (pH 8.3), 4 mM EDTA, 7 M urea) in microcentrifuge tubes. Four microliters of freshly made 1 M CMC (Sigma, C1011) in TEU buffer or 4 μl TEU buffer was added to each sample for +CMC or −CMC treatment, respectively. The sample mixture in 40 μl 0.7× TEU was incubated at 37°C for 1 h. The mixture was diluted to 200 μl with 160 μl of 50 mM KOAc (pH 7) and 200 mM KCl. One microliter of 5 μg/μl glycogen and 550 μl ethanol were added to the mixture to precipitate RNA at −80°C for >2 h. The mixture was then centrifuged at highest speed (17000×g) for 30 min. The RNA precipitate was mixed with 500 μl 75% ethanol and kept at −80°C for >2 h followed by centrifugation at 17000×g for 30 min. The washing step was repeated once. The RNA precipitate was mixed with 50 μl of 50 mM Na2CO3 and 2 mM EDTA (pH 10.4) and incubated at 37°C for 6 h to remove CMC-U/CMC-G adducts. The RNA was purified using Zymo RNA Clean and Concentrator column (Zymo, R1014) with in-column DNase treatment by following the manufacturer’s manual. The RNA was eluted in 11 μl sterile H2O. The concentration of the ±CMC treated RNA was measured using Nanodrop, and equal amount (~300 ng) of total RNA was used for RT-qPCR experiment.
Eleven microliters of 300 ng ±CMC-treated total RNA from untreated/IFNβ/IFNγ samples were mixed with 1 μl 50 μM 5′T22VN (V=A,C,G, N=A,C,G,T) primer (IDT) and 1 μl 10 mM dNTP mix. The mixtures were incubated at 65°C in thermal cycler for 5 mins followed by incubation at room temperature for 3 min. The PCR tubes were kept on ice until the addition of the SuperScript IV RT mix. 7 μl RT mix was prepared for each sample by combining 4 μl 5× SSIV Buffer, 1 μl 100 mM DTT, 1 μl RNaseOUT RNase inhibitor, and 1 μl SSIV reverse transcriptase. 7 μl RT mix was added to each PCR tube. The tubes were incubated at 55°C in thermal cycler for 1.5 h. The PCR tubes were then incubated at 80°C for 10 min followed by incubation on ice immediately to deactivate RT. 45 μl sterile H2O was added to each tube to dilute the RT mixture to 65 μl, and 2 μl was used for qPCR reaction.
qPCR reaction was performed in 10 μl consisting of 5 μl 2× PrimeTime Gene Expression Master Mix (IDT, 1055772), 2 μl RT mix, and 3 μl primer and probe mix. Three microliters of primer and probe mix (1.5 μM each PCR primer and 0.6 μM probe) was first added into each well of 384-well plate or 96-well plate. RT mix of each sample and 2× PrimeTime Gene Expression Master Mix were mixed at 2:5 ratio to make master mix based on the number of qPCR reactions for each sample. Seven microliters of the template and PrimeTime master mix were then added to each well. The plate was spun on a swing bucket plate centrifuge at 3000 RPM for 2 min. qPCR reaction was performed on Bio-Rad CFX384 or CFX96 qPCR machine for 40 cycles. Cq/CT values were obtained for follow-up data analysis.
Relative Ψ levels for ISG15 transcript was calculated using ACTB-1 as internal reference. First, we obtained ΔCq(-) = Cq(ISG15,-CMC) - Cq(ACTB,-CMC), and ΔCq(+) = Cq(ISG15,+CMC) - Cq(ACTB,+CMC); then, we obtained ΔΔCq(ISG15) = ΔCq(+) - ΔCq(-). The relative Ψ level is represented as 2^ ΔΔCq(ISG15).
Single read Ψ prediction model training
The 100% modified human rRNA sites were reported in a previously work measured by quantitative LC/MS . A basic assumption was that all reads in our human rRNA sample would have Ψ at the reported 100% modified sites and U at the reported completely unmodified sites. The dataset for training contained 25 100% Ψ sites with 49,437 data points and 26 randomly selected U sites with 50,922 data points. The dataset was divided into 60% training set, 20% validation set, and 20% testing set. Features were extracted from each base in each read. The features describing the ratios in bulk prediction model were replaced with features indicating the mismatching and indel states of the base. The Ψ modification prediction models were generated by training set and validated with the validation set using the EXT algorithm (n_estimators=200, criterion= “gini”, max_depth=None, min_samples_split=2) with 10 features, which are insertion_ot_not, insertion_length, deletion_or_not, deletion_length, deleted_site_or_not, mismatch_or_not, mutate_to_A, mutate_to_C, mutate_to_G, base quality score. The AUC value for the prediction of testing set was 0.8269. To further evaluate the model, Ψ modification probabilities of data points from 22 previously reported , partially modified human rRNA Ψ sites (modification fraction from 5 to 85%) were predicted. The base was viewed as Ψ when the probability was larger than 0.5 and as U when the probability was less than 0.5. The stoichiometry of each site was calculated as the number of predicted Ψ bases divided by the coverage of the site.
Single read Ψ analysis in HeLa samples
The Ψ probabilities of all U residues in selected genes were predicted with the protocol above. To investigate the linkage of multiple Ψ on single reads, each read was indexed so that the U data points with the same read index were from the same read. Ψ probabilities of residues of a certain site were fitted by Gaussian mixture model (GMM) with 2 components. The sites with abs(μ1-μ2)>0.5 and λ1 and λ2>0.05 were selected for following analysis. When doing pair wise linkage analysis, the reads were assigned into “Ψ” and “U” groups when it had >95% posterior probability for one population in the GMM for site 1. To evaluate whether there was a difference in the Ψ probabilities distribution of site 2 upon the presence or absence of Ψ at site 1, two sample Kolmogorov-Smirnov test was performed on the Ψ probabilities cumulative distribution curves of site 2 in the “Ψ” and “U” groups with an output of the maximum distance D value and p value. The R library to do a two-sample Kolmogorov-Smirnov test was from GitHub (https://rdrr.io/github/happyrabbit/DataScienceR/man/pairwise_ks_test.html).
Availability of data and materials
The datasets generated and analyzed during the current study are available in the NCBI GEO database under the accession GSE180656 . The scripts for “NanoPsu” Ψ prediction package are available on GitHub  (https://github.com/sihaohuanguc/Nanopore_psU, GNU General Public License v2.0) or Zenodo  (DOI: 10.5281/zenodo.5711328).
Li X, Zhu P, Ma S, Song J, Bai J, Sun F, et al. Chemical pulldown reveals dynamic pseudouridylation of the mammalian transcriptome. Nat Chem Biol. 2015;11(8):592–7. https://doi.org/10.1038/nchembio.1836.
Karikó K, Muramatsu H, Welsh FA, Ludwig J, Kato H, Akira S, et al. Incorporation of pseudouridine into mRNA yields superior nonimmunogenic vector with increased translational capacity and biological stability. Mol Ther. 2008;16(11):1833–40. https://doi.org/10.1038/mt.2008.200.
Anderson BR, Muramatsu H, Nallagatla SR, Bevilacqua PC, Sansing LH, Weissman D, et al. Incorporation of pseudouridine into mRNA enhances translation by diminishing PKR activation. Nucleic Acids Res. 2010;38(17):5884–92. https://doi.org/10.1093/nar/gkq347.
Jackson LA, Anderson EJ, Rouphael NG, Roberts PC, Makhene M, Coler RN, et al. An mRNA vaccine against SARS-CoV-2—preliminary report. New England J Med. 2020;383(20):1920–31. https://doi.org/10.1056/NEJMoa2022483.
Carlile TM, Rojas-Duran MF, Zinshteyn B, Shin H, Bartoli KM, Gilbert WV. Pseudouridine profiling reveals regulated mRNA pseudouridylation in yeast and human cells. Nature. 2014;515(7525):143–6. https://doi.org/10.1038/nature13802.
Schwartz S, Bernstein DA, Mumbach MR, Jovanovic M, Herbst RH, León-Ricardo BX, et al. Transcriptome-wide mapping reveals widespread dynamic-regulated pseudouridylation of ncRNA and mRNA. Cell. 2014;159(1):148–62. https://doi.org/10.1016/j.cell.2014.08.028.
Zhou KI, Clark WC, Pan DW, Eckwahl MJ, Dai Q, Pan T. Pseudouridines have context-dependent mutation and stop rates in high-throughput sequencing. RNA Biol. 2018;15(7):892–900. https://doi.org/10.1080/15476286.2018.1462654.
Khoddami V, Yerra A, Mosbruger TL, Fleming AM, Burrows CJ, Cairns BR. Transcriptome-wide profiling of multiple RNA modifications simultaneously at single-base resolution. Proc Natl Acad Sci. 2019;116(14):6784–9. https://doi.org/10.1073/pnas.1817334116.
Li F, Guo X, Jin P, Chen J, Xiang D, Song J, et al. Porpoise: a new approach for accurate prediction of RNA pseudouridine sites. Brief Bioinform. 2021;22(6). https://doi.org/10.1093/bib/bbab245.
Salem DH, Acevedo D, Daulatabad SV, Mir Q, Janga SC. Penguin: a tool for predicting pseudouridine sites in direct RNA nanopore sequencing data. bioRxiv. 2021. https://doi.org/10.1101/2021.03.31.437901.
Li Y-H, Zhang G, Cui Q. PPUS: a web server to predict PUS-specific pseudouridine sites. Bioinformatics. 2015;31(20):3362–4. https://doi.org/10.1093/bioinformatics/btv366.
Chen W, Tang H, Ye J, Lin H, Chou K-C. iRNA-PseU: identifying RNA pseudouridine sites. Mol Ther Nucleic Acids. 2016;5:e332. https://doi.org/10.1038/mtna.2016.37.
He J, Fang T, Zhang Z, Huang B, Zhu X, Xiong Y. PseUI: pseudouridine sites identification based on RNA sequence information. BMC Bioinform. 2018;19(1):1–11. https://doi.org/10.1186/s12859-018-2321-0.
Tahir M, Tayara H, Chong KT. iPseU-CNN: identifying RNA pseudouridine sites using convolutional neural networks. Mol Ther Nucleic Acids. 2019;16:463–70. https://doi.org/10.1016/j.omtn.2019.03.010.
Liu K, Chen W, Lin H. XG-PseU: an eXtreme gradient boosting based method for identifying pseudouridine sites. Mol Genet Genom. 2020;295(1):13–21. https://doi.org/10.1007/s00438-019-01600-9.
Bi Y, Jin D, Jia C. EnsemPseU: identifying pseudouridine sites with an ensemble approach. Ieee Access. 2020;8:79376–82. https://doi.org/10.1109/ACCESS.2020.2989469.
Lv Z, Zhang J, Ding H, Zou Q. RF-PseU: a random forest predictor for RNA pseudouridine sites. Front Bioeng Biotechnol. 2020;8:134. https://doi.org/10.3389/fbioe.2020.00134.
Khan SM, He F, Wang D, Chen Y, Xu D. MU-PseUDeep: a deep learning method for prediction of pseudouridine sites. Comput Struct Biotechnol J. 2020;18:1877–83. https://doi.org/10.1016/j.csbj.2020.07.010.
Song B, Tang Y, Wei Z, Liu G, Su J, Meng J, et al. PIANO: a web server for pseudouridine-site (Ψ) identification and functional annotation. Front Genet. 2020;11:88. https://doi.org/10.3389/fgene.2020.00088.
Song B, Chen K, Tang Y, Ma J, Meng J, Wei Z. PSI-MOUSE: predicting mouse pseudouridine sites from sequence and genome-derived features. Evol Bioinform. 2020;16:1176934320925752. https://doi.org/10.1177/1176934320925752.
Garalde DR, Snell EA, Jachimowicz D, Sipos B, Lloyd JH, Bruce M, et al. Highly parallel direct RNA sequencing on an array of nanopores. Nat Methods. 2018;15(3):201–6. https://doi.org/10.1038/nmeth.4577.
Liu H, Begik O, Lucas MC, Ramirez JM, Mason CE, Wiener D, et al. Accurate detection of m 6 A RNA modifications in native RNA sequences. Nat Comm. 2019;10(1):1–9. https://doi.org/10.1038/s41467-019-11713-9.
Workman RE, Tang AD, Tang PS, Jain M, Tyson JR, Razaghi R, et al. Nanopore native RNA sequencing of a human poly (A) transcriptome. Nat Methods. 2019;16(12):1297–305. https://doi.org/10.1038/s41592-019-0617-2.
Drexler HL, Choquet K, Churchman LS. Splicing kinetics and coordination revealed by direct nascent RNA sequencing through nanopores. Mol Cell. 2020;77:985–98. e988. https://doi.org/10.1016/j.molcel.2019.11.017.
Lorenz DA, Sathe S, Einstein JM, Yeo GW. Direct RNA sequencing enables m6A detection in endogenous transcript isoforms at base-specific resolution. RNA. 2020;26(1):19–28. https://doi.org/10.1261/rna.072785.119.
Fleming AM, Mathewson NJ, Howpay Manage SA, Burrows CJ. Nanopore dwell time analysis permits sequencing and conformational assignment of pseudouridine in SARS-CoV-2. ACS Central Sci. 2021;7(10):1707–17. https://doi.org/10.1021/acscentsci.1c00788.
Begik O, Lucas MC, Pryszcz LP, Ramirez JM, Medina R, Milenkovic I, et al. Quantitative profiling of pseudouridylation dynamics in native RNAs with nanopore sequencing. Nat Biotechnol. 2021;39(10):1–14. https://doi.org/10.1038/s41587-021-00915-6.
Jenjaroenpun P, Wongsurawat T, Wadley TD, Wassenaar TM, Liu J, Dai Q, et al. Decoding the epitranscriptional landscape from native RNA sequences. Nucleic Acids Res. 2021;49(2):e7. https://doi.org/10.1093/nar/gkaa620.
Lee AJ, Ashkar AA. The dual nature of type I and type II interferons. Front Immunol. 2018;9:2061. https://doi.org/10.3389/fimmu.2018.02061.
Safra M, Nir R, Farouq D, Slutskin IV, Schwartz S. TRUB1 is the predominant pseudouridine synthase acting on mammalian mRNA via a predictable and conserved code. Genome Res. 2017;27(3):393–406. https://doi.org/10.1101/gr.207613.116.
Taoka M, Nobe Y, Yamaki Y, Sato K, Ishikawa H, Izumikawa K, et al. Landscape of the complete RNA chemical modifications in the human 80S ribosome. Nucleic Acids Res. 2018;46(18):9289–98. https://doi.org/10.1093/nar/gky811.
Borchardt EK, Martinez NM, Gilbert WV. Regulation and function of RNA pseudouridylation in human cells. Ann Rev Genet. 2020;54(1):309–36. https://doi.org/10.1146/annurev-genet-112618-043830.
Girardot C, Scholtalbers J, Sauer S, Su S-Y, Furlong EE. Je, a versatile suite to handle multiplexed NGS libraries with unique molecular identifiers. BMC Bioinform. 2016;17(1):1–6. https://doi.org/10.1186/s12859-016-1284-2.
Langmead B, Salzberg SL. Fast gapped-read alignment with Bowtie 2. Nature methods. 2012;9(4):357–9. https://doi.org/10.1038/nmeth.1923.
Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, et al. The sequence alignment/map format and SAMtools. Bioinformatics. 2009;25(16):2078–9. https://doi.org/10.1093/bioinformatics/btp352.
Robinson JT, Thorvaldsdóttir H, Winckler W, Guttman M, Lander ES, Getz G, et al. Integrative genomics viewer. Nat Biotechnol. 2011;29(1):24–6. https://doi.org/10.1038/nbt.1754.
Li H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics. 2018;34(18):3094–100. https://doi.org/10.1093/bioinformatics/bty191.
Quinlan AR, Hall IM. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics. 2010;26(6):841–2. https://doi.org/10.1093/bioinformatics/btq033.
Frankish A, Diekhans M, Jungreis I, Lagarde J, Loveland JE, Mudge JM, et al. GENCODE 2021. Nucleic Acids Res. 2021;49(D1):D916–23. https://doi.org/10.1093/nar/gkaa1087.
Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, et al. Gene ontology: tool for the unification of biology. Nat Genet. 2000;25(1):25–9. https://doi.org/10.1038/75556.
Gene Ontology Consortium. The Gene Ontology resource: enriching a GOld mine. Nucleic Acids Research. 2021;49(D1):D325–34. https://doi.org/10.1093/nar/gkaa1113.
Bailey TL, Johnson J, Grant CE, Noble WS. The MEME suite. Nucleic Acids Res. 2015;43(W1):W39–49. https://doi.org/10.1093/nar/gkv416.
Zhang W, Eckwahl MJ, Zhou KI, Pan T. Sensitive and quantitative probing of pseudouridine modification in mRNA and long noncoding RNA. Rna. 2019;25(9):1218–25. https://doi.org/10.1261/rna.072124.119.
Huang S, Zhang W, Katanski CD, Dersh D, Dai Q, Lolans K, Yewdell J, Eran AM, Pan T. Interferon inducible pseudouridine modification in human mRNA by quantitative nanopore profiling. GSE180656. Gene Expression Omnibus. https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE180656 (2021).
Huang S, Zhang W, Katanski CD, Dersh D, Dai Q, Lolans K, Yewdell J, Eran AM, Pan T. Nanopore_psU. Github. https://github.com/sihaohuanguc/Nanopore_psU (2021)
Huang S, Zhang W, Katanski CD, Dersh D, Dai Q, Lolans K, Yewdell J, Eran AM, Pan T. Nanopore_psU. https://zenodo.org/record/5711328#.YZaoBy1h2Tc (2021)
We thank Jordan Brown and Dr. Heng-Chi Lee for the C. elegans total RNA.
Peer review information
Barbara Cheifet was the primary editor of this article and managed the editorial process and peer review in collaboration with the rest of the editorial team.
The review history is available as Additional file 3.
Twitter handle: @merenbey (A. Murat Eren)
This work was supported by the grant from the NIH (RM1 HG008935 to T.P.). D.D. and J.W.Y. are supported by the Division of Intramural Research, NIAID. A.M.E. and K.L. were supported by an NIH NIDDK grant (RC2 DK122394).
Ethics approval and consent to participate
This study was reviewed and approved by the University of Chicago Ethics Committee and by the University of Chicago Institutional Review Board (IRB18-1539). Written and informed consent was obtained for all participants.
Consent for publication
The authors declare that they have no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
About this article
Cite this article
Huang, S., Zhang, W., Katanski, C.D. et al. Interferon inducible pseudouridine modification in human mRNA by quantitative nanopore profiling. Genome Biol 22, 330 (2021). https://doi.org/10.1186/s13059-021-02557-y
- Nanopore sequencing
- Machine learning