Determination of isoform-specific RNA structure with nanopore long reads thumbnail

Determination of isoform-specific RNA structure with nanopore long reads

Abstract

Current methods for determining RNA structure with short-read sequencing cannot capture most differences between distinct transcript isoforms. Here we present RNA structure analysis using nanopore sequencing (PORE-cupine), which combines structure probing using chemical modifications with direct long-read RNA sequencing and machine learning to detect secondary structures in cellular RNAs. PORE-cupine also captures global structural features, such as RNA-binding-protein binding sites and reactivity differences at single-nucleotide variants. We show that shared sequences in different transcript isoforms of the same gene can fold into different structures, highlighting the importance of long-read sequencing for obtaining phase information. We also demonstrate that structural differences between transcript isoforms of the same gene lead to differences in translation efficiency. By revealing isoform-specific RNA structure, PORE-cupine will deepen understanding of the role of structures in controlling gene regulation.

Data availability

Raw sequencing data and reactivity profiles can be downloaded from the Gene Expression Omnibus under accession number GSE133361. Source data are provided with this paper.

Code availability

Source code for all scripts (R version 3.4.1) and commands used for analysis can be found at http://github.com/awjga/PORE-cupine.

References

  1. 1.

    Wan, Y., Kertesz, M., Spitale, R. C., Segal, E. & Chang, H. Y. Understanding the transcriptome through RNA structure. Nat. Rev. Genet. 12, 641–655 (2011).

    CAS 
    PubMed 
    Article 
    PubMed Central 

    Google Scholar
     

  2. 2.

    Kertesz, M. et al. Genome-wide measurement of RNA secondary structure in yeast. Nature 467, 103–107 (2010).

    CAS 
    PubMed 
    Article 
    PubMed Central 

    Google Scholar
     

  3. 3.

    Wan, Y. et al. Landscape and variation of RNA secondary structure across the human transcriptome. Nature 505, 706–709 (2014).

    CAS 
    PubMed 
    PubMed Central 
    Article 

    Google Scholar
     

  4. 4.

    Wan, Y. et al. Genome-wide measurement of RNA folding energies. Mol. Cell 48, 169–181 (2012).

    PubMed 
    PubMed Central 
    Article 
    CAS 

    Google Scholar
     

  5. 5.

    Siegfried, N. A., Busan, S., Rice, G. M., Nelson, J. A. & Weeks, K. M. RNA motif discovery by SHAPE and mutational profiling (SHAPE-MaP). Nat. Methods 11, 959–965 (2014).

    CAS 
    PubMed 
    PubMed Central 
    Article 

    Google Scholar
     

  6. 6.

    Spitale, R. C. et al. Structural imprints in vivo decode RNA regulatory mechanisms. Nature 519, 486–490 (2015).

    CAS 
    PubMed 
    PubMed Central 
    Article 

    Google Scholar
     

  7. 7.

    Lucks, J. B. et al. Multiplexed RNA structure characterization with selective 2′-hydroxyl acylation analyzed by primer extension sequencing (SHAPE-Seq). Proc. Natl Acad. Sci. USA 108, 11063–11068 (2011).

    CAS 
    PubMed 
    Article 
    PubMed Central 

    Google Scholar
     

  8. 8.

    Rouskin, S., Zubradt, M., Washietl, S., Kellis, M. & Weissman, J. S. Genome-wide probing of RNA structure reveals active unfolding of mRNA structures in vivo. Nature 505, 701–705 (2014).

    CAS 
    PubMed 
    Article 
    PubMed Central 

    Google Scholar
     

  9. 9.

    Zubradt, M. et al. DMS-MaPseq for genome-wide or targeted RNA structure probing in vivo. Nat. Methods 14, 75–82 (2017).

    CAS 
    PubMed 
    Article 
    PubMed Central 

    Google Scholar
     

  10. 10.

    Ding, Y. et al. In vivo genome-wide profiling of RNA secondary structure reveals novel regulatory features. Nature 505, 696–700 (2013).

    PubMed 
    Article 
    CAS 
    PubMed Central 

    Google Scholar
     

  11. 11.

    Strobel, E. J., Yu, A. M. & Lucks, J. B. High-throughput determination of RNA structures. Nat. Rev. Genet. 19, 615–634 (2018).

    CAS 
    PubMed 
    PubMed Central 
    Article 

    Google Scholar
     

  12. 12.

    Sharon, D., Tilgner, H., Grubert, F. & Snyder, M. A single-molecule long-read survey of the human transcriptome. Nat. Biotechnol. 31, 1009–1014 (2013).

    CAS 
    PubMed 
    PubMed Central 
    Article 

    Google Scholar
     

  13. 13.

    Au, K. F. et al. Characterization of the human ESC transcriptome by hybrid sequencing. Proc. Natl Acad. Sci. USA 110, E4821–E4830 (2013).

    CAS 
    PubMed 
    Article 
    PubMed Central 

    Google Scholar
     

  14. 14.

    Koren, S. et al. Hybrid error correction and de novo assembly of single-molecule sequencing reads. Nat. Biotechnol. 30, 693–700 (2012).

    CAS 
    PubMed 
    PubMed Central 
    Article 

    Google Scholar
     

  15. 15.

    Tilgner, H. et al. Comprehensive transcriptome analysis using synthetic long-read sequencing reveals molecular co-association of distant splicing events. Nat. Biotechnol. 33, 736–742 (2015).

    CAS 
    PubMed 
    PubMed Central 
    Article 

    Google Scholar
     

  16. 16.

    Garalde, D. R. et al. Highly parallel direct RNA sequencing on an array of nanopores. Nat. Methods 15, 201–206 (2018).

    CAS 
    PubMed 
    Article 
    PubMed Central 

    Google Scholar
     

  17. 17.

    Oikonomopoulos, S., Wang, Y. C., Djambazian, H., Badescu, D. & Ragoussis, J. Benchmarking of the Oxford Nanopore MinION sequencing for quantitative and qualitative assessment of cDNA populations. Sci. Rep. 6, 31602 (2016).

    CAS 
    PubMed 
    PubMed Central 
    Article 

    Google Scholar
     

  18. 18.

    Simpson, J. T. et al. Detecting DNA cytosine methylation using nanopore sequencing. Nat. Methods 14, 407–410 (2017).

    CAS 
    PubMed 
    Article 
    PubMed Central 

    Google Scholar
     

  19. 19.

    Parker, M. T. et al. Nanopore direct RNA sequencing maps the complexity of Arabidopsis mRNA processing and m6A modification. eLife 9, e49658 (2020).

  20. 20.

    Liu, H. et al. Accurate detection of m6A RNA modifications in native RNA sequences. Nat. Commun. 10, 4079 (2019).

    PubMed 
    PubMed Central 
    Article 
    CAS 

    Google Scholar
     

  21. 21.

    Weeks, K. M. Advances in RNA structure analysis by chemical probing. Curr. Opin. Struct. Biol. 20, 295–304 (2010).

    CAS 
    PubMed 
    PubMed Central 
    Article 

    Google Scholar
     

  22. 22.

    Spitale, R. C. et al. RNA SHAPE analysis in living cells. Nat. Chem. Biol. 9, 18–20 (2013).

    CAS 
    PubMed 
    Article 
    PubMed Central 

    Google Scholar
     

  23. 23.

    Sachsenmaier, N., Handl, S., Debeljak, F. & Waldsich, C. Mapping RNA structure in vitro using nucleobase-specific probes. Methods Mol. Biol. 1086, 79–94 (2014).

    CAS 
    PubMed 
    Article 
    PubMed Central 

    Google Scholar
     

  24. 24.

    Guo, F., Gooding, A. R. & Cech, T. R. Structure of the Tetrahymena ribozyme: base triple sandwich and metal ion at the active site. Mol. Cell 16, 351–362 (2004).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  25. 25.

    Winkler, W., Nahvi, A. & Breaker, R. R. Thiamine derivatives bind messenger RNAs directly to regulate bacterial gene expression. Nature 419, 952–956 (2002).

    CAS 
    PubMed 
    Article 
    PubMed Central 

    Google Scholar
     

  26. 26.

    Jambhekar, A. et al. Unbiased selection of localization elements reveals cis-acting determinants of mRNA bud localization in Saccharomyces cerevisiae. Proc. Natl Acad. Sci. USA 102, 18005–18010 (2005).

    CAS 
    PubMed 
    Article 
    PubMed Central 

    Google Scholar
     

  27. 27.

    Sexton, A. N., Wang, P. Y., Rutenberg-Schoenberg, M. & Simon, M. D. Interpreting reverse transcriptase termination and mutation events for greater insight into the chemical probing of RNA. Biochemistry 56, 4713–4721 (2017).

    CAS 
    PubMed 
    PubMed Central 
    Article 

    Google Scholar
     

  28. 28.

    Li, F. et al. Global analysis of RNA secondary structure in two metazoans. Cell. Rep. 1, 69–82 (2012).

    CAS 
    PubMed 
    Article 
    PubMed Central 

    Google Scholar
     

  29. 29.

    Sun, L. et al. RNA structure maps across mammalian cellular compartments. Nat. Struct. Mol. Biol. 26, 322–330 (2019).

    CAS 
    PubMed 
    PubMed Central 
    Article 

    Google Scholar
     

  30. 30.

    Wilbert, M. L. et al. LIN28 binds messenger RNAs at GGAGA motifs and regulates splicing factor abundance. Mol. Cell 48, 195–206 (2012).

    CAS 
    PubMed 
    PubMed Central 
    Article 

    Google Scholar
     

  31. 31.

    Wang, E. T. et al. Alternative isoform regulation in human tissue transcriptomes. Nature 456, 470–476 (2008).

    CAS 
    PubMed 
    PubMed Central 
    Article 

    Google Scholar
     

  32. 32.

    Pan, Q., Shai, O., Lee, L. J., Frey, B. J. & Blencowe, B. J. Deep surveying of alternative splicing complexity in the human transcriptome by high-throughput sequencing. Nat. Genet. 40, 1413–1415 (2008).

    CAS 
    PubMed 
    Article 
    PubMed Central 

    Google Scholar
     

  33. 33.

    Moqtaderi, Z., Geisberg, J. V. & Struhl, K. Extensive structural differences of closely related 3′ mRNA isoforms: links to Pab1 binding and mRNA stability. Mol. Cell 72, 849–861 (2018).

    CAS 
    PubMed 
    PubMed Central 
    Article 

    Google Scholar
     

  34. 34.

    Floor, S. N. & Doudna, J. A. Tunable protein synthesis by transcript isoforms in human cells. eLife 5, e10921 (2016).

  35. 35.

    Aw, J. G. et al. In vivo mapping of eukaryotic RNA interactomes reveals principles of higher-order organization and regulation. Mol. Cell 62, 603–617 (2016).

    CAS 
    PubMed 
    Article 
    PubMed Central 

    Google Scholar
     

  36. 36.

    Wang, E. T. et al. Alternative isoform regulation in human tissue transcriptomes. Nature 456, 470–476 (2008).

    CAS 
    PubMed 
    PubMed Central 
    Article 

    Google Scholar
     

  37. 37.

    Mustoe, A. M. et al. Pervasive regulatory functions of mRNA structure revealed by high-resolution SHAPE probing. Cell 173, 181–195 (2018).

    CAS 
    PubMed 
    PubMed Central 
    Article 

    Google Scholar
     

  38. 38.

    Das, R., Laederach, A., Pearlman, S. M., Herschlag, D. & Altman, R. B. SAFA: semi-automated footprinting analysis software for high-throughput quantification of nucleic acid footprinting experiments. RNA 11, 344–354 (2005).

    CAS 
    PubMed 
    PubMed Central 
    Article 

    Google Scholar
     

  39. 39.

    Patro, R., Duggal, G., Love, M. I., Irizarry, R. A. & Kingsford, C. Salmon provides fast and bias-aware quantification of transcript expression. Nat. Methods 14, 417–419 (2017).

    CAS 
    PubMed 
    PubMed Central 
    Article 

    Google Scholar
     

  40. 40.

    Sovic, I. et al. Fast and sensitive mapping of nanopore sequencing reads with GraphMap. Nat. Commun. 7, 11307 (2016).

    CAS 
    PubMed 
    PubMed Central 
    Article 

    Google Scholar
     

  41. 41.

    Li, H. & Durbin, R. Fast and accurate long-read alignment with Burrows–Wheeler transform. Bioinformatics 26, 589–595 (2010).

    PubMed 
    PubMed Central 
    Article 
    CAS 

    Google Scholar
     

  42. 42.

    Shah, A., Qian, Y., Weyn-Vanhentenryck, S. M. & Zhang, C. CLIP Tool Kit (CTK): a flexible and robust pipeline to analyze CLIP sequencing data. Bioinformatics 33, 566–567 (2017).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  43. 43.

    Heinz, S. et al. Simple combinations of lineage-determining transcription factors prime cis-regulatory elements required for macrophage and B cell identities. Mol. Cell 38, 576–589 (2010).

    CAS 
    PubMed 
    PubMed Central 
    Article 

    Google Scholar
     

  44. 44.

    Li, H. A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data. Bioinformatics 27, 2987–2993 (2011).

    CAS 
    PubMed 
    PubMed Central 
    Article 

    Google Scholar
     

Download references

Acknowledgements

We thank members of the Wan and Tan labs and F. Yao, H. M. Loh, C. C. Khor and M. Sikic for helpful discussions. Y.W. is supported by funding from A*STAR (A*STAR investigatorship 1630700155), the National Research Foundation Singapore (NRF2019-NRF-ISF003-2970 and CRP21-2018-0101), the EMBO Young Investigatorship and the CIFAR global scholarship.

Author information

Author notes

  1. These authors contributed equally: Jong Ghut Ashley Aw, Shaun W. Lim, Jia Xu Wang.

Affiliations

  1. Stem Cell and Regenerative Biology, Genome Institute of Singapore, A*STAR, Singapore, Singapore

    Jong Ghut Ashley Aw, Shaun W. Lim, Jia Xu Wang, Finnlay R. P. Lambert, Wen Ting Tan, Yu Zhang, Pornchai Kaewsapsak, Meng How Tan & Yue Wan

  2. Computational and Systems Biology, Genome Institute of Singapore, A*STAR, Singapore, Singapore

    Yang Shen, Chenhao Li & Niranjan Nagarajan

  3. Genome Technologies Platform, Genome Institute of Singapore, A*STAR, Singapore, Singapore

    Sarah B. Ng

  4. Skin Research Institute of Singapore, A*STAR, Immunos, Singapore

    Leah A. Vardy

  5. School of Chemical and Biomedical Engineering, Nanyang Technological University, Singapore, Singapore

    Meng How Tan

  6. Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore

    Niranjan Nagarajan & Yue Wan

  7. School of Biological Sciences, Nanyang Technological University, Singapore, Singapore

    Yue Wan

Contributions

Y.W. conceived the project. Y.W., N.N., M.H.T., B.S.N. and L.V. designed the experiments and analysis. S.W.L. and J.X.W. performed the experiments with help from P.K. J.G.A.A. and Y.S. performed the computational analysis with help from C.L. and E.P.K. Y.W. organized and wrote the paper with J.G.A.A., S.W.L. and all other authors.

Corresponding authors

Correspondence to
Niranjan Nagarajan or Yue Wan.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Chemical structures of RNA structure probing compounds, associated reaction products, mapped length and statistics of error rates.

a, Chemical structures of RNA structure probing compounds. Side chains for the carbodiimide of CMCT are highlighted and abbreviated as R’ and R’ for part (b). b, RNA nucleotide triphosphates with chemical adducts formed from reaction with structure probing compounds. Adducts are highlighted in green. c, Median lengths of mapped nanopore reads for unmodified and modified Tetrahymena RNA with different structure probing compounds. d, e, f, Boxplots showing the frequency of mismatch (d), deletion (e) and insertion (f) rates for different structure probing chemicals on Tetrahymena RNA, as compared to unmodified RNA. P-values were calculated using the two-sided Wilcoxon Rank Sum test. h–j, Boxplots showing the AUC-ROC performance of mismapping (h), deletion (i) and insertion (j) rates for the different compounds on the Tetrahymena RNA secondary structure. P-values were calculated using two-sided Wilcoxon rank-sum test. c-j, 6962-42107 reads from different libraries were used for comparisons (Supplementary Table 1). The middle, lower and upper boundary lines in the boxplot correspond to median, first and third quartiles. The upper whisker extends to the largest value no further than 1.5 × IQR from the hinge (where IQR is the inter-quartile range) and the lower whisker extends to the smallest value at most 1.5 × IQR of the hinge.
Source data

Extended Data Fig. 2 Distribution of mismatches, insertions, deletions along Tetrahymena RNA sequence.

Line plots of normalized number of mismatches (a), deletions (b) and insertions (c) caused by the different compounds and unmodified, along the length of the Tetrahymena RNA sequence. The red bars on top of the plots indicate the location of single-stranded bases in the secondary structure.

Extended Data Fig. 3 Error characterization of the modifications along the secondary structure of the Tetrahymena RNA.

a, Positions and intensity of mismatches (red), deletions (green) and insertions (purple) caused by the different chemical compounds are mapped along the secondary structure of the Tetrahymena RNA. b, Percentage of observed bases (upper number) and corresponding P-values were shown (lower number) for each observation. P-value was calculated using two-sided chi-square test for all modified versus unmodified comparisons.
Source data

Extended Data Fig. 4 Schematic of the bioinformatic workflow of PORE-cupine and characteristics of direct RNA sequencing signal.

Sequenced reads were basecalled using Albacore or Guppy, and mapped to the reference sequences using Graphmap. We used Nanopolish to align the raw signals and to extract current features which were used to train unmodified data using SVM. We then filter for reads that are longer than 50% of annotated lengths and for transcripts that have at least 100 reads in the modified library and 200 reads in the unmodified library for downstream analysis. b, Normalized current dwell time for single-stranded regions on Tetrahymena RNA modified with NAI-N3. With footprinting gels as a guide, the top 10% of the single-stranded regions on Tetrahymena RNA were selected for these plots. c-e, Normalized current mean (c), standard deviation (d) and dwell time (e) distributions for all positions on unmodified Tetrahymena RNA and RNA modified with NAI-N3. f, Bioanalyzer traces of in vitro transcribed, full-length, unmodified, and NAI-N3 (100 mM for 5 mins or 25 mins) modified Tetrahymena RNA. g, Mapping rates for modified versus unmodified Tetrahymena RNA. The total number of sequenced reads for unmodified, NAI-N3 (5 mins), and NAI-N3(25 mins) are 20149, 51760, and 22155 reads respectively. The percentage of mapped reads for unmodified, NAI-N3 (5 mins), and NAI-N3 (25 mins) are 75%, 81%, and 17% respectively. h, Density plots showing the distribution of lengths of sequenced unmodified and modified Tetrahymena RNA. Top: unmodified and NAI-N3 modified (100 mM, 5 min) RNA. Bottom: unmodified and NAI-N3 modified (100 mM, 25 min) RNA. i, Coverage of reads mapping to Tetrahymena RNA along its length, for unmodified (top), NAI-N3 modified (100 mM, 5 min)(middle) and extended NAI-N3 modified (100 mM, 25 min) RNA (bottom).

Extended Data Fig. 5 Optimization of PORE-cupine using 11 RNAs as training set.

a, Scatterplot showing the distribution of normalized base reactivity between N = 2 biological replicates of modified Tetrahymena RNA. R = 0.97, CI95% = [0.97,0.98] (Pearson correlation). P-value=2.5×10-262, two-tailed Student’s T-test. b, Distribution of current mean and standard deviation for a unimodal (left) and bimodal (right) position in two biological replicates. c, AUC-ROC performance of the correlation of NAI-N3 reactivities of the training set based on PORE-cupine versus footprinting from 11 transcripts. d,e, Comparison of PORE-cupine reactivity and traditional footprinting. Two replicates of gels were shown for Tetrahymena RNA (d, R = 0.80) and lysine riboswitch (e, R = 0.74). Lane 1 of the footprinting gels show A (left, Tetrahymena) or G (right, Tetrahymena and lysine) ladder. Lane 2 shows unmodified RNA, and lane 3 shows NAI-N3 modified RNA. Quantification of the bands on the gels was done using SAFA. Pearson correlation was used to compare between SAFA and PORE-cupine signals. f, List of RNAs used for training and test. g, Scatter plot of per-base reactivity in two biological replicates of the three test RNAs. P-value = 0 using two-tailed Student’s T-test. R = 0.877, CI95% = [0.87, 0.89], by Pearson correlation. h-j, Line plots showing the per-base reactivity along the length of three test RNAs, for two biological replicates. R > = 0.89, using Pearson correlation. k, Boxplot showing the performance of the SVM parameters on the 3 test RNAs, based on training on the Tetrahymena RNA (left) or on 11 RNAs (right, footprinting gels). l, AUC-ROC performance of SVM parameters on 3 test RNAs (red, based on our current 11 training RNAs) versus test RNAs after random selection of 11/14 RNAs as training, for 20 times. m, Boxplot showing the performance of all, unimodal and bimodal positions on test RNAs using AUC-ROC based on footprinting gels from 3 transcripts. In c, k-m, the middle, lower and upper boundary lines in the boxplot correspond to median, first and third quartiles. The upper whisker extends to the largest value no further than 1.5 × IQR from the hinge (where IQR is the inter-quartile range) and the lower whisker extends to the smallest value at most 1.5 × IQR of the hinge. Outliers are shown as dots.
Source data

Extended Data Fig. 6 Comparison between PORE-cupine and footprinting signals.

a, Bioanalyzer traces of unmodified and in vivo NAI-N3 modified (100 mM, 5 min) total B. subtilis RNA. b, Secondary structure model of B. subtilis 16 S rRNA. The structure probed regions are boxed in pink, green and blue. c-e, Comparisons between PORE-cupine and footprinting. Two replicates of gels are shown for each of the three regions along B. subtilis RNA. The gels show G ladder (lane 1), unmodified RNA (lane 2) and NAI-N3 modified RNA (lane 3) and a correlation of R (Pearson)= 0.91 (c), 0.74 (d) and 0.24 (e) between the gels. Quantification of the bands on the gels were done using SAFA. Comparison between SAFA quantification and PORE-cupine for each of the regions is shown as a line plot to the right of the gels. R = 0.52 (c), 0.76 (d), 0.62 (e) by Pearson correlation. f,g, Bioanalyzer traces of unmodified and in vitro NAI-N3 modified RPS29 (100 mM, 5 min) (f) and Adocbl riboswitch (g). h-i, Comparisons between PORE-cupine and footprinting. Two replicates of gels are shown for along RPS29 and Adocbl riboswitch RNA, R(Pearson)= 0.93(h) and 0.73(i). The gels show G ladder (lane 1), unmodified RNA (lane 2) and NAI-N3 modified RNA (lane 3). Quantification of the bands on the gels were done using SAFA. Comparison between SAFA quantification and PORE-cupine for each of the regions is shown to the right of the gels. R(Pearson)=0.68 (h) and 0.69 (i).
Source data

Extended Data Fig. 7 PORE-cupine reactivity signals on TPP.

Line plots showing 2 replicates of PORE-cupine reactivities along TPP riboswitch in the presence of water (R = 0.86), 250 nM TPP (R = 0.87), 750 nM TPP (R = 0.83), and 10 μM TPP (R = 0.94). Pearson correlation is used to calculate the similarities between the reactivities of the replicates.
Source data

Extended Data Fig. 8 PORE-cupine results on the hESC transcriptome.

a, b, Bioanalyzer traces of unmodified and modified total (a) and polyA(+) selected (b) hESC. c, Barplots showing the number of reads after basecalling (12007032 and 10118432) and mapping (86% and 60%) in unmodified and modified hESC samples respectively. d, Histogram showing the distribution of reads with different amounts of modification in hESC. e, Boxplots showing the performance of reads with different amounts of modification, calculated using AUC-ROC on the test set of 3 RNAs, based on 10 footprinting regions. Reads were grouped into different classes: all reads (current, 147670 reads), with only 1 modification for the strand (only 1, 15461 reads), with 0-1% modification (68025 reads), with 1-2% modification (9771 reads), with 2-3% modification (803 reads), and with 3-4% modifications (76 reads). P-value is calculated using two-sided Wilcoxon Rank Sum test. The middle, lower and upper boundary of the boxplot correspond to the median, first and third quartiles, while the upper and lower whiskers extend from the hinge to the largest and smallest value at most 1.5 × IQR of the hinge respectively. f, Top: Fraction of modified reads mapped across exon-exon junctions (position 0) in hESC (black) and across artificial junctions with 50 base insertions (red). Bottom: Difference between mapping rates across normal versus artificial exon-exon junctions at each base; p-value was calculated using two-tailed Wilcoxon Rank-Sum Test. g, Line graph showing the percentage of bimodal positions observed (for both current standard deviation and mean) for all 1024 kmers. The orange line indicates the top 1% of bimodal signals across all kmers and the identity of the corresponding kmers are labelled above. h-j, Base composition along positions 1,2,3,4,5 of unimodal (left) and bimodal kmers (right). These include kmers that show bimodal current mean (h), bimodal current standard deviation (i), and both bimodal current mean and current standard deviation (j). k, Coverage of unmodified (left) and modified (right) reads along the hESC transcriptome using direct RNA sequencing.
Source data

Extended Data Fig. 9 Structural properties of the hESC transcriptome.

Scatterplot showing the distribution of total reads per position for each hESC transcript, for N = 2 biologically independent replicates, of unmodified (left, p-value =0, using two-tailed Student T test, CI95% = [0.98,0.98], 1613 transcripts) and modified transcripts (right, 1751 transcripts, p-value =0 using two-tailed Student T test, CI95% = [0.97,0.97]). R (Pearson)=0.98 (left) and 0.97 (right). b, Barplot showing the number of transcripts left after abundance and length filter. The number of transcripts in each group is shown above the plot. c, Boxplots showing the distribution of median mapped lengths of unmodified (left) and NAI-N3 modified (middle) hESC mRNAs (1751 trancripts). Annotated refers to the distribution of expected lengths for each transcript based on ENSEMBL GRCh38 annotation (right). d, Histogram showing the distribution of transcripts having different fractions of annotated length in unmodified and modified samples. e, Distribution of Pearson correlations between full-length (>99% of known length) and partial transcripts in hESC (83 transcripts from N = 2 two biological replicates were used). The Y-axis shows the fraction of transcripts with a particular correlation. The X-axis depicts Pearson correlation coefficients. f, Boxplot showing PORE-cupine reactivity of different classes of transcripts. P-values were calculated using two-sided Wilcoxon Rank Sum test.1584 coding genes, 67 pseudogenes, 81 non-coding genes and 4 rRNAs were used. g, Top, Metagene analysis of PORE-cupine-derived mean reactivities aligned according to start (Upper) and stop (Lower) codons for all 559 transcripts. Bottom, Metagene autocorrelation function (ACF) plot for the 5’ UTR, CDS and 3’ UTR. In c and f, the middle, lower and upper boundary of the boxplot correspond to the median, first and third quartiles. The upper and lower whisker extends from the hinge to the largest and smallest value at most 1.5 × IQR of the hinge. Outliers are shown as dots.
Source data

Extended Data Fig. 10 RNA structures in gene-linked isoforms.

a, Upper: Transcript organization of different RPLP0 isoforms. Alternative exons seen in our structural data are highlighted in red. Lower, normalized reactivity profiles for the different isoforms and their aggregate signal. b, Upper, Transcript organization of different RACK1 isoforms. Alternative exon is shown in red (also in inset). Lower, Line plots for the aggregate reactivity signal between the two isoforms are shown (Top). Middle, Line plots showing the expanded view of the reactivity difference between the isoforms. Bottom, Line plots showing the individual reactivity information for each isoform along its length. c, No. of transcripts with two structure changing regions that are more than 100, 200, 300, 400 or 500 bases apart. The value for each group is shown above the bar. d, Schematic of the TrIP-seq workflow. e, Line plot showing the absorbance A260 of each fraction (2-12) after polysome fractionation. f, Pair-wise correlations (Spearman correlation) of the read-counts/transcript for each fraction between two biological replicates. Fractions and batches are denoted as F2-12 and B1-2 respectively. g, Distribution of read-counts across different polysome fractions for two biological replicates of Actin B (left) and Activating transcription factor 4 (right).
Source data

Supplementary information

About this article

Verify currency and authenticity via CrossMark

Cite this article

Aw, J.G.A., Lim, S.W., Wang, J.X. et al. Determination of isoform-specific RNA structure with nanopore long reads.
Nat Biotechnol (2020). https://doi.org/10.1038/s41587-020-0712-z

Download citation

Read More

Leave a Reply

Your email address will not be published. Required fields are marked *