The RNA Atlas expands the catalog of human non-coding RNAs thumbnail

The RNA Atlas expands the catalog of human non-coding RNAs

Abstract

Existing compendia of non-coding RNA (ncRNA) are incomplete, in part because they are derived almost exclusively from small and polyadenylated RNAs. Here we present a more comprehensive atlas of the human transcriptome, which includes small and polyA RNA as well as total RNA from 300 human tissues and cell lines. We report thousands of previously uncharacterized RNAs, increasing the number of documented ncRNAs by approximately 8%. To infer functional regulation by known and newly characterized ncRNAs, we exploited pre-mRNA abundance estimates from total RNA sequencing, revealing 316 microRNAs and 3,310 long non-coding RNAs with multiple lines of evidence for roles in regulating protein-coding genes and pathways. Our study both refines and expands the current catalog of human ncRNAs and their regulatory interactions. All data, analyses and results are available for download and interrogation in the R2 web portal, serving as a basis for future exploration of RNA biology and function.

Access options

Subscribe to Journal

Get full journal access for 1 year

$59.00

only $4.92 per issue

All prices are NET prices.

VAT will be added later in the checkout.

Tax calculation will be finalised during checkout.

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Data availability

All types of RNA entities can be readily explored via the online R2: Genomics Analysis and Visualization Platform (http://r2.amc.nl) and via a dedicated accessible portal (http://r2platform.com/rna_atlas). This portal includes genome browser profiles for the total RNA as well as polyA tracks for all samples. All samples can also be used for correlations, differential signals and many more analyses. In addition, the LongHorn results, described in this manuscript, can be explored.

The raw data (FASTQ files) and processed expression measurement tables from all RNA biotypes across samples have been deposited in the National Center for Biotechnology Information’s Gene Expression Omnibus (GEO) and are accessible through GEO series accession number GSE138734.

Code availability

Computer code used to generate the results presented in this manuscript is available at https://github.com/llorenzi90/RNA_Atlas.

References

  1. 1.

    Esteller, M. Non-coding RNAs in human disease. Nat. Rev. Genet. 12, 861–874 (2011).

  2. 2.

    Chen, L.-L. The biogenesis and emerging roles of circular RNAs. Nat. Rev. Mol. Cell Biol. 17, 205–211 (2016).

  3. 3.

    Lorenzi, L. Long noncoding RNA expression profiling in cancer: challenges and opportunities. GenesÿChromosomes Cancer 58, 191–199 (2019).

    CAS 
    Article 

    Google Scholar
     

  4. 4.

    GTEx Consortium. The Genotype-Tissue Expression (GTEx) pilot analysis: multitissue gene regulation in humans. Science 348, 648–660 (2015).

  5. 5.

    Barretina, J. et al. The Cancer Cell Line Encyclopedia enables predictive modelling of anticancer drug sensitivity. Nature 483, 603–607 (2012).

  6. 6.

    Forrest, A. R. R. et al. A promoter-level mammalian expression atlas. Nature 507, 462–470 (2014).

    CAS 
    PubMed 
    Article 

    Google Scholar
     

  7. 7.

    Hon, C. C. et al. An atlas of human long non-coding RNAs with accurate 5′ ends. Nature 543, 199–204 (2017).

    CAS 
    PubMed 
    PubMed Central 
    Article 

    Google Scholar
     

  8. 8.

    De Rie, D. et al. An integrated expression atlas of miRNAs and their promoters in human and mouse. Nat. Biotechnol. 35, 872–878 (2017).

    PubMed 
    PubMed Central 
    Article 
    CAS 

    Google Scholar
     

  9. 9.

    Pertea, M. et al. CHESS: a new human gene catalog curated from thousands of large-scale RNA sequencing experiments reveals extensive transcriptional noise. Genome Biol. 19, 208 (2018).

  10. 10.

    Iyer, M. et al. The landscape of long noncoding RNAs in the human transcriptome. Nat. Genet. 47, 199–208 (2015).

    CAS 
    PubMed 
    PubMed Central 
    Article 

    Google Scholar
     

  11. 11.

    Vo, J. N. et al. The landscape of circular RNA in cancer. Cell 176, 869–881 (2019).

    CAS 
    PubMed 
    PubMed Central 
    Article 

    Google Scholar
     

  12. 12.

    Regev, A. et al. The Human Cell Atlas. eLife 6, e27041 (2017).

  13. 13.

    You, B. H., Yoon, S. H. & Nam, J. W. High-confidence coding and noncoding transcriptome maps. Genome Res. 27, 1050–1062 (2017).

    CAS 
    PubMed 
    PubMed Central 
    Article 

    Google Scholar
     

  14. 14.

    Melé, M. et al. Human genomics. The human transcriptome across tissues and individuals. Science 348, 660–665 (2015).

    PubMed 
    PubMed Central 
    Article 
    CAS 

    Google Scholar
     

  15. 15.

    Arun, G., Diermeier, S. D. & Spector, D. L. Therapeutic targeting of long non-coding RNAs in cancer. Trends Mol. Med. 24, 257–277 (2018).

  16. 16.

    Leucci, E. et al. Melanoma addiction to the long non-coding RNA SAMMSON. Nature 531, 518–522 (2016).

  17. 17.

    Hosono, Y. et al. Oncogenic role of THOR, a conserved cancer/testis long non-coding RNA. Cell 171, 1559–1572 (2017).

    CAS 
    PubMed 
    PubMed Central 
    Article 

    Google Scholar
     

  18. 18.

    Cunningham, F. et al. Ensembl 2015. Nucleic Acids Res. 43, D662–D669 (2015).

  19. 19.

    Roadmap Epigenomics Consortium, K. A. et al. Integrative analysis of 111 reference human epigenomes. Nature 518, 317–329 (2015).

    PubMed Central 
    Article 
    CAS 
    PubMed 

    Google Scholar
     

  20. 20.

    Liu, S. J. et al. CRISPRi-based genome-scale identification of functional long noncoding RNA loci in human cells. Science 355, eaah7111 (2017).

  21. 21.

    Frankish, A. et al. GENCODE reference annotation for the human and mouse genomes. Nucleic Acids Res. 47, D766–D773 (2019).

    CAS 
    Article 

    Google Scholar
     

  22. 22.

    O’Leary, N. A. et al. Reference Sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation. Nucleic Acids Res. 44, D733–D745 (2016).

    PubMed 
    PubMed Central 
    Article 
    CAS 

    Google Scholar
     

  23. 23.

    Vromman, M., Vandesompele, J. & Volders, P.-J. Closing the circle: current state and perspectives of circular RNA databases. Brief Bioinform. 22, 288–297 (2021).

  24. 24.

    Jeck, W. R. et al. Circular RNAs are abundant, conserved, and associated with ALU repeats. RNA 19, 141–157 (2013).

    CAS 
    PubMed 
    PubMed Central 
    Article 

    Google Scholar
     

  25. 25.

    Memczak, S. et al. Circular RNAs are a large class of animal RNAs with regulatory potency. Nature 495, 333–338 (2013).

    CAS 
    PubMed 
    PubMed Central 
    Article 

    Google Scholar
     

  26. 26.

    Kozomara, A. & Griffiths-Jones, S. MiRBase: annotating high confidence microRNAs using deep sequencing data. Nucleic Acids Res. 42, 68–73 (2014).

    Article 
    CAS 

    Google Scholar
     

  27. 27.

    Friedländer, M. R., MacKowiak, S. D., Li, N., Chen, W. & Rajewsky, N. MiRDeep2 accurately identifies known and hundreds of novel microRNA genes in seven animal clades. Nucleic Acids Res. 40, 37–52 (2012).

    PubMed 
    Article 
    CAS 
    PubMed Central 

    Google Scholar
     

  28. 28.

    Backes, C. et al. miRCarta: a central repository for collecting miRNA candidates. Nucleic Acids Res. 46, D160–D167 (2018).

  29. 29.

    Fromm, B. et al. MirGeneDB 2.0: the metazoan microRNA complement. Nucleic Acids Res. 48, D132–D141 (2020).

    CAS 
    PubMed 
    Article 

    Google Scholar
     

  30. 30.

    Wang, L. et al. CPAT: Coding-Potential Assessment Tool using an alignment-free logistic regression model. Nucleic Acids Res. 41, e74 (2013).

  31. 31.

    Lin, M. F., Jungreis, I. & Kellis, M. PhyloCSF: a comparative genomics method to distinguish protein coding and non-coding regions. Bioinformatics 27, i275–i282 (2011).

    CAS 
    PubMed 
    PubMed Central 
    Article 

    Google Scholar
     

  32. 32.

    Camacho, C. et al. BLAST+: architecture and applications. BMC Bioinformatics 10, 421 (2009).

  33. 33.

    Kim, M. S. et al. A draft map of the human proteome. Nature 509, 575–581 (2014).

  34. 34.

    Frese, S. et al. Long-term endurance exercise in humans stimulates cell fusion of myoblasts along with fusogenic endogenous retroviral genes in vivo. PLoS ONE 10, e1032099 (2015).

  35. 35.

    Yang, L., Duff, M. O., Graveley, B. R., Carmichael, G. G. & Chen, L. L. Genomewide characterization of non-polyadenylated RNAs. Genome Biol. 12, R16 (2011).

  36. 36.

    Cabili, M. et al. Integrative annotation of human large intergenic noncoding RNAs reveals global properties and specific subclasses. Genes Dev. 25, 1915–1927 (2011).

  37. 37.

    Baran, Y. et al. The landscape of genomic imprinting across diverse adult human tissues. Genome Res. 27, 927–936 (2015).

  38. 38.

    Yoshihara, K. et al. The landscape and therapeutic relevance of cancer-associated transcript fusions. Oncogene 34, 4845–4854 (2015).

  39. 39.

    Uhlén, M. et al. Proteomics. Tissue-based map of the human proteome. Science 347, 1260419 (2015).

  40. 40.

    Gaidatzis, D., Burger, L., Florescu, M. & Stadler, M. B. Analysis of intronic and exonic reads in RNA-seq data characterizes transcriptional and post-transcriptional regulation. Nat. Biotechnol. 33, 722–729 (2015).

    CAS 
    PubMed 
    Article 

    Google Scholar
     

  41. 41.

    Chiu, H. et al. Cupid: simultaneous reconstruction of microRNA-target and ceRNA networks. Genome Res. 25, 257–267 (2015).

    CAS 
    PubMed 
    PubMed Central 
    Article 

    Google Scholar
     

  42. 42.

    Chiu, H. S. et al. Pan-cancer analysis of lncRNA regulation supports their targeting of cancer genes in each tumor context. Cell Rep. 23, 297–312 (2018).

    CAS 
    PubMed 
    PubMed Central 
    Article 

    Google Scholar
     

  43. 43.

    Karreth, F. A. & Pandolfi, P. P. CeRNA cross-talk in cancer: when ce-bling rivalries go awry. Cancer Discov. 3, 1113–1121 (2013).

  44. 44.

    Poliseno, L. et al. A coding-independent function of gene and pseudogene mRNAs regulates tumour biology. Nature 465, 1033–1038 (2010).

  45. 45.

    Salmena, L., Poliseno, L., Tay, Y., Kats, L. & Pandolfi, P. P. A ceRNA hypothesis: the Rosetta Stone of a hidden RNA language? Cell 146, 353–358 (2011).

  46. 46.

    Tay, Y., Rinn, J. & Pandolfi, P. P. The multilayered complexity of ceRNA crosstalk and competition. Nature 505, 344–352 (2014).

  47. 47.

    ENCODE Project Consortium. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74 (2012).

  48. 48.

    Salzman, J., Gawad, C., Wang, P. L., Lacayo, N. & Brown, P. O. Circular RNAs are the predominant transcript isoform from hundreds of human genes in diverse cell types. PLoS ONE 7, e30733 (2012).

  49. 49.

    Djebali, S. Landscape of transcription in human cells. Nature 489, 101–108 (2012).

    CAS 
    PubMed 
    PubMed Central 
    Article 

    Google Scholar
     

  50. 50.

    Liberzon, A. The Molecular Signatures Database (MSigDB) hallmark gene set collection. Cell Syst. 1, 417–425 (2015).

    CAS 
    PubMed 
    PubMed Central 
    Article 

    Google Scholar
     

  51. 51.

    Ramilowski, J. A. Functional annotation of human long noncoding RNAs via molecular phenotyping. Genome Res. 30, (2020).

  52. 52.

    Trapnell, C., Pachter, L. & Salzberg, S. L. TopHat: discovering splice junctions with RNA-seq. Bioinformatics 25, 1105–1111 (2009).

    CAS 
    PubMed 
    PubMed Central 
    Article 

    Google Scholar
     

  53. 53.

    Langmead Ben, StevenS. Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–359 (2013).

    Article 
    CAS 

    Google Scholar
     

  54. 54.

    Pertea, M. StringTie enables improved reconstruction of a transcriptome from RNA-seq reads. Nat. Biotechnol. 33, 290–295 (2015).

    CAS 
    PubMed 
    PubMed Central 
    Article 

    Google Scholar
     

  55. 55.

    Trapnell, C. Transcript assembly and quantification by RNA-seq reveals unannotated transcripts and isoform switching during cell differentiation. Nat. Biotechnol. 28, 511–515 (2010).

    CAS 
    PubMed 
    PubMed Central 
    Article 

    Google Scholar
     

  56. 56.

    Quinlan, A. R. & Hall, I. M. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–842 (2010).

    CAS 
    PubMed 
    PubMed Central 
    Article 

    Google Scholar
     

  57. 57.

    Haeussler, M. et al. The UCSC Genome Browser database: 2019 update. Nucleic Acids Res. 47, D853–D858 (2019).

    CAS 
    PubMed 
    Article 

    Google Scholar
     

  58. 58.

    Bray, N. L., Pimentel, H., Melsted, P. & Pachter, L. Near-optimal probabilistic RNA-seq quantification. Nat. Biotechnol. 34, 525–527 (2016).

    CAS 
    PubMed 
    PubMed Central 
    Article 

    Google Scholar
     

  59. 59.

    Cobos, F. A. et al. Zipper plot: visualizing transcriptional activity of genomic regions. BMC Bioinformatics 18, 231 (2017).

  60. 60.

    Pertea, G. & Pertea, M. GFF Utilities: GffRead and GffCompare. F1000Res. 9, ISCB Comm J-304 (2020).

  61. 61.

    Hubisz, M. J., Pollard, K. S. & Siepel, A. PHAST and RPHAST: phylogenetic analysis with space/time models. Brief Bioinform. 12, 41–51 (2011).

    CAS 
    PubMed 
    Article 

    Google Scholar
     

  62. 62.

    Zerbino, D. R. et al. Ensembl 2018. Nucleic Acids Res. 46, D754–D761 (2018).

    CAS 
    PubMed 
    PubMed Central 
    Article 

    Google Scholar
     

  63. 63.

    Vizcaíno, J. A. et al. The Proteomics Identifications (PRIDE) database and associated tools: status in 2013. Nucleic Acids Res. 41, D1063–D1069 (2012).

    PubMed 
    PubMed Central 
    Article 
    CAS 

    Google Scholar
     

  64. 64.

    Kessner, D., Chambers, M., Burke, R., Agus, D. & Mallick, P. ProteoWizard: open source software for rapid proteomics tools development. Bioinformatics 24, 2534–2536 (2008).

    CAS 
    PubMed 
    PubMed Central 
    Article 

    Google Scholar
     

  65. 65.

    Silva, C. A. S. et al. Data-driven rescoring of metabolite annotations significantly improves sensitivity. Anal. Chem. 90, 11636–11642 (2018).

    Article 
    CAS 

    Google Scholar
     

  66. 66.

    The UniProt Consortium. UniProt: the universal protein knowledgebase. Nucleic Acids Res. 46, 2699 (2018).

  67. 67.

    Zhang, X. O. Diverse alternative back-splicing and alternative splicing landscape of circular RNAs. Genome Res. 26, 1277–1287 (2016).

    CAS 
    PubMed 
    PubMed Central 
    Article 

    Google Scholar
     

  68. 68.

    Gordon, A., Hannon, G. J. & Gordon. FASTX-Toolkit. http://hannonlab.cshl.edu/fastx_toolkit/

  69. 69.

    Langmead, B., Trapnell, C., Pop, M. & Salzberg, S. L. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 10, R25 (2009).

  70. 70.

    Wagih, O. ggseqlogo: a versatile R package for drawing sequence logos. Bioinformatics 33, 3645–3647 (2017).

  71. 71.

    Lefever, S. et al. High-throughput PCR assay design for targeted resequencing using primerXL. BMC Bioinformatics 18, 400 (2017).

  72. 72.

    Workman, R. E. et al. Nanopore native RNA sequencing of a human poly(A) transcriptome. Nat. Methods 16, 1297–1305 (2019).

    CAS 
    PubMed 
    PubMed Central 
    Article 

    Google Scholar
     

  73. 73.

    Gleeson, J., Lane, T. A., Harrison, P. J., Haerty, W. & Clark, M. B. Nanopore direct RNA sequencing detects differential expression between human cell populations. Preprint at bioRxiv https://doi.org/10.1101/2020.08.02.232785 (2020).

  74. 74.

    Leger, A. et al. RNA modifications detection by comparative nanopore direct RNA sequencing. Preprint at bioRxiv https://doi.org/10.1101/843136 (2019).

  75. 75.

    Cole, C., Byrne, A., Adams, M., Volden, R. & Vollmers, C. Complete characterization of the human immune cell transcriptome using accurate full-length cDNA sequencing. Genome Res. 30, 589–601 (2020).

    CAS 
    PubMed 
    PubMed Central 
    Article 

    Google Scholar
     

  76. 76.

    De Coster, W., D’hert, S., Schultz, D. T., Cruts, M. & Van Broeckhoven, C. NanoPack: visualizing and processing long-read sequencing data. Bioinformatics 34, 2666–2669 (2018).

  77. 77.

    Li, H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 34, 3091–3100 (2018).

  78. 78.

    Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics. 25, 2078–2079 (2009).

    PubMed 
    PubMed Central 
    Article 
    CAS 

    Google Scholar
     

  79. 79.

    Anders, S., Pyl, P. T. & Huber, W. HTSeq—a Python framework to work with high-throughput sequencing data. Bioinformatics. 31, 166–169 (2015).

    CAS 
    PubMed 
    Article 
    PubMed Central 

    Google Scholar
     

  80. 80.

    Nicorici, D. et al. FusionCatcher—a tool for finding somatic fusion genes in paired-end RNA-sequencing data. Preprint at https://doi.org/10.1101/011650 (2014).

  81. 81.

    Goovaerts, T. et al. A comprehensive overview of genomic imprinting in breast and its deregulation in cancer. Nat. Commun. 9, 4120 (2018).

  82. 82.

    Van Der Maaten, L. & Hinton, G. Visualizing data using t-SNE. J. Mach. Learn. Res. 9, 2579–2605 (2008).

  83. 83.

    R Development Core Team. R: A Language and Environment for Statistical Computing. http://www.R-project.org (R Foundation for Statistical Computing, 2011).

  84. 84.

    Love, M. I., Huber, W. & Anders, S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 15, 550 (2014).

  85. 85.

    Liao Y, Smyth GK, Shi W. FeatureCounts: An efficient general purpose program for assigning sequence reads to genomic features. Bioinformatics https://doi.org/10.1093/bioinformatics/btt656 (2014).

  86. 86.

    Bovolenta, L. A., Acencio, M. L. & Lemke, N. HTRIdb: an open-access database for experimentally verified human transcriptional regulation interactions. BMC Genomics 13, 405 (2012).

  87. 87.

    Matys, V. et al. TRANSFAC and its module TRANSCompel: transcriptional gene regulation in eukaryotes. Nucleic Acids Res. 34, D108–D110 (2006).

  88. 88.

    Whitfield, T. W. et al. Functional analysis of transcription factor binding sites in human promoters. Genome Biol. 13, R50 (2012).

  89. 89.

    Xiao, F. et al. miRecords: an integrated resource for microRNA-target interactions. Nucleic Acids Res. 37, D105–D110 (2009).

  90. 90.

    Vlachos, I. S. et al. DIANA-TarBase v7.0: indexing more than half a million experimentally supported miRNA:mRNA interactions. Nucleic Acids Res. 43, D153–D159 (2015).

  91. 91.

    Da, H. S. et al. MiRTarBase update 2014: an information resource for experimentally validated miRNA-target interactions. Nucleic Acids Res. 42, D78–D85 (2014).

  92. 92.

    Grosswendt, S. et al. Unambiguous identification of miRNA: target site interactions by different types of ligation reactions. Mol. Cell 54, 1042–1054 (2014).

  93. 93.

    Buske, F. A., Bauer, D. C., Mattick, J. S. & Bailey, T. L. Triplexator: detecting nucleic acid triple helices in genomic and transcriptomic data. Genome Res. 22, 1372–1381 (2012).

  94. 94.

    Garcia, D. M. et al. Weak seed-pairing stability and high target-site abundance decrease the proficiency of lsy-6 and other microRNAs. Nat. Struct. Mol. Biol. 18, 1139–1146 (2010).

  95. 95.

    Landt, S. G. et al. ChIP-seq guidelines and practices of the ENCODE and modENCODE consortia. Genome Res. 22, 1813–1831 (2012).

  96. 96.

    Wang, J. et al. Sequence features and chromatin structure around the genomic regions bound by 119 human transcription factors. Genome Res. 22, 1798–1812 (2012).

  97. 97.

    Kulakovskiy, I. V. et al. HOCOMOCO: towards a complete collection of transcription factor binding models for human and mouse via large-scale ChIP-seq analysis. Nucleic Acids Res. 46, D252–D259 (2018).

  98. 98.

    Lambert, S. A. et al. The human transcription factors. Cell 172, 650–665 (2018).

    CAS 
    Article 

    Google Scholar
     

  99. 99.

    Khan, A. et al. JASPAR 2018: update of the open-access database of transcription factor binding profiles and its web framework. Nucleic Acids Res. 46, D260–D266 (2018).

  100. 100.

    Pachkov, M., Balwierz, P. J., Arnold, P., Ozonov, E. & Van Nimwegen, E. SwissRegulon, a database of genome-wide annotations of regulatory sites: recent updates. Nucleic Acids Res. 41, D214–D220 (2013).

  101. 101.

    Smith, A. D., Sumazin, P., Xuan, Z. & Zhang, M. Q. DNA motifs in human and mouse proximal promoters predict tissue-specific expression. Proc. Natl Acad. Sci. USA 103, 6275–6280 (2006).

    CAS 
    PubMed 
    Article 
    PubMed Central 

    Google Scholar
     

  102. 102.

    Smith, A. D., Sumazin, P., Das, D. & Zhang, M. Q. Mining ChIP-chip data for transcription factor and cofactor binding sites. Bioinformatics 21 (Suppl. 1), i403–i412 (2005).

  103. 103.

    Sz‚kely, G. J., Rizzo, M. L. & Bakirov, N. K. Measuring and testing dependence by correlation of distances. Ann. Stat. 35, 2769–2794 (2007).


    Google Scholar
     

  104. 104.

    Van Nostrand, E. L. et al. Robust transcriptome-wide discovery of RNA-binding protein binding sites with enhanced CLIP (eCLIP). Nat. Methods 13, 504–514 (2016).

  105. 105.

    Lury, D. A. & Fisher, R. A. Statistical methods for research workers. J. R. Stat. Soc. Ser. D Statistician https://doi.org/10.2307/2986695 (1972).

  106. 106.

    Brown, M. B. 400: a method for combining non-independent, one-sided tests of significance. Biometrics 31, 987–992 (1975).

    Article 

    Google Scholar
     

  107. 107.

    Hough, S. H., Ajetunmobi, A., Brody, L., Humphryes-Kirilov, N. & Perello, E. Desktop Genetics. Per. Med. 13, 517–521 (2016).

    CAS 
    PubMed 
    PubMed Central 
    Article 

    Google Scholar
     

  108. 108.

    Horlbeck, M. A. et al. Compact and highly active next-generation libraries for CRISPR-mediated gene repression and activation. eLife 5, e19760 (2016).

  109. 109.

    Gilbert, L. A. et al. CRISPR-mediated modular RNA-guided regulation of transcription in eukaryotes. Cell 154, 442–451 (2013).

    CAS 
    PubMed 
    PubMed Central 
    Article 

    Google Scholar
     

  110. 110.

    Bushnell, B. BBMap. https://sourceforge.net/projects/bbmap/

  111. 111.

    Dobin, A. et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29, 15–21 (2013).

    CAS 
    PubMed 
    PubMed Central 
    Article 

    Google Scholar
     

Download references

Acknowledgements

F.A.C. is supported by a Special Research Fund (BOF) scholarship of Ghent University (BOF.DOC.2017.0026.01). R.C. is supported by the Fonds Wetenschappelijk Onderzoek (11Y6218N). T.-W.C. is supported by grants from the Ministry of Science and Technology, Taiwan (MOST-109-2311-B-009 −002). A.U. is supported by research funding from the National Health and Medical Research Council (Australia) and the Leukemia & Lymphoma Society, the Leukemia Foundation and the Snowdome Foundation. G.A. is supported by a postgraduate scholarship from the Translational Cancer Research Network. M.R.W. and N.P.D. acknowledge support from the National Collaborative Research Infrastructure Strategy program, administered by Bioplatforms Australia. We thank N. Yigit, A. Barr, S. Pathak, L. Way and A. Mai for their contributions in library preparation and A. Yunghans, E. Jaeger and A. Moshrefi for their assistance in library organization and sequencing/tracking/data management. This project was funded by the European Union’s Horizon 2020 Research and Innovation Programme under grant agreements 668858 and 826121 to P.M., P.S. and J. Koster and the Concerted Research Action of Ghent University (BOF/GOA 01G00819) to P.M. and K.B.

Author information

Author notes

  1. These authors contributed equally: Lucia Lorenzi, Hua-Sheng Chiu.

Affiliations

  1. Center for Medical Genetics, Ghent University, Ghent, Belgium

    Lucia Lorenzi, Francisco Avila Cobos, Pieter-Jan Volders, Robrecht Cannoodt, Justine Nuytens, Katrien Vanderheyden, Jasper Anckaert, Steve Lefever, Eric J. de Bony, Wim Trypsteen, Fien Gysens, Marieke Vromman, Katleen De Preter, Jo Vandesompele & Pieter Mestdagh

  2. Cancer Research Institute Ghent (CRIG), Ghent, Belgium

    Lucia Lorenzi, Francisco Avila Cobos, Pieter-Jan Volders, Robrecht Cannoodt, Justine Nuytens, Katrien Vanderheyden, Jasper Anckaert, Steve Lefever, Eric J. de Bony, Wim Trypsteen, Fien Gysens, Marieke Vromman, Tim De Meyer, Katleen De Preter, Jo Vandesompele, Pavel Sumazin & Pieter Mestdagh

  3. Texas Children’s Cancer Center, Baylor College of Medicine, Houston TX, USA

    Hua-Sheng Chiu

  4. Illumina, Inc., San Diego CA, USA

    Stephen Gross, Scott Kuersten & Gary P. Schroth

  5. VIB-UGent Center for Medical Biotechnology, VIB, Ghent, Belgium

    Pieter-Jan Volders & Yvan Saeys

  6. Data Mining and Modelling for Biomedicine Group, VIB Center for Inflammation Research, Ghent, Belgium

    Robrecht Cannoodt & Yvan Saeys

  7. Department of Applied Mathematics, Computer Science, and Statistics, Ghent University, Ghent, Belgium

    Robrecht Cannoodt

  8. Data Intuitive, Lebbeke, Belgium

    Robrecht Cannoodt

  9. Australian e-Health Research Centre, Commonwealth Scientific and Industrial Research Organisation, New South Wales, Sydney NSW, Australia

    Aidan P. Tay

  10. Department of Biomedical Sciences, Macquarie University, New South Wales, Sydney NSW, Australia

    Aidan P. Tay

  11. Department of Data Analysis and Mathematical Modelling, Ghent University, Ghent, Belgium

    Tine Goovaerts & Tim De Meyer

  12. Interdisciplinary Nanoscience Centre (iNANO), Department of Molecular Biology and Genetics, Aarhus University, Aarhus, Denmark

    Thomas Birkballe Hansen & Jørgen Kjems

  13. Biogazelle, Zwijnaarde, Belgium

    Nele Nijs

  14. Department of Diagnostic Sciences, Ghent University, Ghent, Belgium

    Tom Taghon

  15. Department of Respiratory Medicine, Ghent University, Ghent, Belgium

    Karim Vermaelen & Ken R. Bracke

  16. Systems Biology Initiative, School of Biotechnology and Biomolecular Sciences, UNSW Sydney, Sydney NSW, Australia

    Nandan P. Deshpande & Marc R. Wilkins

  17. Adult Cancer Program, Lowy Cancer Research Centre, UNSW Sydney, Sydney NSW, Australia

    Govardhan Anande & Ashwin Unnikrishnan

  18. Prince of Wales Clinical School, UNSW Sydney, Sydney NSW, Australia

    Govardhan Anande & Ashwin Unnikrishnan

  19. Institute of Bioinformatics and Systems Biology, National Yang Ming Chiao Tung University, Hsinchu, Taiwan

    Ting-Wen Chen

  20. Department of Oncogenomics, Amsterdam UMC, University of Amsterdam, Amsterdam, The Netherlands

    Jan Koster

Contributions

P.M., J.V. and P.S. conceived the idea and designed and supervised the project. L.L. and H.-S.C. contributed to the implementation and design of most bioinformatic analyses. L.L performed most of the raw sequencing data processing, transcriptome assembly and filtering, polyadenylation classification and most of the presented analyses for quality assessment and characterization of the generated transcriptome. H.-S.C., T.-W.C. and P.S. performed the analyses related to prediction and validation of regulatory interactions mediated by ncRNAs. F.A.C. and K.D.P. performed the analyses to select the RNA Atlas genes and contributed to quality validation of the transcriptome. S.G., S.K. and G.P.S. generated and sequenced the polyA and total RNA libraries. P.-J.V. performed the evaluation of coding potential, analyses of mass spectrometry data, alignment of candidate protein sequences to other animal proteins via BLASTp and analysis of conservation with chimpanzee. R.C. and Y. S. contributed to the analyses of RNA biotype expression and sample ontology associations. J.N. performed the polyA-minus sequencing and the qPCR experiments. K. Vanderheyden and J.N. generated and sequenced the small RNA libraries. J.A. implemented the identification of miRNAs and sequence motif analysis. S.L. designed the primers for the qPCR experiments and contributed to the graphic design of schematic figures. A.P.T. performed the analysis of overlap between ONT reads in public datasets and RNA Atlas-only single-exon genes. E.J.B., W.T. and F.G. performed the experiments of CRISPRi-mediated transcriptional silencing of lncRNA MALAT1. M.V. generated the integrated circRNA reference dataset used for comparisons with RNA Atlas circRNAs. T.G. and T.D.M. performed the imprinting analyses. T.B.H. and J. Kjems implemented the circRNA identification workflow. N.N. developed the polyA-minus sequencing protocol. T.T., K. Vermaelen and K.R.B. provided immune system-related cell lines and cell types. N.P.D., G.A., M.R.W. and A.U. performed analyses and annotation of circRNAs and contributed to the analysis of ONT reads in public datasets. J. Koster developed dedicated tools to analyze RNA Atlas data and results and implemented them in a dedicated RNA Atlas datascope in the online portal R2. P.M. led the writing of the manuscript in collaboration with L.L., H.-S.C. and P.S. L.L., H.-S.C., G.P.S., J.V., P.S. and P.M. contributed to the conceptualization, interpretation and discussion of results. All authors commented on the manuscript and contributed to the presentation of the data and results. The authors acknowledge the Texas Advanced Computing Center (TACC) at The University of Texas at Austin for providing HPC resources that have contributed to the research results reported within this paper.

Corresponding authors

Correspondence to
Pavel Sumazin or Pieter Mestdagh.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Peer review information Nature Biotechnology thanks Steven Salzberg and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

About this article

Verify currency and authenticity via CrossMark

Cite this article

Lorenzi, L., Chiu, HS., Avila Cobos, F. et al. The RNA Atlas expands the catalog of human non-coding RNAs.
Nat Biotechnol (2021). https://doi.org/10.1038/s41587-021-00936-1

Download citation

Read More

Leave a Reply

Your email address will not be published. Required fields are marked *