inStrain profiles population microdiversity from metagenomic data and sensitively detects shared microbial strains thumbnail

inStrain profiles population microdiversity from metagenomic data and sensitively detects shared microbial strains

Abstract

Coexisting microbial cells of the same species often exhibit genetic variation that can affect phenotypes ranging from nutrient preference to pathogenicity. Here we present inStrain, a program that uses metagenomic paired reads to profile intra-population genetic diversity (microdiversity) across whole genomes and compares microbial populations in a microdiversity-aware manner, greatly increasing the accuracy of genomic comparisons when benchmarked against existing methods. We use inStrain to profile >1,000 fecal metagenomes from newborn premature infants and find that siblings share significantly more strains than unrelated infants, although identical twins share no more strains than fraternal siblings. Infants born by cesarean section harbor Klebsiella with significantly higher nucleotide diversity than infants delivered vaginally, potentially reflecting acquisition from hospital rather than maternal microbiomes. Genomic loci that show diversity in individual infants include variants found between other infants, possibly reflecting inoculation from diverse hospital-associated sources. inStrain can be applied to any metagenomic dataset for microdiversity analysis and rigorous strain comparison.

Data availability

The data supporting the findings of this study are available within the paper and its supplementary information files. Reads from infant samples are available under BioProject PRJNA294605 (SRA studies SRP052967, SRP114966 and SRP012558; and SRA accessions SRR5405607 to SRR5406014), reads from Zymo samples are available under BioProject PRJNA648136 and de novo assembled genomes are available at https://doi.org/10.6084/m9.figshare.c.4740080.v1.

Code availability

inStrain is available as an open-source Python program on GitHub (https://github.com/MrOlm/inStrain) and documentation is online at https://instrain.readthedocs.io/en/latest/.

References

  1. 1.

    Zhao, S. et al. Adaptive evolution within gut microbiomes of healthy people. Cell Host Microbe 25, 656–667.e8 (2019).

    CAS 
    PubMed 
    PubMed Central 
    Article 

    Google Scholar
     

  2. 2.

    Schloissnig, S. et al. Genomic variation landscape of the human gut microbiome. Nature 493, 45–50 (2012).

    CAS 
    PubMed 
    PubMed Central 
    Article 

    Google Scholar
     

  3. 3.

    Simmons, S. L. et al. Population genomic analysis of strain variation in Leptospirillum group II bacteria involved in acid mine drainage formation. PLoS Biol. 6, e177 (2008).

    PubMed 
    PubMed Central 
    Article 
    CAS 

    Google Scholar
     

  4. 4.

    Eppley, J. M., Tyson, G. W., Getz, W. M. & Banfield, J. F. Genetic exchange across a species boundary in the archaeal genus Ferroplasma. Genetics 177, 407–416 (2007).

    CAS 
    PubMed 
    PubMed Central 
    Article 

    Google Scholar
     

  5. 5.

    Good, B. H., McDonald, M. J., Barrick, J. E., Lenski, R. E. & Desai, M. M. The dynamics of molecular evolution over 60,000 generations. Nature https://doi.org/10.1038/nature24287 (2017).

  6. 6.

    Ignacio-Espinoza, J. C., Ahlgren, N. A. & Fuhrman, J. A. Long-term stability and Red Queen-like strain dynamics in marine viruses. Nat. Microbiol. https://doi.org/10.1038/s41564-019-0628-x (2019).

  7. 7.

    Bendall, M. L. et al. Genome-wide selective sweeps and gene-specific sweeps in natural bacterial populations. ISME J. 10, 1589–1601 (2016).

    PubMed 
    PubMed Central 
    Article 

    Google Scholar
     

  8. 8.

    Delmont, T. O. et al. Single-amino acid variants reveal evolutionary processes that shape the biogeography of a global SAR11 subclade. eLife 8, e46497 (2019).

    PubMed 
    PubMed Central 
    Article 

    Google Scholar
     

  9. 9.

    Garud, N. R., Good, B. H., Hallatschek, O. & Pollard, K. S. Evolutionary dynamics of bacteria in the gut microbiome within and across hosts. PLoS Biol. 17, e3000102 (2019).

    PubMed 
    PubMed Central 
    Article 

    Google Scholar
     

  10. 10.

    Smillie, C. S. et al. Strain tracking reveals the determinants of bacterial engraftment in the human gut following fecal microbiota transplantation. Cell Host Microbe 23, 229–240.e5 (2018).

    CAS 
    PubMed 
    Article 
    PubMed Central 

    Google Scholar
     

  11. 11.

    Siranosian, B. A., Tamburini, F. B., Sherlock, G. & Bhatt, A. S. Acquisition, transmission and strain diversity of human gut-colonizing crAss-like phages. Nat. Commun. 11, 280 (2020).

    CAS 
    PubMed 
    PubMed Central 
    Article 

    Google Scholar
     

  12. 12.

    Crits-Christoph, A., Olm, M. R., Diamond, S., Bouma-Gregson, K. & Banfield, J. F. Soil bacterial populations are shaped by recombination and gene-specific selection across a grassland meadow. ISME J. 14, 1834–1846 (2020).

    CAS 
    PubMed 
    PubMed Central 
    Article 

    Google Scholar
     

  13. 13.

    Sharon, I. et al. Time series community genomics analysis reveals rapid shifts in bacterial species, strains, and phage during infant gut colonization. Genome Res. 23, 111–120 (2013).

    CAS 
    PubMed 
    PubMed Central 
    Article 

    Google Scholar
     

  14. 14.

    Shao, Y. et al. Stunted microbiota and opportunistic pathogen colonization in caesarean-section birth. Nature 574, 117–121 (2019).

    CAS 
    PubMed 
    PubMed Central 
    Article 

    Google Scholar
     

  15. 15.

    Korpela, K. et al. Selective maternal seeding and environment shape the human gut microbiome. Genome Res. https://doi.org/10.1101/gr.233940.117 (2018).

  16. 16.

    Brooks, B. et al. Strain-resolved analysis of hospital rooms and infants reveals overlap between the human and room microbiome. Nat. Commun. 8, 1814 (2017).

    PubMed 
    PubMed Central 
    Article 
    CAS 

    Google Scholar
     

  17. 17.

    Olm, M. R., Brown, C. T., Brooks, B. & Banfield, J. F. dRep: a tool for fast and accurate genomic comparisons that enables improved genome recovery from metagenomes through de-replication. ISME J. 11, 2864–2868 (2017).

    CAS 
    PubMed 
    PubMed Central 
    Article 

    Google Scholar
     

  18. 18.

    Truong, D. T., Tett, A., Pasolli, E., Huttenhower, C. & Segata, N. Microbial strain-level population structure and genetic diversity from metagenomes. Genome Res. 27, 626–638 (2017).

    CAS 
    PubMed 
    PubMed Central 
    Article 

    Google Scholar
     

  19. 19.

    Nayfach, S., Rodriguez-Mueller, B., Garud, N. & Pollard, K. S. An integrated metagenomics pipeline for strain profiling reveals novel patterns of bacterial transmission and biogeography. Genome Res. 26, 1612–1625 (2016).

    CAS 
    PubMed 
    PubMed Central 
    Article 

    Google Scholar
     

  20. 20.

    Brito, I. L. et al. Transmission of human-associated microbiota along family and social networks. Nat. Microbiol. 4, 964–971 (2019).

    CAS 
    PubMed 
    PubMed Central 
    Article 

    Google Scholar
     

  21. 21.

    Costea, P. I. et al. metaSNV: a tool for metagenomic strain level analysis. PLoS ONE 12, e0182392 (2017).

    PubMed 
    PubMed Central 
    Article 
    CAS 

    Google Scholar
     

  22. 22.

    Truong, D. T. et al. MetaPhlAn2 for enhanced metagenomic taxonomic profiling. Nat. Methods 12, 902–903 (2015).

    CAS 
    PubMed 
    Article 
    PubMed Central 

    Google Scholar
     

  23. 23.

    Nei, M. & Li, W. H. Mathematical model for studying genetic variation in terms of restriction endonucleases. Proc. Natl Acad. Sci. USA 76, 5269–5273 (1979).

    CAS 
    PubMed 
    Article 
    PubMed Central 

    Google Scholar
     

  24. 24.

    Almeida, A. et al. A unified catalog of 204,938 reference genomes from the human gut microbiome. Nat. Biotechnol. https://doi.org/10.1038/s41587-020-0603-3 (2020).

  25. 25.

    Olm, M. R. et al. Necrotizing enterocolitis is preceded by increased gut bacterial replication, Klebsiella, and fimbriae-encoding bacteria. Sci. Adv. 5, eaax5727 (2019).

    CAS 
    PubMed 
    PubMed Central 
    Article 

    Google Scholar
     

  26. 26.

    Schirmer, M., D’Amore, R., Ijaz, U. Z., Hall, N. & Quince, C. Illumina error profiles: resolving fine-scale variation in metagenomic sequencing data. BMC Bioinf. 17, 125 (2016).

    Article 
    CAS 

    Google Scholar
     

  27. 27.

    Lobocka, M. & Yarmolinsky, M. P1 plasmid partition: a mutational analysis of ParB. J. Mol. Biol. 259, 366–382 (1996).

    CAS 
    PubMed 
    Article 
    PubMed Central 

    Google Scholar
     

  28. 28.

    Fu, W. et al. First structure of the polymyxin resistance proteins. Biochem. Biophys. Res. Commun. 361, 1033–1037 (2007).

    CAS 
    PubMed 
    Article 
    PubMed Central 

    Google Scholar
     

  29. 29.

    Yang, F. et al. Novel fold and capsid-binding properties of the λ-phage display platform protein gpD. Nat. Struct. Biol. 7, 230–237 (2000).

    CAS 
    PubMed 
    Article 
    PubMed Central 

    Google Scholar
     

  30. 30.

    Bodelón, G., Palomino, C. & Fernández, L. Á. Immunoglobulin domains in Escherichia coli and other enterobacteria: from pathogenesis to applications in antibody technologies. FEMS Microbiol. Rev. 37, 204–250 (2013).

    PubMed 
    Article 
    CAS 

    Google Scholar
     

  31. 31.

    Tétart, F., Repoila, F., Monod, C. & Krisch, H. M. Bacteriophage T4 host range is expanded by duplications of a small domain of the tail fiber adhesin. J. Mol. Biol. 258, 726–731 (1996).

    PubMed 
    Article 

    Google Scholar
     

  32. 32.

    Vatanen, T. et al. Genomic variation and strain-specific functional adaptation in the human gut microbiome during early life. Nat. Microbiol. 4, 470–479 (2019).

    CAS 
    PubMed 
    Article 

    Google Scholar
     

  33. 33.

    Yassour, M. et al. Strain-level analysis of mother-to-child bacterial transmission during the first few months of life. Cell Host Microbe 24, 146–154.e4 (2018).

    CAS 
    PubMed 
    PubMed Central 
    Article 

    Google Scholar
     

  34. 34.

    Eren, A. M. et al. Anvi’o: an advanced analysis and visualization platform for ‘omics data. PeerJ 3, e1319 (2015).

    PubMed 
    PubMed Central 
    Article 

    Google Scholar
     

  35. 35.

    Brito, I. L. & Alm, E. J. Tracking strains in the microbiome: insights from metagenomics and models. Front. Microbiol. 7, 712 (2016).

    PubMed 
    PubMed Central 

    Google Scholar
     

  36. 36.

    Ferretti, P. et al. Mother-to-infant microbial transmission from different body sites shapes the developing infant gut microbiome. Cell Host Microbe 24, 133–145.e5 (2018).

    CAS 
    PubMed 
    PubMed Central 
    Article 

    Google Scholar
     

  37. 37.

    Goodrich, J. K. et al. Genetic determinants of the gut microbiome in UK twins. Cell Host Microbe 19, 731–743 (2016).

    CAS 
    PubMed 
    PubMed Central 
    Article 

    Google Scholar
     

  38. 38.

    Lim, M. Y. et al. The effect of heritability and host genetics on the gut microbiota and metabolic syndrome. Gut 66, 1031–1038 (2017).

    CAS 
    PubMed 
    Article 

    Google Scholar
     

  39. 39.

    Turpin, W. et al. Association of host genome with intestinal microbial composition in a large healthy cohort. Nat. Genet. 48, 1413–1417 (2016).

    CAS 
    PubMed 
    Article 

    Google Scholar
     

  40. 40.

    Davenport, E. R. et al. Genome-wide association studies of the human gut microbiota. PLoS ONE 10, e0140301 (2015).

    PubMed 
    PubMed Central 
    Article 
    CAS 

    Google Scholar
     

  41. 41.

    Goodrich, J. K., Davenport, E. R., Clark, A. G. & Ley, R. E. The relationship between the human genome and microbiome comes into view. Annu. Rev. Genet. 51, 413–433 (2017).

    CAS 
    PubMed 
    PubMed Central 
    Article 

    Google Scholar
     

  42. 42.

    Spor, A., Koren, O. & Ley, R. Unravelling the effects of the environment and host genotype on the gut microbiome. Nat. Rev. Microbiol. 9, 279–290 (2011).

    CAS 
    PubMed 
    Article 
    PubMed Central 

    Google Scholar
     

  43. 43.

    Teucher, B. et al. Dietary patterns and heritability of food choice in a UK female twin cohort. Twin Res. Hum. Genet. 10, 734–748 (2007).

    PubMed 
    Article 
    PubMed Central 

    Google Scholar
     

  44. 44.

    Vinkhuyzen, A. A. E., van der Sluis, S., de Geus, E. J. C., Boomsma, D. I. & Posthuma, D. Genetic influences on ‘environmental’ factors. Genes Brain Behav. 9, 276–287 (2010).

    CAS 
    PubMed 
    Article 
    PubMed Central 

    Google Scholar
     

  45. 45.

    Faith, J. J. et al. The long-term stability of the human gut microbiota. Science 341, 1237439–1237439 (2013).

    PubMed 
    PubMed Central 
    Article 
    CAS 

    Google Scholar
     

  46. 46.

    Ding, T. & Schloss, P. D. Dynamics and associations of microbial community types across the human body. Nature 509, 357–360 (2014).

    CAS 
    PubMed 
    PubMed Central 
    Article 

    Google Scholar
     

  47. 47.

    Shin, H. et al. The first microbial environment of infants born by C-section: the operating room microbes. Microbiome 3, 59 (2015).

    PubMed 
    PubMed Central 
    Article 

    Google Scholar
     

  48. 48.

    Thévenon, S. & Couvet, D. The impact of inbreeding depression on population survival depending on demographic parameters. Anim. Conserv. 5, 53–60 (2002).

    Article 

    Google Scholar
     

  49. 49.

    Oh, J., Byrd, A. L., Park, M., Kong, H. H. & Segre, J. A. Temporal stability of the human skin microbiome. Cell 165, 854–866 (2016).

    CAS 
    PubMed 
    PubMed Central 
    Article 

    Google Scholar
     

  50. 50.

    Jovel, J. et al. Characterization of the gut microbiome using 16S or shotgun metagenomics. Front. Microbiol. 7, 459 (2016).

    PubMed 
    PubMed Central 
    Article 

    Google Scholar
     

  51. 51.

    Yekutieli, D. & Benjamini, Y. Resampling-based false discovery rate controlling multiple test procedures for correlated test statistics. J. Stat. Plan. Inference 82, 171–196 (1999).

    Article 

    Google Scholar
     

  52. 52.

    Li, H. et al. The sequence alignment/map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).

    PubMed 
    PubMed Central 
    Article 
    CAS 

    Google Scholar
     

  53. 53.

    Hyatt, D. et al. Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinf. 11, 119 (2010).

    Article 
    CAS 

    Google Scholar
     

  54. 54.

    McKinney, W. et al. Data structures for statistical computing in python. in Proc. 9th Python in Science Conf. 445, 51–56 (2010).

  55. 55.

    Jones, E., Oliphant, T. & Peterson, P. SciPy: open source scientific tools for Python (SciPy Developers, 2001); http://scipy.org

  56. 56.

    Harris, C. R. et al. Array programming with NumPy. Nature 585, 357–362 (2020).

    CAS 
    PubMed 
    Article 
    PubMed Central 

    Google Scholar
     

  57. 57.

    Hunter, J. D. Matplotlib: a 2D graphics environment. Comput. Sci. Eng. 9, 90–95 (2007).


    Google Scholar
     

  58. 58.

    Waskom, M. et al. mwaskom/seaborn: v0.11.1. https://doi.org/10.5281/ZENODO.592845 (2020).

  59. 59.

    VanLiere, J. M. & Rosenberg, N. A. Mathematical properties of the r2 measure of linkage disequilibrium. Theor. Popul. Biol. 74, 130–137 (2008).

    PubMed 
    PubMed Central 
    Article 

    Google Scholar
     

  60. 60.

    Davis, S. et al. CFSAN SNP pipeline: an automated method for constructing SNP matrices from next-generation sequence data. PeerJ Comput. Sci. 1, e20 (2015).

    Article 

    Google Scholar
     

  61. 61.

    Hu, X. et al. pIRS: profile-based Illumina pair-end reads simulator. Bioinformatics 28, 1533–1535 (2012).

    PubMed 
    Article 
    CAS 
    PubMed Central 

    Google Scholar
     

  62. 62.

    Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–359 (2012).

    CAS 
    PubMed 
    PubMed Central 
    Article 

    Google Scholar
     

  63. 63.

    Cock, P. J. A. et al. Biopython: freely available Python tools for computational molecular biology and bioinformatics. Bioinformatics 25, 1422–1423 (2009).

    CAS 
    PubMed 
    PubMed Central 
    Article 

    Google Scholar
     

  64. 64.

    Peng, Y., Leung, H. C. M., Yiu, S. M. & Chin, F. Y. L. IDBA-UD: a de novo assembler for single-cell and metagenomic sequencing data with highly uneven depth. Bioinformatics 28, 1420–1428 (2012).

    CAS 
    PubMed 
    Article 
    PubMed Central 

    Google Scholar
     

  65. 65.

    Delcher, A. L., Phillippy, A., Carlton, J. & Salzberg, S. L. Fast algorithms for large-scale genome alignment and comparison. Nucleic Acids Res. 30, 2478–2483 (2002).

    PubMed 
    PubMed Central 
    Article 

    Google Scholar
     

  66. 66.

    Bushnell, B., Rood, J. & Singer, E. BBMerge—accurate paired shotgun read merging via overlap. PLoS ONE 12, e0185056 (2017).

    PubMed 
    PubMed Central 
    Article 
    CAS 

    Google Scholar
     

  67. 67.

    Krzywinski, M. et al. Circos: an information aesthetic for comparative genomics. Genome Res. 19, 1639–1645 (2009).

    CAS 
    PubMed 
    PubMed Central 
    Article 

    Google Scholar
     

  68. 68.

    El-Gebali, S. et al. The Pfam protein families database in 2019. Nucleic Acids Res. https://doi.org/10.1093/nar/gky995 (2018).

  69. 69.

    Olm, M. R. et al. Consistent metagenome-derived metrics verify and delineate bacterial species boundaries. mSystems https://doi.org/10.1128/mSystems.00731-19 (2020).

Download references

Acknowledgements

This research was supported by the National Institutes of Health (NIH) under award no. RAI092531A to J.F.B. and M.J.M., the Alfred P. Sloan Foundation under grant no. APSF-2012-10-05 to J.F.B., a National Science Foundation Graduate Research Fellowship to M.R.O. under grant no. DGE 1106400 and Chan Zuckerberg Biohub. The study was approved by the University of Pittsburgh Institutional Review Board (protocol no. PRO10090089).

Author information

Author notes

  1. Matthew R. Olm

    Present address: Department of Microbiology and Immunology, Stanford University School of Medicine, Stanford, CA, USA

Affiliations

  1. Department of Earth and Planetary Science, University of California, Berkeley, CA, USA

    Matthew R. Olm & Jillian F. Banfield

  2. Department of Plant and Microbial Biology, University of California, Berkeley, CA, USA

    Matthew R. Olm & Alexander Crits-Christoph

  3. Office of Information Management and Analysis, California State Water Resources Control Board, Sacramento, CA, USA

    Keith Bouma-Gregson

  4. Department of Surgery, University of Pittsburgh School of Medicine, Pittsburgh, PA, USA

    Brian A. Firek & Michael J. Morowitz

  5. Department of Environmental Science, Policy, and Management, University of California, Berkeley, CA, USA

    Jillian F. Banfield

  6. Earth Sciences Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA

    Jillian F. Banfield

  7. Chan Zuckerberg Biohub, San Francisco, CA, USA

    Jillian F. Banfield

Contributions

M.R.O., M.J.M. and J.F.B. designed the study. M.R.O. performed metagenomic analyses. M.R.O., A.C.-C. and K.B.-G. contributed to software development and population genomic analyses. B.A.F. performed all DNA extractions. M.R.O. and J.F.B. wrote the manuscript and all authors contributed to manuscript revisions.

Corresponding author

Correspondence to
Jillian F. Banfield.

Ethics declarations

Competing interests

J.F.B. is a founder of Metagenomi.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Table 1.

Information related to inStrain benchmarking.

Supplementary Table 2.

Ability of inStrain, MIDAS and MetaPhlAn2 to detect genomes present in the Zymo samples using public reference genomes.

Supplementary Table 3.

Raw data related to comparison of SNV detection by metagenomic inStrain analysis compared with isolate sequencing.

Supplementary Table 4.

Strain-level comparisons within infant samples, between infant coReads and strain identities.

Supplementary Table 5.

Abundance of subspecies in all infants, individual samples and controls, and information about subspecies genomes and representatives.

Supplementary Table 6.

Detailed SNS and SNV information for Enterococcus faecalis bacteriophage subspecies 482_10.ph.

About this article

Verify currency and authenticity via CrossMark

Cite this article

Olm, M.R., Crits-Christoph, A., Bouma-Gregson, K. et al. inStrain profiles population microdiversity from metagenomic data and sensitively detects shared microbial strains.
Nat Biotechnol (2021). https://doi.org/10.1038/s41587-020-00797-0

Download citation

Read More

Leave a Reply

Your email address will not be published. Required fields are marked *