Learning on knowledge graph dynamics provides an early warning of impactful research thumbnail

Learning on knowledge graph dynamics provides an early warning of impactful research

Abstract

The scientific ecosystem relies on citation-based metrics that provide only imperfect, inconsistent and easily manipulated measures of research quality. Here we describe DELPHI (Dynamic Early-warning by Learning to Predict High Impact), a framework that provides an early-warning signal for ‘impactful’ research by autonomously learning high-dimensional relationships among features calculated across time from the scientific literature. We prototype this framework and deduce its performance and scaling properties on time-structured publication graphs from 1980 to 2019 drawn from 42 biotechnology-related journals, including over 7.8 million individual nodes, 201 million relationships and 3.8 billion calculated metrics. We demonstrate the framework’s performance by correctly identifying 19/20 seminal biotechnologies from 1980 to 2014 via a blinded retrospective study and provide 50 research papers from 2018 that DELPHI predicts will be in the top 5% of time-rescaled node centrality in the future. We propose DELPHI as a tool to aid in the construction of diversified, impact-optimized funding portfolios.

Access options

Subscribe to Journal

Get full journal access for 1 year

$59.00

only $4.92 per issue

All prices are NET prices.

VAT will be added later in the checkout.

Tax calculation will be finalised during checkout.

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Data availability

The data analyzed are available for download from https://www.lens.org/. Exemplary datasets and retrieval code are further available from GitHub as described in the ‘Code availability’ section.

Code availability

Exemplary code, datasets, trained models, a visualization application to aid in the analysis of results and Docker-based installation instructions are all available from GitHub at https://github.com/jameswweis/delphi.

References

  1. 1.

    McNutt, M. The measure of research merit. Science 346, 1155 (2014).

    CAS 
    Article 

    Google Scholar
     

  2. 2.

    Not-so-deep impact. Nature 435, 1003–1004 (2005).

  3. 3.

    Wilhite, A. W. & Fong, E. A. Coercive citation in academic publishing. Science 335, 542–543 (2012).

    CAS 
    Article 

    Google Scholar
     

  4. 4.

    Seglen, P. O. Why the impact factor of journals should not be used for evaluating research. BMJ 314, 498–502 (1997).

    CAS 
    Article 

    Google Scholar
     

  5. 5.

    Cumming, D. J. & Dai, N. Local bias in venture capital investments. J. Empirical Finance 17, 362–380 (2010).

    Article 

    Google Scholar
     

  6. 6.

    Gompers, P., Gornall, W., Kaplan, S. & Strebulaev, I. How Do Venture Capitalists Make Decisions? Working Paper 22587 https://www.nber.org/system/files/working_papers/w22587/w22587.pdf (National Bureau of Economic Research, 2016).

  7. 7.

    Mulcahy, D., Weeks, B. & Bradley, H. We Have Met The Enemy… and He Is Us: Lessons from Twenty Years of the Kauffman Foundation’s Investments in Venture Capital Funds and the Triumph of Hope over Experience https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2053258 (Kauffman Foundation, 2012).

  8. 8.

    Funk, R. J. & Owen-Smith, J. A dynamic network measure of technological change. Management Sci. 63, 791–817 (2017).

    Article 

    Google Scholar
     

  9. 9.

    Mariani, M. S., Medo, M. & Lafond, F. Early identification of important patents: design and validation of citation network metrics. Technol. Forecast. Soc. Change 146, 644–654 (2019).

    Article 

    Google Scholar
     

  10. 10.

    Wu, L., Wang, D. & Evans, J. A. Large teams develop and small teams disrupt science and technology. Nature 566, 378–382 (2019).

    CAS 
    Article 

    Google Scholar
     

  11. 11.

    Ma, Y. & Uzzi, B. Scientific prize network predicts who pushes the boundaries of science. Proc. Natl Acad. Sci. USA 115, 12608–12615 (2018).

    CAS 
    Article 

    Google Scholar
     

  12. 12.

    Battiston, F. et al. Taking census of physics. Nat. Rev. Physics 1, 89–97 (2019).

    Article 

    Google Scholar
     

  13. 13.

    Acuna, D. E., Allesina, S. & Kording, K. P. Predicting scientific success. Nature 489, 201–202 (2012).

    CAS 
    Article 

    Google Scholar
     

  14. 14.

    Fu, L. D. & Aliferis, C. F. Using content-based and bibliometric features for machine learning models to predict citation counts in the biomedical literature. Scientometrics 85, 257–270 (2010).

    Article 

    Google Scholar
     

  15. 15.

    Weihs, L. & Etzioni, O. Learning to predict citation-based impact measures. Proceedings of the 17th ACM/IEEE Joint Conference on Digital Libraries 49–58 http://ai2-website.s3.amazonaws.com/publications/JCDL2017.pdf (2017).

  16. 16.

    Vidmer, A. & Medo, M. The essential role of time in network-based recommendation. Europhysics Lett. 116, 30007 (2016).

    Article 

    Google Scholar
     

  17. 17.

    Mariani, M. S., Medo, M. & Zhang, Y.-C. Identification of milestone papers through time-balanced network centrality. J. Informetrics 10, 1207–1223 (2016).

    Article 

    Google Scholar
     

  18. 18.

    Grover, A. & Leskovec, J. node2vec: scalable feature learning for networks. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining 855–864 https://doi.org/10.1145/2939672.2939754 (2016).

  19. 19.

    Tachibana, M. et al. G9a histone methyltransferase plays a dominant role in euchromatic histone h3 lysine 9 methylation and is essential for early embryogenesis. Genes Dev. 16, 1779–1791 (2002).

    CAS 
    Article 

    Google Scholar
     

  20. 20.

    Dykstra, B. et al. Long-term propagation of distinct hematopoietic differentiation programs in vivo. Cell Stem Cell 1, 218–229 (2007).

    CAS 
    Article 

    Google Scholar
     

  21. 21.

    Nature and biotechnology. Nat. Biotechnol. 37, 1383–1383 (2019).

  22. 22.

    Xu, S., Mariani, M. S., Lü, L. & Medo, M. Unbiased evaluation of ranking metrics reveals consistent performance in science and technology citation data. J. Informetrics 14, 101005 (2020).

    Article 

    Google Scholar
     

  23. 23.

    Metcalfe, B. Metcalfe’s law after 40 years of ethernet. Computer 46, 26–31 (2013).

    Article 

    Google Scholar
     

  24. 24.

    Zhang, X.-Z., Liu, J.-J. & Xu, Z.-W. Tencent and Facebook data validate Metcalfe’s law. J. Comput. Sci. Technol. 30, 246–251 (2015).

    Article 

    Google Scholar
     

  25. 25.

    Fang, F. C. & Casadevall, A. Research funding: the case for a modified lottery. mBio 7, e00422–16 (2016).

  26. 26.

    Nicholson, J. M. & Ioannidis, J. P. A. Conform and be funded. Nature 492, 34–36 (2012).

    CAS 
    Article 

    Google Scholar
     

  27. 27.

    Chawla, N. V., Bowyer, K. W., Hall, L. O. & Kegelmeyer, W. P. SMOTE: synthetic minority over-sampling technique. J. Artificial Intell. Res. 16, 321–357 (2002).

    Article 

    Google Scholar
     

Download references

Acknowledgements

This work was supported by the consortia of sponsors of the MIT Media Lab and the MIT Center for Bits and Atoms. We thank the AWS Cloud Credits for Research program for computational infrastructure and the Lens Lab for providing publication data.

Author information

Affiliations

  1. MIT Media Lab, Massachusetts Institute of Technology, Cambridge, MA, USA

    James W. Weis & Joseph M. Jacobson

  2. Department of Computational and Systems Biology, Massachusetts Institute of Technology, Cambridge, MA, USA

    James W. Weis

  3. MIT Center for Bits and Atoms, Massachusetts Institute of Technology, Cambridge, MA, USA

    Joseph M. Jacobson

Contributions

J.W.W. and J.M.J. conceived the study. J.W.W. performed the data structuring, algorithm design and computational implementation. J.W.W. and J.M.J. drafted the manuscript and figures. J.M.J. supported and supervised the project.

Corresponding author

Correspondence to
James W. Weis.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Peer review information Nature Biotechnology thanks Lutz Bornmann and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

About this article

Verify currency and authenticity via CrossMark

Cite this article

Weis, J.W., Jacobson, J.M. Learning on knowledge graph dynamics provides an early warning of impactful research.
Nat Biotechnol (2021). https://doi.org/10.1038/s41587-021-00907-6

Download citation

Read More

Leave a Reply

Your email address will not be published. Required fields are marked *