The scientific ecosystem relies on citation-based metrics that provide only imperfect, inconsistent and easily manipulated measures of research quality. Here we describe DELPHI (Dynamic Early-warning by Learning to Predict High Impact), a framework that provides an early-warning signal for ‘impactful’ research by autonomously learning high-dimensional relationships among features calculated across time from the scientific literature. We prototype this framework and deduce its performance and scaling properties on time-structured publication graphs from 1980 to 2019 drawn from 42 biotechnology-related journals, including over 7.8 million individual nodes, 201 million relationships and 3.8 billion calculated metrics. We demonstrate the framework’s performance by correctly identifying 19/20 seminal biotechnologies from 1980 to 2014 via a blinded retrospective study and provide 50 research papers from 2018 that DELPHI predicts will be in the top 5% of time-rescaled node centrality in the future. We propose DELPHI as a tool to aid in the construction of diversified, impact-optimized funding portfolios.
Subscribe to Journal
Get full journal access for 1 year
only $4.92 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Tax calculation will be finalised during checkout.
Rent or Buy article
Get time limited or full article access on ReadCube.
All prices are NET prices.
The data analyzed are available for download from https://www.lens.org/. Exemplary datasets and retrieval code are further available from GitHub as described in the ‘Code availability’ section.
Exemplary code, datasets, trained models, a visualization application to aid in the analysis of results and Docker-based installation instructions are all available from GitHub at https://github.com/jameswweis/delphi.
McNutt, M. The measure of research merit. Science 346, 1155 (2014).
Not-so-deep impact. Nature 435, 1003–1004 (2005).
Wilhite, A. W. & Fong, E. A. Coercive citation in academic publishing. Science 335, 542–543 (2012).
Seglen, P. O. Why the impact factor of journals should not be used for evaluating research. BMJ 314, 498–502 (1997).
Cumming, D. J. & Dai, N. Local bias in venture capital investments. J. Empirical Finance 17, 362–380 (2010).
Gompers, P., Gornall, W., Kaplan, S. & Strebulaev, I. How Do Venture Capitalists Make Decisions? Working Paper 22587 https://www.nber.org/system/files/working_papers/w22587/w22587.pdf (National Bureau of Economic Research, 2016).
Mulcahy, D., Weeks, B. & Bradley, H. We Have Met The Enemy… and He Is Us: Lessons from Twenty Years of the Kauffman Foundation’s Investments in Venture Capital Funds and the Triumph of Hope over Experience https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2053258 (Kauffman Foundation, 2012).
Funk, R. J. & Owen-Smith, J. A dynamic network measure of technological change. Management Sci. 63, 791–817 (2017).
Mariani, M. S., Medo, M. & Lafond, F. Early identification of important patents: design and validation of citation network metrics. Technol. Forecast. Soc. Change 146, 644–654 (2019).
Wu, L., Wang, D. & Evans, J. A. Large teams develop and small teams disrupt science and technology. Nature 566, 378–382 (2019).
Ma, Y. & Uzzi, B. Scientific prize network predicts who pushes the boundaries of science. Proc. Natl Acad. Sci. USA 115, 12608–12615 (2018).
Battiston, F. et al. Taking census of physics. Nat. Rev. Physics 1, 89–97 (2019).
Acuna, D. E., Allesina, S. & Kording, K. P. Predicting scientific success. Nature 489, 201–202 (2012).
Fu, L. D. & Aliferis, C. F. Using content-based and bibliometric features for machine learning models to predict citation counts in the biomedical literature. Scientometrics 85, 257–270 (2010).
Weihs, L. & Etzioni, O. Learning to predict citation-based impact measures. Proceedings of the 17th ACM/IEEE Joint Conference on Digital Libraries 49–58 http://ai2-website.s3.amazonaws.com/publications/JCDL2017.pdf (2017).
Vidmer, A. & Medo, M. The essential role of time in network-based recommendation. Europhysics Lett. 116, 30007 (2016).
Mariani, M. S., Medo, M. & Zhang, Y.-C. Identification of milestone papers through time-balanced network centrality. J. Informetrics 10, 1207–1223 (2016).
Grover, A. & Leskovec, J. node2vec: scalable feature learning for networks. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining 855–864 https://doi.org/10.1145/2939672.2939754 (2016).
Tachibana, M. et al. G9a histone methyltransferase plays a dominant role in euchromatic histone h3 lysine 9 methylation and is essential for early embryogenesis. Genes Dev. 16, 1779–1791 (2002).
Dykstra, B. et al. Long-term propagation of distinct hematopoietic differentiation programs in vivo. Cell Stem Cell 1, 218–229 (2007).
Nature and biotechnology. Nat. Biotechnol. 37, 1383–1383 (2019).
Xu, S., Mariani, M. S., Lü, L. & Medo, M. Unbiased evaluation of ranking metrics reveals consistent performance in science and technology citation data. J. Informetrics 14, 101005 (2020).
Metcalfe, B. Metcalfe’s law after 40 years of ethernet. Computer 46, 26–31 (2013).
Zhang, X.-Z., Liu, J.-J. & Xu, Z.-W. Tencent and Facebook data validate Metcalfe’s law. J. Comput. Sci. Technol. 30, 246–251 (2015).
Fang, F. C. & Casadevall, A. Research funding: the case for a modified lottery. mBio 7, e00422–16 (2016).
Nicholson, J. M. & Ioannidis, J. P. A. Conform and be funded. Nature 492, 34–36 (2012).
Chawla, N. V., Bowyer, K. W., Hall, L. O. & Kegelmeyer, W. P. SMOTE: synthetic minority over-sampling technique. J. Artificial Intell. Res. 16, 321–357 (2002).
This work was supported by the consortia of sponsors of the MIT Media Lab and the MIT Center for Bits and Atoms. We thank the AWS Cloud Credits for Research program for computational infrastructure and the Lens Lab for providing publication data.
The authors declare no competing interests.
Peer review information Nature Biotechnology thanks Lutz Bornmann and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.