- •Inadequate planning and lack of research coordination is an important source of research waste. We need new tools to improve research planning.
- •Phylomemy reconstruction process—a new text mining method—allows exploring the temporal evolution of registered clinical trials.
- •Phylomemy reconstruction synthesizes the complexity of the knowledge produced by the research community and provides deep understanding of research planning over time.
Study Design and Setting
- •The phylomemy reconstruction process applied to data automatically extracted from registries and annotated by epidemiologists allows identifying the evolution of main research questions.
- •The phylomemy reconstruction process brings insights for the global coordination between research teams toward the creation of an observatory of the evolution of international clinical trials.
What this adds to what is known?
- •We need to develop high-quality observatory of clinical trials based on data registered within clinical trials registries. New methods and tools such as the phylomemy reconstruction process are needed to explore these data and improve research planning coordination.
What is the implication, what should change now?
2. Materials and methods
2.1 The COVID-NMA database
2.2 Preprocessing the COVID-NMA database
2.3 The phylomemy reconstruction process
- 1.Terms indexation. By means of natural language processing algorithms and human validations—natural language processing algorithms and human validations are handled by the free software Gargantext []; we first extract from an original corpus of documents (Fig. 1.0) a core vocabulary as a list of sets of equivalent expressions called roots (Fig. 1.1).
- 2.Similarity measure. Within each period of time and on the basis of its co-occurrences matrix, we estimate the semantic similarity between roots using the confidence measure []. The completion of this task results in a temporal series of graphs of similarity (Fig. 1.2).
- 3.Fields detection. For each period, a community detection algorithm—the frequent item set method []—is applied to detect subsets of densely connected roots within the graphs of similarity. Theses subsets are called fields (Fig. 1.3) and their aggregated root expressions describe consistent research topics that were explored at a given period.
- 4.Intertemporal matching. A temporal matching algorithm is then applied to identify meaningful kinship connections between fields from one period of time to another, that is, fields that belong to the same research stream. We finally highlight the different research streams over time and called them branches of knowledge (Fig. 1.4).
2.4 Visualizing phylomemies
3. Description of the resulting phylomemy
4. Following the worldwide tracks of COVID-19 vaccines
4.1 General observations
4.2 Repurposing non-COVID vaccines
4.3 Heterologous vaccination
4.5 Filters and specific research questions
5.1 Why we need new approaches such as phylomemies?
- •Integrating other COVID-NMA metadata. In consultation with epidemiologists, we have enriched our visualizations with the possibility to filter on a selected set of metadata such as the participants characteristics (age, pregnancy, etc.) (4.5). This list can be extended to all the structured fields of the COVID-NMA database by simply changing the preprocessing script (2.2). Furthermore, as phylomemies are designed to reveal the structure out of unstructured data, the COVID-NMA database could even be enriched with new fields such as subcategories of vaccine trials: ‘heterologous’, ‘boosters’, etc. In due time, our approach could influence the way scientists share trials information in registries by standardizing new metadata and unstructured textual content.
- •Working on COVID-19 treatments instead of vaccines. Choosing between visualizing COVID-19 vaccines or treatments is also a matter of selecting the right field in the COVID-NMA database. Yet, we will still have to create a new core vocabulary (2.2). But thanks to Gargantext (the free text-mining software used to annotate the vaccines descriptions upstream from the phylomemies), it will only take a few days to collaboratively achieve this task and annotate hundreds of trials descriptions. A preliminary study can be found in [
- •Visualizing trials related to another disease. Since the beginning of this study, the process designed to fill and integrate the COVID-NMA database has evolved making it possible to construct easily new databases gathering data on other kind of diseases. Indeed, the process to extract raw data from registries has a large generic step where we extract the description of each trial including information on its design, the inclusion/exclusion criteria for patients, the description of arms, and the set of outcomes. As an example, we have reconstructed the phylomemy of 1,798 trials related to Alzheimer disease and extracted from the WHO international registries. The resulting visualization can be explored at http://maps.gargantext.org/phylo/alzheimer/.
- •Analyzing publication data. Phylomemies have been first designed to visualize the content of scientific publications. Thanks to the recent integration of phylomemies to the free software Gargantext, one can already reconstruct the temporal structure of various corpora of thousands of articles extracted from PubMed or from the Web of Science. In the context of a pandemic like COVID-19 pandemic, it would be relevant to explore in parallel both the phylomemy on trials and the phylomemy on articles or preprint to get a comprehensive view of the scientific landscape. A first exploration of COVID-19 literature can be found at http://maps.gargantext.org/maps/covid-19 in the form of a semantic graph.
- •Monitoring various types of clinical trials on the fly. If all placed end-to-end, the continuous integration of WHO international registries within the COVID-NMA enrichment pipeline and through the phylomemy reconstruction process would enable the dynamic analysis of any kind of trials content as they arise. Such analytical workflow will require the creation of two teams: an integration team able to deal with the possible evolution of the registries and an annotation team dedicated to the creation and update of core vocabularies, but it would allow better coordination of scientific teams around the world in all medical fields and accelerate medical discoveries.
5.4 Perspectives and insights for COVID-19 research
- Phylogenetic network analysis of sars-cov-2 genomes.Proc Natl Acad Sci U S A. 2020; 117: 9241-9243
- Early indicators of intensive care unit bed requirement during the covid-19 epidemic: a retrospective study in ile-de-France region, France.PLoS One. 2020; 15: 1-12
- Research response to coronavirus disease 2019 needed better coordination and collaboration: a living mapping of registered trials.J Clin Epidemiol. 2021; 130: 107-116
- The COVID-NMA project: building an evidence ecosystem for the COVID-19 pandemic.Ann Intern Med. 2020; 173: 1015-1017
- Draw me science – multi-level and multi-scale reconstruction of knowledge dynamics with phylomemies.Scientometrics. 2021; 22: 1-31
Delanoë A., Chavalarias D. Mining the digital society - Gargantext, a macroscope for collaborative analysis and exploration of textual corpora. forthcoming.
- Phylomemetic patterns in science evolution—the rise and fall of scientific fields.PloS one. 2013; 8: e54847
- Mapping general-specific noun relationships to wordnet hypernym/hyponym relations.in: International conference on knowledge engineering and knowledge management. Springer, Berlin, Heidelberg2008: 198-212
- Lcm ver. 2: Efficient mining algorithms for frequent/closed/maximal itemsets.in: Fimi. 2004
- Exploring, browsing and interacting with multi-level and multi-scale dynamics of knowledge.Inf Vis. 2021; 21: 17-37
- Old vaccines for new infections: exploiting innate immunity to control covid-19 and prevent future pandemics.Proc Natl Acad Sci U S A. 2021; 118
- Considerations in boosting COVID-19 vaccine immune responses.Lancet. 2021; 398: 1377-1380
Guarantor: David Chavalarias, Complex Systems Institute of Paris Île-de-France, 113 Rue Nationale, 75013 Paris, France [email protected]
Data availability: The original COVID-NMA database can be downloaded at covid-nma.com. The preprocessing script can be downloaded at https://doi.org/10.7910/DVN/JTRI7A. The full list of root terms is available at https://doi.org/10.7910/DVN/JTRI7A. The reconstructed phylomemy is available for live explorations at http://maps.gargantext.org/phylo/vaccines_publications_10_2021/ and downloadable at https://doi.org/10.7910/DVN/JTRI7A.