Advertisement
Original Article| Volume 126, P116-121, October 2020

The Emtree term “diagnostic test accuracy study” retrieved less than half of the diagnostic accuracy studies in Embase

  • Author Footnotes
    1 Both authors have contributed equally.
    Pema Gurung
    Footnotes
    1 Both authors have contributed equally.
    Affiliations
    Amsterdam UMC, University of Amsterdam, Department of Clinical Epidemiology, Biostatistics and Bioinformatics, Amsterdam Public Health, Meibergdreef 9, Amsterdam, The Netherlands
    Search for articles by this author
  • Author Footnotes
    1 Both authors have contributed equally.
    Sahile Makineli
    Footnotes
    1 Both authors have contributed equally.
    Affiliations
    Amsterdam UMC, University of Amsterdam, Department of Clinical Epidemiology, Biostatistics and Bioinformatics, Amsterdam Public Health, Meibergdreef 9, Amsterdam, The Netherlands
    Search for articles by this author
  • René Spijker
    Affiliations
    Amsterdam UMC, University of Amsterdam, Medical Library, Amsterdam Public Health, Meibergdreef 9, Amsterdam, The Netherlands

    Cochrane Netherlands, Julius Center for Health Sciences and Primary Care, UMC Utrecht, Utrecht University, Utrecht, The Netherlands
    Search for articles by this author
  • Mariska M.G. Leeflang
    Correspondence
    Corresponding author. Tel.: +31(0)205666934.
    Affiliations
    Amsterdam UMC, University of Amsterdam, Department of Clinical Epidemiology, Biostatistics and Bioinformatics, Amsterdam Public Health, Meibergdreef 9, Amsterdam, The Netherlands
    Search for articles by this author
  • Author Footnotes
    1 Both authors have contributed equally.
Open AccessPublished:June 29, 2020DOI:https://doi.org/10.1016/j.jclinepi.2020.06.030

      Abstract

      Objectives

      Embase is a biomedical and pharmacological bibliographic database of published literature, produced by Elsevier. In 2011, Embase introduced the Emtree term “diagnostic test accuracy study,” after discussion with the diagnostic test accuracy (DTA) community of Cochrane. The aim of this study is to investigate the performance of this Emtree term when used to retrieve diagnostic accuracy studies.

      Study Design and Setting

      We first piloted a random selection of 1,000 titles from Embase and then repeated the process with 1,223 studies specifically limited to humans. Two researchers independently screened those for eligibility. From titles that were indicated as being relevant or potentially relevant by at least one assessor, the full texts were retrieved and screened. A third researcher retrieved the Emtree terms for each title and checked whether “diagnostic test accuracy study” was one of the attached Emtree terms. The results of both exercises were then cross-classified, and sensitivity and specificity of the Emtree term were estimated.

      Results

      Our pilot set consisted of 1,000 studies, of which 20 (2.0%) were studies from which DTA data could be extracted. Thirteen studies had the label DTA study, of which five were indeed DTA studies. The final set consisted of 1,223 studies, of which 33 (2.7%) were DTA studies. Twenty studies were labeled as DTA study, of which fourteen indeed were DTA studies. This resulted in a sensitivity of 42.4% (95% CI: 25.5% to 60.8%) and a specificity of 99.5% (95% CI: 98.9% to 99.8%).

      Conclusion

      Although we planned to include a more focused set of studies in our second attempt, the percentage of DTA studies was similar in both attempts. The DTA label failed to retrieve most of the DTA studies and 30% of the studies labeled as being DTA study were in fact not DTA studies. The Emtree term DTA study does not meet the requirements to be useful for retrieving DTA studies accurately.

      Keywords

      What is new?

        Key findings

      • The Emtree term “diagnostic test accuracy study” retrieved less than half of the diagnostic accuracy studies in Embase.

        What this adds to what is known?

      • The Emtree term “diagnostic test accuracy study” was introduced after discussions between Embase and Cochrane, with the aim to facilitate retrieval of diagnostics test accuracy studies.

        What is the implication and what should change now?

      • Adding specific labels is not the solution to retrieve diagnostic test accuracy studies.

      1. Introduction

      One of the aims of a systematic review is to retrieve all available scientific evidence about a clinical question [
      • Mulrow C.D.
      Rationale for systematic reviews.
      ]. One of the quality criteria of a systematic review is a comprehensive search strategy, using a broad range of search terms in at least two electronic databases [
      • Shea B.J.
      • Hamel C.
      • Wells G.A.
      • Bouter L.M.
      • Kristjansson E.
      • Grimshaw J.
      • et al.
      AMSTAR is a reliable and valid measurement tool to assess the methodological quality of systematic reviews.
      ,
      • Shea B.J.
      • Reeves B.C.
      • Wells G.
      • Thuku M.
      • Hamel C.
      • Moran J.
      • et al.
      Amstar 2: a critical appraisal tool for systematic reviews that include randomised or non-randomised studies of healthcare interventions, or both.
      ]. When the review is about interventions, the preferred study design to be included in the review is the randomized controlled trial. However, for most other review questions, the preferred study design is not that clear or at least not unambiguously described in the literature. This is, for example, the case for diagnostic test accuracy (DTA) studies.
      DTA is the ability of a diagnostic test to distinguish between people with and without a certain condition. The most common performance measures in studies investigating the accuracy of a test are sensitivity and specificity, predictive values, and receiver operating characteristic curves [
      • Knottnerus J.A.
      • Muris J.W.
      Assessment of the accuracy of diagnostic tests: the cross-sectional study.
      ,
      • Leeflang M.M.
      • Deeks J.J.
      • Gatsonis C.
      • Bossuyt P.M.
      Cochrane Diagnostic Test Accuracy Working Group
      Systematic reviews of diagnostic test accuracy.
      ]. However, the designs of these studies may differ widely and there is no commonly used term for them. DTA data usually come from cross-sectional study types, whereas authors of these studies often refer to them as ‘cohort’ studies or ‘case-control’ studies [
      • Leeflang M.M.
      • Deeks J.J.
      • Gatsonis C.
      • Bossuyt P.M.
      Cochrane Diagnostic Test Accuracy Working Group
      Systematic reviews of diagnostic test accuracy.
      ]. In addition, a term such as ‘sensitivity’ may have different meanings. Depending on the type of study, it may refer to the ability to distinguish between people with and without a certain condition (clinical sensitivity) or it may refer to the ability of a test to measure low levels of a certain compound (analytical sensitivity). A Medical Subject Heading (MeSH) such as ‘sensitivity and specificity’ is therefore not very specific or sensitive enough to filter accuracy studies from, for example, Medline [
      • Petersen H.
      • Poon J.
      • Poon S.K.
      • Loy C.
      Increased workload for systematic review literature searches of diagnostic tests compared with treatments: challenges and opportunities.
      ,
      • Wilczynski N.L.
      • Haynes R.B.
      Indexing of diagnosis accuracy studies in MEDLINE and EMBASE.
      ].
      These factors complicate the search process of a DTA review. Authors doing such a review usually retrieve many more studies than authors of an intervention review, and they retrieve more irrelevant studies as well [
      • Beynon R.
      • Leeflang M.M.
      • McDonald S.
      • Eisinga A.
      • Mitchell R.L.
      • Whiting P.
      • et al.
      Search strategies to identify diagnostic accuracy studies in MEDLINE and EMBASE.
      ]. To overcome these complications, active members of the Cochrane community asked Elsevier to introduce a specific label for DTA studies in their bibliographic database, Embase. In 2011, Embase introduced the term ‘diagnostic test accuracy study’ in its subject heading (Emtree) list. Embase's indexers read the full text of each article and assign index terms to the article. These index terms are controlled by the Emtree thesaurus. Check tags include study types and age groups, and their definitions are described by scope notes (see Text Box 1) and are assigned using a check list to ensure the highest possible consistency of indexing [
      Indexing Guide 2018
      A comprehensive guide to Embase’s indexing policy. Embase. Elsevier Life Sciences IP Limited.
      ].
      “Diagnostic test accuracy study” scope note
      The scope note reads as follows: diagnostic test accuracy study—Use for an original study or systematic review which assesses how accurately a test distinguishes humans (or animals) having a condition or disease from those who do not. Typically, the test under evaluation is called the index test and its results are compared to the results of the best available standard test (reference standard), which defines the condition or disease.
      Besides this check tag, there is also an Emtree term for diagnostic accuracy and for sensitivity and specificity. But these are no official check tags.
      The aim of this study is to investigate the performance of the check tag ‘diagnostic test accuracy study’ when used to retrieve diagnostic accuracy studies in Embase.

      2. Methods

      A search filter or search term can be regarded as an analogy of a ‘diagnostic test’ that indicates if a study is a diagnostic accuracy study or not, with the sensitivity of a search term being the percentage of DTA studies that will be retrieved by the search term and specificity of the percentage of nonaccuracy studies being not retrieved by the search term. The positive predictive value (PPV), also referred to as precision in the information retrieval field, is the percentage of diagnostic accuracy studies among the studies labeled with the term ‘diagnostic accuracy study’ [
      • Haynes R.B.
      • Wilczynski N.
      • McKibbon K.A.
      • Walker C.J.
      • Sinclair J.C.
      Developing optimal search strategies for detecting clinically sound studies in MEDLINE.
      ]. These performance measures can only be estimated if one uses a gold standard for study retrieval and if a complete table of true-positive articles, false-positive articles, false-negative articles, and true-negative articles can be built.
      For the compilation of a gold standard set, we first piloted a random selection of 1,000 titles from Embase in 2015, to get an idea of the prevalence of diagnostic accuracy studies and of the prevalence of the check tag ‘diagnostic test accuracy study,’ Hereto, we used the software environment R to randomly select 1,500 Embase accession numbers from records included in Embase between 1st January 2015 (number 2014984782) and 31st December 2015 (number 2015864718). This resulted in 1,245 unique records and after removing letters and editorials, we included the first 1 to 1,000 records for screening. As this data set contained many studies about geology, chemistry, and other irrelevant fields, we decided to focus for the actual study on those Embase—listed articles reporting studies performed in humans only.
      For the final test set, we repeated the pilot process but randomly selected 2,000 Embase records included in 2016. Animal studies were excluded. After deduplication and removal of letters and editorials, 1,471 records remained. From these, we selected the human-only articles, resulting in a final set of 1,223 records.
      Each batch was first manually reviewed on basis of title and abstract by two authors independently (M.M.G.L. + S.M. for the first set, P.G. + S.M. with M.M.G.L. as the arbiter for the second set). Of the included or unclear studies, and of those on which we or the attached label disagreed, the full text was read. A diagnostic test accuracy (DTA) study was defined as a study from which a 2 × 2 contingency table to estimate the clinical sensitivity and specificity could be extracted. If a study only reported the sensitivity in a by-line and only included cases, it was not considered to be a DTA study; the same for a narrative review mentioning sensitivity and specificity in a by-line. Another researcher (R.S.) retrieved the Emtree terms for each title and checked whether “diagnostic test accuracy study” was one of the attached Emtree terms. The full texts of the studies with disagreement between the authors and the Emtree label were also read. Our data are freely available at Zenodo (http://doi.org/10.5281/zenodo.3668455).
      Besides the check tag “diagnostic test accuracy study,” we also evaluated the performance of two other indexing terms—“diagnostic test accuracy” and “sensitivity and specificity”—and a combination of the three terms, using Boolean operator term OR.
      Sensitivity was calculated as the number of correctly labeled DTA studies divided by all DTA studies. Specificity was calculated as the number of non-DTA studies without a label divided by the total number of non-DTA studies. The PPV was calculated as the number of correctly labeled DTA studies divided by the total number of studies carrying that label.

      3. Results

      In our pilot experiment, we disagreed after title/abstract screening on the assessment of 26 articles and six were considered ‘unclear’ by both assessors. Another 17 were considered to be eligible by both assessors. After reading the full texts of these 49 articles, 20 (2.0%) articles were considered to be DTA studies and 980 were not (Table 1, Table 2). Of the 20 DTA studies, only eight were also labeled as such, which resulted in a sensitivity of 40.0% (95% CI: 19.1% to 64.0%). Of the 980 non-DTA studies, five were labeled as DTA study, which resulted in a specificity of 99.5% (95% CI: 98.8% to 99.8%). Of the 13 studies that were labeled as being a DTA study, only eight actually were, resulting in a PPV of 61.5% (95% CI: 36.5% to 81.7%).
      Table 1In the first attempt, 20 (2.0%) articles were considered to be DTA studies, of which eight were also labeled as such
      Labeling of studyDiagnostic test accuracyNo diagnostic test accuracyTotal
      Labeled diagnostic test accuracy8513
      Not labeled diagnostic test accuracy12975987
      Total209801,000
      Abbreviation: DTA, diagnostic test accuracy.
      There were 980 non-DTA studies, whereas five of these were labeled as DTA study.
      Table 2In the second attempt, 33 (2.7%) articles were considered to be DTA studies, of which 14 were also labeled as such
      Labeling of studyDiagnostic test accuracyNo diagnostic test accuracyTotal
      Labeled diagnostic test accuracy14620
      Not labeled diagnostic test accuracy191,1841,203
      Total331,1901,223
      Abbreviation: DTA, diagnostic test accuracy.
      There were 1,190 non-DTA studies, whereas 6 of these were labeled as DTA study.
      In the final experiment, we had total of 1,223 abstracts, of which 33 (2.7%) were DTA studies. The full texts of 79 studies were checked because of disagreements or because they were listed as a DTA study by either one assessor or one Emtree label. Of the 33 DTA studies, only 13 were indicated to be DTA studies by both assessors directly. Only 14 of the 33 DTA studies were labeled as such, resulting in a sensitivity of 42.4% (95% CI: 25.5% to 60.8%). Of the 1,190 non-DTA studies, six were labeled by the Embase indexers as being a DTA study. This resulted in a specificity of 99.5% (95% CI: 98.9% to 99.8%). The PPV was 70.0% (95% CI: 48.9% to 85.1%).
      In our final data set of 1,223 studies, we also assessed the performance of two other labels and combinations of the labels (Table 3). Nine studies were assigned the label “diagnostic test accuracy,” of which eight were a true DTA study. The sensitivity of this label was 24.2% (95% CI: 11.1% to 42.3%), the specificity was 99.9% (95% CI: 99.5% to 100%), and the PPV was 88.9% (95% CI: 50.7% to 98.4%). Thirty studies were assigned the label “sensitivity and specificity,” of which 21 were actual DTA studies. The sensitivity of this label was 63.6% (95% CI: 45.1% to 79.6%) and the specificity was 99.2% (95% CI: 98.6% to 99.7%). The PPV of this label was 70.0% (95% CI: 53.7% to 82.4%). When all labels were combined with OR, the sensitivity increased to 72.7% (95% CI: 54.2% to 86.1%), the specificity decreased slightly to 98.8% (95% CI: 98.0% to 99.3%), and the PPV was 63.1% (95% CI: 46.0% to 77.7%).
      Table 3Results of assessment
      DTA study as confirmed by assessorsDTA study labelDTA labelSensitivity and specificity labelNumber of records
      11117
      11101
      11014
      10110
      11002
      10100
      100110
      10009
      01110
      01101
      01011
      00110
      01004
      00100
      00018
      00001,175
      Abbreviation: DTA, diagnostic test accuracy.

      3.1 Disagreements between assessors and labels

      We further explored the characteristics of the false-negative and false-positive studies in our final data set of 1,223 studies, to get more insight about the reasons behind wrong labeling or not labeling the studies as diagnostic test accuracy (see Supplementary Tables 1 and 2). Out of 19 false-negative studies (that means Emtree did not label these studies although these were DTA studies), two were also rated by the two assessors as not being a DTA study. However, the titles suggested in the opinion of the third person that there might have been DTA data present, which was confirmed by the full text. None of these studies used the term ‘accuracy’ in the title but neither did most of the labeled studies. Four studies were agreement studies, where radiologists were involved and imaging diagnostic techniques were used for diagnosis. However, at least one DTA outcome parameter (such as sensitivity or specificity or a receiver operating characteristic curve) was reported in the abstract or full text. In addition, some studies used the term ‘prediction’ instead of ‘diagnosis,’ although the study did discriminate between persons having the target condition or not and not between people developing the target condition or not.
      There were six false-positive studies in total: non-DTA studies labeled as DTA study. One of those was also indicated by the two assessors as being a DTA study, but reading the full text revealed that the study was not about accuracy at all. Three of these studies did refer in their title to the assessment of one or more diagnostic tests, but the results were not reported in accuracy measures or it was not clear what the measures meant.

      4. Discussion

      The Emtree check tag “diagnostic test accuracy study” was developed in collaboration between Embase (Elsevier) and Cochrane, to facilitate the retrieval of diagnostic accuracy studies. Almost 10 years after the implementation of this term, the term does not meet the expectations of its initial purpose.
      The sensitivity of the DTA study tag was 42.4% (95% CI: 25.5% to 60.8%), and the PPV was 70.0% (95% CI: 48.9% to 85.1%). That means that if one would use the label as the main method to retrieve DTA studies, one would miss more than half of the DTA studies in the total body of records. Therefore, for systematic reviews or for setting up a database of diagnostic test accuracy studies, this label will not be helpful. Even for a quick, practical search, this may not be good enough. Combining three diagnostic accuracy–related labels in Embase would increase the sensitivity to 72.2%, which would still mean than over 25% of the DTA studies would be missed. For the use in a systematic review, this would not be acceptable. However, for a scoping exercise, it might be helpful. In that case, one should expect to find about 67 DTA studies among every 100 studies checked.
      Cochrane's recommendation for systematic reviews of diagnostic accuracy studies is to compile the search strategy of terms for the index test and the target condition [
      • Leeflang M.M.
      • Deeks J.J.
      • Gatsonis C.
      • Bossuyt P.M.
      Cochrane Diagnostic Test Accuracy Working Group
      Systematic reviews of diagnostic test accuracy.
      ,
      • de Vet H.C.W.
      • Eisinga A.
      • Riphagen
      • Aertgeerts B.
      • Pewsner D.
      Chapter 7: searching for studies.
      ]. So-called search filters are not recommended, as they may miss relevant articles [
      • Beynon R.
      • Leeflang M.M.
      • McDonald S.
      • Eisinga A.
      • Mitchell R.L.
      • Whiting P.
      • et al.
      Search strategies to identify diagnostic accuracy studies in MEDLINE and EMBASE.
      ]. Still, methodological terms, such as “diagnostic test accuracy study,” but also “predictive values” or “sensitivity and specificity,” may be added to the search strategy when the initial search renders too many initial titles and abstracts to be screened. Taking the results of this study into account, we would stress authors to be cautious when adding these labels in Embase.
      Most of the false-negative studies were in the development and exploring phase, and the main objective was to develop and explore the best accuracy of the specific tests for specific disease. In addition, imaging studies were often not labeled as being a DTA study. The reason could be that in these studies, observers were involved and results were also presented for the agreement between observers, next to accuracy data. This might have confused the Embase indexers.
      Most of the false-positive studies fulfilled the characteristic of a DTA study: assessment of the performance of an index test and detection or prediction of disease. For these reasons, these studies might have been classified as DTA study. However, the outcomes were not reported in any of the DTA parameters or in a two-by-two table, or the study was clearly a prognostic study. For a systematic review, this may not be problematic, as authors may want to further check the full text of this type of study anyway. Authors may not want to run the risk of missing such a study that does report the necessary data for a DTA review.
      Our study does have some limitations. Although we have used two gold standard annotated sets of over 2,000 studies combined, these sets may not reflect a real review situation where a search seldom is limited to DTA studies in general. In a real review, any label or filter will always be combined with disease-specific terms. It is difficult to predict whether the evaluated labels will perform better or worse in such a situation, but the percentage of missed DTA studies is such that a considerable improvement in sensitivity would be needed to even consider using them. Another limitation may be the fact that we did not systematically check all studies that have been excluded by two assessors in the first place. However, this would only have led to extra DTA studies without a label (as we checked all labeled studies) and so would have led to a decrease in sensitivity and not to a change in our overall conclusion.
      We focused on Embase in this study and not on Medline. This may be seen as a limitation but is in line with the aim to investigate the introduction of a specific indexing term for DTA studies in Embase. Medline does not contain a specific indexing term for DTA studies, and therefore, a direct comparison was not possible. Medline has two MeSH terms that may relate to diagnostic test accuracy: “sensitivity and specificity” and “diagnostic errors.” When we checked the performance of these two terms in our DTA studies and a random subset of non-DTA studies, the MeSH term “diagnostic errors” was not present at all and “sensitivity and specificity” only in eight of the 33 DTA studies in our final set.
      A solution for cumbersome search strategies in systematic reviews, or to collate datasets of DTA studies, may be found in artificial intelligence. Although several artificial intelligence methods seem to be capable of retrieving at least 95% of the relevant DTA studies in a systematic review and perform very well updating systematic databases, these methods are not yet fully available for the average review author [
      • Norman C.R.
      • Leeflang M.M.G.
      • Porcher R.
      • Névéol A.
      Measuring the impact of screening automation on meta-analyses of diagnostic test accuracy.
      ,
      • Norman C.R.
      • Gargon E.
      • Leeflang M.M.G.
      • Névéol A.
      • Williamson P.R.
      Evaluation of an automatic article selection method for timelier updates of the comet core outcome set database.
      ]. Furthermore, their performance varies, especially for smaller DTA reviews, and the relevance of missing 5% is not clear yet.
      As long as automated search methods of retrieving DTA studies from any bibliographic database in a reliable way are not easily available, review authors and anyone else searching for DTA studies will keep looking for ways to decrease the work load and number needed to read. Using the Emtree term “diagnostic test accuracy study” was a very promising method when the term was implemented, but it turns out that it cannot live up to the expectations. Training of Embase indexers may solve part of the problem, but poor reporting of DTA studies is perhaps more important. As the title and abstract are often used to retrieve studies from bibliographic databases and to filter studies in the first place, authors should pay attention to accurately reporting the design characteristics in the title and abstract. Reporting guidelines may support authors in doing so [
      • Bossuyt P.M.
      • Reitsma J.B.
      • Bruns D.E.
      • Gatsonis C.A.
      • Glasziou P.P.
      • Irwig L.
      • et al.
      STARD Group
      STARD 2015: an updated list of essential items for reporting diagnostic accuracy studies. Version 2.
      ]. The paradox, however, is that as soon as authors report their titles, abstracts, and full texts in such a way that they can be accurately labeled by indexers, both individual review authors and artificial intelligence methods will also be better capable of retrieving these studies.
      In conclusion, the Emtree term “diagnostic test accuracy study” should not be used in isolation by authors of diagnostic test accuracy systematic review, as it misses more than half of the DTA studies in Embase.

      CRediT authorship contribution statement

      Pema Gurung: Investigation, Resources, Validation, Writing - review & editing. Sahile Makineli: Methodology, Investigation, Validation, Writing - review & editing. René Spijker: Methodology, Writing - review & editing, Software, Resources. Mariska M.G. Leeflang: Conceptualization, Methodology, Formal analysis, Writing - original draft, Supervision, Writing - review & editing, Data curation.

      Supplementary data

      References

        • Mulrow C.D.
        Rationale for systematic reviews.
        BMJ. 1994; 309: 597-599
        • Shea B.J.
        • Hamel C.
        • Wells G.A.
        • Bouter L.M.
        • Kristjansson E.
        • Grimshaw J.
        • et al.
        AMSTAR is a reliable and valid measurement tool to assess the methodological quality of systematic reviews.
        J Clin Epidemiol. 2009; 62: 1013-1020
        • Shea B.J.
        • Reeves B.C.
        • Wells G.
        • Thuku M.
        • Hamel C.
        • Moran J.
        • et al.
        Amstar 2: a critical appraisal tool for systematic reviews that include randomised or non-randomised studies of healthcare interventions, or both.
        BMJ. 2017; 358: j4008
        • Knottnerus J.A.
        • Muris J.W.
        Assessment of the accuracy of diagnostic tests: the cross-sectional study.
        J Clin Epidemiol. 2003; 56: 1118-1128
        • Leeflang M.M.
        • Deeks J.J.
        • Gatsonis C.
        • Bossuyt P.M.
        • Cochrane Diagnostic Test Accuracy Working Group
        Systematic reviews of diagnostic test accuracy.
        Ann Intern Med. 2008; 149: 889-897
        • Petersen H.
        • Poon J.
        • Poon S.K.
        • Loy C.
        Increased workload for systematic review literature searches of diagnostic tests compared with treatments: challenges and opportunities.
        JMIR Med Inform. 2014; 2: e11
        • Wilczynski N.L.
        • Haynes R.B.
        Indexing of diagnosis accuracy studies in MEDLINE and EMBASE.
        AMIA Annu Symp Proc. 2007; 2007: 801-805
        • Beynon R.
        • Leeflang M.M.
        • McDonald S.
        • Eisinga A.
        • Mitchell R.L.
        • Whiting P.
        • et al.
        Search strategies to identify diagnostic accuracy studies in MEDLINE and EMBASE.
        Cochrane Database Syst Rev. 2013; : MR000022
        • Indexing Guide 2018
        A comprehensive guide to Embase’s indexing policy. Embase. Elsevier Life Sciences IP Limited.
        (Available at) (Accessed 30 November 2019)
        • Haynes R.B.
        • Wilczynski N.
        • McKibbon K.A.
        • Walker C.J.
        • Sinclair J.C.
        Developing optimal search strategies for detecting clinically sound studies in MEDLINE.
        J Am Med Inform Assoc. 1994; 1: 447-458
        • de Vet H.C.W.
        • Eisinga A.
        • Riphagen
        • Aertgeerts B.
        • Pewsner D.
        Chapter 7: searching for studies.
        in: Cochrane Handbook for Systematic Reviews of Diagnostic Test Accuracy Version 0.4 [updated September 2008]. The Cochrane Collaboration, London, UK2008
        • Norman C.R.
        • Leeflang M.M.G.
        • Porcher R.
        • Névéol A.
        Measuring the impact of screening automation on meta-analyses of diagnostic test accuracy.
        Syst Rev. 2019; 8: 243
        • Norman C.R.
        • Gargon E.
        • Leeflang M.M.G.
        • Névéol A.
        • Williamson P.R.
        Evaluation of an automatic article selection method for timelier updates of the comet core outcome set database.
        Database (Oxford). 2019; 2019: baz109
        • Bossuyt P.M.
        • Reitsma J.B.
        • Bruns D.E.
        • Gatsonis C.A.
        • Glasziou P.P.
        • Irwig L.
        • et al.
        • STARD Group
        STARD 2015: an updated list of essential items for reporting diagnostic accuracy studies. Version 2.
        BMJ. 2015; 351: h5527