A potential for seamless designs in diagnostic research could be identified

Open AccessPublished:September 26, 2020DOI:https://doi.org/10.1016/j.jclinepi.2020.09.019

      Highlights

      • Seamless designs are not only fruitful in intervention research, but also in diagnostic research.
      • Studying change in management can easily be combined with studying outcomes in discordant pairs.
      • All steps from construction until determining comparative accuracy can be combined within one seamless design.
      • Detailed guidelines for seamless designs in diagnostic research should be developed.

      Abstract

      Background and Objective

      New diagnostic tests to identify a well-established disease state must undergo a series of scientific studies from test construction to finally demonstrating a societal impact. Traditionally, these studies are performed with substantial time gaps in between, resulting in a long time period from the initial idea to roll out in clinical practice including reimbursement. Seamless designs allowing us to combine a sequence of studies in one protocol may hence accelerate this process. Currently, a systematic investigation of the potential of seamless designs in diagnostic research is lacking.

      Methods

      We identify major study types in diagnostic research and their basic characteristics with respect to the application of seamless designs. This information is used to identify major hurdles and opportunities for seamless designs.

      Results

      The following major study types were identified: Variable construction studies, cut point finding studies, variable value studies, single-arm accuracy studies, comparative accuracy studies, change-in-management studies, observational discordant pair studies, randomized discordant pair studies, and randomized diagnostic studies. The following characteristics were identified: Type of recruitment (case-control vs. population-based), application of a reference standard, inclusion of a comparator, paired or unpaired application of a comparator, assessment of patient-relevant outcomes, and possibility for blinding of test results.
      Two basic hurdles could be identified: 1) Accuracy studies are hard to combine with postaccuracy studies in a seamless design for the following reasons. First, because the former are required to justify the latter and application of a reference test in outcome studies may be a threat to the integrity of the study. 2) Randomized diagnostic studies are probably best placed as singular studies at the end of the process, as all other questions should be clarified before performing such a study.
      However, otherwise there is a substantial potential for seamless designs. All steps from the construction to the comparison with the comparator can be combined in one protocol. This may include a switch from case-control to population-based recruitment as well as a switch from a single-arm study to a comparative accuracy study. In addition, change-in-management studies can be combined with an outcome study in discordant pairs.

      Conclusion

      There is a potential for seamless designs in diagnostic research. It is wise to have the whole sequence of necessary studies in mind and to plan a full programme than rather individual studies one by one.

      Keywords

      What is new?

         Key findings

      • There is a potential for seamless designs in diagnostic research.
      • This holds in particular for the phase from construction to assessing comparative accuracy which can be addressed in one study with a seamless design.
      • Studying change in management can easily be combined with studying outcomes in discordant pairs.
      • Some statistical issues arise in seamless designs in diagnostic research, but they seem to be manageable.

         What this adds to what was known?

      • Seamless designs are already used in intervention research, but they also offer some potential in diagnostic research.

         What is the implication and what should change now?

      • In planning studies to establish a new diagnostic test, seamless designs should be taken into consideration.
      • Detailed guidelines for addressing the statistical issues arising in seamless designs in diagnostic research should be developed.

      1. Introduction

      Acceleration of the research process is desirable for many reasons, and different attempts have been made to approach this [
      • Woodcock J.
      • Woosley R.
      The critical path initiative and its influence on new drug development.
      ,
      • Reed J.C.
      • White E.L.
      • Aubé J.
      • Lindsley C.
      • Li M.
      • Sklar L.
      • et al.
      The NIH's role in accelerating translational sciences.
      ,
      • Abugessaisa I.
      • Saevarsdottir S.
      • Tsipras G.
      • Lindblad S.
      • Sandin C.
      • Nikamo P.
      • et al.
      Accelerating translational research by clinically driven development of an informatics platform – a case study.
      ]. Seamless designs are one approach, aiming to combine different study types in a single study, which traditionally are performed sequentially with often substantial time gaps in between [
      • Bauer P.
      • Bretz F.
      • Dragalin V.
      • König F.
      • Wassmer G.
      Twenty-five years of confirmatory adaptive designs: opportunities and pitfalls.
      ,
      • Bretz F.
      • Schmidli H.
      • König F.
      • Racine A.
      • Maurer W.
      Confirmatory seamless phase II/III clinical trials with hypotheses selection at interim: general concepts.
      ,
      • Schmidli H.
      • Bretz F.
      • Racine A.
      • Maurer W.
      Confirmatory seamless phase II/III clinical trials with hypotheses selection at interim: applications and practical considerations.
      ]. Seamless designs have not been widely discussed in diagnostic research. In this article, we try to identify the potential for seamless designs in diagnostic research.
      The focus of this article is the research on new diagnostic tests, trying to improve the distinction between two well-established (disease) states in a well-defined clinical situation relative to the existing current possibilities. This situation is different from the development of a new (bio)marker defining simultaneously a previously unknown (disease) state. Consequently, we do not consider enrichment and interaction designs, which play a prominent role in biomarker research.
      In the following we will distinguish three types of seamless designs:
      • Type A: The second study is starting immediately after the first study, but no data are shared between the two studies in the analysis. Only the logistics to perform the two studies are shared.
      • Type B: The second study is starting immediately after the first study, and in the analysis of the second study, some data collected already in the first study are (re-)used.
      • Type C: The two studies are running in parallel (or with some overlap), and the data collected are used in both study-specific analyses.
      We start with an overview of the most frequently used study types in diagnostic research focusing on the basic setup, the data to be collected, and the role in the research process. We then discuss systematically the conceptual possibility to combine two or more study types in a seamless design. This will cover possibilities already used today as well as possibilities mainly neglected. We illustrate some of the possibilities using worked hypothetical examples and present two real examples from the literature. In the supplement, we also address some statistical issues.

      2. Primary considerations

      2.1 Study types in diagnostic research

      Table 1 summarizes the major existing study types in diagnostic research. Accuracy studies are typically found in the middle of the research process. It depends on the nature of the test whether they are preceded by test construction studies. If the test is based on a continuous marker, we may wish to evaluate the value of the marker and there is typically a need to find a cut point. If it is based on several variables, there is typically a need to find the optimal combination. There may be also several potential variants or other modifications requiring some type of test construction studies.
      Table 1Major existing study types in diagnostic research
      Study typeAimBasic setupRecRefCO
      I. Construction studies
       Variable construction studyTo find the optimal way to combine various sources of information into one diagnostic variable
      • - All necessary information is collected for each participant, and the reference test is applied
      • - The best combination into one variable is determined by maximizing a diagnostic value measure (e.g., AUC)
      CC/P+--
       Cut point finding studyTo determine an adequate cut point for a diagnostic variable
      • - The diagnostic variable and the reference test are applied in all participants
      • - The optimal cut point is chosen in accordance with a prespecified criterion
      CC/P+--
      II. Accuracy studies
       Variable value studyTo determine the diagnostic value of a variable
      • - The diagnostic variable and the reference test are applied in all participants
      • - Diagnostic value measures are computed
      CC/P+--
       Single-arm accuracy studyTo determine the accuracy of a single test
      • - The index test and reference test are applied in all participants
      • - Accuracy measures are computed for the index test
      CC/P+--
       Comparative accuracy studiesTo compare the accuracy between two tests
      • - The index test and/or comparator are applied in all participants
      • - Accuracy measures are computed for the index test and the comparator
      CC/P+u/p-
      III. Change-in-management studies
       Change-in-management studiesTo investigate the change in management when replacing the comparator test with the index test
      • - In all participants, first the comparator is applied
      • - A management decision is made based on the result
      • - Then the index test is applied
      • - A management decision is made based on the (new) result
      • - The frequencies of changes/the different types of changes are computed
      P-p-
      IV. Outcome studies
       Observational discordant pair studiesTo investigate the outcome in patients with discordant results when following the result of the index test
      • - In all participants, the comparator and the index test are applied
      • - In one or both types of discordant results, the management is arranged in accordance with the index test
      • - The outcome in this group/these groups is compared with an expectation
      P-p+
       Randomized discordant pair studiesTo investigate whether it is an advantage to follow the index test in discordant pairs with respect to the outcome
      • - In all participants, the comparator and the index test are applied
      • - In discordant pairs, the patients are randomized to disclose/follow the results of either the index or the comparator test
      • - The outcome is compared between the two arms
      P-p+
       Randomized diagnostic studiesTo investigate whether the index test is superior to the comparator with respect to the outcome
      • - Patients are randomized with respect to applying the index test or the comparator
      • - The outcome is compared between the two arms
      P-u+
      Rec, recruitment; CC, case-control recruitment; P, population-based recruitment; Ref, application of reference standard; C, application of the comparator; p, paired application; u, unpaired application; O, assessment of (patient-relevant) outcomes; +, yes; -, no.
      Accuracy studies can be performed as single-arm studies or comparative studies, using a paired or (randomized) unpaired design. Comparative studies using the current diagnostic standard as the comparator are of particular interest.
      The study types we apply after establishing a sufficient accuracy in comparison with the current standard also vary highly from test to test. If patients’ benefit from changing false test decisions to true test decisions (e.g., based on a linked evidence approach [
      • Merlin T.
      • Lehman S.
      • Hiller J.E.
      • Ryan P.
      The "linked evidence approach" to assess medical tests: a critical analysis.
      ]) is unequivocal, there is not much need for further research. However, there may be doubts about this for different reasons, in particular, if the clinical pathways implied by test results are poorly defined or are not adopted always. If it may be unclear whether or how clinicians will make use of a differing test result, then change-in-management studies can clarify this question (or such studies are required by regulators, e.g., in the United States, [
      • Tunis S.
      • Whicher D.
      The National Oncologic PET Registry: lessons learned for coverage with evidence development.
      ]). It may be also unclear whether patients benefit from a change in treatment as suggested by a move from a false negative test result to a true positive test result. For example, the new test may identify also patients with few symptoms or at an early disease state who may not benefit from the management established for patients with distinct disease. In case the latter receive a palliative treatment, this could mean withholding effective curative treatment from the patient. Then studies in patients with discordant test results can clarify whether the expectations are met. The consequences of replacing an established test by a new one can be complex and unpredictable, for example, if we replace a test based on clinical symptoms by an imaging-based test. Then it might be necessary to check the true impact in a randomized design comparing the two tests and considering patient relevant outcomes.
      We consider in Table 1 some design features of the different study types, which already give first indications about the potential for seamless designs, as seamless designs are more likely to be feasible if the study types do not differ in these features. The first feature is the recruitment of patients, which is preferably population-based with respect to the clinical population of interest [
      • Knottnerus J.A.
      • Muris J.W.
      Assessment of the accuracy of diagnostic tests: the cross-sectional study.
      ]. However, whenever we would like to assess diagnostic accuracy in a population with low prevalence of one disease state, we need large patient numbers to get a sufficient number of cases or controls, respectively. Hence, in particular, in the early phase, we may prefer a case-control approach implying to recruit patients in accordance with the two disease states from different sources, even if there is a risk for overoptimistic results due to considering extreme cases and/or controls [
      • Rutjes A.S.W.
      • Reitsma J.B.
      • Vandenbroucke J.P.
      • Glas A.S.
      • Bossuyt P.M.M.
      Case–control and two-Gate designs in diagnostic accuracy studies.
      ]. The second is the application of a reference test. The third is the application of a comparator, which can happen in a paired or an unpaired fashion. Finally, we consider whether patient relevant outcomes are assessed.
      Compared with the hierarchical classification presented by Fryback and Thornbury [
      • Fryback D.G.
      • Thornbury J.R.
      The efficacy of diagnostic imaging.
      ], we omit the levels “technical efficacy” and “societal efficacy” (Fig. 1). Studies on technical efficacy typically consider a premature version of the index test, and studies on societal efficacy are typically performed after having established a diagnostic test. In addition, the other levels do not coincide with our classification, as Fryback and Thornbury focus on different types of efficacy, whereas we focus on design aspects. We also differ from the classification into four phases suggested by Koebberling et al. [
      • Koebberling J.
      • Trampisch H.J.
      • Windeler J.
      Memorandun for the evaluation of diagnostic measures.
      ]. Phase 1 refers again to technical preinvestigations, whereas phase 2 and 3 cover accuracy studies distinguishing between case-control and population-based sampling. Phase 4 covers all studies to be performed after establishing the accuracy of a test. Figure 1 contrasts the classifications by Fryback and Thornbury, Koebberling et al. and the four major study types considered in this article.
      Figure thumbnail gr1
      Fig. 1The hierarchical model of Fryback & Thornbury [
      • Fryback D.G.
      • Thornbury J.R.
      The efficacy of diagnostic imaging.
      ], the study types considered in this article, and the phases of evaluating diagnostic measures considered by Koebberling et al. [
      • Koebberling J.
      • Trampisch H.J.
      • Windeler J.
      Memorandun for the evaluation of diagnostic measures.
      ].

      2.2 Some further reflections on accuracy studies

      Table 1 reveals that measuring outcomes is not the only difference between accuracy studies and outcomes studies. An essential property of outcome studies is that treatment is arranged in accordance with the results of a single test. Therefore, the general question of how to combine accuracy and outcome studies requires a short reflection on the choice of treatment in accuracy studies. In general, there is little need to consider the choice of treatment in accuracy studies, as diagnosis precedes treatment. (The question whether treatment may affect the validity of the reference test is a different one. This may have an impact on the interpretation of study results, but it should not impact treatment itself.) In general, patients included in an accuracy study should obtain treatment in accordance with the current standard, including the current standard diagnostic setup. Typically, the current diagnostic standard is neither the index test nor the reference test nor both together. Consequently, treatment is typically not arranged in accordance with the test results in an accuracy study. Exceptions from this rule may open the possibility for seamless designs, which we will illustrate later by an artificial example.
      If we wish to enforce treatment to follow the results of a single test, one possibility is to blind clinicians and patients for the results of other tests, in particular for the results of the index test and/or the reference test. There is a wide range of blinding practices in accuracy studies. In some accuracy studies, the index test is blinded to avoid an undue impact of a test with unknown accuracy on patient management (see e.g., [
      • Zapf A.
      • Gwinner W.
      • Karch A.
      • Metzger J.
      • Haller H.
      • Koch A.
      Non-invasive diagnosis of acute rejection in renal transplant patients using mass spectrometry of urine samples - a multicentre phase 3 diagnostic accuracy study.
      ]). In other studies, the potential additional information from the index test is used as an ethical justification to expose patients to this test. A typical example is given by whole body positron emission tomography/computed tomography (PET/CT) scans exposing patient to radiation but allowing to obtain additional information beyond the specific use in the index test. The situation is different with respect to the reference test. As this test should reflect the best diagnosis possible, it is hard to justify blinding. However, the timing of the test may provide another type of argument in favor of a limited influence on the management—with postmortem tests as the extreme case.

      3. Possibilities for seamless designs in diagnostic research

      Table 1 already points to some possibilities for seamless designs: In some instances, studies, which may traditionally be conducted sequentially with some gap, coincide in several design aspects and hence are candidates for seamless designs. We now consider these possibilities more systematically by pointing to conceptual or statistical challenges.

      3.1 Combining construction studies with accuracy studies

      This is a straightforward idea, as both tasks require the same data and hence suggest applying a type C seamless design. In addition, this is actually carried out in many construction studies, that is, the value of the diagnostic variable constructed or the accuracy of the test constructed is reported, too. The only—but fundamental—challenge is that we must take into account that the data are used twice both for construction and for the determination of the value/accuracy. It is well known that this is a source for overoptimistic results [
      • Rutjes A.W.
      • Reitsma J.B.
      • Di Nisio M.
      • Smidt N.
      • van Rijn J.C.
      • Bossuyt P.M.
      Evidence of bias and variation in diagnostic accuracy studies.
      ]. To overcome this, the standard approach in variable construction studies is the use of cross-validation or other resampling techniques to obtain a more realistic estimate [
      • Steyerberg E.W.
      • Harrell Jr., F.E.
      • Borsboom G.J.
      Internal validation of predictive models: efficiency of some procedures for logistic regression analysis.
      ,
      • Harrell Jr., F.E.
      • Lee K.L.
      • Mark D.B.
      Multivariable prognostic models: issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors.
      ]. This is further discussed in Sections 1.1 and 1.2 in the Supplement.

      3.2 Combining variable construction with the choice of a cut point

      This is again a straightforward idea and performed in many studies. Once we have constructed a diagnostic variable, we can apply the usual procedures for cut point construction to this variable, that is, we have a type C design. However, the fact that the variable has been constructed by maximizing the diagnostic value will most likely lead to a positively biased diagnostic test. To the best of our knowledge, this impact has not been investigated in the literature. We can again evaluate the combined construction with an assessment of the accuracy. This is also covered by our considerations in Section 1.1 in the Supplement.

      3.3 Combining single-arm accuracy studies with comparative accuracy studies

      The limited value of single-arm accuracy studies and the high value of comparative accuracy studies have been emphasized in the literature, as only the latter build a basis for sound comparisons [
      • Leeflang M.M.
      • Moons K.G.
      • Reitsma J.B.
      • Zwinderman A.H.
      Bias in sensitivity and specificity caused by data-driven selection of optimal cutoff values: mechanisms, magnitude, and solutions.
      ,
      • Takwoingi Y.
      • Leeflang M.M.
      • Deeks J.J.
      Empirical evidence of the importance of comparative studies of diagnostic test accuracy.
      ]. Hence, we should in general aim at comparative accuracy studies and avoid single-arm studies. However, if the comparator is an invasive or expensive test or if it is not well defined, we may hesitate to apply it until we have some evidence for a promising accuracy of the index test.
      Consequently, in a seamless design, we start with a single-arm study and continue until we have such evidence. Then we start the comparative accuracy study. Section 2.1 in the supplement presents a worked example. It would be desirable to use a type B design, that is, to reuse the information on the accuracy of the index test from the single arm, as this can substantially reduce the necessary sample size compared with using a type A design. However, in a naïve analysis, the information from the single-arm study will be biased because of application of some stopping rule. This and further statistical issues are discussed in Section 1.3 in the Supplement.

      3.4 Combining case-control recruitment with population-based recruitment

      Case-control recruitment is often chosen to obtain a first proof of principle for a new diagnostic test. In the case of success, the true value should then be established using a study with population-based recruitment. Except for the type of recruitment, everything else is the same; hence, the question about a type B or C seamless design arises in a natural way. Indeed, this may be a realistic idea, as the main motivation for case-control recruitment is often the low prevalence in the population of interest. This suggests that we can use a population-based recruitment from the beginning and combine it with an additional recruitment of cases from other sources. As soon as we have enough cases, we can analyze the data available in the spirit of a case-control accuracy study. In the case of promising results, we can continue with population-based recruitment only. The final analysis is then based on all patients recruited via population-based sampling but excluding the additional cases. Therefore, we use some of the patients already included in the analysis of the first stage. The impact of this reuse on the statistical analysis is discussed in Section 1.4 in the Supplement. This approach is promising if the case-control study can be analyzed quickly (i.e., we do not have to wait a long time for the reference standard to be performed and evaluated). Then in the case of a failure of the proof of principle, the population-based recruitment can be stopped rather early compared with performing a full fixed-sample study.

      3.5 Combining accuracy studies with postaccuracy studies

      We are confronted here with one fundamental conceptual issue: postaccuracy studies focus on the use of the test in clinical practice under the condition that its diagnostic value is already accepted in the clinical community. Usually, we can only expect such a broad acceptance some time after an accuracy study, when the results are published and preferably confirmed by several accuracy studies. Hence, it will be often unlikely that we can move directly from an accuracy study to a postaccuracy study. With respect to a type A seamless design, an exception might be possible, if all stakeholders agree on the questions to be addressed in subsequent studies already before starting the accuracy study and the improvement in accuracy to be demonstrated is the only obstacle to start the subsequent studies.

      3.6 Combining postaccuracy studies with accuracy studies

      A different perspective would be to consider a planned postaccuracy study and to ask whether we can use it to redo also simultaneously an accuracy study. This can make sense, as postaccuracy studies have usually a larger sample size than accuracy studies, and accuracy studies are usually just powered to demonstrate a sufficient accuracy in the whole population. Therefore, we may get a chance to get a more detailed picture about the accuracy by studying patient subpopulations or the dependence of accuracy on external factors like the experience of raters or subtle differences in applications of the test.
      However, it is a common feature of postaccuracy studies that they do not require the application of a reference test, whereas accuracy studies do so. Therefore, one basic question is whether the design of a postaccuracy study allows us to apply also the reference standard to perform a type C seamless design. In change-in-management studies, this is no problem, as the reference test can be applied and communicated after the management decisions are made and the management decisions may be altered again in light of the result of the reference test without affecting the study results. Section 2.2 in the supplement presents a worked example illustrating this possibility for a seamless design. However, the situation is more challenging in outcome studies. Here, it is necessary to ensure that the results of the reference test do not have any influence on the patient management and consequently on the outcome. This will typically require to blind patients and the treating clinicians for the results of the reference test.

      3.7 Measuring outcomes in accuracy studies

      It is a rather straightforward idea to measure outcomes also in accuracy studies, as this additionally requires following up the patients for some time. Then these data may partly be used in a postaccuracy study or, at least, inform the planning of such a study. However, as we can see in Table 1, outcome studies intend to measure the outcome after arranging the management in dependence on a single test, either the index test or the comparator, and this holds both when considering all patients or only discordant pairs. This is hard to achieve in accuracy studies, as already discussed in Section 2, because management should follow the current diagnostic standard, and this will never be the index test. In comparative studies, this may (or should) be the comparator. To ensure that patient management is only informed by the comparator, we have still to blind the results of the reference test and, except when using an unpaired design, the results of the index test.

      3.8 Combining change-in-management studies with studies in discordant pairs

      It is rather natural to think about such combinations, as change-in-management studies allow to identify discordant pairs. Moreover, as they are also focusing on these pairs, they are typically powered to identify a substantial number of such pairs. Moreover, change-in-management studies do not require that the management is performed in accordance with the clinician's choice. Hence, there is also some freedom to arrange management in accordance with the study guidelines, which may include the application of a reference test. Consequently, both observational and randomized studies in discordant pairs can be easily added to a change-in-management study. Section 2.3 in the Supplement presents a worked example illustrating the combination with a randomized study in discordant pairs. The potential for combination with an observational study is illustrated by a real example in Section 4.2.

      3.9 Combining randomized diagnostic studies with other types of postaccuracy studies

      A look at Table 1 reveals that randomized diagnostic studies have one unique feature among all postaccuracy studies: they use an unpaired design with respect to apply the index test or the comparator. Adding the application of the alternative test or a reference test would be a threat to the integrity of the study if the test results are not blinded.
      In addition, the questions we try to answer with the other postaccuracy studies should be clarified before we start a randomized diagnostic study. Randomized diagnostic studies should give the final answer to the question of the relevance of the test regarding patient outcome.
      These arguments suggest that randomized diagnostic studies are preferably conducted as separate studies at the end of the development process and are not embedded in a seamless design.

      4. Examples of seamless designs in diagnostic research

      4.1 Developing a diagnostic screening tool for lumbar spinal stenosis (LSS)

      In the study of Jensen et al. [
      • Jensen R.K.
      • Lauridsen H.H.
      Development of a diagnostic screening questionnaire for lumbar spinal stenosis (LSS-Screen).
      ], 13 items to be included in a questionnaire-based screening tool were identified from a literature search. In this study, patients are recruited at the Spine Center of Southern Denmark who approach the center with low back and/or leg pain. Because the expected prevalence of LSS is below 10% and it is unclear whether the 13 items provide a sufficient basis to construct a screening tool, it was decided to sample additional cases from surgical departments and medical departments at neighboring hospitals. As soon as 100 cases and 100 controls are recruited, an interim analysis will be performed. This is feasible, as the reference standard is the clinical diagnosis, which will be available at the end of each patient's visit at the Spine Center. In the interim analysis, a diagnostic score will be constructed based on the 13 items, and the specificity of a cut point corresponding to a sensitivity of 95% will be estimated using cross-validation. If the estimated specificity is above 68%, the study will continue as a population-based study at the Spine Center, including also patients recruited before the interim analysis. The threshold of 68% was chosen to guarantee a positive predictive value of 25% in the case of a prevalence of 10% [
      • Jensen R.K.
      • Lauridsen H.H.
      Development of a diagnostic screening questionnaire for lumbar spinal stenosis (LSS-Screen).
      ].

      4.2 PET/CT in patients with melanoma: combining change in management with observing outcomes in discordant pairs

      The University Hospital Tübingen runs a local oncologic PET/CT registry [
      • Pfannenberg C.
      • Gueckel B.
      • Wang L.
      • Gatidis S.
      • Olthof S.C.
      • Vach W.
      • et al.
      Practice-based evidence for the clinical benefit of PET/CT-results of the first oncologic PET/CT registry in Germany.
      ]. This registry covers all PET/CTs performed in oncologic patients and the clinicians have to document the intended management before as well as after performing PET/CT. Forschner et al. [
      • Forschner A.
      • Olthof S.C.
      • Gückel B.
      • Martus P.
      • Vach W.
      • la Fougère C.
      • et al.
      Impact of 18F-FDG-PET/CT on surgical management in patients with advanced melanoma: an outcome based analysis.
      ] used this source to perform a change-in-management study in 333 patients with advanced melanoma. The study included also an investigation of the survival outcome stratified by the change in management, that is, in several subgroups of patients characterized by discordant results between the conventional diagnostic setup and the setup including PET/CT. Of particular interest is the group of patients for whom PET/CT suggested a move from surgical treatment to a watchful waiting strategy, as PET/CT could not identify any tumor. The study could confirm the expectation of an excellent survival in this group of 20 patients with a 2-year survival rate of 100%.

      4.3 Establishing superior accuracy and outcome of a new test relative to the reference standard (coinciding with current diagnostic standard)

      This is a hypothetical example to illustrate that in specific situations, a combination of an accuracy study and an outcome study in a seamless design of type B may be feasible.
      The initial accuracy study is carried out in a one-arm design (the index test compared with the reference test). If the reference test can be applied in a timely manner, the treatment decision can be based on the reference test. Therefore, in the accuracy study also outcome data for the reference test can be obtained. An outcome study now follows the accuracy study. In this study, the treatment decision is based on the result of the index test or of the reference test (randomized allocation). Using the outcome data of the accuracy study, the sample size in the reference test arm can be reduced, leading to unbalanced sample sizes. In the extreme case, the reference test arm can even be dropped. This would result in a design where the two cohorts (index vs. reference) would no longer be tested in parallel, but in series.
      A possible application example for the design described here would have been the studies for the diagnostic tool "fiberoptic endoscopic evaluation of swallowing" for the detection of dysphagia [
      • Langmore S.E.
      • Schatz K.
      • Olsen N.
      Endoscopic and videofluoroscopic evaluations of swallowing and aspiration.
      ,
      • Aviv J.E.
      Prospective, randomized outcome study of endoscopy versus modified barium swallow in patients with dysphagia.
      ]. At a distance of about 10 years, the two studies first test the superior accuracy, then the patient-relevant outcome.
      If a seamless design had been used here, time and financial resources could have been saved by combining the two studies.

      5. Discussion

      We could identify two basic hurdles with respect to the application of seamless designs:
      • 1)
        Accuracy studies are hard to combine with postaccuracy studies, as
        • the results of accuracy studies are typically required to justify postaccuracy studies
        • the application of a reference test in outcome studies may be a threat to the integrity of the study.
      • 2)
        Randomized diagnostic studies are probably best placed as singular studies at the end of the process, as all other questions should be clarified before performing such a study.
      However, beyond these hurdles, the perspective of seamless designs suggests that we can reduce the number of studies to be performed in developing a new diagnostic test to maximally three and under favorable circumstances to one:
      • We can start with one study combining all steps from construction until the comparison of accuracy with the comparator. This may include a switch from case-control to population-based recruitment as well as a switch from a single-arm study to a comparative accuracy study. Such switches pose some challenges for the statistical analyses, but they seem to be manageable. However, detailed recommendations for the statistical analysis have still to be developed.
      • If there is a need for further research after successfully establishing an improved accuracy compared with the current diagnostic standard, a combination of a change-in-management study with an outcome study in discordant pairs can provide further insights.
      • If the impact on patient outcomes of a switch to the index test is still unclear, a randomized diagnostic study may be necessary.
      Of course, it will be desirable to replicate some findings in independent studies; hence, we may be forced to double the number of necessary studies. In any case, the considerations suggest that seamless designs should not be confined to construction studies (Sections 3.1, 3.2), but seen as a fruitful opportunity in all phases of diagnostic research.
      It should be emphasized that the focus of this article concerns basic aspects of different study types with respect to being combined in a seamless design. “Typical” features of the different study types can be debated which, in turn, implies that there is always room for exceptions from the “rules” we have tried to establish. Whenever we start to think about a seamless design in a specific context, we should carefully check whether the arguments we have presented in favor and against a seamless design also apply in this context.
      Moreover, the focus of the article was the conceptual feasibility of seamless designs, motivated by the wish to accelerate diagnostic research. However, many additional considerations are necessary when considering the option of a seamless design. On the one hand, there may be more advantages than just avoiding gaps between studies. Type B and type C seamless design imply a reduction in the overall sample size compared with conducting two single studies. In any case, using the same infrastructure for patient recruitment and study conduct may reduce the fixed costs of each study. On the other hand, seamless designs require a joint study protocol reflecting the added complexity, and a joint study proposal may hit the budget limit set by funding agencies for single applications. Finally, it can be seen as a disadvantage to use the same patient population twice during the research process, as this diminishes the generalizability of results.

      6. Conclusions

      There is a potential for seamless designs in diagnostic research, mainly in the phase from construction until establishing comparative accuracy and in connection with change-in-management studies.

      CRediT authorship ccontribution statement

      Werner Vach: Conceptualization, Methodology, Writing - original draft, Writing - review & editing, Visualization. Eric Bibiza: Conceptualization, Methodology, Formal analysis, Writing - original draft, Writing - review & editing, Visualization. Oke Gerke: Conceptualization, Methodology, Writing - review & editing. Patrick M. Bossuyt: Conceptualization, Methodology, Writing - review & editing. Tim Friede: Conceptualization, Methodology, Writing - review & editing. Antonia Zapf: Conceptualization, Methodology, Writing - original draft, Writing - review & editing, Supervision.

      Appendix A. Supplementary data

      References

        • Woodcock J.
        • Woosley R.
        The critical path initiative and its influence on new drug development.
        Annu Rev Med. 2008; 59: 1-12
        • Reed J.C.
        • White E.L.
        • Aubé J.
        • Lindsley C.
        • Li M.
        • Sklar L.
        • et al.
        The NIH's role in accelerating translational sciences.
        Nat Biotechnol. 2012; 30: 16-19
        • Abugessaisa I.
        • Saevarsdottir S.
        • Tsipras G.
        • Lindblad S.
        • Sandin C.
        • Nikamo P.
        • et al.
        Accelerating translational research by clinically driven development of an informatics platform – a case study.
        PLoS One. 2014; 9: e104382
        • Bauer P.
        • Bretz F.
        • Dragalin V.
        • König F.
        • Wassmer G.
        Twenty-five years of confirmatory adaptive designs: opportunities and pitfalls.
        Stat Med. 2016; 35: 325-347
        • Bretz F.
        • Schmidli H.
        • König F.
        • Racine A.
        • Maurer W.
        Confirmatory seamless phase II/III clinical trials with hypotheses selection at interim: general concepts.
        Biom J. 2006; 48: 623-634
        • Schmidli H.
        • Bretz F.
        • Racine A.
        • Maurer W.
        Confirmatory seamless phase II/III clinical trials with hypotheses selection at interim: applications and practical considerations.
        Biom J. 2006; 48: 635-643
        • Merlin T.
        • Lehman S.
        • Hiller J.E.
        • Ryan P.
        The "linked evidence approach" to assess medical tests: a critical analysis.
        Int J Technol Assess Health Care. 2013; 29: 343-350
        • Tunis S.
        • Whicher D.
        The National Oncologic PET Registry: lessons learned for coverage with evidence development.
        J Am Coll Radiol. 2009; 6: 360-365
        • Knottnerus J.A.
        • Muris J.W.
        Assessment of the accuracy of diagnostic tests: the cross-sectional study.
        J Clin Epidemiol. 2003; 56: 1118-1128
        • Rutjes A.S.W.
        • Reitsma J.B.
        • Vandenbroucke J.P.
        • Glas A.S.
        • Bossuyt P.M.M.
        Case–control and two-Gate designs in diagnostic accuracy studies.
        Clin Chem. 2005; 51: 1335-1341
        • Fryback D.G.
        • Thornbury J.R.
        The efficacy of diagnostic imaging.
        Med Decis Making. 1991; 11: 88-94
        • Koebberling J.
        • Trampisch H.J.
        • Windeler J.
        Memorandun for the evaluation of diagnostic measures.
        J Clin Chem Clin Biochem. 1990; 28: 873-879
        • Zapf A.
        • Gwinner W.
        • Karch A.
        • Metzger J.
        • Haller H.
        • Koch A.
        Non-invasive diagnosis of acute rejection in renal transplant patients using mass spectrometry of urine samples - a multicentre phase 3 diagnostic accuracy study.
        BMC Nephrol. 2015; 16: 153
        • Rutjes A.W.
        • Reitsma J.B.
        • Di Nisio M.
        • Smidt N.
        • van Rijn J.C.
        • Bossuyt P.M.
        Evidence of bias and variation in diagnostic accuracy studies.
        CMAJ. 2006; 174: 469-476
        • Steyerberg E.W.
        • Harrell Jr., F.E.
        • Borsboom G.J.
        Internal validation of predictive models: efficiency of some procedures for logistic regression analysis.
        J Clin Epidemiol. 2001; 54: 774-781
        • Harrell Jr., F.E.
        • Lee K.L.
        • Mark D.B.
        Multivariable prognostic models: issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors.
        Stat Med. 1996; 15: 361-387
        • Leeflang M.M.
        • Moons K.G.
        • Reitsma J.B.
        • Zwinderman A.H.
        Bias in sensitivity and specificity caused by data-driven selection of optimal cutoff values: mechanisms, magnitude, and solutions.
        Clin Chem. 2008; 54: 729-737
        • Takwoingi Y.
        • Leeflang M.M.
        • Deeks J.J.
        Empirical evidence of the importance of comparative studies of diagnostic test accuracy.
        Ann Intern Med. 2013; 158: 544-554
        • Jensen R.K.
        • Lauridsen H.H.
        Development of a diagnostic screening questionnaire for lumbar spinal stenosis (LSS-Screen).
        (Available at:) (ClinicalTrials.gov identifier: NCT03910335)
        • Pfannenberg C.
        • Gueckel B.
        • Wang L.
        • Gatidis S.
        • Olthof S.C.
        • Vach W.
        • et al.
        Practice-based evidence for the clinical benefit of PET/CT-results of the first oncologic PET/CT registry in Germany.
        Eur J Nucl Med Mol Imaging. 2019; 46: 54-64
        • Forschner A.
        • Olthof S.C.
        • Gückel B.
        • Martus P.
        • Vach W.
        • la Fougère C.
        • et al.
        Impact of 18F-FDG-PET/CT on surgical management in patients with advanced melanoma: an outcome based analysis.
        Eur J Nucl Med Mol Imaging. 2017; 44: 1312-1318
        • Langmore S.E.
        • Schatz K.
        • Olsen N.
        Endoscopic and videofluoroscopic evaluations of swallowing and aspiration.
        Ann Otol Rhinol Laryngol. 1991; 100: 678-681
        • Aviv J.E.
        Prospective, randomized outcome study of endoscopy versus modified barium swallow in patients with dysphagia.
        Laryngoscope. 2000; 110: 563-574