Advertisement

The PRECIS-2 tool has good interrater reliability and modest discriminant validity

  • Kirsty Loudon
    Correspondence
    Corresponding author. Tel.: 01786466341.
    Affiliations
    Nursing Midwifery and Allied Health Professions Research Unit, Unit 13 Scion House, University of Stirling Innovation Park, Stirling FK9 4NF, UK
    Search for articles by this author
  • Merrick Zwarenstein
    Affiliations
    Centre for Studies in Family Medicine, Department of Family Medicine, Schulich School of Medicine & Dentistry, Western University, Western Centre for Public Health and Family Medicine, 1151 Richmond St, London, Ontario N6A 4K7, Canada
    Search for articles by this author
  • Frank M. Sullivan
    Affiliations
    School of Medicine, Medical & Biological Sciences, North Haugh, St Andrews KY16 9ST, UK

    North York General Hospital, 4001 Leslie Street, Toronto, Ontario M2K 1E1, Canada
    Search for articles by this author
  • Peter T. Donnan
    Affiliations
    Division of Population Health Sciences, School of Medicine, University of Dundee, The Mackenzie Building, Kirsty Semple Way, Dundee DD2 4BF, UK
    Search for articles by this author
  • Ildikó Gágyor
    Affiliations
    Department of General Practice, University Medical Center Göttingen, Humboldtallee 38, D-37073 Göttingen, Germany
    Search for articles by this author
  • Hans J.S.M. Hobbelen
    Affiliations
    Ageing and Healthcare Research Group, Healthy Ageing, Allied Health Care and Nursing, Centre of Expertise Healthy Ageing, Hanze University of Applied Sciences Groningen, Eyssoniusplein 18, Kamer A0.17, Postbus 3109 9701 DC Groningen, The Netherlands
    Search for articles by this author
  • Fernando Althabe
    Affiliations
    Departamento de Investigación en Salud de la Madre y el Niño, Instituto de Efectividad Clínica y Sanitaria (IECS), Dr Emilio Ravignani 2024 (C1414CPV), Buenos Aires, Argentina
    Search for articles by this author
  • Jerry A. Krishnan
    Affiliations
    Population Health Sciences, Office of the Vice Chancellor for Health Affairs, Medicine and Public Health, University of Illinois at Chicago, 1200 West Harrison St., Chicago, IL 60607, USA
    Search for articles by this author
  • Shaun Treweek
    Affiliations
    Health Services Research Unit, University of Aberdeen, 3rd Floor, Health Sciences Building, Foresterhill, Aberdeen AB25 2ZD, UK
    Search for articles by this author

      Abstract

      Objectives

      PRagmatic Explanatory Continuum Indicator Summary (PRECIS)-2 is a tool that could improve design insight for trialists. Our aim was to validate the PRECIS-2 tool, unlike its predecessor, testing the discriminant validity and interrater reliability.

      Study Design and Setting

      Over 80 international trialists, methodologists, clinicians, and policymakers created PRECIS-2 helping to ensure face validity and content validity. The interrater reliability of PRECIS-2 was measured using 19 experienced trialists who used PRECIS-2 to score a diverse sample of 15 randomized controlled trial protocols. Discriminant validity was tested with two raters to independently determine if the trial protocols were more pragmatic or more explanatory, with scores from the 19 raters for the 15 trials as predictors of pragmatism.

      Results

      Interrater reliability was generally good, with seven of nine domains having an intraclass correlation coefficient over 0.65. Flexibility (adherence) and recruitment had wide confidence intervals, but raters found these difficult to rate and wanted more information. Each of the nine PRECIS-2 domains could be used to differentiate between trials taking more pragmatic or more explanatory approaches with better than chance discrimination for all domains.

      Conclusion

      We have assessed the validity and reliability of PRECIS-2. An elaboration study and web site provide guidance to help future users of the tool which is continuing to be tested by trial teams, systematic reviewers, and funders.

      Keywords

      What is new?

        Key findings

      • The original PRagmatic Explanatory Continuum Indicator Summary (PRECIS) tool did not have its validity and reliability formally measured.
      • The interrater reliability of PRECIS-2 was measured using 19 raters (trialists from seven countries) to score a varied sample of 15 randomized controlled trial protocols.
      • Interrater reliability was generally good, with seven of nine domains having an intraclass correlation coefficient over 0.65.
      • Each of the nine PRECIS-2 domains could be used to differentiate between trials taking more pragmatic or more explanatory approaches with better than chance discrimination for all domains.
      • The validity and reliability of PRECIS-2 have been assessed.

        What is the implication?

      • PRECIS-2 has good inter-rater reliability and reasonable discriminant validity even when used retrospectively and can be used to assess the design approach taken (or being proposed) for a trial.

        What should change now?

      • By using PRECIS-2, trialists can enhance the transparency and reporting of their design approach, as well as help to reduce research waste by providing an opportunity to consider the consequences of design decisions on the usefulness of the trial results to their intended users.

      1. Introduction

      The aim of the original PRagmatic Explanatory Continuum Indicator Summary (PRECIS) tool [
      • Thorpe K.E.
      • Zwarenstein M.
      • Oxman A.D.
      • Treweek S.
      • Furberg C.D.
      • Altman D.G.
      • et al.
      A pragmatic-explanatory continuum indicator summary (PRECIS): a tool to help trial designers.
      ,
      • Thorpe K.E.
      • Zwarenstein M.
      • Oxman A.D.
      • Treweek S.
      • Furberg C.D.
      • Altman D.G.
      • et al.
      A pragmatic-explanatory continuum indicator summary (PRECIS): a tool to help trial designers.
      ], and of the current PRECIS-2 tool [
      • Loudon K.
      • Treweek S.
      • Sullivan F.
      • Donnan P.
      • Thorpe K.
      • Zwarenstein M.
      The PRECIS-2 tool: designing trials that are fit for purpose.
      ], is to enable trialists to match their design decisions to the intended purpose of the trial. Some trials are conducted to understand how an intervention works (explanatory or efficacy trials), whereas others are intended to inform clinical and service delivery decisions in usual health care settings in the real world (pragmatic or effectiveness trials) [
      • Thorpe K.E.
      • Zwarenstein M.
      • Oxman A.D.
      • Treweek S.
      • Furberg C.D.
      • Altman D.G.
      • et al.
      A pragmatic-explanatory continuum indicator summary (PRECIS): a tool to help trial designers.
      ,
      • Thorpe K.E.
      • Zwarenstein M.
      • Oxman A.D.
      • Treweek S.
      • Furberg C.D.
      • Altman D.G.
      • et al.
      A pragmatic-explanatory continuum indicator summary (PRECIS): a tool to help trial designers.
      ,
      • Loudon K.
      • Treweek S.
      • Sullivan F.
      • Donnan P.
      • Thorpe K.
      • Zwarenstein M.
      The PRECIS-2 tool: designing trials that are fit for purpose.
      ]. Although the original PRECIS tool (2009) [
      • Thorpe K.E.
      • Zwarenstein M.
      • Oxman A.D.
      • Treweek S.
      • Furberg C.D.
      • Altman D.G.
      • et al.
      A pragmatic-explanatory continuum indicator summary (PRECIS): a tool to help trial designers.
      ,
      • Thorpe K.E.
      • Zwarenstein M.
      • Oxman A.D.
      • Treweek S.
      • Furberg C.D.
      • Altman D.G.
      • et al.
      A pragmatic-explanatory continuum indicator summary (PRECIS): a tool to help trial designers.
      ] was increasingly cited, it was criticized for the lack of interrater variability assessments and the absence of a rating scale for each domain [
      • Koppenaal T.
      • Linmans J.
      • Knottnerus J.A.
      • Spigt M.
      Pragmatic vs. explanatory: an adaptation of the PRECIS tool helps to judge the applicability of systematic reviews for daily practice.
      ,
      • Riddle D.L.
      • Johnson R.E.
      • Jensen M.P.
      • Keefe F.J.
      • Kroenke K.
      • Bair M.J.
      • et al.
      The Pragmatic-Explanatory Continuum Indicator Summary (PRECIS) instrument was useful for refining a randomized trial design: experiences from an investigative team.
      ,
      • Witt C.M.
      • Manheimer E.
      • Hammerschlag R.
      • Ludtke R.
      • Lao L.X.
      • Tunis S.R.
      • et al.
      How well do randomized trials inform decision making: systematic review using comparative effectiveness research measures on acupuncture for back pain.
      ,
      • Glasgow R.E.
      • Gaglio B.
      • Bennett G.
      • Jerome G.J.
      • Yeh H.C.
      • Sarwer D.B.
      • et al.
      Applying the PRECIS criteria to describe three effectiveness trials of weight loss in obese patients with comorbid conditions.
      ,
      • Sanchez M.A.
      • Rabin B.A.
      • Gaglio B.
      • Henton M.
      • Elzarrad M.K.
      • Purcell P.
      • et al.
      A systematic review of eHealth cancer prevention and control interventions: new technology, same methods and designs?.
      ]. Users also wanted further explanation on the PRECIS domains to use the tool effectively. The PRECIS-2 tool [
      • Loudon K.
      • Treweek S.
      • Sullivan F.
      • Donnan P.
      • Thorpe K.
      • Zwarenstein M.
      The PRECIS-2 tool: designing trials that are fit for purpose.
      ] published in 2015 aimed to address these demands. It was the result of collaboration with over 80 international trialists, clinicians, and policymakers from 2011 to 2014 involving a two-round electronic Delphi, brainstorming meetings in Dundee, UK and Toronto, Canada and user testing of the PRECIS-2 tool with 19 individual trialists ranging from early career to experienced researchers [
      • Loudon K.
      • Treweek S.
      • Sullivan F.
      • Donnan P.
      • Thorpe K.
      • Zwarenstein M.
      The PRECIS-2 tool: designing trials that are fit for purpose.
      ,
      • Loudon K.
      • Zwarenstein M.
      • Sullivan F.
      • Donnan P.
      • Treweek S.
      Making clinical trials more relevant: improving and validating the PRECIS tool for matching trial design decisions to trial purpose.
      ,
      • Loudon K.
      Making trials matter: providing an empirical basis for the selection of pragmatic design choices in clinical trials.
      ]. This PRECIS-2 tool, like the original, was intended to be used prospectively, at the trial design stage, by a multidisciplinary team to prompt discussion of each design choice, and thus ensure that the resulting trial design would match the intended question, whether it be pragmatic or explanatory.
      The nine domains in PRECIS-2, eligibility, recruitment, setting, organization, flexibility (delivery), flexibility (adherence), follow-up, primary outcome, primary analysis are each scored repeatedly as the trial protocol is developed, using a scale from “1” (very explanatory) to “5” (very pragmatic). A score of “1” indicates a highly explanatory design choice for that domain, suggesting that trial domain is intending to test an intervention under idealized, tightly controlled conditions, whereas a “5” would suggest that domain is intending to be very pragmatic and test the intervention under conditions close to routine clinical care. With all nine domains scored, trialists get a visual representation of their trial's design on a wheel (Fig. 1)—and can instantly see across all domains, whether the trial is more pragmatic (mostly near the rim of the wheel) or more explanatory (close to the hub or center of the wheel).
      Figure thumbnail gr1
      Fig. 1The PRECIS-2 wheel. PRECIS, PRagmatic Explanatory Continuum Indicator Summary.
      Reprinted with permission from [
      • Loudon K.
      • Treweek S.
      • Sullivan F.
      • Donnan P.
      • Thorpe K.
      • Zwarenstein M.
      The PRECIS-2 tool: designing trials that are fit for purpose.
      ].
      The validity and reliability of the original PRECIS tool (2009) was not formally measured [
      • Thorpe K.E.
      • Zwarenstein M.
      • Oxman A.D.
      • Treweek S.
      • Furberg C.D.
      • Altman D.G.
      • et al.
      A pragmatic-explanatory continuum indicator summary (PRECIS): a tool to help trial designers.
      ,
      • Thorpe K.E.
      • Zwarenstein M.
      • Oxman A.D.
      • Treweek S.
      • Furberg C.D.
      • Altman D.G.
      • et al.
      A pragmatic-explanatory continuum indicator summary (PRECIS): a tool to help trial designers.
      ]. Rather it was presented as a concept and readers were encouraged to try it out and further develop PRECIS. Some users did just this, using the original PRECIS tool [
      • Thorpe K.E.
      • Zwarenstein M.
      • Oxman A.D.
      • Treweek S.
      • Furberg C.D.
      • Altman D.G.
      • et al.
      A pragmatic-explanatory continuum indicator summary (PRECIS): a tool to help trial designers.
      ] as it was intended prospectively at the trial design stage, for retrospective assessment in systematic reviews or to assess trials that were past the trial design stage and underway. Many had problems with interrater variability when multiple raters scored PRECIS domains for a trial. For example, in a systematic review, Koppenhaal et al. tested a modified PRECIS tool and recommended using two raters to reduce subjectivity across all domains when assessing 20 trials using two raters [
      • Koppenaal T.
      • Linmans J.
      • Knottnerus J.A.
      • Spigt M.
      Pragmatic vs. explanatory: an adaptation of the PRECIS tool helps to judge the applicability of systematic reviews for daily practice.
      ]; Riddle et al. found that discussion helped his team of seven raters when they used the PRECIS tool as intended to design a trial; variation in PRECIS-2 scores before discussion for each individual was 1.16 for the average standard deviation becoming 0.61 after discussion across all domains [
      • Riddle D.L.
      • Johnson R.E.
      • Jensen M.P.
      • Keefe F.J.
      • Kroenke K.
      • Bair M.J.
      • et al.
      The Pragmatic-Explanatory Continuum Indicator Summary (PRECIS) instrument was useful for refining a randomized trial design: experiences from an investigative team.
      ]. Witt in a systematic review of 10 trials had low interrater reliability among five raters using PRECIS after the first round of scoring, but after a consensus meeting, this improved and there was usually only one-point difference (on a five-point scale) for all domains [
      • Witt C.M.
      • Manheimer E.
      • Hammerschlag R.
      • Ludtke R.
      • Lao L.X.
      • Tunis S.R.
      • et al.
      How well do randomized trials inform decision making: systematic review using comparative effectiveness research measures on acupuncture for back pain.
      ]. Glasgow et al. [
      • Glasgow R.E.
      • Gaglio B.
      • Bennett G.
      • Jerome G.J.
      • Yeh H.C.
      • Sarwer D.B.
      • et al.
      Applying the PRECIS criteria to describe three effectiveness trials of weight loss in obese patients with comorbid conditions.
      ] assessed three studies with PRECIS using nine raters and found intraclass correlation coefficient for domains was 0.72, whereas Sanchez et al. assessed 113 trials using two raters, calculated weighted agreement scores for PRECIS ranged from 63.9 to 78.5%, with a median of 73.9% [
      • Sanchez M.A.
      • Rabin B.A.
      • Gaglio B.
      • Henton M.
      • Elzarrad M.K.
      • Purcell P.
      • et al.
      A systematic review of eHealth cancer prevention and control interventions: new technology, same methods and designs?.
      ]. All of these studies have been small, with few raters and/or trials being rated; the most raters used were seven but Riddle et al. scored only one trial (their own) [
      • Riddle D.L.
      • Johnson R.E.
      • Jensen M.P.
      • Keefe F.J.
      • Kroenke K.
      • Bair M.J.
      • et al.
      The Pragmatic-Explanatory Continuum Indicator Summary (PRECIS) instrument was useful for refining a randomized trial design: experiences from an investigative team.
      ]. This work highlights that there were issues around interrater variability which we were keen to address in formally testing the validity and reliability of the PRECIS-2 tool.
      For PRECIS-2, we achieved face validity by consulting a large number of participants and potential users of the tool in creating and developing the updated tool. The modified tool, PRECIS-2, kept the simple format but addressed weaknesses through a scoring system, domains changes, and additional guidance. We felt, however, that it would be useful to assess the reliability and other aspects of validity of the new tool at the point it was developed, so that its strengths and limitations would be known early in its life. Moreover, having a validated tool might encourage more trialists to consider using the tool with their own trials. Within the time frame of K.L.'s PhD, it was not practical to prospectively quantitatively assess the use the PRECIS-2 tool by trial teams at the design stage of their trials (although some largely qualitative work was done [
      • Loudon K.
      Making trials matter: providing an empirical basis for the selection of pragmatic design choices in clinical trials.
      ]). To give an indication of validity and reliability, we therefore decided to use the tool retrospectively on trials that had already been published, recognizing that this would be a harder test for PRECIS-2, that is, to be used by raters unfamiliar with the trials they were assessing and without discussion between raters. It would, in other words, provide a conservative estimate of validity and reliability.
      The aim of the work described here was to validate the PRECIS-2 tool. To ensure PRECIS-2 could be used to design different trials by different raters on a spectrum of pragmatism, from very explanatory to very pragmatic, we tested the face validity, interrater reliability, and discriminant validity (ability of the domains to determine pragmatism) of PRECIS-2. We believed that it was important that the participants reflect trialists who are experienced and could be future users of the PRECIS-2 tool. It was also important that the sample of trial protocols that they assessed was varied to allow the tool to be used for all trial protocol designs.

      2. Methods

      First, (1) we undertook a sample size calculation using the intraclass coefficient, then (2) we selected the trials that would be used to test out PRECIS-2. This was followed by (3) pilot-testing the materials and methods to make it as easy as possible for individual participants to assist with validity and reliability testing. (4) A purposive sample of trialists were then invited to participate in this project. (5) The interrater variability of the nine PRECIS-2 domains was then analyzed. Finally, (6) statistical analysis indicated the discriminant validity of PRECIS-2 to determine pragmatism.

      2.1 Sample size

      The key requirement for assessing the reliability of PRECIS-2 was to ensure we had sufficient raters involved in testing the PRECIS-2 tool. The intraclass correlation coefficient (ICC) acts as a measure of interrater reliability (see Section 2.5). We were expecting an ICC near 0.7; Land and Koch view the range of 0.61 to 0.80 as “substantial agreement.” Assuming the ICC was in the region of 0.7, then 15 raters looking at 10, 15, or 20 trials would give precisions of ±0.20, ±0.17, and ±0.14, respectively. We aimed to give our 15 raters between 10 and 15 trials to rate.

      2.2 Selection of the trials for assessment

      We needed a broad spectrum of trials for raters to independently rate using the PRECIS-2 tool. We decided to use trial protocols because they give more detailed information on trial design than the final trial publications. We were given permission to access a database of trial protocol examples assembled from public web sites, journals, trial investigators, and industry sponsors by An-Wen Chan and Jennifer M Tetzlaff for SPIRIT–Standard Protocol Items: Recommendations for Interventional Trials [
      • Chan A.W.
      • Tetzlaff J.M.
      • Altman D.G.
      • Dickersin K.
      • Moher D.
      SPIRIT 2013: new guidance for content of clinical trial protocols.
      ]. The SPIRIT guidance for protocol reporting was published in 2013 (http://www.equator-network.org/reporting-guidelines/spirit-2013-statement-defining-standard-protocol-items-for-clinical-trials/) in response to poor reporting.
      S.T. and K.L. independently screened the 150 SPIRIT protocols, excluding all trial protocols longer than 60 pages (approximately 10%) to reduce the burden on raters, who would have to read them. K.L. and S.T. each selected 20% of the SPIRIT database of trial protocols to include the different types of trials in terms of interventions (drug trials, therapy or educational programs), settings (primary care, hospital and community), a range of countries, publications in different journals and, including cluster randomized and factorial designs. They agreed through discussion on a final selection of 15 trial protocols (Supplementary 1 at www.jclinepi.com), published between 2008 and 2011.

      2.3 Internal testing of the materials and procedure to guide future use

      K.L. developed the training materials and procedures for reviewers on how to use PRECIS-2, the nine PRECIS-2 domains, and descriptions and the scoring system, with examples of trials rated by PRECIS-2. These were pilot-tested with four members of the PRECIS-2 development steering group and one independent primary care trialist (S.T., F.S., M.Z., I.G., and K.L.) for clarity and ease of use.
      The pilot testers used these materials to assess three contrasting trials, purposively selected to test the PRECIS-2 tool: a single-center factorial trial in the United States—the physiotherapy vs. corticosteroid trial [
      • Rhon D.I.
      • Boyles R.E.
      • Cleland J.A.
      • Brown D.L.
      A manual physical therapy approach versus subacromial corticosteroid injection for treatment of shoulder impingement syndrome: a protocol for a randomised clinical trial.
      ]; a behavioral change cluster-randomized trial in India using Accredited Social Health Activists to improve maternal and neonatal health [
      • Tripathy P.
      • Nair N.
      • Mahapatra R.
      • Rath S.
      • Gope R.K.
      • Bajpai A.
      • et al.
      Community mobilisation with women's groups facilitated by Accredited Social Health Activists (ASHAs) to improve maternal and newborn health in underserved areas of Jharkhand and Orissa: study protocol for a cluster-randomised controlled trial.
      ] and a multicenter trial in North of England and Scotland comparing surgical intervention with conventional medical treatment in children with recurrent sore throat trial [
      • Bond J.
      • Wilson J.
      • Eccles M.
      • Vanoli A.
      • Steen N.
      • Clarke R.
      • et al.
      Protocol for north of England and Scotland study of tonsillectomy and adeno-tonsillectomy in children (NESSTAC). A pragmatic randomised controlled trial comparing surgical intervention with conventional medical treatment in children with recurrent sore throats.
      ] (Supplementary 1 at www.jclinepi.com).
      Our test led to two changes. First, to speed up the PRECIS-2 learning process for raters, we produced a much shorter three-page information sheet. Second, we reduced the number of PRECIS-2 tool domains from 10 to 9 and removed “organization–comparison.” As the intention of the tool is primarily to help design trials which are useful for decision making in usual care, our approach to PRECIS-2 was to simplify it by always drawing the comparison with existing patterns of usual care or standard of care.

      2.4 Selection and invitation of participants

      Thirty-five personalized invitations (Fig. 2) were sent on September 24, 2013, to six different groups of potential raters, including researchers who had been involved in an early stage of PRECIS-2 development (a Delphi); early user testers who had given feedback on initial versions of PRECIS-2; individuals who had participated in brainstorming meetings; methodologists in the Cochrane Methodology Review Group, the CONSORT group, the Scottish Clinical Trials Units, and the EU-funded DECIDE (Developing and Evaluating Communication Strategies to Support Informed Decisions and Practice based on Evidence) group; researchers who had worked with the original PRECIS tool; and editors of medical/trial journals. Sampling was purposive: for this retrospective assessment of trial protocols using PRECIS-2, we wanted experienced trialists and methodologists who would be able to commit the not inconsiderable time required to do the assessments.
      Figure thumbnail gr2
      Fig. 2Flow diagram of participants in validity and reliability testing.
      Nineteen researchers agreed to take part and were sent a concise PRECIS-2 training package comprising a three-page explanation sheet, a PRECIS-2 wheel, and table that could be used for scoring trial protocols. Raters were sent 5 to 15 protocols each, in batches of five, with a new batch sent on receipt of the returned previous batch; the number of protocols depended on rater preference. As this was a significant burden on time, raters were initially offered £100 as a notional payment for about 4-hour work but as many raters waived payment, this enabled us to increase the financial incentive to £200 to complete the assessment of 15 trial protocols using PRECIS-2.

      2.5 Statistical analysis of the interrater reliability of the nine PRECIS-2 domains

      The ICC is a relatively simple statistical measure to assess variation and determine if raters were reaching similar decisions with regard to domain scores. As there were two variables that could affect the rating, the raters, and the trial protocols, we chose the two-way random-effects model where both people (PRECIS-2 raters) and measures effects (trial protocols for scoring) are treated as random variables, that is, as a random sample of all potential raters and trial protocols. To determine the effect of missing data, we undertook sensitivity analysis, imputing missing data in two ways: (1) using a score of “3”—equally pragmatic/explanatory and (2) undertaking multiple sensitivity analysis (10 imputations) in which randomly generated values of “1” to “5” were inserted if there were missing values.

      2.6 Statistical analysis of the discriminant validity of PRECIS-2 to determine pragmatism

      Although interrater reliability determines the reliability of the tool in different hands, discriminant validity examines the ability of the tool itself to discriminate between pragmatic and explanatory trials. We were keen to evaluate discriminant validity by testing whether PRECIS-2 could accurately discriminate trials of varying degree of pragmatism (an ordinal variable), but discriminant validity is a binary concept and requires a gold standard against which the performance of the instrument is compared.
      Ideally, to asses discriminant validity, we would have asked participants to give subjective global ratings of pragmatism of the trial after reading the trial protocol, then participants would use PRECIS-2 to rate the nine domains of PRECIS-2. However, because of the already significant burden on raters, this was not possible. Therefore, we decided to determine discriminant validity to compare our own (S.T. and K.L.) subjective global (more pragmatic vs. more explanatory) ratings of each of the 15 trial protocols. Two raters (K.L. and S.T.) independently used binary scores of more pragmatic = “1,” more explanatory = “0” to rate the overall pragmatism for the 15 trials. This was done by making a judgment of degree of pragmatism based on reading the trial publication, with K.L. and S.T. then reaching consensus through discussion. We used this as an implicit gold standard to compare with the median score of each domain of PRECIS-2, determined by as many as 18 raters, analyzed using binary logistic regression, calculating area under the curve (AUROC) odds (discriminant validity) − (receiver operating characteristic curve) (ROC curve function) [
      • Loudon K.
      Making trials matter: providing an empirical basis for the selection of pragmatic design choices in clinical trials.
      ]. We used a Hosmer-Lemeshow goodness-of-fit statistic to assess calibration of the model. We saved the predicted probabilities for each domain, and then using the ROC curve function, we calculated AUROC. This showed us the sensitivity/specificity of the different PRECIS-2 domain variables for different cutoffs. Thus, using the test variable as the predicted probability (PRE_1, PRE_2, etc.) and the state variables as pragmatism (more pragmatic, more explanatory), we calculated how good each domain in the PRECIS-2 tool is at predicting whether the trial was more or less pragmatic (using our subjective, consensus-based gold standard) (1) displaying an ROC curve with diagonal reference line and standard error and confidence interval. SPSS (version 15.5; IBM Corporation, Armonk, NY, USA) was used for this analysis.

      3. Results

      3.1 Participants

      After 10 weeks, we had received a response from 91% (32/35), with 54% (19/35) of ratings returned. The 19 raters came from seven countries—USA (8), UK (3), Canada (3), The Netherlands (2), Argentina (1), Australia (1), and Germany (1). Of these, seven of the raters scored 15, and 12 scored 10 trials (Supplementary 2 at www.jclinepi.com). Six of 19 of the raters had assisted with the Delphi round, brainstorming, or user testing, and 4 of the 19 raters had assisted with methodological testing of the original PRECIS tool. The remaining nine raters had not previously been involved in development of PRECIS or PRECIS-2.

      3.2 Results of the statistical analysis of the interrater reliability of the nine PRECIS-2 domains

      For seven of nine domains, interrater reliability was good or modest with ICC over 0.65 and tight confidence intervals. Two domains flexibility (adherence), ICC 0.57, and recruitment, ICC 0.60, have lower interrater reliability with wide confidence intervals. These results are based on the best estimate of the ICC and confidence intervals for 10 trials and 12 raters as this was closest to our sample size calculations (Table 1, Table 2). To assess the effect of missing data, we imputed values of “3” as equally pragmatic/explanatory to indicate the uncertainty in scoring (Table 1). We also randomly imputed values of “1” to “5” (Table 2). In Table 1, we looked at the ICC for five trials scored by 18 raters, 10 trials scored by 12 raters, and 15 trials scored by seven raters; generally, there was not much difference for the ICC for the different batches of trials. The ICC scores for the different domains in Table 1 with imputed values of “3” compared well to the ICC scores for the PRECIS-2 domains in Table 2 for a complete set of 15 trials scored by 19 raters which included random values “1” to “5” for missing data (up to 38%).
      Table 1Overall results for interrater reliability for nine PRECIS-2 domains including sensitivity analysis
      DomainNumber of trials, ratersNo. of imputed values = 3
      PRECIS-2 score = 3 equally pragmatic/explanatory chosen as score to indicate uncertainty in scoring.
      (%)
      Intraclass correlation95% confidence interval
      Lower boundUpper bound
      Eligibility15, 71 (0.74)0.88***0.750.95
      10, 122 (1.67)0.89***0.760.97
      5, 184 (4.4)0.94***0.810.99
      Recruitment15, 71 (0.74)0.59**0.180.84
      10, 122 (1.67)0.60*0.100.88
      5, 184 (4.4)0.83***0.500.98
      Setting15, 71 (0.95)0.80***0.600.92
      10, 122 (1.67)0.80***0.560.94
      5, 185 (4.4)0.92***0.760.99
      Organization15, 72 (1.99)0.72***0.440.89
      10, 129 (7.5)0.83***0.610.95
      5, 186 (5.55)0.75**0.250.97
      Flexibility delivery15, 73 (3.33)0.74***0.470.90
      10, 126 (6.67)0.85***0.670.96
      5, 186 (6.67)0.92***0.750.99
      Flexibility adherence15, 500.50*−0.060.81
      15, 78 (7.62)0.24 ns−0.540.70
      10, 1217 (15.74)0.57*0.040.88
      5, 1818 (15.56)0.72**0.160.97
      Follow-up15, 71 (1.11)0.60**0.180.84
      10, 128 (8.89)0.80***0.550.94
      5, 186 (6.67)0.85***0.550.98
      Primary outcome15, 700.44 ns−0.130.78
      10, 121 (1.11)0.66**0.240.900
      5, 183 (3.33)0.84***0.540.98
      Primary analysis15, 700.67***0.320.87
      10, 123 (3.33)0.73***0.390.92
      5, 185 (5.55)0.83***0.500.98
      Abbreviations: PRECIS, PRagmatic Explanatory Continuum Indicator Summary; ns, not significant.
      Trials were scored in batches of 5; there were overall 19 raters but one rater asked for 10 trials but only scored the second batch of trials so did not score the first five trials that 18 other raters scored.
      *P value 0.01 to 0.05 significant; **P value 0.001 to 0.01 very significant; ***P value 0.0001 to 0.001 extremely significant; P value ≥ 0.05 not significant.
      a PRECIS-2 score = 3 equally pragmatic/explanatory chosen as score to indicate uncertainty in scoring.
      Table 2Rater scoring using randomly generated values (1–5) to impute missing data for PRECIS-2 domains using responses for 15 trials by 19 raters
      Domain% Missing data
      Approximately 93% of missing data are due to trials not being scored at all by raters.
      Intraclass correlation95% confidence intervalSignificance
      Lower boundUpper bound
      Eligibility330.840.690.94***
      Recruitment330.580.200.84**
      Setting340.790.600.92***
      Organization360.720.460.89***
      Flexibility delivery prov.350.800.620.92***
      Flexibility adherence380.540.120.82*
      Follow-up340.710.440.88***
      Primary outcome340.680.380.87***
      Primary analysis340.670.370.84***
      Abbreviations: PRECIS, PRagmatic Explanatory Continuum Indicator Summary; ns, not significant.
      *P value 0.01 to 0.05 significant; **P value 0.001 to 0.01 very significant; ***P value 0.0001 to 0.001 extremely significant; P value ≥ 0.05 not significant.
      a Approximately 93% of missing data are due to trials not being scored at all by raters.

      3.3 Discriminant validity results

      Agreement between S.T. and K.L. on their implicit gold standard was 80% (12/15) before discussion, Cohen's kappa 0.59 indicating moderate agreement. The three trials where there was disagreement in assigning a trial as being pragmatic or explanatory, that is, “1” instead of “0” were resolved after discussion.
      The AUROC values for determining whether a trial is more pragmatic or more explanatory for the nine PRECIS-2 domains are displayed in Table 3; these are a numerical summary of the ROC curves (Supplementary 3 at www.jclinepi.com). These values have been placed in order of discriminative ability. A score of 1 would be the ideal score and indicate that a PRECIS-2 domain was perfect at discriminating between more pragmatic and more explanatory trials. Random performance with no discriminant ability beyond chance would be 0.5. For the ROC curves, ideally we would want the whole curve to be above the diagonal line. The results for all PRECIS-2 domains are greater than 0.5 although some are not significantly different from chance. Primary outcome is the single variable that is most likely to discriminate how pragmatic a trial is based on this data—AUROC 0.75. Then in order of discriminating between a more pragmatic and more explanatory approach: follow-up 0.73, primary analysis 0.72, flexibility (delivery) 0.71, eligibility 0.62, recruitment 0.62, flexibility adherence 0.60, setting 0.59, organization 0.57.
      Table 3Discriminant validity measured using area under the ROC curves (AUROC)
      DomainsAUROC95% confidence intervals
      Primary outcome0.750.49–1.00
      Follow-up0.730.48–0.99
      Primary analysis0.720.45–1.00
      Flexibility delivery0.710.44–0.99
      Eligibility0.620.33–0.92
      Recruitment0.620.32–0.92
      Flexibility adherence0.600.30–0.89
      Setting0.590.26–0.92
      Organization0.570.27–0.87

      4. Discussion

      Our reliability and validity work found that PRECIS-2 has generally good interrater reliability across the nine domains with 7/9 ICCs over 0.65 and modest discriminant validity with better than chance discriminant validity for 7/9 domains in comparison with our subjective global ratings of pragmatism. The two domains which were not statistically better discriminants than chance were flexibility adherence and recruitment, and this is likely because both were poorly described in the trial protocols.
      It is important to note that PRECIS-2 was developed to help designers of trials to match their design choices to their intended degree of pragmatism, not for retrospective assessment of trials designed by others and that the poor description of certain domains in the protocols is therefore not relevant to the main use of this tool. Trial design teams would be much more familiar with the intricate details of each domain for a trial they were currently designing, than would be our assessors, who were rating trials they did not design. Because it was not logistically possible to work with large enough trial design teams, during their design process, to evaluate performance of the PRECIS-2 tool, we constructed an artificial situation which would be expected to underestimate both interrater reliability and discriminant validity. It is encouraging that interrater reliability was still good, and discriminant validity modest, even when PRECIS-2 was used by researchers unconnected with the trial being scored.
      Sensitivity analysis indicated there was no obvious difference in the scores between individuals with regard to country, research area, or profession who completed 10 or 15 trials. The main reason for not assessing all 15 trial protocols using PRECIS-2 was lack of time.

      4.1 Strengths and limitations

      This is the first validation and estimation of reliability of the PRECIS-2 tool, which was never done for the original PRECIS tool [
      • Thorpe K.E.
      • Zwarenstein M.
      • Oxman A.D.
      • Treweek S.
      • Furberg C.D.
      • Altman D.G.
      • et al.
      A pragmatic-explanatory continuum indicator summary (PRECIS): a tool to help trial designers.
      ,
      • Thorpe K.E.
      • Zwarenstein M.
      • Oxman A.D.
      • Treweek S.
      • Furberg C.D.
      • Altman D.G.
      • et al.
      A pragmatic-explanatory continuum indicator summary (PRECIS): a tool to help trial designers.
      ]. We involved raters who reflect the target group of experienced trialists who could be future users of the PRECIS-2 tool. The sample of trial protocols that they assessed was varied, indicating the tool can be used for diverse trial designs. Our assumption that raters have limited time turned out to be correct and we were unable to get all 15 raters to complete all 15 trials. In addition, some raters did not score particular domains giving various reasons, for instance, eligibility, organization, and flexibility (delivery) were not scored on medical and surgical trials due to lack of expertise in the area (eg, physiotherapist) who reported: “No entry, obviously no content knowledge on this one. Too far afield of my content to judge.” Other examples of explanations for missing ratings in domains included the following: for recruitment, organization, “inadequate information”; for setting, “unclear to judge”; for flexibility (adherence) “although mentioned in most study protocols in protocol publications often not enough information is given to judge on this.” Many of the imputed values are needed due to “lack of time” or whole trials not being scored by individual raters. Comparing the values for the different batches of trials and indeed for a complete set, there is little difference in the values for the ICC thus the impact of the missing data on our assessment of interrater reliability was not serious.
      We are confident in the ability of PRECIS-2 to pick out trials taking different design approaches and that different raters looking at the same trials come to similar conclusions. We have used the feedback from participants in validity and reliability testing of PRECIS-2 to add additional information to the guidance for users on PRECIS-2 [
      • Loudon K.
      • Treweek S.
      • Sullivan F.
      • Donnan P.
      • Thorpe K.
      • Zwarenstein M.
      The PRECIS-2 tool: designing trials that are fit for purpose.
      ] and the PRECIS-2 web site http://www.precis-2.org/.
      Asking raters to retrospectively score trial protocols is perhaps a rather artificial way of using the PRECIS-2 tool when we suggest that it is used at the design stage by the team designing the trial. Although a prospective study is conceivable, the time that would be needed to do it was prohibitive. The results presented here could be considered as a worst-case test of the tool given that there was sometime inadequate reporting of trial information that was relevant to assessing the PRECIS-2 domains. This was also one of the reasons for a high percentage of missing data (in addition to being unable to get a fully completed assessment of 15 trial protocols by 15 raters). This highlights a need to adhere to the SPIRIT statement [
      • Chan A.W.
      • Tetzlaff J.M.
      • Altman D.G.
      • Dickersin K.
      • Moher D.
      SPIRIT 2013: new guidance for content of clinical trial protocols.
      ] to improve reporting of information on design and methods and in conjunction the CONSORT statement for pragmatic trials [
      • Zwarenstein M.
      • Treweek S.
      • Gagnier J.J.
      • Altman D.G.
      • Tunis S.
      • Haynes B.
      • et al.
      Improving the reporting of pragmatic trials: an extension of the CONSORT statement.
      ] and also the full CONSORT statement for randomized trials (http://www.consort-statement.org/) as these data are useful to understand a trial's design and assess applicability of trials.
      Discriminant validity could only be tested using a global, dichotomous “more pragmatic” or “more explanatory” assessment based on the independent judgment of S.T. and K.L. rather than on the judgments of more raters. This was to reduce workload but may have an impact on the results. Clearly, judgments based on the opinions of more raters would have been preferable but we were very wary of the burden we were already placing on raters.

      5. Conclusion

      The validity and reliability of the PRECIS-2 tool is modest, even when tested retrospectively using individuals unconnected with the trials being scored. PRECIS-2 is a relatively simple, visual tool that can be used to focus the trial team's discussion on the match between their design decisions and the needs of those for whom the results are intended, and this perhaps helps to explain why PRECIS-2 is already proving useful in pragmatic trial design [
      • Ford I.
      • Norrie J.
      Pragmatic trials.
      ]. We believe it could also be helpful in reducing research waste [
      • Moher D.
      • Glasziou P.
      • Chalmers I.
      • Nasser M.
      • Bossuyt P.M.
      • Korevaar D.A.
      • et al.
      Increasing value and reducing waste in biomedical research: who's listening?.
      ] by helping trialists to consider the consequences of their design decisions on the usefulness of the trial results to their intended users.

      Acknowledgments

      The authors are grateful to all the participants who assisted in this study: F. Althabe, A.-W. Chan, D. Altman, D. Bratton, E. Brass, M. Campbell, G. Forbes, B. Gaglio, R. Glasgow, H.J.S.M. Hobbelen, S. Hopewell, J.A. Krishnan, D. Riddle, J. Segal, D. Steinfort, P. Tugwell, S.N. Van der Veer, V.A. Welch, C. Witt.

      References

        • Thorpe K.E.
        • Zwarenstein M.
        • Oxman A.D.
        • Treweek S.
        • Furberg C.D.
        • Altman D.G.
        • et al.
        A pragmatic-explanatory continuum indicator summary (PRECIS): a tool to help trial designers.
        CMAJ. 2009; 180: E47-E57
        • Thorpe K.E.
        • Zwarenstein M.
        • Oxman A.D.
        • Treweek S.
        • Furberg C.D.
        • Altman D.G.
        • et al.
        A pragmatic-explanatory continuum indicator summary (PRECIS): a tool to help trial designers.
        J Clin Epidemiol. 2009; 62: 464-475
        • Loudon K.
        • Treweek S.
        • Sullivan F.
        • Donnan P.
        • Thorpe K.
        • Zwarenstein M.
        The PRECIS-2 tool: designing trials that are fit for purpose.
        BMJ. 2015; 350: h2147
        • Koppenaal T.
        • Linmans J.
        • Knottnerus J.A.
        • Spigt M.
        Pragmatic vs. explanatory: an adaptation of the PRECIS tool helps to judge the applicability of systematic reviews for daily practice.
        J Clin Epidemiol. 2011; 64: 1095-1101
        • Riddle D.L.
        • Johnson R.E.
        • Jensen M.P.
        • Keefe F.J.
        • Kroenke K.
        • Bair M.J.
        • et al.
        The Pragmatic-Explanatory Continuum Indicator Summary (PRECIS) instrument was useful for refining a randomized trial design: experiences from an investigative team.
        J Clin Epidemiol. 2010; 63: 1271-1275
        • Witt C.M.
        • Manheimer E.
        • Hammerschlag R.
        • Ludtke R.
        • Lao L.X.
        • Tunis S.R.
        • et al.
        How well do randomized trials inform decision making: systematic review using comparative effectiveness research measures on acupuncture for back pain.
        PLoS One. 2012; 7: e32399
        • Glasgow R.E.
        • Gaglio B.
        • Bennett G.
        • Jerome G.J.
        • Yeh H.C.
        • Sarwer D.B.
        • et al.
        Applying the PRECIS criteria to describe three effectiveness trials of weight loss in obese patients with comorbid conditions.
        Health Serv Res. 2011; 47: 1051-1067
        • Sanchez M.A.
        • Rabin B.A.
        • Gaglio B.
        • Henton M.
        • Elzarrad M.K.
        • Purcell P.
        • et al.
        A systematic review of eHealth cancer prevention and control interventions: new technology, same methods and designs?.
        Transl Behav Med. 2013; 3: 392-401
        • Loudon K.
        • Zwarenstein M.
        • Sullivan F.
        • Donnan P.
        • Treweek S.
        Making clinical trials more relevant: improving and validating the PRECIS tool for matching trial design decisions to trial purpose.
        Trials. 2013; 14 ([Published Online First: 2013/06/21]): 115
        • Loudon K.
        Making trials matter: providing an empirical basis for the selection of pragmatic design choices in clinical trials.
        ([Ph.D]) University of Dundee, Dundee2015
        • Chan A.W.
        • Tetzlaff J.M.
        • Altman D.G.
        • Dickersin K.
        • Moher D.
        SPIRIT 2013: new guidance for content of clinical trial protocols.
        Lancet. 2013; 381: 91-92
        • Rhon D.I.
        • Boyles R.E.
        • Cleland J.A.
        • Brown D.L.
        A manual physical therapy approach versus subacromial corticosteroid injection for treatment of shoulder impingement syndrome: a protocol for a randomised clinical trial.
        BMJ Open. 2011; 1: e000137
        • Tripathy P.
        • Nair N.
        • Mahapatra R.
        • Rath S.
        • Gope R.K.
        • Bajpai A.
        • et al.
        Community mobilisation with women's groups facilitated by Accredited Social Health Activists (ASHAs) to improve maternal and newborn health in underserved areas of Jharkhand and Orissa: study protocol for a cluster-randomised controlled trial.
        Trials. 2011; 12: 182
        • Bond J.
        • Wilson J.
        • Eccles M.
        • Vanoli A.
        • Steen N.
        • Clarke R.
        • et al.
        Protocol for north of England and Scotland study of tonsillectomy and adeno-tonsillectomy in children (NESSTAC). A pragmatic randomised controlled trial comparing surgical intervention with conventional medical treatment in children with recurrent sore throats.
        BMC Ear Nose Throat Disord. 2006; 6: 13
        • Zwarenstein M.
        • Treweek S.
        • Gagnier J.J.
        • Altman D.G.
        • Tunis S.
        • Haynes B.
        • et al.
        Improving the reporting of pragmatic trials: an extension of the CONSORT statement.
        BMJ. 2008; 337: a2390
        • Ford I.
        • Norrie J.
        Pragmatic trials.
        N Engl J Med. 2016; 375: 454-463
        • Moher D.
        • Glasziou P.
        • Chalmers I.
        • Nasser M.
        • Bossuyt P.M.
        • Korevaar D.A.
        • et al.
        Increasing value and reducing waste in biomedical research: who's listening?.
        Lancet. 2016; 387: 1573-1586