Advertisement
Original Article| Volume 153, P55-65, January 2023

A standardized measurement instrument was recommended for evaluating operator experience in complex healthcare interventions

  • Angus G.K. McNair
    Correspondence
    Corresponding author. Consultant Senior Lecturer in Colorectal Surgery, Bristol Medical School: Population Health Sciences, University of Bristol, 39 Whatley Road, Bristol, BS8 2PS, UK. Tel.: +44 117 331 3932.
    Affiliations
    National Institute for Health and Care Research Bristol Biomedical Research Centre, Bristol Centre for Surgical Research, Population Health Sciences, Bristol Medical School, University of Bristol, 39 Whatley Road, Bristol, BS8 2PS, UK

    Department of Gastrointestinal Surgery, North Bristol NHS Trust, Southmead Road, Bristol, BS10 5NB, UK
    Search for articles by this author
  • Christin Hoffmann
    Affiliations
    National Institute for Health and Care Research Bristol Biomedical Research Centre, Bristol Centre for Surgical Research, Population Health Sciences, Bristol Medical School, University of Bristol, 39 Whatley Road, Bristol, BS8 2PS, UK
    Search for articles by this author
  • Rhiannon C. Macefield
    Affiliations
    National Institute for Health and Care Research Bristol Biomedical Research Centre, Bristol Centre for Surgical Research, Population Health Sciences, Bristol Medical School, University of Bristol, 39 Whatley Road, Bristol, BS8 2PS, UK
    Search for articles by this author
  • Daisy Elliott
    Affiliations
    National Institute for Health and Care Research Bristol Biomedical Research Centre, Bristol Centre for Surgical Research, Population Health Sciences, Bristol Medical School, University of Bristol, 39 Whatley Road, Bristol, BS8 2PS, UK
    Search for articles by this author
  • Jane M. Blazeby
    Affiliations
    National Institute for Health and Care Research Bristol Biomedical Research Centre, Bristol Centre for Surgical Research, Population Health Sciences, Bristol Medical School, University of Bristol, 39 Whatley Road, Bristol, BS8 2PS, UK
    Search for articles by this author
  • Author Footnotes
    1 K.A. and S.P. are joint senior authors.
    Kerry L.N. Avery
    Footnotes
    1 K.A. and S.P. are joint senior authors.
    Affiliations
    National Institute for Health and Care Research Bristol Biomedical Research Centre, Bristol Centre for Surgical Research, Population Health Sciences, Bristol Medical School, University of Bristol, 39 Whatley Road, Bristol, BS8 2PS, UK
    Search for articles by this author
  • Author Footnotes
    1 K.A. and S.P. are joint senior authors.
    Shelley Potter
    Footnotes
    1 K.A. and S.P. are joint senior authors.
    Affiliations
    National Institute for Health and Care Research Bristol Biomedical Research Centre, Bristol Centre for Surgical Research, Population Health Sciences, Bristol Medical School, University of Bristol, 39 Whatley Road, Bristol, BS8 2PS, UK

    Bristol Breast Care Centre, North Bristol NHS Trust, Southmead Road, Bristol, BS10 5NB, UK
    Search for articles by this author
  • Author Footnotes
    1 K.A. and S.P. are joint senior authors.
Open AccessPublished:October 10, 2022DOI:https://doi.org/10.1016/j.jclinepi.2022.10.006

      Highlights

      • This study identifies, appraises, and recommends a standard measure to assess operators' experience in studies of surgical innovation.
      • Robust methodology was applied.
      • Supplemental validation used semistructured interviews with multinational and multidisciplinary professionals.
      • The SURG-TLX is preliminarily recommended because it was found to be most relevant, comprehensive, and comprehensible.
      • Routine use of a validated, standard measure to assess operators' experience supports efficient and transparent evaluation of complex interventions involving surgical innovation.

      Abstract

      Objective

      During development of complex surgical innovations, modifications occur to optimize safety and efficacy. Operators' experiences (how professionals feel undertaking the innovation) drive this process but comprehensive overviews of measures of this concept are lacking. This study identified and appraised measures to assess operators’ experience of surgical innovation.

      Study Design and Setting

      There were three phases: (1) Literature reviews identified measures of operators’ experience and concepts measured were extracted and grouped into domains. (2) Quality appraisal was conducted to assess content validity of identified instruments and was supported by COnsensus-based Standards for the selection of health Measurement Instruments methodology. Self-reported measurement instruments that had underdone formal development were eligible. Content validity was assessed using COnsensus-based Standards for the selection of health Measurement Instruments criteria for good content validity (rated sufficient/insufficient/indeterminate/inconsistent), informed by standards for measurement development and domains identified in phase 1. (3) Instruments determined suitable and of sufficient quality underwent supplemental appraisal in interviews with international multidisciplinary professionals and a focus group.

      Results

      Literature reviews identified 16 measurement instruments from 243 studies. Most assessed ‘psychological’ experiences and ‘usability’. No instrument was specifically validated for innovative surgery. Three instruments were rated ‘sufficient’ (Surgery Task Load Index [SURG-TLX]) or ‘indeterminate’ (Spielberger State-Trait Anxiety Inventory, Imperial Stress Assessment Tool). Twenty professionals were interviewed (seven female; 15 specialties; six countries) and the focus group included 10 participants (four professionals, six researchers). The SURG-TLX was considered the most relevant, comprehensive, and comprehensible instrument.

      Conclusion

      The SURG-TLX is preliminarily recommended to measure operators’ experiences of innovation. Further work exploring its role and impact on surgical innovation is required.

      Graphical abstract

      Keywords

      What is new?

        Key findings

      • This study established the SURG-TLX as the most relevant, comprehensive, and comprehensible instrument to assess operators' experience (self-reported physical, psychological, and emotional aspects) of performing innovative surgery.

        What this adds to what is known?

      • Standardized measurement of operators' experience of performing/using an innovation is lacking, hindering effective and transparent evaluation of new procedures and devices. This study identified and appraised existing measures to assess operators' experience of surgical innovation using robust methodology.

        What is the implication, what should change now?

      • The SURG-TLX is preliminarily recommended for use in studies evaluating surgical innovations. Further evaluation of other measurement properties is now needed. Routine, standardized measurement may facilitate optimization of novel procedures and devices to enable efficient innovation.

      1. Introduction

      Surgical innovations are complex and characterized by a development phase where new procedures and devices are iteratively modified and improved [
      • McCulloch P.
      • Altman D.G.
      • Campbell W.B.
      • Flum D.R.
      • Glasziou P.
      • Marshall J.C.
      • et al.
      No surgical innovation without evaluation: the IDEAL recommendations.
      ,
      • Ergina P.L.
      • Barkun J.S.
      • McCulloch P.
      • Cook J.A.
      • Altman D.G.
      IDEAL Group
      IDEAL framework for surgical innovation 2: observational studies in the exploration and assessment stages.
      ]. This refines processes and outcomes so that innovations are optimized until no further modifications are required. Theoretically, innovations progressing through the translational pathway subsequently undergo randomized evaluation to establish the effectiveness and cost-effectiveness, as illustrated in the Idea, Development, Exploration, Assessment, and Long-term follow-up (IDEAL) framework [
      • McCulloch P.
      • Altman D.G.
      • Campbell W.B.
      • Flum D.R.
      • Glasziou P.
      • Marshall J.C.
      • et al.
      No surgical innovation without evaluation: the IDEAL recommendations.
      ]. In practice, few innovations follow this incremental pathway and full evaluation in a main trial does not always occur before the procedure is widely adopted [
      • Austin D.C.
      • Torchia M.T.
      • Lurie J.D.
      • Jevsevar D.S.
      • Bell J.E.
      Mapping the diffusion of technology in orthopaedic surgery: understanding the spread of arthroscopic rotator cuff repair in the United States.
      ,
      • Mirheydar H.S.
      • Parsons J.K.
      Diffusion of robotics into clinical practice in the United States: process, patient safety, learning curves, and the public health.
      ,
      • Currie A.
      • Brigic A.
      • Blencowe N.S.
      • Potter S.
      • Faiz O.D.
      • Kennedy R.H.
      • et al.
      Systematic review of surgical innovation reporting in laparoendoscopic colonic polyp resection.
      ,
      • Hoffmann C.
      • Macefield R.C.
      • Wilson N.
      • Blazeby J.M.
      • Avery K.N.L.
      • Potter S.
      • et al.
      A systematic review and in-depth analysis of outcome reporting in early phase studies of colorectal cancer surgical innovation.
      ,
      • Macefield R.C.
      • Wilson N.
      • Hoffmann C.
      • Blazeby J.M.
      • McNair A.G.K.
      • Avery K.N.L.
      • et al.
      Outcome selection, measurement and reporting for new surgical procedures and devices: a systematic review of IDEAL/IDEAL-D studies to inform development of a core outcome set.
      ,
      • Khachane A.
      • Philippou Y.
      • Hirst A.
      • McCulloch P.
      Appraising the uptake and use of the IDEAL Framework and Recommendations: a review of the literature.
      ]. A key factor influencing the development and uptake of a new procedure is the experience delivering the innovation (operator experience). Positive and negative experiences shape the development process. For example, physical hardship caused by poor ergonomics may inspire a device to be redesigned [
      • Choi J.D.
      • Park J.W.
      • Lee H.W.
      • Lee D.G.
      • Jeong B.C.
      • Jeon S.S.
      • et al.
      A comparison of surgical and functional outcomes of robot-assisted versus pure laparoscopic partial nephrectomy.
      ,
      • Alleblas C.C.J.
      • de Man A.M.
      • van den Haak L.
      • Vierhout M.E.
      • Jansen F.W.
      • Nieboer T.E.
      Prevalence of musculoskeletal disorders among surgeons performing minimally invasive surgery.
      ,
      • Dalager T.
      • Søgaard K.
      • Bech K.T.
      • Mogensen O.
      • Jensen P.T.
      Musculoskeletal pain among surgeons performing minimally invasive surgery: a systematic review.
      ]. Similarly, psychological stress created from a highly complex procedure may prompt improvement by simplifying tasks [
      • Alam M.
      • Roongpisuthipong W.
      • Kim N.A.
      • Goyal A.
      • Swary J.H.
      • Brindise R.T.
      • et al.
      Utility of recorded guided imagery and relaxing music in reducing patient pain and anxiety, and surgeon anxiety, during cutaneous surgical procedures: a single-blinded randomized controlled trial.
      ,
      • Arora S.
      • Sevdalis N.
      • Nestel D.
      • Woloshynowych M.
      • Darzi A.
      • Kneebone R.
      The impact of stress on surgical performance: a systematic review of the literature.
      ]. It is expected that the introduction of new procedures will require additional effort, risks, and uncertain benefits compared to routine care [
      • Ergina P.L.
      • Cook J.A.
      • Blazeby J.M.
      • Boutron I.
      • Clavien P.A.
      • Reeves B.C.
      • et al.
      Challenges in evaluating surgical innovation.
      ]. Operators’ perception of these risks and potential benefits are viewed through the lens of their experiences and consequently influence their willingness to pursue the development and adoption of innovation.
      Measuring operators' experience is therefore integral to understanding how and why surgical innovations are developed and to explore a subsequent uptake of innovations. Efforts have been made to capture operators' experience of surgery. Typically, these include observer or self-reported measures of physical, psychological, and emotional experiences in routine care. Evaluation of operators' experiences of innovative surgery, however, is inconsistent and lacks a standard measurement instrument [
      • Hoffmann C.
      • Macefield R.C.
      • Wilson N.
      • Blazeby J.M.
      • Avery K.N.L.
      • Potter S.
      • et al.
      A systematic review and in-depth analysis of outcome reporting in early phase studies of colorectal cancer surgical innovation.
      ,
      • Macefield R.C.
      • Wilson N.
      • Hoffmann C.
      • Blazeby J.M.
      • McNair A.G.K.
      • Avery K.N.L.
      • et al.
      Outcome selection, measurement and reporting for new surgical procedures and devices: a systematic review of IDEAL/IDEAL-D studies to inform development of a core outcome set.
      ]. This might hinder evidence synthesis, prevent shared learning between investigators, and slow development cycles [
      • McCulloch P.
      • Altman D.G.
      • Campbell W.B.
      • Flum D.R.
      • Glasziou P.
      • Marshall J.C.
      • et al.
      No surgical innovation without evaluation: the IDEAL recommendations.
      ,
      • McCulloch P.
      • Feinberg J.
      • Philippou Y.
      • Kolias A.
      • Kehoe S.
      • Lancaster G.
      • et al.
      Progress in clinical research in surgery and IDEAL.
      ,
      • Yu J.
      • Shan F.
      • Hirst A.
      • McCulloch P.
      • Li Y.
      • Sun X.
      Identifying research waste from surgical research: a protocol for assessing compliance with the IDEAL framework and recommendations.
      ]. This study aims to identify, critically appraise, and recommend a measure of operators’ experience of performing innovative surgery to inform efficient and systematic evaluation of innovation.

      2. Methods

      Methods were informed by COnsensus-based Standards for the selection of health Measurement Instruments (COSMIN) guidelines for systematic reviews of outcome measurement instruments that were modified to design this study [
      • Prinsen C.A.C.
      • Mokkink L.B.
      • Bouter L.M.
      • Alonso J.
      • Patrick D.L.
      • de Vet H.C.W.
      • et al.
      COSMIN guideline for systematic reviews of patient-reported outcome measures.
      ]. These guidelines provide a framework to generate a comprehensive overview of the quality of measurement instruments to support evidence-based recommendations for the selection of the most suitable instrument for a given purpose. There were three phases: (1) identification of measurement instruments and development of a conceptual framework, (2) appraisal of instrument quality, and (3) supplemental appraisal of content validity in the context of surgical innovation. A flow chart illustrating the study design is presented in Figure 1.

      2.1 Definitions

      Operators' experience is defined as the self-reported perception of performing an invasive procedure. It may be unidimensional (measuring only one concept) or multidimensional (measuring multiple concepts) and includes, but is not limited to, physical (e.g., comfort), psychological (e.g., mental complexity), and emotional (e.g., anxiety) experiences. Self-reported perceived competence is included; however, excluded are observer assessed measures of competence (e.g., analysis of learning curve). Published definitions for an ‘invasive procedure,’ ‘innovative procedure,’ ‘operator,’ and ‘outcome’ are used and are all provided in Supplemental File 1.

      2.2 Phase 1: identification of measurement instruments and development of a conceptual framework

      This study used multiple data sources to identify measures of operator experience in studies of early phase innovations and develop a framework of concepts being measured in the context of surgical innovation. This approach was used because scoping work revealed that traditional systematic literature search strategies would not identify relevant articles because key wording with the subject of interest is not available. Three literature reviews were therefore undertaken that were designed to identify (1) author-reported IDEAL innovation studies, (2) studies of known innovative devices from a broad range of medical disciplines, and (3) a sample of studies of colorectal cancer surgical innovation. Detailed methods and results for each review are described elsewhere [
      • Hoffmann C.
      • Macefield R.C.
      • Wilson N.
      • Blazeby J.M.
      • Avery K.N.L.
      • Potter S.
      • et al.
      A systematic review and in-depth analysis of outcome reporting in early phase studies of colorectal cancer surgical innovation.
      ,
      • Macefield R.C.
      • Wilson N.
      • Hoffmann C.
      • Blazeby J.M.
      • McNair A.G.K.
      • Avery K.N.L.
      • et al.
      Outcome selection, measurement and reporting for new surgical procedures and devices: a systematic review of IDEAL/IDEAL-D studies to inform development of a core outcome set.
      ,
      • Avery K.
      • Blazeby J.
      • Wilson N.
      • Macefield R.
      • Cousins S.
      • Main B.
      • et al.
      Development of reporting guidance and core outcome sets for seamless, standardised evaluation of innovative surgical procedures and devices: a study protocol for content generation and a Delphi consensus process (COHESIVE study).
      ].
      Data sources were supplemented by targeted searches for innovation studies and a scoping search for existing systematic reviews of measures of surgeon experience used in routine surgery and snowball searches of reference lists. Search terms for ‘invasive procedures’ and ‘measurement instruments’ were combined with a systematic review filter and applied to the Ovid version of MEDLINE with no restrictions. Included were systematic reviews of studies measuring operator experience. Excluded were nonhuman and non-English language articles.
      Self-reported measurement instruments were selected based on the presence of a development paper, defined by COSMIN as any “qualitative or quantitative study that were performed to develop a measurement instrument, including pilot testing of a draft measurement instrument, concept elicitation, and/or testing of a new measurement instrument” [
      • Terwee C.B.
      • Prinsen C.A.
      • Chiarotto A.
      • Cw De Vet H.
      • Bouter L.M.
      • Marjan J.A.
      • et al.
      COSMIN methodology for assessing the content validity of PROMs User manual version 1.0;.
      ]. Development papers were obtained and snowball reference list searching was used to further identify articles of relevance.

      2.2.1 Data extraction and analysis

      Outcomes and measurement instruments relevant to operators’ experience were extracted from data sources verbatim through line-by-line coding, including details of measurement items and scales. Verbatim outcomes, measurement items, and scales were categorized into conceptual domains by two researchers independently. Conceptual domains were summarized to create a framework of concepts being measured in the context of surgical innovation to inform the appraisal of instrument quality in phase 2.
      Characteristics of measurement instruments that underwent formal development were obtained from development papers and summarized using descriptive statistics including number of items (single or multi-item), number and description of dimensions, and scope. The scope of instruments is described as generic (designed to apply in healthcare and nonhealthcare contexts), healthcare-specific (designed to apply in any healthcare context), surgery-specific (designed to apply to any invasive procedure), and technique-specific (designed to apply in specific surgical techniques). All self-reported measurement instruments which underwent formal development were eligible to be brought forward to phase 2.

      2.3 Phase 2: appraisal of instrument quality

      Quality of identified measurement instruments was appraised to determine which instruments are suitable and of sufficient quality to be taken forward into phase 3. Content validity was evaluated because guidelines consider it the most important measurement property to ensure the instrument is relevant, comprehensive, and comprehensible as to the construct of interest and target population [
      • Mokkink L.B.
      • Terwee C.B.
      • Knol D.L.
      • Stratford P.W.
      • Alonso J.
      • Patrick D.L.
      • et al.
      The COSMIN checklist for evaluating the methodological quality of studies on measurement properties: a clarification of its content.
      ]. COSMIN methodology was used to support quality appraisal [
      • Terwee C.B.
      • Prinsen C.A.
      • Chiarotto A.
      • Cw De Vet H.
      • Bouter L.M.
      • Marjan J.A.
      • et al.
      COSMIN methodology for assessing the content validity of PROMs User manual version 1.0;.
      ]. COSMIN methodology was developed to review patient-reported outcome measures and it was adapted to this setting.
      Each measurement instrument underwent two assessments that were summarized to inform a single overall quality rating. Steps and deviations from COSMIN methodology are detailed below.

      2.3.1 Assessment of the quality of the development paper

      The first assessment rated the quality of the measurement instrument development paper. Two reviewers (A.M. and C.H.) independently evaluated the quality of instrument development using 35 COSMIN standards that were rated 'very good,' 'adequate,' 'doubtful,' or 'inadequate'.

      2.3.2 Evaluation of the content validity of the measurement instrument

      Results from the first assessment informed ratings of measurement instrument development against the 10 criteria for good content validity described by COSMIN [
      • Prinsen C.A.C.
      • Mokkink L.B.
      • Bouter L.M.
      • Alonso J.
      • Patrick D.L.
      • de Vet H.C.W.
      • et al.
      COSMIN guideline for systematic reviews of patient-reported outcome measures.
      ,
      • Terwee C.B.
      • Prinsen C.A.C.
      • Chiarotto A.
      • Westerman M.J.
      • Patrick D.L.
      • Alonso J.
      • et al.
      COSMIN methodology for evaluating the content validity of patient-reported outcome measures: a Delphi study.
      ]. Ratings considered the context, construct, and population of interest as described in the relevant development paper. Reviewers independently assigned ‘sufficient’ (+), ‘insufficient’ (−), ‘indeterminate’ (?), or ‘inconsistent’ (±) ratings for each measurement instrument (Step 3a).
      A second assessment rated the measurement instrument. Two reviewers independently evaluated the content of each instrument against the construct (conceptual framework of surgical innovation developed in phase 1), population (surgeon innovators), and context (surgical innovation) of interest. Ratings for each instrument against COSMIN's 10 criteria were provided as above.
      Individual reviewer ratings were then reconciled in discussions between authors to produce combined ratings for assessments 1 (rating of development paper) and 2 (rating of the measurement instrument in the context of surgical innovation).

      2.3.3 Selection of measurement instruments for supplemental validation

      In a final step, all ratings for each measurement instrument were reviewed jointly by the two reviewers who qualitatively summarized data to subjectively rank instruments as per which was considered a suitable and sufficiently high-quality measure to assess operators' experience for surgical innovation. Discrepancies between reviewers' ratings were resolved in discussions with the wider study team. A collective review of the reviewers’ ranking of instruments by the multidisciplinary study team alongside further discussions informed decisions on which instruments to bring forward for supplemental appraisal of content validity in phase 3.

      2.4 Phase 3: supplemental appraisal of content validity in the context of surgical innovation

      Measures brought forward from phase 2 underwent further appraisal to explore any deficiencies in content validity identified during quality appraisal. Semistructured interviews with multinational operators with experience of surgical innovation considered whether the instruments' content are adequate reflections of operators' experience of surgical innovation (as defined by the conceptual framework and stakeholders’ own experience) by exploring their relevance, comprehensiveness, and comprehensibility and views on the most suitable measure for clinical use. Interviews were conducted over video conferencing software (Zoom, MS Teams) by two researchers (A.M., colorectal surgeon and C.H., social scientist) trained and experienced in qualitative research and with diverse backgrounds to enable triangulation. A topic guide was created and piloted to ensure discussions covered predefined areas of interest while being applied flexibly to allow participants freedom to explore new topics. Any arising deviant views were actively explored. A purposive sampling strategy was implemented to ensure participants represented the target population (i.e., operators with experience of surgical innovation) and to maximize variation in participant characteristics by gender, geographic location, experience with surgical innovation, and professional self-described clinical specialty. Interview participants were identified through the authors’ personal network and approached by personal invitation. Sample size was flexible and iterative rounds of data collection and analysis continued until no new opinions emerged. Interviews were supplemented by one interdisciplinary focus group comprising of U.K.-based professionals and researchers identified through convenience sampling. Two facilitators (A.M. and C.H.) led the group discussion based on the interview topic guide.
      Interviews and the focus group were audio-recorded and transcribed. Principles of thematic analysis were applied using a framework approach [
      • Srivastava A.
      • Thomson S.B.
      Framework Analysis: A Qualitative Methodology for Applied Policy Research.
      ], whereby transcripts were read and reread for familiarization, line-by-line coding undertaken to assign meaning to relevant text, themes were identified by collating similar codes and revised through a process of constant comparison with new data, and discussion with the study team. The analysis primarily focused on the framework of a priori topics described above; however, an inductive approach was also undertaken to allow any new themes to emerge from the data. Results are presented by theme.

      2.4.1 Ethical approval

      Ethical approval was granted by the University of Bristol Faculty of Health Sciences Research Ethics Committee (ref: 56,522). A written informed consent was obtained from all professional participants involved in qualitative interviews.

      3. Results

      3.1 Phase 1: identification of measurement instruments

      A total of 243 studies were included from multiple data sources including 48 author-reported IDEAL studies [
      • Macefield R.C.
      • Wilson N.
      • Hoffmann C.
      • Blazeby J.M.
      • McNair A.G.K.
      • Avery K.N.L.
      • et al.
      Outcome selection, measurement and reporting for new surgical procedures and devices: a systematic review of IDEAL/IDEAL-D studies to inform development of a core outcome set.
      ], 128 studies of innovative devices, 51 randomly sampled studies of innovative studies [
      • Hoffmann C.
      • Macefield R.C.
      • Wilson N.
      • Blazeby J.M.
      • Avery K.N.L.
      • Potter S.
      • et al.
      A systematic review and in-depth analysis of outcome reporting in early phase studies of colorectal cancer surgical innovation.
      ], and 16 from supplemental searches, including one systematic review [
      • Arora S.
      • Sevdalis N.
      • Nestel D.
      • Woloshynowych M.
      • Darzi A.
      • Kneebone R.
      The impact of stress on surgical performance: a systematic review of the literature.
      ] (Fig. 2).
      Line-by-line coding identified 304 verbatim outcomes related to operators’ experience of innovation. For most outcomes (281, 92% of total outcomes extracted), no further detail of how they were measured was provided. For example, verbatim outcomes described that the innovation was “easy to learn” [
      • Koedam T.W.A.
      • Veltcamp Helbach M.
      • Penna M.
      • Wijsmuller A.
      • Doornebosch P.
      • van Westreenen H.L.
      • et al.
      Short-term outcomes of transanal completion total mesorectal excision (cTaTME) for rectal cancer: a case-matched analysis.
      ] or that “difficulty during surgery was evaluated” [
      • Vidya R.
      • Cawthorn S.J.
      Muscle-sparing ADM-assisted breast reconstruction technique using complete breast implant coverage: a dual-institute UK-based experience.
      ] but without any details of how this was assessed or whether a measurement instrument was used.
      There were 21 measurement instruments identified which underwent formal development. No measurement instrument was used more than once. Instruments contained at a total of 146 items (median 9, range 1 to 40). The most frequently measured conceptual domains were ‘Psychology’ (112 items across 13 instruments), followed by ‘Usability’ (34 items across three instruments). ‘Physical comfort’ was represented in three measurement instruments. The conceptual framework, derived from 304 verbatim outcomes and 146 measurement items, is summarized in Table 1.
      Table 1Conceptual framework of operator experience (number of measurement items per concept) with examples of verbatim outcomes and measurement items
      Psychology (n = 119)Physical comfort (n = 18)Usability (n = 41)
      ExamplesExamplesExamples
      Coping with pressureShoulder stiffnessEasy to harvest
      Surgeon's anxietySubjective ergonomic stress factorsTechnically very challenging
      Perceived exertionImpact on ergonomicsSimple to perform
      Mental strainHand painExcellent vision
      Surgeon's wellbeingPhysical demandsProblematic points
      Of 21 identified measurement instruments, five were excluded for further review in phase 2 because they were not self-reported measures and three development papers could not be obtained. Of the remaining 16 measurement instruments, most were multidimensional (e.g., physical and psychological; n = 9) and had a generic (n = 7) or surgery-specific (n = 5) scope (Supplemental File 1, Table S.2). No single measurement instrument was developed or had been evaluated in the context of surgical innovation or for surgeon innovators nor did any measurement specifically measure the construct ‘operators’ experience.’

      3.2 Phase 2: appraisal of instrument quality

      Self-reported measurement instruments (n = 16) underwent assessment of content validity. Summary results are presented in Table 2, with detailed ratings of all 35 COSMIN standards and 10 COSMIN criteria displayed in Supplemental File 1. Most instruments were rated ‘insufficient’ (N = 11, 66.7%). The Surgery Task Load Index (SURG-TLX) [
      • Wilson M.R.
      • Poolton J.M.
      • Malhotra N.
      • Ngo K.
      • Bright E.
      • Masters R.S.W.
      Development and validation of a surgical workload measure: The surgery task load index (SURG-TLX).
      ] was the only measurement instrument for which sufficient content validity could be established. This instrument measures six concepts (physical, mental and temporal demands, distractions, situational stress, and task complexity). There was, however, insufficient evidence to determine comprehensiveness in the context of surgical innovation.
      Table 2Summary of content validity ratings
      Measurement instrument
      Order of instruments represent subjective ranking as per which instrument was considered a suitable and sufficiently high-quality measure to assess operators' experience for surgical innovation.
      Relevance
      Rating: +, sufficient; ±, inconsistent; -, insufficient; ?, indeterminate; based on qualitative summary of results from ratings of the quality of the development paper and reviewers' own rating.
      Comprehensiveness
      Rating: +, sufficient; ±, inconsistent; -, insufficient; ?, indeterminate; based on qualitative summary of results from ratings of the quality of the development paper and reviewers' own rating.
      Comprehensibility
      Rating: +, sufficient; ±, inconsistent; -, insufficient; ?, indeterminate; based on qualitative summary of results from ratings of the quality of the development paper and reviewers' own rating.
      Rating of development paperReviewer ratingRating of development paperReviewer ratingRating of development paperReviewer rating
      SURG-TLX
      Measurement instruments taken forward to phase 3.
      +++??+
      STAI
      Measurement instruments taken forward to phase 3.
      ???-?+
      ISAT
      Measurement instruments taken forward to phase 3.
      ???-?+
      NASA-TLX+?+??+
      HFEQ-CASS±?????
      SUS+±--?±
      GEARS?---??
      GOALS+-?-??
      STEEM/OREEM?-?-?+
      UMUX+±?-?-
      SMEQ???-??
      MRQ?????-
      NOTSS+-±-?+
      Borg Scale???-?-
      SWAT???-??
      BPD/LED----?-
      SURG-TLX, The Surgery Task Load Index, STAI, State-Trait Anxiety Inventory; ISAT, The Imperial Stress Assessment Tool; HFEQ-CASS, Human Factors Evaluation Questionnaire for Computer Assisted Surgery Systems; NASA-TLX, National Aeronautics and Space Administration task load index; GEARS, Global Evaluative Assessment of Robotic Skills; MRQ, Multiple Resource Questionnaire; NOTSS, Nontechnical skills for surgeons; SMEQ, Subjective Mental Effort Questionnaire; SWAT, Subjective Workload Assessment Technique; STEEM, Surgical Theater Educational Environment Measure; OREEM, Operating Room Educational Environment Measure; SUS, System Usability Scale; GOALS, The Global Operative Assessment of Laparoscopic Skills; UMUX, The Usability Metric for User Experience; BPD, Body Part Discomfort scale; LED, The Local Experienced Discomfort [
      • Vidya R.
      • Cawthorn S.J.
      Muscle-sparing ADM-assisted breast reconstruction technique using complete breast implant coverage: a dual-institute UK-based experience.
      ,
      • Wilson M.R.
      • Poolton J.M.
      • Malhotra N.
      • Ngo K.
      • Bright E.
      • Masters R.S.W.
      Development and validation of a surgical workload measure: The surgery task load index (SURG-TLX).
      ,
      • Arora S.
      • Tierney T.
      • Sevdalis N.
      • Aggarwal R.
      • Nestel D.
      • Woloshynowych M.
      • et al.
      The imperial stress assessment tool (ISAT): a feasible, reliable and valid approach to measuring stress in the operating room.
      ,
      • Manzey D.
      • Röttger S.
      • Bahner-Heyne J.E.
      • Schulze-Kissing D.
      • Dietz A.
      • Meixensberger J.
      • et al.
      Image-guided navigation: the surgeon’s perspective on performance consequences and human factors issues.
      ,
      • Goh A.C.
      • Goldfarb D.W.
      • Sander J.C.
      • Miles B.J.
      • Dunkin B.J.
      Global evaluative assessment of robotic skills: validation of a clinical assessment tool to measure robotic surgical skills.
      ,
      • Lemon J.
      • Cooper J.
      • Defres S.
      • Easton A.
      • Sadarangani M.
      • Griffiths M.J.
      • et al.
      Understanding parental perspectives on outcomes following paediatric encephalitis: a qualitative study.
      ,
      • Raison N.
      • Wood T.
      • Brunckhorst O.
      • Abe T.
      • Ross T.
      • Challacombe B.
      • et al.
      Development and validation of a tool for non-technical skills evaluation in robotic surgery—the ICARS system.
      ,
      • Zijlstra F.R.H.
      Efficiency in Work Behaviour.
      ,
      • Reid G.B.
      • Nygren T.E.
      The subjective Workload assessment technique: a scaling procedure for measuring mental Workload.
      ,
      • Cassar K.
      Development of an instrument to measure the surgical operating theatre learning environment as perceived by basic surgical trainees.
      ,
      • Finstad K.
      The usability metric for user experience.
      ,
      • Brooke J.
      SUS: a “quick and dirty” usability scale.
      ,
      • Corlett E.N.
      • Bishop R.P.
      A technique for assessing postural discomfort.
      ,
      • Hart S.G.
      • Staveland L.E.
      Development of NASA-TLX (task load index): results of empirical and theoretical research.
      ,
      • Marteau T.M.
      • Bekker H.
      The development of a six-item short-form of the state scale of the Spielberger State—trait Anxiety Inventory (STAI).
      ,
      • Vassiliou M.C.
      • Feldman L.S.
      • Andrew C.G.
      • Bergman S.
      • Leffondré K.
      • Stanbridge D.
      • et al.
      A global assessment tool for evaluation of intraoperative laparoscopic skills.
      ,
      • Boles D.B.
      • Adair L.P.
      The multiple resources Questionnaire (MRQ).
      ,
      • Yule S.
      • Flin R.
      • Paterson-Brown S.
      • Maran N.
      • Rowley D.
      Development of a rating system for surgeons’ non-technical skills.
      ,
      • Borg G.
      Perceived exertion as an indicator of somatic stress.
      ].
      a Order of instruments represent subjective ranking as per which instrument was considered a suitable and sufficiently high-quality measure to assess operators' experience for surgical innovation.
      b Rating: +, sufficient; ±, inconsistent; -, insufficient; ?, indeterminate; based on qualitative summary of results from ratings of the quality of the development paper and reviewers' own rating.
      c Measurement instruments taken forward to phase 3.
      Two measurement instruments were considered of inconsistent quality: the Spielberger State-Trait Anxiety Inventory (STAI) [
      • Marteau T.M.
      • Bekker H.
      The development of a six-item short-form of the state scale of the Spielberger State—trait Anxiety Inventory (STAI).
      ] and the Imperial Stress Assessment Tool (ISAT) [
      • Arora S.
      • Tierney T.
      • Sevdalis N.
      • Aggarwal R.
      • Nestel D.
      • Woloshynowych M.
      • et al.
      The imperial stress assessment tool (ISAT): a feasible, reliable and valid approach to measuring stress in the operating room.
      ]. Comprehensiveness of both instruments was rated as insufficient, with indeterminant and inconsistent relevance and sufficient and indeterminant comprehensibility, respectively. Similarities in rating of these instruments were expected as the STAI forms one dimension of the ISAT with the addition of two physiological measures (cortisol and heart rate). The SURG-TLX, STAI, and ISAT progressed to phase 3; however, the STAI was presented within the ISAT to avoid duplication.

      3.3 Phase 3: supplemental appraisal of content validity in the context of surgical innovation

      Interviews were conducted between July and November 2020 and lasted between 30 and 45 minutes. A total of 20 professionals (7, 35% female) participated from a range of surgical specialties internationally (Supplemental File 1, Table S.5). The focus group was conducted in August 2020 and included a total of 10 multidisciplinary professions. Surgeons (N = 4), methodologists (N = 1), and academics (N = 5) with a background in surgical innovation and design of complex surgical interventions research discussed relevance, comprehensiveness, and comprehensibility of the instruments during a 60-minute virtual meeting. Table 3, Table 4 provide illustrative quotes supporting the a priori and emergent themes. A comprehensive report of primary data is presented in Supplemental File 2.
      Table 3A priori themes and supporting quotations describing views on operators’ experience measurement instruments in the context of surgical innovation, by instrument. Participant identification numbers in square brackets. Abridged text is indicated by ellipsis
      ThemeInstrumentSupporting quotations
      RelevanceSurg-TLX“So these seem to be perfectly reasonable categories if you are trying to judge the impact on a surgeon” [P35].
      “I have undergone many innovations over my 30 years in practice in surgery, and I can tell you every new procedure was more demanding … I would say these are the relevant aspects you are focused when you perform a new technique, a new procedure” [P28].
      ISAT“I think there are so many confounders, and I am not sure whether that would be specific enough to the innovation…I am not sure cortisol and heart rate would add more than just asking with a questionnaire how stressed you are. Because your heart rate is going to vary” [P37].
      ComprehensivenessSurg-TLX“I mean, those are certainly the things that I would think about when I am thinking about doing something different or new” [P14].
      “I am trying to think of every scenario and I think it works” [P24].
      ISAT“So based on that I would be very cautious of having a tool that only focusses on anxiety and stress because [compared to the SURG-TLX] you are sort of saying it is more than that and then ignoring the rest” [WP9].
      ComprehensibilitySurg-TLX“I think that it is all pretty clear actually [
      • Lemon J.
      • Cooper J.
      • Defres S.
      • Easton A.
      • Sadarangani M.
      • Griffiths M.J.
      • et al.
      Understanding parental perspectives on outcomes following paediatric encephalitis: a qualitative study.
      ]. Easy. I do understand. I get it” [P15].

      “….I think between the temporal demands and distractions, these are two domains that are difficult and it may not be capturing what you want to capture” [P24].
      ISAT“I would have problems to differentiate between calm, tense, upset, relaxed, content, and worried…between all these fine…nuances” [P18].

      “Well, I think in an experimental setting this may make some sense to, kind of, correlate physiology with qualitative data and maybe to understand what it is happening to someone in real time. I guess, again, practicality, pertinence, I just… it is hard to I think, kind of, make it all fit” [P21].
      Instrument suitabilitySurg-TLX“The SURG-TLX would be my preferred metric or evaluation tool as compared to the other one” [P20].
      “The Surgical Task Load Index seems to be much more comprehensive in nature and much more pertinent to the topic at hand, so I would say that by a long shot” [P21].
      ISAT“Just from the pragmatic perspective, are we really going to be recommending a tool which suggests that you are going to have to capture cortisol and heart rate? I think realistically … I mean yes in the perfect world but this seems to be much more of a research tool to be honest” [WP6].
      Table 4Emergent themes and supporting quotations describing views on operator experience measurement instruments in the context of surgical innovation. Participant identification numbers in square brackets. Abridged text is indicated by ellipsis
      Emergent themesSupporting quotations
      Procedures occur in stages“So it is no longer just the global procedure and getting a score for everything or getting feedback for everything, it is start to think about how can we break down those steps of the procedure or device procedure into phases and steps that you can then really finesse which parts and which phases of the procedure we[re] particularly difficult and complex” [P24].
      Patient complexity“I think somehow you have got to be able to know that within that procedure, the general question about was it an average procedure? Or was it more difficult? Or more not? Nothing to do with the instrument but about the patient themself” [P19].
      Impact of wider operating team“I think it is important to ask different persons or people from the team” [P18].
      Baseline proficiency“If I am thinking about in the context of a new or an innovative procedure, I am always going to compare how difficult the innovative procedure is compared to whatever the standard is that I've been doing” [P14].

      “But also this will be influenced by surgeon's baseline skill and competency. So it is not a standard baseline for everyone” [P15].
      Baseline attitudes toward innovation“I might start the operation going I do not really want to use this, I am anxious about it, I am stressed about it, new stapler and I only like my stapler… vs. I am very excited about using this piece of kit because I think it is better than the last one and I cannot wait to use it… So there are two completely different mindsets which would affect my subconsciously affect the scoring of all of it” [P19].
      Baseline emotional factors“One of the things that is the sort of unspoken no-nos, that all of those things that are scored there are affected by my own mental health and what is going on in my own life… because a, when I am feeling bad and miserable and upset with other stuff an operation will feel more difficult and will annoy me more when it goes wrong or when there are issues in it” [P19].
      Changes over time“It is not going to be the same for the same person at any point in time. Because that person's skill will change with time as well” [P15].
      Trustworthiness of assessments“We need better understanding of how surgeons are actually impacted, because I do not think that all surgeons or their subjective assessments are necessarily trustworthy” [P16].

      3.3.1 Relevance

      Overall, participants explicitly noted the high relevance of the six concepts measured by the SURG-TLX. Nine (45%) provided unprompted support for the relevance of all six concepts measured by the SURG-TLX to measure operators’ experience with innovation. Half used examples to illustrate the relevance of mental demands, physical demands, task complexity, and situational stress to innovation without prompting. Task complexity was often referred to as the most relevant concept. Temporal demands and distractions were described as least relevant by five (25%) participants. The SURG-TLX was considered equally relevant to new procedures and devices.
      The relevance of the ISAT's cortisol and heart rate measurement to surgical innovation was questioned by the majority (11, 55%). Participants highlighted practical difficulties in measurement and interpretation. Similar doubts were expressed about the relevance of self-reported anxiety to surgical innovation.

      3.3.2 Comprehensiveness

      All participants agreed that the SURG-TLX was comprehensive and that concepts generally “capture most of the themes associated with a new procedure” [P29]. Conversely, participants viewed ISAT as insufficiently comprehensive because of the focus on stress and anxiety. Two additional concepts emerged that were not addressed by either measure. Five (25%) professionals described a need to capture overall satisfaction with the innovation. Similarly, six (30%) participants described the value of measuring “usability” of devices.

      3.3.3 Comprehensibility

      Few concerns were raised about the comprehensibility of either instrument. Two participants described difficulties understanding the SURG-TLX item ‘temporal demands’ and how it related to innovation rather than surgery in general.

      3.3.4 Subjective instrument suitability for practical use

      All participants described the SURG-TLX as the more suitable measurement instrument because it was perceived as more relevant and comprehensive, provided richer information about operator experience, and data collection was thought to be easier or more practical. Many (10, 50%) professionals felt the physiological components of the ISAT were “more objective” and two (10%) thought it may be a useful research tool but still favored the SURG-TLX for the routine evaluation of surgical innovation.

      3.3.5 Emergent themes

      There were nine themes evident from the data that did not fit within the a priori framework (Table 4). Professionals discussed how several contextual factors in a real-world setting may influence the subjective experience with surgical innovation. For example, procedures occur in stages, only some of which may be innovative, and operator experience may be influenced by patient complexity and the wider operating team/environment. Professionals also felt that baseline attitudes toward the innovation, emotional factors, and proficiency were important to consider and that these are likely to evolve over time. Finally, two participants questioned the trustworthiness of professionals completing subjective self-assessments when the results could be reported to the wider surgical community.

      4. Discussion

      This study comprehensively identified measures of operator experience from 243 source documents across diverse surgical disciplines and innovations. A total of 16 self-reported measurement instruments underwent detailed quality appraisal, of which three met criteria for supplemental appraisal in the context of surgical innovation. Interviews and a focus group with professionals from a range of disciplines in North America, Europe, and Australia demonstrated that the SURG-TLX was relevant, comprehensive, comprehensible, and perceived a suitable measurement instrument to measure operators’ experience of surgical innovation in clinical practice. It is therefore recommended that the SURG-TLX is used in studies of surgical innovation to enable their systematic and transparent evaluation, although further understanding of its performance and value is still needed.
      Measuring and understanding operators’ experience of innovation is consistent with recommended methods for the development and evaluation of novel invasive procedures and devices [
      • Macefield R.C.
      • Wilson N.
      • Hoffmann C.
      • Blazeby J.M.
      • McNair A.G.K.
      • Avery K.N.L.
      • et al.
      Outcome selection, measurement and reporting for new surgical procedures and devices: a systematic review of IDEAL/IDEAL-D studies to inform development of a core outcome set.
      ,
      • Avery K.
      • Blazeby J.
      • Wilson N.
      • Macefield R.
      • Cousins S.
      • Main B.
      • et al.
      Development of reporting guidance and core outcome sets for seamless, standardised evaluation of innovative surgical procedures and devices: a study protocol for content generation and a Delphi consensus process (COHESIVE study).
      ]. The IDEAL framework describes the process by which interventions move from first-in-human studies, through development phases before definitive randomized evaluation and long-term monitoring. Key to the early phases of this process is a detailed understanding of the innovation to identify when and how modifications are necessary to drive optimization. Feedback from professionals about the physical and psychological experience of using novel procedures and devices may help identify beneficial modifications or better characterize the root cause of complication or failings. In later phases, where innovations have stabilized and no further modifications are necessary, measuring operators’ experience may provide some indication of when novice surgeons have become comfortable and achieved some level of proficiency [
      • Ruiz-Rabelo J.F.
      • Navarro-Rodriguez E.
      • Di-Stasi L.L.
      • Diaz-Jimenez N.
      • Cabrera-Bermon J.
      • Diaz-Iglesias C.
      • et al.
      Validation of the NASA-TLX score in ongoing assessment of mental Workload during a laparoscopic learning curve in bariatric surgery.
      ]. Selecting a suitable measurement instrument will ensure data collection is standardized and easily comparable. It may also benefit the translational pathway because innovative procedures that lead to ‘good operator experience’ are more likely to be subsequently used or undergo full evaluation.
      This study used robust methodology in accordance with international guidelines [
      • Terwee C.B.
      • Prinsen C.A.
      • Chiarotto A.
      • Cw De Vet H.
      • Bouter L.M.
      • Marjan J.A.
      • et al.
      COSMIN methodology for assessing the content validity of PROMs User manual version 1.0;.
      ,
      • Mokkink L.B.
      • Terwee C.B.
      • Knol D.L.
      • Stratford P.W.
      • Alonso J.
      • Patrick D.L.
      • et al.
      The COSMIN checklist for evaluating the methodological quality of studies on measurement properties: a clarification of its content.
      ], but there are some weaknesses. Identification of studies of surgical innovation is challenging, and multiple targeted reviews were used to overcome limitations of a traditional search methodology in this context. It is possible that measurement instruments were missed but it is unlikely that any additional instruments would significantly alter the conceptual framework that underpinned the appraisal process.
      We modified COSMIN methodology for assessing content validity of measurement instruments and this may have impacted on the results. Step 3 b, for example, was modified to include a subjective judgment summarizing all ratings for the evaluation of content validity to select suitable and sufficiently high-quality instruments to bring forward to interviews. It is possible that application of unmodified COSMIN methods would have brought forward more instruments for interview but it was anticipated that these would have been less relevant, comprehensive, and comprehensible to the context of surgical innovation. Interviews in phase 3 also identified themes relevant to operators’ experience not represented in the conceptual framework developed in phase 1. Refining the framework to include these themes may have caused some minor alteration to ratings in phase 2 but are not likely to have significantly changed the results. Supplemental validation of highest quality instruments was completed using interviews with operators from a range of locations and specialties but it was limited to professionals from high-income anglophone countries. Examining cross-cultural validity in non-English speaking and low-income and middle-income countries will be required to ensure generalizability in those settings. Our work did not review the total body of evidence for each instrument because the included instruments were rarely used in the context of surgical innovation. Instead, multiple data sources were used to identify measures of operator experience used in studies of surgical innovation. This implies that overall quality of available evidence (as determined by the GRADE approach through Step 3c of the COSMIN methodology) was not completed and therefore not considered during the selection of a suitable instrument in phase 2. Synthesizing the total body of evidence for 16 instruments may have been valuable to explore instrument validity in different target populations but was considered unlikely to significantly change the conclusions of this review and exceeded the resources available to complete the work. There is potential value in completing a full, formal COSMIN review when instruments have been more widely validated in the context of a refined conceptual framework of surgical innovation as the subject of future research and may consider findings from the present work.
      Content validity of the SURG-TLX has been established in the context of surgical innovation, with other instruments performing poorly. More research is now required to define other robust measurement properties. Further studies validating the use of the SURG-TLX for evaluating surgical innovation can enable calculation of test-retest, inter-rater, and intrarater reliability. Responsiveness to change can be evaluated by measuring operator experience before and after known improvements at different times through the innovation lifecycle and interpretation of the SURG-TLX may be improved through studies that define the minimally important difference. Finally, interviews with professionals highlighted some deficiencies of the SURG-TLX with regards to assessing satisfaction and usability of innovative devices. Complementary use of relevant measures identified in this work (e.g., System Usability Scale) can be considered in specific contexts.
      The recent development of the Core Outcomes for early pHasE Surgical Innovation and deVicEs outcome set for all studies of surgical innovation [
      • Avery K.
      • Blazeby J.
      • Wilson N.
      • Macefield R.
      • Cousins S.
      • Main B.
      • et al.
      Development of reporting guidance and core outcome sets for seamless, standardised evaluation of innovative surgical procedures and devices: a study protocol for content generation and a Delphi consensus process (COHESIVE study).
      ] is an important step to enable systematic evaluation of complex, novel procedures and devices. International stakeholders agreed that operator experience is one of eight domains that are essential to be measured and reported in early phase studies. The present study completes a necessary step to operationalize the core outcome set; however, there is an ongoing need to establish the measurement of the other seven domains.
      In conclusion, the SURG-TLX has sufficient validity to be preliminarily recommended for use in studies evaluating surgical innovation. Routine measurement of operators’ experiences may facilitate optimization of novel procedures and devices to enable safe and efficient innovation.

      Acknowledgments

      We would like to thank Neil Smart for his help in the recruitment of interview participants. We would also like to thank all interview participants for their time and contributing their views to this study.

      Supplementary data

      References

        • McCulloch P.
        • Altman D.G.
        • Campbell W.B.
        • Flum D.R.
        • Glasziou P.
        • Marshall J.C.
        • et al.
        No surgical innovation without evaluation: the IDEAL recommendations.
        Lancet. 2009; 374: 1105-1112
        • Ergina P.L.
        • Barkun J.S.
        • McCulloch P.
        • Cook J.A.
        • Altman D.G.
        • IDEAL Group
        IDEAL framework for surgical innovation 2: observational studies in the exploration and assessment stages.
        BMJ. 2013; 346
        • Austin D.C.
        • Torchia M.T.
        • Lurie J.D.
        • Jevsevar D.S.
        • Bell J.E.
        Mapping the diffusion of technology in orthopaedic surgery: understanding the spread of arthroscopic rotator cuff repair in the United States.
        Clin Orthop Relat Res. 2019; 477: 2399-2410
        • Mirheydar H.S.
        • Parsons J.K.
        Diffusion of robotics into clinical practice in the United States: process, patient safety, learning curves, and the public health.
        World J Urol. 2013; 31: 455-461
        • Currie A.
        • Brigic A.
        • Blencowe N.S.
        • Potter S.
        • Faiz O.D.
        • Kennedy R.H.
        • et al.
        Systematic review of surgical innovation reporting in laparoendoscopic colonic polyp resection.
        Br J Surg. 2015; 102: e108-e116
        • Hoffmann C.
        • Macefield R.C.
        • Wilson N.
        • Blazeby J.M.
        • Avery K.N.L.
        • Potter S.
        • et al.
        A systematic review and in-depth analysis of outcome reporting in early phase studies of colorectal cancer surgical innovation.
        Color Dis. 2020; 22: 1862-1873
        • Macefield R.C.
        • Wilson N.
        • Hoffmann C.
        • Blazeby J.M.
        • McNair A.G.K.
        • Avery K.N.L.
        • et al.
        Outcome selection, measurement and reporting for new surgical procedures and devices: a systematic review of IDEAL/IDEAL-D studies to inform development of a core outcome set.
        BJS Open. 2020; 4: 1072-1083
        • Khachane A.
        • Philippou Y.
        • Hirst A.
        • McCulloch P.
        Appraising the uptake and use of the IDEAL Framework and Recommendations: a review of the literature.
        Int J Surg. 2018; 57: 84-90
        • Choi J.D.
        • Park J.W.
        • Lee H.W.
        • Lee D.G.
        • Jeong B.C.
        • Jeon S.S.
        • et al.
        A comparison of surgical and functional outcomes of robot-assisted versus pure laparoscopic partial nephrectomy.
        J Soc Laparoendosc Surg. 2013; 17: 292-299
        • Alleblas C.C.J.
        • de Man A.M.
        • van den Haak L.
        • Vierhout M.E.
        • Jansen F.W.
        • Nieboer T.E.
        Prevalence of musculoskeletal disorders among surgeons performing minimally invasive surgery.
        Ann Surg. 2017; 266: 905-920
        • Dalager T.
        • Søgaard K.
        • Bech K.T.
        • Mogensen O.
        • Jensen P.T.
        Musculoskeletal pain among surgeons performing minimally invasive surgery: a systematic review.
        Surg Endosc. 2017; 31: 516-526
        • Alam M.
        • Roongpisuthipong W.
        • Kim N.A.
        • Goyal A.
        • Swary J.H.
        • Brindise R.T.
        • et al.
        Utility of recorded guided imagery and relaxing music in reducing patient pain and anxiety, and surgeon anxiety, during cutaneous surgical procedures: a single-blinded randomized controlled trial.
        J Am Acad Dermatol. 2016; 75: 585-589
        • Arora S.
        • Sevdalis N.
        • Nestel D.
        • Woloshynowych M.
        • Darzi A.
        • Kneebone R.
        The impact of stress on surgical performance: a systematic review of the literature.
        Surgery. 2010; 147: 318-330.e6
        • Ergina P.L.
        • Cook J.A.
        • Blazeby J.M.
        • Boutron I.
        • Clavien P.A.
        • Reeves B.C.
        • et al.
        Challenges in evaluating surgical innovation.
        Lancet. 2009; 374: 1097-1104
        • McCulloch P.
        • Feinberg J.
        • Philippou Y.
        • Kolias A.
        • Kehoe S.
        • Lancaster G.
        • et al.
        Progress in clinical research in surgery and IDEAL.
        Lancet. 2018; 392: 88-94
        • Yu J.
        • Shan F.
        • Hirst A.
        • McCulloch P.
        • Li Y.
        • Sun X.
        Identifying research waste from surgical research: a protocol for assessing compliance with the IDEAL framework and recommendations.
        BMJ Surgery, Interv Heal Technol. 2021; 3: e000050
        • Prinsen C.A.C.
        • Mokkink L.B.
        • Bouter L.M.
        • Alonso J.
        • Patrick D.L.
        • de Vet H.C.W.
        • et al.
        COSMIN guideline for systematic reviews of patient-reported outcome measures.
        Qual Life Res. 2018; 27: 1147-1157
        • Avery K.
        • Blazeby J.
        • Wilson N.
        • Macefield R.
        • Cousins S.
        • Main B.
        • et al.
        Development of reporting guidance and core outcome sets for seamless, standardised evaluation of innovative surgical procedures and devices: a study protocol for content generation and a Delphi consensus process (COHESIVE study).
        BMJ Open. 2019; 9: 9
        • Terwee C.B.
        • Prinsen C.A.
        • Chiarotto A.
        • Cw De Vet H.
        • Bouter L.M.
        • Marjan J.A.
        • et al.
        COSMIN methodology for assessing the content validity of PROMs User manual version 1.0;.
        2018 (Available at)
        • Mokkink L.B.
        • Terwee C.B.
        • Knol D.L.
        • Stratford P.W.
        • Alonso J.
        • Patrick D.L.
        • et al.
        The COSMIN checklist for evaluating the methodological quality of studies on measurement properties: a clarification of its content.
        BMC Med Res Methodol. 2010; 10: 22
        • Terwee C.B.
        • Prinsen C.A.C.
        • Chiarotto A.
        • Westerman M.J.
        • Patrick D.L.
        • Alonso J.
        • et al.
        COSMIN methodology for evaluating the content validity of patient-reported outcome measures: a Delphi study.
        Qual Life Res. 2018; 27: 1159-1170
        • Srivastava A.
        • Thomson S.B.
        Framework Analysis: A Qualitative Methodology for Applied Policy Research.
        4 Journal of Administration and Governance 72, 2009 (Available at SSRN:)
        https://ssrn.com/abstract=2760705
        Date accessed: July 12, 2022
        • Koedam T.W.A.
        • Veltcamp Helbach M.
        • Penna M.
        • Wijsmuller A.
        • Doornebosch P.
        • van Westreenen H.L.
        • et al.
        Short-term outcomes of transanal completion total mesorectal excision (cTaTME) for rectal cancer: a case-matched analysis.
        Surg Endosc. 2019; 33: 103-109
        • Vidya R.
        • Cawthorn S.J.
        Muscle-sparing ADM-assisted breast reconstruction technique using complete breast implant coverage: a dual-institute UK-based experience.
        Breast Care. 2017; 12: 251-254
        • Wilson M.R.
        • Poolton J.M.
        • Malhotra N.
        • Ngo K.
        • Bright E.
        • Masters R.S.W.
        Development and validation of a surgical workload measure: The surgery task load index (SURG-TLX).
        World J Surg. 2011; 35: 1961-1969
        • Arora S.
        • Tierney T.
        • Sevdalis N.
        • Aggarwal R.
        • Nestel D.
        • Woloshynowych M.
        • et al.
        The imperial stress assessment tool (ISAT): a feasible, reliable and valid approach to measuring stress in the operating room.
        World J Surg. 2010; 34: 1756-1763
        • Manzey D.
        • Röttger S.
        • Bahner-Heyne J.E.
        • Schulze-Kissing D.
        • Dietz A.
        • Meixensberger J.
        • et al.
        Image-guided navigation: the surgeon’s perspective on performance consequences and human factors issues.
        Int J Med Robot Comput Assist Surg. 2009; 5: 297-308
        • Goh A.C.
        • Goldfarb D.W.
        • Sander J.C.
        • Miles B.J.
        • Dunkin B.J.
        Global evaluative assessment of robotic skills: validation of a clinical assessment tool to measure robotic surgical skills.
        J Urol. 2012; 187: 247-252
        • Lemon J.
        • Cooper J.
        • Defres S.
        • Easton A.
        • Sadarangani M.
        • Griffiths M.J.
        • et al.
        Understanding parental perspectives on outcomes following paediatric encephalitis: a qualitative study.
        PLoS One. 2019; 14: 1-15
        • Raison N.
        • Wood T.
        • Brunckhorst O.
        • Abe T.
        • Ross T.
        • Challacombe B.
        • et al.
        Development and validation of a tool for non-technical skills evaluation in robotic surgery—the ICARS system.
        Surg Endosc. 2017; 31: 5403-5410
        • Zijlstra F.R.H.
        Efficiency in Work Behaviour.
        Delft University Press, Delft1995
        • Reid G.B.
        • Nygren T.E.
        The subjective Workload assessment technique: a scaling procedure for measuring mental Workload.
        Adv Psychol. 1988; 52: 185-218
        • Cassar K.
        Development of an instrument to measure the surgical operating theatre learning environment as perceived by basic surgical trainees.
        Med Teach. 2004; 26: 260-264
        • Finstad K.
        The usability metric for user experience.
        Interact Comput. 2010; 22: 323-327
        • Brooke J.
        SUS: a “quick and dirty” usability scale.
        in: Jordan P.W. Thomas B. McClelland I.L. Weerdmeester B. Usability Evaluation In Industry. CRC Press, London1996: 189-195
        • Corlett E.N.
        • Bishop R.P.
        A technique for assessing postural discomfort.
        Ergonomics. 1976; 19: 175-182
        • Hart S.G.
        • Staveland L.E.
        Development of NASA-TLX (task load index): results of empirical and theoretical research.
        Adv Psychol. 1988; 52: 139-183
        • Marteau T.M.
        • Bekker H.
        The development of a six-item short-form of the state scale of the Spielberger State—trait Anxiety Inventory (STAI).
        Br J Clin Psychol. 1992; 31: 301-306
        • Vassiliou M.C.
        • Feldman L.S.
        • Andrew C.G.
        • Bergman S.
        • Leffondré K.
        • Stanbridge D.
        • et al.
        A global assessment tool for evaluation of intraoperative laparoscopic skills.
        Am J Surg. 2005; 190: 107-113
        • Boles D.B.
        • Adair L.P.
        The multiple resources Questionnaire (MRQ).
        Proc Hum Factors Ergon Soc Annu Meet. 2001; 45: 1790-1794
        • Yule S.
        • Flin R.
        • Paterson-Brown S.
        • Maran N.
        • Rowley D.
        Development of a rating system for surgeons’ non-technical skills.
        Med Educ. 2006; 40: 1098-1104
        • Borg G.
        Perceived exertion as an indicator of somatic stress.
        Scand J Rehabil Med. 1970; 2: 92-98
        • Ruiz-Rabelo J.F.
        • Navarro-Rodriguez E.
        • Di-Stasi L.L.
        • Diaz-Jimenez N.
        • Cabrera-Bermon J.
        • Diaz-Iglesias C.
        • et al.
        Validation of the NASA-TLX score in ongoing assessment of mental Workload during a laparoscopic learning curve in bariatric surgery.
        Obes Surg. 2015; 25: 2451-2456