Highlights
- •This study identifies, appraises, and recommends a standard measure to assess operators' experience in studies of surgical innovation.
- •Robust methodology was applied.
- •Supplemental validation used semistructured interviews with multinational and multidisciplinary professionals.
- •The SURG-TLX is preliminarily recommended because it was found to be most relevant, comprehensive, and comprehensible.
- •Routine use of a validated, standard measure to assess operators' experience supports efficient and transparent evaluation of complex interventions involving surgical innovation.
Abstract
Objective
Study Design and Setting
Results
Conclusion
Graphical abstract

Keywords
- •This study established the SURG-TLX as the most relevant, comprehensive, and comprehensible instrument to assess operators' experience (self-reported physical, psychological, and emotional aspects) of performing innovative surgery.
Key findings
- •Standardized measurement of operators' experience of performing/using an innovation is lacking, hindering effective and transparent evaluation of new procedures and devices. This study identified and appraised existing measures to assess operators' experience of surgical innovation using robust methodology.
What this adds to what is known?
- •The SURG-TLX is preliminarily recommended for use in studies evaluating surgical innovations. Further evaluation of other measurement properties is now needed. Routine, standardized measurement may facilitate optimization of novel procedures and devices to enable efficient innovation.
What is the implication, what should change now?
1. Introduction
- Alam M.
- Roongpisuthipong W.
- Kim N.A.
- Goyal A.
- Swary J.H.
- Brindise R.T.
- et al.
2. Methods

2.1 Definitions
2.2 Phase 1: identification of measurement instruments and development of a conceptual framework
- Avery K.
- Blazeby J.
- Wilson N.
- Macefield R.
- Cousins S.
- Main B.
- et al.
- Terwee C.B.
- Prinsen C.A.
- Chiarotto A.
- Cw De Vet H.
- Bouter L.M.
- Marjan J.A.
- et al.
2.2.1 Data extraction and analysis
2.3 Phase 2: appraisal of instrument quality
- Terwee C.B.
- Prinsen C.A.
- Chiarotto A.
- Cw De Vet H.
- Bouter L.M.
- Marjan J.A.
- et al.
2.3.1 Assessment of the quality of the development paper
2.3.2 Evaluation of the content validity of the measurement instrument
2.3.3 Selection of measurement instruments for supplemental validation
2.4 Phase 3: supplemental appraisal of content validity in the context of surgical innovation
- Srivastava A.
- Thomson S.B.
2.4.1 Ethical approval
3. Results
3.1 Phase 1: identification of measurement instruments

Psychology (n = 119) | Physical comfort (n = 18) | Usability (n = 41) |
---|---|---|
Examples | Examples | Examples |
Coping with pressure | Shoulder stiffness | Easy to harvest |
Surgeon's anxiety | Subjective ergonomic stress factors | Technically very challenging |
Perceived exertion | Impact on ergonomics | Simple to perform |
Mental strain | Hand pain | Excellent vision |
Surgeon's wellbeing | Physical demands | Problematic points |
3.2 Phase 2: appraisal of instrument quality
Measurement instrument | Relevance | Comprehensiveness | Comprehensibility | |||
---|---|---|---|---|---|---|
Rating of development paper | Reviewer rating | Rating of development paper | Reviewer rating | Rating of development paper | Reviewer rating | |
SURG-TLX | + | + | + | ? | ? | + |
STAI | ? | ? | ? | - | ? | + |
ISAT | ? | ? | ? | - | ? | + |
NASA-TLX | + | ? | + | ? | ? | + |
HFEQ-CASS | ± | ? | ? | ? | ? | ? |
SUS | + | ± | - | - | ? | ± |
GEARS | ? | - | - | - | ? | ? |
GOALS | + | - | ? | - | ? | ? |
STEEM/OREEM | ? | - | ? | - | ? | + |
UMUX | + | ± | ? | - | ? | - |
SMEQ | ? | ? | ? | - | ? | ? |
MRQ | ? | ? | ? | ? | ? | - |
NOTSS | + | - | ± | - | ? | + |
Borg Scale | ? | ? | ? | - | ? | - |
SWAT | ? | ? | ? | - | ? | ? |
BPD/LED | - | - | - | - | ? | - |
3.3 Phase 3: supplemental appraisal of content validity in the context of surgical innovation
Theme | Instrument | Supporting quotations |
---|---|---|
Relevance | Surg-TLX | “So these seem to be perfectly reasonable categories if you are trying to judge the impact on a surgeon” [P35]. |
“I have undergone many innovations over my 30 years in practice in surgery, and I can tell you every new procedure was more demanding … I would say these are the relevant aspects you are focused when you perform a new technique, a new procedure” [P28]. | ||
ISAT | “I think there are so many confounders, and I am not sure whether that would be specific enough to the innovation…I am not sure cortisol and heart rate would add more than just asking with a questionnaire how stressed you are. Because your heart rate is going to vary” [P37]. | |
Comprehensiveness | Surg-TLX | “I mean, those are certainly the things that I would think about when I am thinking about doing something different or new” [P14]. |
“I am trying to think of every scenario and I think it works” [P24]. | ||
ISAT | “So based on that I would be very cautious of having a tool that only focusses on anxiety and stress because [compared to the SURG-TLX] you are sort of saying it is more than that and then ignoring the rest” [WP9]. | |
Comprehensibility | Surg-TLX | “I think that it is all pretty clear actually [ [29] ]. Easy. I do understand. I get it” [P15].“….I think between the temporal demands and distractions, these are two domains that are difficult and it may not be capturing what you want to capture” [P24]. |
ISAT | “I would have problems to differentiate between calm, tense, upset, relaxed, content, and worried…between all these fine…nuances” [P18]. “Well, I think in an experimental setting this may make some sense to, kind of, correlate physiology with qualitative data and maybe to understand what it is happening to someone in real time. I guess, again, practicality, pertinence, I just… it is hard to I think, kind of, make it all fit” [P21]. | |
Instrument suitability | Surg-TLX | “The SURG-TLX would be my preferred metric or evaluation tool as compared to the other one” [P20]. |
“The Surgical Task Load Index seems to be much more comprehensive in nature and much more pertinent to the topic at hand, so I would say that by a long shot” [P21]. | ||
ISAT | “Just from the pragmatic perspective, are we really going to be recommending a tool which suggests that you are going to have to capture cortisol and heart rate? I think realistically … I mean yes in the perfect world but this seems to be much more of a research tool to be honest” [WP6]. |
Emergent themes | Supporting quotations |
---|---|
Procedures occur in stages | “So it is no longer just the global procedure and getting a score for everything or getting feedback for everything, it is start to think about how can we break down those steps of the procedure or device procedure into phases and steps that you can then really finesse which parts and which phases of the procedure we[re] particularly difficult and complex” [P24]. |
Patient complexity | “I think somehow you have got to be able to know that within that procedure, the general question about was it an average procedure? Or was it more difficult? Or more not? Nothing to do with the instrument but about the patient themself” [P19]. |
Impact of wider operating team | “I think it is important to ask different persons or people from the team” [P18]. |
Baseline proficiency | “If I am thinking about in the context of a new or an innovative procedure, I am always going to compare how difficult the innovative procedure is compared to whatever the standard is that I've been doing” [P14]. “But also this will be influenced by surgeon's baseline skill and competency. So it is not a standard baseline for everyone” [P15]. |
Baseline attitudes toward innovation | “I might start the operation going I do not really want to use this, I am anxious about it, I am stressed about it, new stapler and I only like my stapler… vs. I am very excited about using this piece of kit because I think it is better than the last one and I cannot wait to use it… So there are two completely different mindsets which would affect my subconsciously affect the scoring of all of it” [P19]. |
Baseline emotional factors | “One of the things that is the sort of unspoken no-nos, that all of those things that are scored there are affected by my own mental health and what is going on in my own life… because a, when I am feeling bad and miserable and upset with other stuff an operation will feel more difficult and will annoy me more when it goes wrong or when there are issues in it” [P19]. |
Changes over time | “It is not going to be the same for the same person at any point in time. Because that person's skill will change with time as well” [P15]. |
Trustworthiness of assessments | “We need better understanding of how surgeons are actually impacted, because I do not think that all surgeons or their subjective assessments are necessarily trustworthy” [P16]. |
3.3.1 Relevance
3.3.2 Comprehensiveness
3.3.3 Comprehensibility
3.3.4 Subjective instrument suitability for practical use
3.3.5 Emergent themes
4. Discussion
- Avery K.
- Blazeby J.
- Wilson N.
- Macefield R.
- Cousins S.
- Main B.
- et al.
- Terwee C.B.
- Prinsen C.A.
- Chiarotto A.
- Cw De Vet H.
- Bouter L.M.
- Marjan J.A.
- et al.
- Avery K.
- Blazeby J.
- Wilson N.
- Macefield R.
- Cousins S.
- Main B.
- et al.
Acknowledgments
Supplementary data
- Supplemental File 1
- Supplemental File 2
References
- No surgical innovation without evaluation: the IDEAL recommendations.Lancet. 2009; 374: 1105-1112
- IDEAL framework for surgical innovation 2: observational studies in the exploration and assessment stages.BMJ. 2013; 346
- Mapping the diffusion of technology in orthopaedic surgery: understanding the spread of arthroscopic rotator cuff repair in the United States.Clin Orthop Relat Res. 2019; 477: 2399-2410
- Diffusion of robotics into clinical practice in the United States: process, patient safety, learning curves, and the public health.World J Urol. 2013; 31: 455-461
- Systematic review of surgical innovation reporting in laparoendoscopic colonic polyp resection.Br J Surg. 2015; 102: e108-e116
- A systematic review and in-depth analysis of outcome reporting in early phase studies of colorectal cancer surgical innovation.Color Dis. 2020; 22: 1862-1873
- Outcome selection, measurement and reporting for new surgical procedures and devices: a systematic review of IDEAL/IDEAL-D studies to inform development of a core outcome set.BJS Open. 2020; 4: 1072-1083
- Appraising the uptake and use of the IDEAL Framework and Recommendations: a review of the literature.Int J Surg. 2018; 57: 84-90
- A comparison of surgical and functional outcomes of robot-assisted versus pure laparoscopic partial nephrectomy.J Soc Laparoendosc Surg. 2013; 17: 292-299
- Prevalence of musculoskeletal disorders among surgeons performing minimally invasive surgery.Ann Surg. 2017; 266: 905-920
- Musculoskeletal pain among surgeons performing minimally invasive surgery: a systematic review.Surg Endosc. 2017; 31: 516-526
- Utility of recorded guided imagery and relaxing music in reducing patient pain and anxiety, and surgeon anxiety, during cutaneous surgical procedures: a single-blinded randomized controlled trial.J Am Acad Dermatol. 2016; 75: 585-589
- The impact of stress on surgical performance: a systematic review of the literature.Surgery. 2010; 147: 318-330.e6
- Challenges in evaluating surgical innovation.Lancet. 2009; 374: 1097-1104
- Progress in clinical research in surgery and IDEAL.Lancet. 2018; 392: 88-94
- Identifying research waste from surgical research: a protocol for assessing compliance with the IDEAL framework and recommendations.BMJ Surgery, Interv Heal Technol. 2021; 3: e000050
- COSMIN guideline for systematic reviews of patient-reported outcome measures.Qual Life Res. 2018; 27: 1147-1157
- Development of reporting guidance and core outcome sets for seamless, standardised evaluation of innovative surgical procedures and devices: a study protocol for content generation and a Delphi consensus process (COHESIVE study).BMJ Open. 2019; 9: 9
- COSMIN methodology for assessing the content validity of PROMs User manual version 1.0;.2018 (Available at)https://www.cosmin.nl/wp-content/uploads/user-manual-COSMIN-Risk-of-Bias-tool_v4_JAN_final.pdfDate accessed: July 12, 2022
- The COSMIN checklist for evaluating the methodological quality of studies on measurement properties: a clarification of its content.BMC Med Res Methodol. 2010; 10: 22
- COSMIN methodology for evaluating the content validity of patient-reported outcome measures: a Delphi study.Qual Life Res. 2018; 27: 1159-1170
- Framework Analysis: A Qualitative Methodology for Applied Policy Research.4 Journal of Administration and Governance 72, 2009 (Available at SSRN:)https://ssrn.com/abstract=2760705Date accessed: July 12, 2022
- Short-term outcomes of transanal completion total mesorectal excision (cTaTME) for rectal cancer: a case-matched analysis.Surg Endosc. 2019; 33: 103-109
- Muscle-sparing ADM-assisted breast reconstruction technique using complete breast implant coverage: a dual-institute UK-based experience.Breast Care. 2017; 12: 251-254
- Development and validation of a surgical workload measure: The surgery task load index (SURG-TLX).World J Surg. 2011; 35: 1961-1969
- The imperial stress assessment tool (ISAT): a feasible, reliable and valid approach to measuring stress in the operating room.World J Surg. 2010; 34: 1756-1763
- Image-guided navigation: the surgeon’s perspective on performance consequences and human factors issues.Int J Med Robot Comput Assist Surg. 2009; 5: 297-308
- Global evaluative assessment of robotic skills: validation of a clinical assessment tool to measure robotic surgical skills.J Urol. 2012; 187: 247-252
- Understanding parental perspectives on outcomes following paediatric encephalitis: a qualitative study.PLoS One. 2019; 14: 1-15
- Development and validation of a tool for non-technical skills evaluation in robotic surgery—the ICARS system.Surg Endosc. 2017; 31: 5403-5410
- Efficiency in Work Behaviour.Delft University Press, Delft1995
- The subjective Workload assessment technique: a scaling procedure for measuring mental Workload.Adv Psychol. 1988; 52: 185-218
- Development of an instrument to measure the surgical operating theatre learning environment as perceived by basic surgical trainees.Med Teach. 2004; 26: 260-264
- The usability metric for user experience.Interact Comput. 2010; 22: 323-327
- SUS: a “quick and dirty” usability scale.in: Jordan P.W. Thomas B. McClelland I.L. Weerdmeester B. Usability Evaluation In Industry. CRC Press, London1996: 189-195
- A technique for assessing postural discomfort.Ergonomics. 1976; 19: 175-182
- Development of NASA-TLX (task load index): results of empirical and theoretical research.Adv Psychol. 1988; 52: 139-183
- The development of a six-item short-form of the state scale of the Spielberger State—trait Anxiety Inventory (STAI).Br J Clin Psychol. 1992; 31: 301-306
- A global assessment tool for evaluation of intraoperative laparoscopic skills.Am J Surg. 2005; 190: 107-113
- The multiple resources Questionnaire (MRQ).Proc Hum Factors Ergon Soc Annu Meet. 2001; 45: 1790-1794
- Development of a rating system for surgeons’ non-technical skills.Med Educ. 2006; 40: 1098-1104
- Perceived exertion as an indicator of somatic stress.Scand J Rehabil Med. 1970; 2: 92-98
- Validation of the NASA-TLX score in ongoing assessment of mental Workload during a laparoscopic learning curve in bariatric surgery.Obes Surg. 2015; 25: 2451-2456
Article info
Publication history
Footnotes
Declarations of interest: None. This study was conducted independent from authors involved in the development of any instruments reviewed in this work. J.B. is a member of the Core Outcome Measures for Effectiveness Trials (COMET) Initiative Management Group. All other authors declare no conflict of interests.
Sources of funding: This study was funded by an National Institute for Health and Care Research (NIHR) Clinician Scientists Fellowship award to A.M. (NIHR CS-2017-17-010). This work was further supported by the NIHR Biomedical Research Center (BRC) at University Hospitals Bristol and Weston NHS Foundation Trust and the University of Bristol (BRC-1215-20,011). The views expressed in this publication are those of the authors and not necessarily those of the NIHR. S.P. is an NIHR Clinician Scientist (NIHR CS-2016-16-019). J.B. is an NIHR Senior Investigator.
Author contributions: Angus McNair: Conceptualization, Methodology, Investigation, Formal analysis, Resources, Writing–Original draft, Writing–Review and Editing, and Funding acquisition. Christin Hoffmann: Methodology, Project administration, Investigation, Formal analysis, Resources, Writing–Original draft, and Writing–Review and Editing. Rhiannon Macefield: Conceptualization, Methodology, Formal analysis, Writing–Original draft, and Writing–Review and Editing. Daisy Elliott: Conceptualization, Methodology, Formal analysis, and Writing–Original draft. Jane Blazeby: Conceptualization, Supervision, Original draft, Writing–Review and Editing, and Funding acquisition. Kerry Avery: Conceptualization, Methodology, Investigation, Formal analysis, Resources, Writing–Original draft, and Writing–Review and Editing. Shelley Potter: Conceptualization, Methodology, Investigation, Formal analysis, Resources, Writing–Original draft, and Writing–Review and Editing.
Identification
Copyright
User license
Creative Commons Attribution (CC BY 4.0) |
Permitted
- Read, print & download
- Redistribute or republish the final article
- Text & data mine
- Translate the article
- Reuse portions or extracts from the article in other works
- Sell or re-use for commercial purposes
Elsevier's open access license policy