Advertisement

A systematic review highlighting poor quality of evidence for content validity of quality of life instruments in female chronic pelvic pain

  • Vishalli Ghai
    Correspondence
    Corresponding author. Department of Obstetrics and Gynaecology, Epsom & St Helier's University Hospitals NHS Trust, Dorking Road, London, UK KT18 7EG. Tel.: +44-1372-735-735.
    Affiliations
    Department of Obstetrics and Gynaecology, Epsom & St Helier University Hospitals NHS Trust, London KT18 7EG, UK

    St George's University of London, Crammer Terrace, London SW17 0RE, UK
    Search for articles by this author
  • Venkatesh Subramanian
    Affiliations
    Department of Obstetrics and Gynaecology, Epsom & St Helier University Hospitals NHS Trust, London KT18 7EG, UK
    Search for articles by this author
  • Haider Jan
    Affiliations
    Department of Obstetrics and Gynaecology, Epsom & St Helier University Hospitals NHS Trust, London KT18 7EG, UK
    Search for articles by this author
  • Stergios K. Doumouchtsis
    Affiliations
    Department of Obstetrics and Gynaecology, Epsom & St Helier University Hospitals NHS Trust, London KT18 7EG, UK

    St George's University of London, Crammer Terrace, London SW17 0RE, UK

    Laboratory of Experimental Surgery and Surgical Research N.S. Christeas, Medical School, National and Kapodistrian University of Athens, Athens, Greece

    American University of the Caribbean, Pembroke Pines, FL, USA

    Ross University, School of Medicine, Miramar, FL, USA
    Search for articles by this author
  • On behalf ofCHORUS: An International Collaboration Harmonising Outcomes, Research, and Standards in Urogynaecology and Womens Health
Open AccessPublished:April 19, 2022DOI:https://doi.org/10.1016/j.jclinepi.2022.04.016

      Abstract

      Objectives

      To evaluate the content validity of 19 patient-reported outcome measures (PROMs) used to measure quality of life (QoL) in women with chronic pelvic pain (CPP).

      Study Design and Setting

      We searched Embase, MEDLINE, PsycINFO databases and Google Scholar from inception to August 2020. We included records describing the development or studies assessing content validity of PROMs. Two reviewers independently assessed the methodological quality of PROMs using the Consensus-based Standards for the Selection of Health Measurement Instruments checklist. Evidence was synthesized for relevance, comprehensiveness, and comprehensibility. Quality of evidence was rated using a modified Grading of Recommendations, Assessment, Development, and Evaluations approach.

      Results

      PROM development was inadequate for all instruments included in this review. No high-quality evidence ratings were found for relevance, comprehensiveness, and comprehensibility. QoL was measured using generic instruments (68.42%, 13/19) rather than those specific to chronic pain (21.04%, 4/19) or pelvic pain (10.53%, 2/19). Quality of concept elicitation was inadequate for 90% of PROMs. Half of PROMs did not include patients in their development and only 40% were devised using a sample representative of the target population for which the PROM was developed. Cognitive interviews were conducted in one-fifth of PROMs and were mostly of inadequate/doubtful quality.

      Conclusion

      There is poor quality of evidence for content validity of PROMs used to measure QoL in women with CPP.

      Keywords

      What is new?

        Key findings

      • We identified, summarized, and graded quality of evidence supporting content validity of 19 patient-reported outcome measures (PROMs) reporting quality of life (QoL) outcomes in women with CPP.
      • This systematic review has shown poor quality evidence for content validity of PROMs measuring QoL in women with CPP including inadequate PROM development in areas such as concept elicitation and the use of cognitive interviews.

        What this add to what is known?

      • This is the first systematic review to implement the Consensus-based Standards for the Selection of Health Measurement Instruments (COSMIN) criteria to assess the content validity of PROMs reporting QoL outcomes in women with CPP.

        What is the implication and what should change now?

      • Findings of this review are concerning for clinicians. It is essential that high content validity instruments are used to generate relevant and meaningful data as it may influence decisions made by health professionals and patients.
      • Our evaluation of content validity of PROMs assessing QoL in women with CPP may be used subsequently to recommend instruments to measure core outcomes in core outcome sets in female CPP.

      1. Introduction

      Chronic pelvic pain (CPP) is a debilitating condition associated with significant long-term morbidity and socioeconomic burden [
      • Mathias S.D.
      • Kuppermann M.
      • Liberman R.F.
      • Lipschutz R.C.
      • Steege J.F.
      Chronic pelvic pain: prevalence, health-related quality of life, and economic correlates.
      ,
      • Chen I.
      • Thavorn K.
      • Shen M.
      • Goddard Y.
      • Yong P.
      • MacRae G.S.
      • et al.
      Hospital-associated costs of chronic pelvic pain in Canada: a population-based descriptive study.
      ]. The complexities of pain perception and sensation mean that it is seldom curative. Therefore, clinical efforts are focused on reducing pain intensity and improving health related quality of life (QoL). QOL is defined as physical, psychological, and social domains of health, seen as distinct areas that are influenced by a person's experiences, beliefs, expectations, and perceptions [
      • Chiarotto A.
      • Terwee C.B.
      • Kamper S.J.
      • Boers M.
      • Ostelo R.W.
      Evidence on the measurement properties of health-related quality of life instruments is largely missing in patients with low back pain: a systematic review.
      ].
      The measurement of QoL is considered, an important outcome domain among researchers and clinicians in clinical trials and a priority among women with CPP [
      • Ghai V.
      • Subramanian V.
      • Jan H.
      • Pergialiotis V.
      • Thakar R.
      • Doumouchtsis S.K.
      A systematic review on reported outcomes and outcome measures in female idiopathic chronic pelvic pain for the development of a core outcome set.
      ,
      • Ghai V.
      • Subramanian V.
      • Jan H.
      • Thakar R.
      • Doumouchtsis S.K.
      CHORUS: an International Collaboration for Harmonising Outcomes, Research, and Standards in Urogynaecology and Women's Health. A meta-synthesis of qualitative literature on female chronic pelvic pain for the development of a core outcome set: a systematic review.
      ]. Several measurement instruments are available to measure QoL. The selection of adequate instruments is determined by validity, which is the extent to which an instrument accurately measures what it is supposed to measure. The Consensus-based Standards for the Selection of Health Measurement Instruments (COSMIN) taxonomy divides validity into five subdomains [
      • Mokkink L.B.
      • Terwee C.B.
      • Patrick D.L.
      • Alonso J.
      • Stratford P.W.
      • Knol D.L.
      • et al.
      The COSMIN study reached international consensus on taxonomy, terminology, and definitions of measurement properties for health-related patient-reported outcomes.
      ]. Content validity is the first measurement property considered when choosing a patient-reported outcome measure (PROM) and is described as the degree to which the content of an instrument is an adequate reflection of the construct to be measured [
      • Prinsen C.A.C.
      • Vohra S.
      • Rose M.R.
      • Boers M.
      • Tugwell P.
      • Clarke M.
      • et al.
      How to select outcome measurement instruments for outcomes included in a “Core Outcome Set” - a practical guideline.
      ]. It refers to the relevance, comprehensiveness, and comprehensibility of a PROM with respect to the construct, target population, and context of use. Content validity influences other measurement properties. For example, poor content validity can impact the responsiveness of a PROM.

      2. Objectives

      There are no systematic reviews available on content validity of PROMs measuring QoL in women with CPP using a standardized COSMIN methodology. Previous reviews published on PROMs in benign gynaecology including CPP and endometriosis have been limited [
      • Jones G.L.
      • Kennedy S.H.
      • Jenkinson C.
      Health-related quality of life measurement in women with common benign gynecologic conditions: a systematic review.
      ,
      • Bourdel N.
      • Chauvet P.
      • Billone V.
      • Douridas G.
      • Fauconnier A.
      • Gerbaud L.
      • et al.
      Systematic review of quality of life measures in patients with endometriosis.
      ,
      • Traylor J.
      • Chaudhari A.
      • Tsai S.
      • Milad M.P.
      Patient-reported outcome measures in benign gynecologic surgery: updates and selected tools.
      ,
      • Neelakantan D.
      • Omojole F.
      • Clark T.J.
      • Gupta J.K.
      • Khan K.S.
      Quality of life instruments in studies of chronic pelvic pain: a systematic review.
      ]. Results have been descriptive, presented basic psychometric properties and not performed using a standardized COSMIN methodology.
      A thorough assessment of content validity should not only include studies evaluating content validity in the population of interest but also original development studies and the content of the instrument itself. The COSMIN initiative has developed methodological guidance describing what constitutes sufficient content validity including a method to integrate methodological quality and results into an evidence synthesis rating system [
      • Terwee C.B.
      • Prinsen C.A.C.
      • Chiarotto A.
      • Westerman M.J.
      • Patrick D.L.
      • Alonso J.
      • et al.
      COSMIN methodology for evaluating the content validity of patient-reported outcome measures: a Delphi study.
      ].
      This systematic review has applied a COSMIN methodology to evaluate PROMs used to measure QoL in women with CPP. As an important element of this evaluation process, we considered content validity to be a parameter of particular relevance for the reasons described above.
      This review was performed by a working group of CHORUS, an International Collaboration for Harmonising Outcomes in Research, and Standards in Urogynaecology and Women's Health (https://i-chorus.org) and is part of a wider initiative to develop core outcome sets in CPP. In depth, assessment of outcome measures for suitability prior to consideration for inclusion in a core outcome measure set has been recommended and this study aims to contribute to this process.

      3. Materials and methods

      This systematic review was registered with the Core Outcomes Measures in Effectiveness Trials (COMET) initiative register, number 981 and with the International Prospective Register of Systematic Reviews (PROSPERO), registration number CRD42019134858. We have performed a secondary analysis of our previous findings on the variation of outcomes and applied outcome measures in CPP trials [
      • Ghai V.
      • Subramanian V.
      • Jan H.
      • Pergialiotis V.
      • Thakar R.
      • Doumouchtsis S.K.
      A systematic review on reported outcomes and outcome measures in female idiopathic chronic pelvic pain for the development of a core outcome set.
      ]. Consequently, we have employed additional methodology for the purpose of this specific review which was not stated in the initial protocol registered on PRSOPERO. These include those related to our search strategy, evidence synthesis, and risk of bias. For example, our search strategy included additional databases such as PsycInfo and COSMIN and data sources such as Google Scholar. We performed an evidence synthesis and a risk of bias assessment in accordance with the standardized approach as recommended by the COSMIN guidelines in relation to systematic reviews evaluating PROMs [
      • Terwee C.B.
      • Prinsen C.A.C.
      • Chiarotto A.
      • Westerman M.J.
      • Patrick D.L.
      • Alonso J.
      • et al.
      COSMIN methodology for evaluating the content validity of patient-reported outcome measures: a Delphi study.
      ,
      • Prinsen C.A.C.
      • Mokkink L.B.
      • Bouter L.M.
      • Alonso J.
      • Patrick D.L.
      • de Vet H.C.W.
      • et al.
      COSMIN guideline for systematic reviews of patient-reported outcome measures.
      ,
      • Mokkink L.B.
      • de Vet H.C.W.
      • Prinsen C.A.C.
      • Patrick D.L.
      • Alonso J.
      • Bouter L.M.
      • et al.
      COSMIN risk of bias checklist for systematic reviews of patient-reported outcome measures.
      ].

      3.1 Study design

      The design of the present systematic review was based on the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guideline [
      • Liberati A.
      • Altman D.G.
      • Tetzlaff J.
      • Mulrow C.
      • Gøtzsche P.C.
      • Ioannidis J.P.A.
      • et al.
      The PRISMA statement for reporting systematic reviews and meta-analyses of studies that evaluate healthcare interventions: explanation and elaboration.
      ] (Appendix A).
      Our inventory of PROMs measuring QoL in women with CPP was informed by a previous systematic review reporting all outcomes and outcome measures in effectiveness trials assessing interventions in idiopathic CPP [
      • Ghai V.
      • Subramanian V.
      • Jan H.
      • Pergialiotis V.
      • Thakar R.
      • Doumouchtsis S.K.
      A systematic review on reported outcomes and outcome measures in female idiopathic chronic pelvic pain for the development of a core outcome set.
      ]. A comprehensive literature search was undertaken using Cochrane Central Register of Controlled Trials (CENTRAL), Embase, and MEDLINE databases. Searches were performed from database inception to September 2019 using the following Medical Subject Heading search terms: “CPP,” “pelvic pain,” and “idiopathic CPP”. We only included randomized control trials assessing the effectiveness of interventions for CPP. The population of interest included women aged more than 18 years with CPP. We included all studies investigating psychological therapies, medical and surgical interventions with existing treatments, or placebo regimes. We excluded studies in languages other than English, pilot studies, nonrandomized studies, retrospective studies, case series, and case reports. We identified 48 measurement instruments, of which 15 PROMs assessed QoL including becks depression inventory, brief pain inventory–interference subscale, Endometriosis Health Profile 30 (EHP 30), EuroQoL 5D (EQ-5D), fear-avoidance beliefs questionnaire (FABQ), General Health Questionnaire (GHQ), Hospital Anxiety and Depression Scale (HADS), inventory of interpersonal problems (IIP), multidimensional pain inventory (MPI), Oswestry Disability Index (ODI), Pain Beliefs and Perception Inventory (PBPI), Sexual Activity Questionnaire (SAQ), short form health survey (SF 36), short from health survey 12 (SF 12), and the World Health Organization Quality of Life assessment (WHOQoL).
      For the purpose of this review, we included all versions of a PROM which resulted in a total of 19 PROMs: becks depression inventory 2.0, brief pain inventory, EHP 30, EHP 5, EQ-5D 3L, EQ-5D 5L, FABQ, HADS, IIP-64, IIP-32, MPI, ODI 1.0, ODI 2.1a, PBPI, SAQ, SF 36, SF 12, WHOQoL 100, and WHOQoL Bref. The GHQ was not included as we could obtain the user manual or questionnaire from the publisher.

      3.2 Search strategy

      A comprehensive literature search was conducted using Embase, MEDLINE, and PsycINFO databases from inception to August 2020. The search strategy consisted of three groups of search terms combined with the Boolean operator “AND” (1): instrument names (2), CPP, and (3) measurement properties to identify and evaluate evidence for this current systematic review. A previously developed search filter used to retrieve studies on measurement properties in PubMed was adapted for all other databases [
      • Terwee C.B.
      • Jansma E.P.
      • Riphagen I.I.
      • de Vet H.C.W.
      Development of a methodological PubMed search filter for finding studies on measurement properties of measurement instruments.
      ] (Appendix B for search strategy). Google scholar was also searched (search date November 17, 2020) using the name of PROMs, the first 100 webpages for each PROM were screened for inclusion. Citation tracking of eligible records was also conducted. Results of searches were combined and duplicates removed in Endnote 20.

      3.3 Selection of studies

      Any report (i.e., book, online article) presenting the development of the 19 PROMs was included for the assessment of content validity [
      • Terwee C.B.
      • Prinsen C.A.C.
      • Chiarotto A.
      • Westerman M.J.
      • Patrick D.L.
      • Alonso J.
      • et al.
      COSMIN methodology for evaluating the content validity of patient-reported outcome measures: a Delphi study.
      ]. Content validity studies were eligible for inclusion if they were full-text original articles, about women with nonspecific CPP or professionals to assess the relevance, comprehensiveness, and comprehensibility of the content of at least one of the PROMs [
      • Terwee C.B.
      • Prinsen C.A.C.
      • Chiarotto A.
      • Westerman M.J.
      • Patrick D.L.
      • Alonso J.
      • et al.
      COSMIN methodology for evaluating the content validity of patient-reported outcome measures: a Delphi study.
      ]. Studies that included women with mixed pathologies were included if at least 75% of the total sample had nonspecific CPP. Studies on cross-cultural adaptation were included as content validity studies if they performed a pilot study of the adapted questionnaire, in which its comprehensibility was assessed in patients with nonspecific CPP [
      • Beaton D.E.
      • Bombardier C.
      • Guillemin F.
      • Ferraz M.B.
      Guidelines for the process of cross-cultural adaptation of self-report measures.
      ].
      Two independent researchers (V.G. and V.S.) screened for potentially eligible studies by examining initially titles and subsequently abstracts of the identified studies. Full-text articles were retrieved for abstracts meeting the inclusion criteria or in cases when information in the abstract was incomplete or unclear. Full-text articles were reviewed and discrepancies regarding suitability for inclusion were resolved by discussion with the senior author (S.K.D.). Results of the study selection process were summarized in a flow chart including reasons for excluding the full text (Fig. S1 for the flow chart and Appendix D for the list of excluded records). References were managed by Endnote 20.

      3.4 Data extraction

      A standardized data extraction form was used [
      • Mokkink L.B.
      • de Vet H.C.W.
      • Prinsen C.A.C.
      • Patrick D.L.
      • Alonso J.
      • Bouter L.M.
      • et al.
      COSMIN risk of bias checklist for systematic reviews of patient-reported outcome measures.
      ]. The following information was extracted: characteristics of the PROM (i.e., construct, target population, intended context of use, mode/time of administration, number of scales/items/response options, recall period, range of scores, available translations, and access fee) and characteristics of the development study (conceptual framework, language, and patient involvement). Data extraction was performed by two researchers independently (V.G. and V.S.) and in the case of disagreement a consensus was reached by discussion with a third senior reviewer (S.K.D.).

      3.5 Quality assessment

      The methodological quality of PROM development and content validity was assessed using the COSMIN risk of bias checklist.
      PROM development was evaluated using 35 standards divided across two areas (1): quality of PROM design including the concept elicitation study for item generation and (2) quality of the cognitive interview study to assess relevance, comprehensiveness, and comprehensibility of PROM items. Each standard was rated using a four-point scale: “very good,” “adequate,” “doubtful,” or “inadequate”.
      A second set of COSMIN standards was used to assess the methodological quality of content validity studies. A total of 31 standards evaluated studies which reported responsiveness, comprehensiveness, or comprehensibility by patients or professionals. Each standard was rated using a four-point scale: “very good,” “adequate,” “doubtful,” or “inadequate”.
      Total scores were calculated for both parts of the PROM development study (quality of PROM design and quality of cognitive interview study) and for each aspect of the methodological quality of the content validity studies (relevance, comprehensiveness, and comprehensibility). Total scores given to each box/part of box were determined using the lowest grade of any standard in that box/part of box (i.e., “the worst score counts” principle).
      We also searched the COSMIN database to identify previous studies which evaluated the quality of PROM development of instruments included in this review. We used these findings of PROM development to support our evaluation.
      A quality assessment was performed by two researchers independently (V.G. and V.S.) and in the case of disagreement a consensus was reached by discussion with a third senior reviewer (S.K.D.).

      3.6 Evidence synthesis

      First, the PROM development study, content validity studies, and the content of the PROM were rated against the 10 criteria of good content validity [
      • Terwee C.B.
      • Prinsen C.A.C.
      • Chiarotto A.
      • Westerman M.J.
      • Patrick D.L.
      • Alonso J.
      • et al.
      COSMIN methodology for evaluating the content validity of patient-reported outcome measures: a Delphi study.
      ]. There are five criteria for relevance, one criterion for comprehensiveness, and four criteria for comprehensibility. Each criterion was scored sufficient (+), insufficient (−), or indeterminate (?).
      Second, results of the PROM development study, content validity studies, and reviewer ratings of PROM content were qualitatively summarized and compared against the criteria of good content validity. Overall ratings for relevance, comprehensiveness, comprehensibility, and overall content validity were determined for each PROM. Ratings were sufficient (+), insufficient (−), or inconsistent (±).
      Finally, a modified Grading of Recommendations, Assessment, Development, and Evaluations approach was applied to assess the quality of evidence [
      • Guyatt G.H.
      • Oxman A.D.
      • Vist G.E.
      • Kunz R.
      • Falck-Ytter Y.
      • Alonso-Coello P.
      • et al.
      GRADE: an emerging consensus on rating quality of evidence and strength of recommendations.
      ]. Factors considered include study quality, consistency of results across studies, and indirectness. The quality of evidence was graded as high, moderate, low, and very low.
      Data synthesis was performed by two researchers independently (V.G. and V.S.) and in the case of disagreement a consensus was reached by discussion with a third senior reviewer (S.K.D.).

      3.7 Data analysis

      We used a descriptive method of analysis to produce overall ratings for relevance, comprehensiveness, comprehensibility, and content validity per PROM including a quality assessment of evidence.

      3.8 Patient and public involvement

      There has been no patient involvement as this study is a systematic review of existing research.

      4. Results

      The literature search was conducted on August 30, 2020 and we identified 475 titles and abstracts. We screened 307 titles and abstracts following the exclusion of 168 duplicate records. Five records were identified from our literature search (Fig. S1). Further 27 records were identified from our search on the COSMIN database, Google Scholar, and citation tracking. In total, we included 32 records focussing on PROM development of 19 PROMs and one study assessing content validity involving patients of a single PROM (Appendix C for a list of included records and Appendix D for a list of excluded full-text records).

      4.1 Quality of patient-reported outcome measure development study

      Table 1 presents a summary of the development studies describing the construct definition, target population, context of use, and patient involvement of 19 PROMs (an online table is available (Table S1) and presents further details of PROM characteristics).
      Table 1Characteristics of included PROMS
      PROM
      Each version of a PROM is considered a separate PROM.
      (reference to first article)
      Construct(s)Target population PROM developed forContext PROM developed forMode of administration (e.g., self-report, interview-based, proxy report, etc.)Language (s) (country) of developmentAvailable translationsAccess feePatient Involvement in concept elicitation
      BDI II [
      • Beck A.T.
      • Ward A.T.
      • Mendelson M.
      • Mock J.
      • Erbaugh J.
      An inventory for measuring depression.
      ,
      • Beck A.T.
      • Steer R.A.
      • Brown G.K.
      Manual for the Beck Depression Inventory-II.
      ]
      Indicator of the presence and degree of depressive symptomsPatients diagnosed with depressionClinical practice and researchSelf-administeredEnglish (US)YesYesYes
      BPI- Pain Interference subscale [
      • Cleeland C.S.
      The brief pain inventory user guide.
      ]
      Measure of the severity and impact of cancer-related pain on functioningPatients with cancer related painClinical practice, clinical trials, epidemiological researchInterview-basedEnglish (US)YesYesYes
      EHP 30 [
      • Jones G.
      • Kennedy S.
      • Barnard A.
      • Wong J.
      • Jenkinson C.
      Development of an endometriosis quality-of-life instrument: the endometriosis health profile-30.
      ]
      Assessment of health related quality of life, “encompassing physical, psychologic, and social aspects”, of women with endometriosisWomen with endometriosisClinical practice and researchSelf-administeredEnglish (UK)YesYesYes
      EHP 5 [
      • Jones G.
      • Kennedy S.
      • Barnard A.
      • Wong J.
      • Jenkinson C.
      Development of an endometriosis quality-of-life instrument: the endometriosis health profile-30.
      ,
      • Jones G.
      • Jenkinson C.
      • Kennedy S.
      Development of the short form endometriosis health profile questionnaire: the EHP-5.
      ]
      Assumed same as EHP-30Assumed same as EHP-30Assumed same as EHP-30Assumed same as EHP-30English (UK)YesYesYes
      EQ-5D-3L [
      EuroQol Group
      EuroQol--a new facility for the measurement of health-related quality of life.
      ,
      EuroQol Research Foundation
      EQ-5D-3L user guide, Version 6.
      ]
      Generic measure of health related quality of life, no definition given.Nondisease specific.Large-scale surveys of communitySelf-administered, interview-basedDutch

      English (UK)

      Finnish

      Norwegian

      Swedish
      YesYesNone
      EQ-5D-5L [
      EuroQol Research Foundation
      EQ-5D-5L user guide, Version 3.
      ,
      • Herdman M.
      • Gudex C.
      • Lloyd A.
      • Janssen M.
      • Kind P.
      • Parkin D.
      • et al.
      Development and preliminary testing of the new five-level version of EQ-5D (EQ-5D-5L).
      ] (2009)
      Assumed same as EQ-5D-3LAssumed same as EQ-5D-3LAssumed same as EQ-5D-3LAssumed same as EQ-5D-3LEnglish (UK)

      Spanish (Spain)
      YesYesYes
      FABQ [
      • Waddell G.
      • Newton M.
      • Henderson I.
      • Somerville D.
      • Main C.
      A Fear-Avoidance Beliefs Questionnaire (FABQ) and the role of fear-avoidance beliefs in chronic low back pain and disability.
      ]
      Measure of patients' fear of pain and consequent avoidance of physical activity and long-term disability.Patients with chronic lower back painClinical practice and researchSelf-administeredEnglish (UK)YesNot statedYes
      HADS [
      • Zigmond A.S.
      • Snaith R.P.
      The hospital anxiety and depression scale.
      ] (1983)
      Measure to detect depression and anxietyPatients in hospital clinicsClinical practice and researchSelf-administeredEnglish (UK)YesYesNone
      IIP 64 [
      • Barkham M.
      • Hardy G.E.
      • Startup M.
      The structure, validity and clinical relevance of the inventory of interpersonal problems.
      ,
      • Horowitz L.M.
      • Alden L.E.
      • Wiggins J.S.
      • Pincus A.L.
      Inventory of interpersonal problems manual.
      ]
      Measure of distress and determining source of interpersonal difficulties, by assessing eight domains: Domineering/controlling, vindictive/self-centred, cold/distant, socially inhibited, nonassertive, overly accommodating, self-sacrificing, and intrusive/needy.
      These domains/concept have been defined, please refer to the reference.
      Patients attending psychotherapy reporting interpersonal difficultiesClinical practice, researchSelf-administered, interview-basedEnglish (US)YesYesYes
      IIP 32 [
      • Horowitz L.M.
      • Alden L.E.
      • Wiggins J.S.
      • Pincus A.L.
      Inventory of interpersonal problems manual.
      ,
      • Barkham M.
      • Hardy G.E.
      • Startup M.
      The IIP-32: a short version of the inventory of interpersonal problems.
      ]
      Assumed same as IIP 64Assumed same as IIP 64Assumed same as IIP 64Self-administered, interview-basedEnglish (US)YesYesYes
      MPI [
      • Kerns R.D.
      • Turk D.C.
      • Rudy T.E.
      The west haven-yale multidimensional pain inventory (WHYMPI).
      ]
      Measure of the subjective distress caused by pain and impact of pain on patients' livesPatient with chronic pain (men and women)Clinical practice and researchSelf-administered, interview- basedEnglish (US)YesNoneNone
      ODI 1.0 [
      • Fairbank J.C.
      • Couper J.
      • Davies J.B.
      • O'Brien J.P.
      The Oswestry low back pain disability questionnaire.
      ]
      Disability defined as the limitations of a patient's performance compared with that of a fit personPatients with acute or chronic lower back painClinical response to treatmentSelf-administeredEnglish (UK)YesYesNone
      ODI 2.1a [
      • Fairbank J.C.
      • Pynsent P.B.
      The oswestry disability index.
      ]
      Assumed same as Oswestry Disability Index 1.0Assumed same as Oswestry Disability Index 1.0Assumed same as Oswestry Disability Index 1.0Assumed same as Oswestry Disability Index 1.0English (UK)YesYesNone
      PBPI [
      • Williams D.A.
      • Thorn B.E.
      An empirical assessment of pain beliefs.
      ,
      • Williams D.A.
      • Robinson M.E.
      • Geisser M.E.
      Pain beliefs: assessment and utility.
      ]
      Measure of pain beliefs, assessing four domains: mystery, self-blame, constancy, and permanence
      These domains/concept have been defined, please refer to the reference.
      Injured workers (men and women) receiving compensation with chronic pain as a result of injury at work, not defined.Clinical practice, researchSelf-administeredEnglish (US)NoNot statedYes
      SAQ [
      • Thirlaway K.
      • Fallowfield L.
      • Cuzick J.
      The Sexual Activity Questionnaire: a measure of women's sexual functioning.
      ]
      Measure of sexual function, no definition givenWomen on long-term Tamoxifen with a high risk of developing breast cancer.Unclear context, implied for clinical trialsSelf-administeredEnglish (UK)NoNot statedNone
      SF-36 [
      • Ware J.
      • Snow K.K.
      • Kosinski M.A.
      • Gandek B.
      SF-36 health survey manual and interpretation guide.
      ,
      • Ware Jr., J.E.
      • Sherbourne C.D.
      The MOS 36-item short-form health survey (SF-36). I. Conceptual framework and item selection.
      ]
      Generic health, eight concepts, assessing physical functioning, social and role functioning, mental health, general health, perceptions, bodily pain, and vitality
      These domains/concept have been defined, please refer to the reference.
      General and patient populationClinical practice, research, health policy evaluations and general population health surveySelf-administered, interview administrationEnglish (US)YesNoneNone
      SF-12 [
      • Ware J.E.
      • Kosinski M.
      • Keller S.D.
      SF-12: how to score the SF-12 physical and mental health summary scales.
      ]
      Assumed same as SF-36Assumed same as SF-36Assumed same as SF-36Assumed same as SF-36English (US)YesNoneNone
      WHOQoL-100 [
      World Health Organization
      Programme on mental health: WHOQoL user manual.
      ,
      The World Health Organization Quality of Life Assessment (WHOQOL)
      Development and general psychometric properties.
      ]
      Generic measure of quality of life cross-culturally (definition given), six domains identified as core aspect of quality of life cross-culturally: physical, psychological, level of independence, social relationships, environment, and personal beliefs/spiritualty
      These domains/concept have been defined, please refer to the reference.
      Patient groups in both developing and developed countriesClinical practice, clinical trials, epidemiological research, health policy, and service evaluationSelf-administered, interview-basedVarious languages

      -more than 30
      YesNoneYes
      WHOQoL-Bref [
      World Health Organization
      Programme on mental health: WHOQoL user manual.
      ,
      Development of the world health organization WHOQOL-BREF quality of life assessment. The WHOQOL group.
      ]
      Generic measure of quality of life cross-culturally (definition given), four domains identified as core aspect of quality of life cross-culturally: physical, psychological, social relationships, and environment
      These domains/concept have been defined, please refer to the reference.
      Assumed same as WHOQol-100Assumed same as WHOQol-100Assumed same as WHOQol-100Various languages

      -more than 30
      YesYesYes
      Abbreviations: BDI, becks depression inventory; BPI, brief pain inventory; EHP 30, endometriosis health profile 30; EHP-5, endometriosis health profile 5; EQ-5D 3L, EuroQoL 5D 3L; EQ-5D 5L, EuroQoL 5D 5L; FABQ, fears avoidance beliefs questionnaire; HADS, hospital anxiety and depression scale; IIP 64, inventory of interpersonal problems 64; IIP 32, inventory of interpersonal problems 32; MPI, multidimensional pain inventory; ODI 1.0, oswestry disability index 1.0; ODI 2.1a, oswestry disability index 2.1a; PBPQ, pain beliefs and perception questionnaire; PROM, patient-reported outcome measures; QoL, quality of life; SAQ, sexual activity questionnaire; SF 36, short form survey 36; SF 12, short form survey 12; WHOQoL, World Health Organization Quality of Life Questionnaire; PROM, patient reported outcome measures.
      a Each version of a PROM is considered a separate PROM.
      b These domains/concept have been defined, please refer to the reference.
      Four PROMs (BPI, FABQ, MPI, and PBPI) were developed to assess QoL in patients with chronic pain. Only two PROMs (EHP 30 and EHP 5) were designed specifically to assess QoL in women with CPP; however, this was secondary to endometriosis. The remaining 13 PROMs were developed generically to assess the QoL among patients with various health conditions.
      Overall, PROM development was considered inadequate for all instruments included in this review. Almost half of PROMs (8/19, 42.11%) did not involve patients in their development (EQ5D-3L, HADS, MPI, ODI 1.0, ODI 2.1a, SAQ, SF 12, and SF 36) (Table 1). Concept elicitation was deemed inadequate for 17 PROMs (Table 2). The other two PROMs, BPI and EQ5D-5L, were considered doubtful as it was unclear whether the patients included were representative of the target population. The number and characteristics of patients in the development were not reported. Eight PROMs (8/19, 42.11%) failed to conduct development studies in a sample representing the target population for which the PROM was developed (EQ-5D 3L, HADS, MPI, ODI 1.0, ODI 2.1a, SAQ, SF 12, and SF 36) (Table 2).
      Table 2Content validity assessment for QoL instruments for female CPP
      PROMReferenceConcept elicitation study; quality
      Quality rated as very good, adequate, doubtful, inadequate, or not applicable.
      Concept elicitation study; patient involvementCognitive study performedCognitive study; quality
      Quality rated as very good, adequate, doubtful, inadequate, or not applicable.
      Overall quality of PROM development study
      Quality rated as very good, adequate, doubtful, inadequate, or not applicable.
      BDI 2.0[
      • Beck A.T.
      • Ward A.T.
      • Mendelson M.
      • Mock J.
      • Erbaugh J.
      An inventory for measuring depression.
      ,
      • Beck A.T.
      • Steer R.A.
      • Brown G.K.
      Manual for the Beck Depression Inventory-II.
      ]
      InadequateYesNon/aInadequate
      BPI[
      • Chiarotto A.
      • Ostelo R.W.
      • Boers M.
      • Terwee C.B.
      A systematic review highlights the need to investigate the content validity of patient-reported outcome measures for physical functioning in patients with low back pain.
      ,
      • Cleeland C.S.
      The brief pain inventory user guide.
      ]
      DoubtfulYesNon/aInadequate
      EHP 30[
      • Jones G.
      • Kennedy S.
      • Barnard A.
      • Wong J.
      • Jenkinson C.
      Development of an endometriosis quality-of-life instrument: the endometriosis health profile-30.
      ]
      InadequateYesYesDoubtfulInadequate
      EHP 5[
      • Jones G.
      • Kennedy S.
      • Barnard A.
      • Wong J.
      • Jenkinson C.
      Development of an endometriosis quality-of-life instrument: the endometriosis health profile-30.
      ,
      • Jones G.
      • Jenkinson C.
      • Kennedy S.
      Development of the short form endometriosis health profile questionnaire: the EHP-5.
      ]
      InadequateYesNon/aInadequate
      EQ-5D 3L[
      • Chiarotto A.
      • Terwee C.B.
      • Kamper S.J.
      • Boers M.
      • Ostelo R.W.
      Evidence on the measurement properties of health-related quality of life instruments is largely missing in patients with low back pain: a systematic review.
      ]
      InadequateNoneNon/aInadequate
      EQ-5D 5L[
      • Herdman M.
      • Gudex C.
      • Lloyd A.
      • Janssen M.
      • Kind P.
      • Parkin D.
      • et al.
      Development and preliminary testing of the new five-level version of EQ-5D (EQ-5D-5L).
      ,
      • Craxford S.
      • Deacon C.
      • Myint Y.
      • Ollivere B.
      Assessing outcome measures used after rib fracture: a COSMIN systematic review.
      ]
      DoubtfulYesYesInadequateInadequate
      FABQ[
      • Waddell G.
      • Newton M.
      • Henderson I.
      • Somerville D.
      • Main C.
      A Fear-Avoidance Beliefs Questionnaire (FABQ) and the role of fear-avoidance beliefs in chronic low back pain and disability.
      ]
      InadequateYesNon/aInadequate
      HADS[
      • Zigmond A.S.
      • Snaith R.P.
      The hospital anxiety and depression scale.
      ,
      • Craxford S.
      • Deacon C.
      • Myint Y.
      • Ollivere B.
      Assessing outcome measures used after rib fracture: a COSMIN systematic review.
      ]
      InadequateNoneNon/aInadequate
      IIP-64[
      • Barkham M.
      • Hardy G.E.
      • Startup M.
      The structure, validity and clinical relevance of the inventory of interpersonal problems.
      ,
      • Horowitz L.M.
      • Alden L.E.
      • Wiggins J.S.
      • Pincus A.L.
      Inventory of interpersonal problems manual.
      ,
      • Alden L.E.
      • Wiggins J.S.
      • Pincus A.L.
      Construction of circumplex scales for the inventory of interpersonal problems.
      ,
      • Horowitz L.M.
      • Rosenberg S.E.
      • Baer B.A.
      • Ureño G.
      • Villaseñor V.S.
      Inventory of interpersonal problems: psychometric properties and clinical applications.
      ]
      InadequateYesNon/aInadequate
      IIP-32[
      • Barkham M.
      • Hardy G.E.
      • Startup M.
      The structure, validity and clinical relevance of the inventory of interpersonal problems.
      ,
      • Horowitz L.M.
      • Alden L.E.
      • Wiggins J.S.
      • Pincus A.L.
      Inventory of interpersonal problems manual.
      ,
      • Barkham M.
      • Hardy G.E.
      • Startup M.
      The IIP-32: a short version of the inventory of interpersonal problems.
      ,
      • Horowitz L.M.
      • Rosenberg S.E.
      • Baer B.A.
      • Ureño G.
      • Villaseñor V.S.
      Inventory of interpersonal problems: psychometric properties and clinical applications.
      ]
      InadequateYesNon/aInadequate
      MPI[
      • Chiarotto A.
      • Ostelo R.W.
      • Boers M.
      • Terwee C.B.
      A systematic review highlights the need to investigate the content validity of patient-reported outcome measures for physical functioning in patients with low back pain.
      ,
      • Kerns R.D.
      • Turk D.C.
      • Rudy T.E.
      The west haven-yale multidimensional pain inventory (WHYMPI).
      ]
      InadequateNoneNon/aInadequate
      ODI 1.0[
      • Chiarotto A.
      • Ostelo R.W.
      • Boers M.
      • Terwee C.B.
      A systematic review highlights the need to investigate the content validity of patient-reported outcome measures for physical functioning in patients with low back pain.
      ]
      InadequateNoneNon/aInadequate
      ODI 2.1a[
      • Chiarotto A.
      • Ostelo R.W.
      • Boers M.
      • Terwee C.B.
      A systematic review highlights the need to investigate the content validity of patient-reported outcome measures for physical functioning in patients with low back pain.
      ]
      InadequateNoneNon/aInadequate
      PBPQ[
      • Williams D.A.
      • Thorn B.E.
      An empirical assessment of pain beliefs.
      ]
      InadequateYesNon/aInadequate
      SAQ[
      • Thirlaway K.
      • Fallowfield L.
      • Cuzick J.
      The Sexual Activity Questionnaire: a measure of women's sexual functioning.
      ]
      InadequateNoneNon/aInadequate
      SF 36[
      • Chiarotto A.
      • Ostelo R.W.
      • Boers M.
      • Terwee C.B.
      A systematic review highlights the need to investigate the content validity of patient-reported outcome measures for physical functioning in patients with low back pain.
      ,
      • Ware J.
      • Snow K.K.
      • Kosinski M.A.
      • Gandek B.
      SF-36 health survey manual and interpretation guide.
      ]
      InadequateNoneNon/aInadequate
      SF 12[
      • Chiarotto A.
      • Ostelo R.W.
      • Boers M.
      • Terwee C.B.
      A systematic review highlights the need to investigate the content validity of patient-reported outcome measures for physical functioning in patients with low back pain.
      ,
      • Ware J.E.
      • Kosinski M.
      • Keller S.D.
      SF-12: how to score the SF-12 physical and mental health summary scales.
      ]
      InadequateNoneNon/aInadequate
      WHOQoL[
      World Health Organization
      Programme on mental health: WHOQoL user manual.
      ,
      The World Health Organization Quality of Life Assessment (WHOQOL)
      Development and general psychometric properties.
      ]
      InadequateYesYesDoubtfulInadequate
      WHOQoL-Bref[
      World Health Organization
      Programme on mental health: WHOQoL user manual.
      ,
      Development of the world health organization WHOQOL-BREF quality of life assessment. The WHOQOL group.
      ]
      InadequateYesYesDoubtfulInadequate
      Abbreviations: BDI, becks depression inventory; BPI, brief pain inventory; CPP, chronic pelvic pain; EHP 30, endometriosis health profile 30; EHP-5, endometriosis health profile 5; EQ-5D 3L, EuroQoL 5D 3L; EQ-5D 5L, EuroQoL 5D 5L; FABQ, fears avoidance beliefs questionnaire; HADS, hospital anxiety and depression scale; IIP 64, inventory of interpersonal problems 64; IIP 32, inventory of interpersonal problems 32; MPI, multidimensional pain inventory; ODI 1.0, oswestry disability index 1.0; ODI 2.1a, oswestry disability index 2.1a; PBPQ, pain beliefs and perception questionnaire; QoL, quality of life; SAQ, sexual activity questionnaire; SF 36, short form survey 36; SF 12, short form survey 12; WHOQoL, World Health Organization Quality of Life Questionnaire.
      a Quality rated as very good, adequate, doubtful, inadequate, or not applicable.
      Only four PROMs featured cognitive interviews with patients in their development process (EHP-30, EQ5D-5L, WHOQol, and WHOQol-Bref) (Table 2). Cognitive interviews were inadequate for the EQ 5D-5L because the final form of the questionnaire was not tested. Cognitive interviews were of doubtful quality for the following PROM's: EHP-30, WHOQol, and WHOQol-Bref. For the EHP-30 no details were provided regarding the methods used to assess the comprehensibility and comprehensiveness. Cognitive interviews for the WHOQoL and WHOQoL-Bref did examine comprehensibility; however, limited details regarding assessment were provided. It was unclear whether comprehensiveness was evaluated for the WHOQoL and WHOQoL-Bref.

      4.2 Content validity studies

      We only identified one study evaluating content validity of a single PROM–SF 36 [
      • Stones R.W.
      • Selfe S.A.
      • Fransman S.
      • Horn S.A.
      Psychosocial and economic impact of chronic pelvic pain.
      ]. This study involved 105 female patients presenting with CPP and assessed relevance, comprehensibility, and comprehensiveness. This study was of doubtful quality as it was unclear which aspect was assessed. In addition, the study did not report the use of an interview/topic guide, trained moderators/interviewers, and whether two independent researchers conducted the analysis. No content validity studies were found including professionals.

      4.3 Evidence synthesis

      No high-quality evidence was available for the 19 PROMs included in this review (Table 3) . All PROMs had low or very low quality of evidence ratings for relevance, comprehensiveness, and comprehensibility. Four PROMs achieved a low quality of evidence rating for relevance (SF 36, EHP 30, EHP 5, and MPI). Quality of evidence on comprehensiveness was rated as low for two instruments (SF 36 and EHP 30). Only the SF 36 attained a low quality of evidence rating for comprehensibility. Based on these results, it is not possible to establish which PROM has the best content validity for QoL in women with CPP.
      Table 3Evidence synthesis on content validity for QoL instruments for female CPP
      PROMResponsiveness rating
      Ratings can be (+) sufficient, (−) insufficient, (±) inconsistent, and (?) indeterminate.
      ; quality
      Quality can be high, moderate, low, and very low.
      Comprehensiveness rating
      Ratings can be (+) sufficient, (−) insufficient, (±) inconsistent, and (?) indeterminate.
      ; quality
      Quality can be high, moderate, low, and very low.
      Comprehensibility rating
      Ratings can be (+) sufficient, (−) insufficient, (±) inconsistent, and (?) indeterminate.
      ; quality
      Quality can be high, moderate, low, and very low.
      BDI 2.0?; very low−; very low−; very low
      BPI±; very low?; very low?; very low
      EHP 30+; low+; low?; very low
      EHP 5+; low?; very low?; very low
      EQ-5D 3L?; very low−; very low?; very low
      EQ-5D 5L?; very low?; very low?; very low
      FABQ±, very low−; very low?; very low
      HADS?; very low?; very low?; very low
      IIP-64±; very low?; very low?; very low
      IIP-32±; very low?; very low?; very low
      MPI+; low?; very low?; very low
      ODI 1.0?; very low−; very low−; very low
      ODI 2.1a?; very low−; very low−; very low
      PBPQ?; very low−; very low−; very low
      SAQ?; very low?; very low−; very low
      SF 36 [
      • Stones R.W.
      • Selfe S.A.
      • Fransman S.
      • Horn S.A.
      Psychosocial and economic impact of chronic pelvic pain.
      ]
      ?; low?; low?; low
      SF 12±; very low?; very low?; very low
      WHOQoL?; very low?; very low?; very low
      WHOQoL-Bref±; very low?; very low?; very low
      Abbreviations: BDI, becks depression inventory; PBPQ, pain beliefs and perception questionnaire; FABQ, fears avoidance beliefs questionnaire; SAQ, sexual activity questionnaire; BPI, brief pain inventory; HADS, hospital anxiety and depression scale; SF 36, short form survey 36; SF 12, short form survey 12; EHP 30, endometriosis health profile 30; EHP-5, endometriosis health profile 5; IIP 64, inventory of interpersonal problems 64; IIP 32, inventory of interpersonal problems 32; ODI 1.0, oswestry disability index 1.0; ODI 2.1a, oswestry disability index 2.1a; MPI, multidimensional pain inventory.
      a Ratings can be (+) sufficient, (−) insufficient, (±) inconsistent, and (?) indeterminate.
      b Quality can be high, moderate, low, and very low.

      5. Discussion

      5.1 Main findings

      Overall, PROM development was inadequate for all instruments used to assess QoL in women with CPP. No high-quality evidence ratings were found for relevance, comprehensiveness, and comprehensibility. QoL was measured using generic instruments (68.42%, 13/19) rather than those specific to chronic pain (21.04%, 4/19) or pelvic pain (10.53%, 2/19). Quality of concept elicitation was inadequate for 90% of PROMs with most failing to involve patients in their development or a sample representative of the target population for which the PROM was developed.
      Only a fifth of PROMs were developed using cognitive interviews assessing comprehensiveness and comprehensibility. We identified one content validity study assessing the relevance, comprehensiveness, and comprehensibility of the SF 36 within a CPP population.

      5.2 Strengths and limitations

      This is the first systematic review to implement COSMIN criteria to assess the content validity of PROMs used to evaluate QoL in women with CPP. In our evaluation, we considered the methodological quality, the development process, findings of the content validity study, and content of the instrument itself. We used robust and reproducible methods, which have been successfully implemented in studies evaluating content validity of PROMs in various medical specialities [
      • Chiarotto A.
      • Ostelo R.W.
      • Boers M.
      • Terwee C.B.
      A systematic review highlights the need to investigate the content validity of patient-reported outcome measures for physical functioning in patients with low back pain.
      ,
      • Davies C.F.
      • Macefield R.
      • Avery K.
      • Blazeby J.M.
      • Potter S.
      Patient-reported outcome measures for post-mastectomy breast reconstruction: a systematic review of development and measurement properties.
      ,
      • Comins J.
      • Siersma V.
      • Couppe C.
      • Svensson R.B.
      • Johansen F.
      • Malmgaard-Clausen N.M.
      • et al.
      Assessment of content validity and psychometric properties of VISA-A for Achilles tendinopathy.
      ].
      This study has limitations. The validity of an instrument is dependent on the interpretation of instrument scores in a given application [
      • Chiarotto A.
      • Ostelo R.W.
      • Boers M.
      • Terwee C.B.
      A systematic review highlights the need to investigate the content validity of patient-reported outcome measures for physical functioning in patients with low back pain.
      ]. Therefore, our findings may not be generalizable to every context. In addition, the current perspective on validity differs and focuses only on inferences, claims, or decisions made, based on instrument scores rather than the instruments itself [
      • Chiarotto A.
      • Ostelo R.W.
      • Boers M.
      • Terwee C.B.
      A systematic review highlights the need to investigate the content validity of patient-reported outcome measures for physical functioning in patients with low back pain.
      ]. We limited the inclusion of PROMs to those from effectiveness trials only. There may be a possibility that this is not an exhaustive list of all PROMs reporting QoL in CPP. However, this systematic review is part of a wider initiative to produce a COS which will stipulate a minimum set of outcomes for effectiveness trials to report. Therefore, our study findings will support and contribute to the evaluation, selection, and recommendation of appropriate instruments to measure these core outcomes. Although we aimed to include studies describing cross-cultural adaptations of PROMs, we found no studies specific to women with idiopathic CPP. Consequently, our results were limited to studies written in the English language. Content validity is one of many parameters to assess the quality of measurement instruments. The focus of our study may appear limited; however, based on our results, further analyses may be conducted to assess additional parameters.

      5.3 Interpretation

      Findings of this review are concerning for clinicians who routinely use PROMs in their clinical practice. We demonstrated that frequently used PROMs reporting QoL outcomes lack content validity. It is essential that high content validity instruments are used to generate data which are relevant and meaningful as they may influence decisions made by health professionals and patients regarding current or future treatment options. For example, the European Medicines Agency has acknowledged the role of patients’ perspective including the impact of anticancer medication affecting their wellbeing and daily life. The collection of PROMs in this instance is an important aspect of evaluating clinical benefits and efficacy of new drugs which are not gained from objective or clinical assessments [
      European Medicines Agency Committee for Medicinal Products for Human Use (CHMP)
      Appendix 2 to the guideline on the evaluation of anticancer medicinal products in man. The use of patient-reported outcome (pro) measures in oncology studies.
      ].
      High content validity of PROMs is ensured by items generated during concept elicitation reflecting what is important to patients. This can be accomplished by undertaking interviews/focus groups thereby producing items using the language of the subjects interviewed and incorporating the content of qualitative statements made by patients [
      • Brod M.
      • Tesler L.E.
      • Christensen T.L.
      Qualitative research and content validity: developing best practices based on science and experience.
      ]. Developers of future instruments should focus on involving patients from the population of interest. Therefore, creating PROMs which represent patient priorities and items which are acceptable, comprehensive, and relevant to their condition.
      Clinicians should be encouraged to use PROMs as evidence suggests the reporting of PROMs, increases patient satisfaction with treatment, and improves adherence to regimes [
      • Chen J.
      • Ou L.
      • Hollis S.J.
      A systematic review of the impact of routine collection of patient reported outcome measures on patients, providers and health organisations in an oncologic setting.
      ,
      • Wartolowska K.
      The nocebo effect as a source of bias in the assessment of treatment effects.
      ]. However, using PROMs beyond their intended use may result in data lacking responsiveness and clinical meaning. Instruments in their current format may be inappropriate and contain irrelevant items to the population being studied. This review demonstrated that majority of instruments lacked content validity assessment supporting their use in a CPP population. We identified a single content validity study assessing the SF 36 in a CPP population [
      • Stones R.W.
      • Selfe S.A.
      • Fransman S.
      • Horn S.A.
      Psychosocial and economic impact of chronic pelvic pain.
      ]. Stones et al. identified that questions such as those describing pain did not reflect the episodic/intermittent nature of CPP but rather implied that pain is constant. In addition, participants with CPP found the timeframe of the questionnaire problematic and questions regarding avoidance behaviour with respect to activity and use of analgesia which affected their current pain experience.
      It is essential that instruments used by clinicians and researchers are “fit for purpose’’. This requires adaptation or modification of existing PROMS where the development and subsequent validation of the instrument occurred in a different population compared to that of the study population []. Cognitive interviews can be used to adapt existing PROMs by modifying instructions or items in response to patient feedback received and therefore minimizing missing or inaccurate data. Cognitive interviewing serves two purposes: (1) does the instrument content represents the most important aspect of the construct of interest and (2) do respondents understand how to complete the instrument including a clear understanding of instructions, interpretation of items, appropriate recall periods, how to use scales, and any other factors that may influence participant responses. Without prior testing of the questionnaire, it is unknown whether patients will encounter difficulties when completing the questionnaire.
      Our findings confirm previous reports of poor reporting of qualitative methods with respect to establishing content validity [
      • Ricci L.
      • Lanfranchi J.-B.
      • Lemetayer F.
      • Rotonda C.
      • Guillemin F.
      • Coste J.
      • et al.
      Qualitative methods used to generate questionnaire items: a systematic review.
      ]. It is essential that processes for evaluating content validity are transparent and well documented for scientific and regulatory purposes []. Multiple guidelines have provided recommendations for demonstrating content validity during PROM development including a literature review, conducting concept elicitation reviews or focus groups, data analysis, item generation, and performing cognitive interviews [
      • Brod M.
      • Tesler L.E.
      • Christensen T.L.
      Qualitative research and content validity: developing best practices based on science and experience.
      ,
      • Lasch K.E.
      • Marquis P.
      • Vigneux M.
      • Abetz L.
      • Arnould B.
      • Bayliss M.
      • et al.
      PRO development: rigorous qualitative research as the crucial foundation.
      ,
      • Patrick D.L.
      • Burke L.B.
      • Gwaltney C.J.
      • Leidy N.K.
      • Martin M.L.
      • Molsen E.
      • et al.
      Content validity--establishing and reporting the evidence in newly developed patient-reported outcomes (PRO) instruments for medical product evaluation: ISPOR PRO good research practices task force report: part 1--eliciting concepts for a new PRO instrument.
      ,
      • Patrick D.L.
      • Burke L.B.
      • Gwaltney C.J.
      • Leidy N.K.
      • Martin M.L.
      • Molsen E.
      • et al.
      Content validity--establishing and reporting the evidence in newly developed patient-reported outcomes (PRO) instruments for medical product evaluation: ISPOR PRO Good Research Practices Task Force report: part 2--assessing respondent understanding.
      ]. Variations in reporting of a qualitative methodology may be attributed to inadequate dissemination of guidance and knowledge among researchers or a lack of demand from editorial boards of journals [
      • Ricci L.
      • Lanfranchi J.-B.
      • Lemetayer F.
      • Rotonda C.
      • Guillemin F.
      • Coste J.
      • et al.
      Qualitative methods used to generate questionnaire items: a systematic review.
      ].
      This review was conducted as part of a wider project to establish COS in CPP. COS have the potential to reduce the inconsistencies in outcome reporting. However, the adoption and usefulness of COS may be limited by a lack of recommendations on how to measure core outcomes [
      • Ramsey I.
      • Eckert M.
      • Hutchinson A.D.
      • Marker J.
      • Corsini N.
      Core outcome sets in cancer and their approaches to identifying and selecting patient-reported outcome measures: a systematic review.
      ]. There is substantial variation in the methods used to identify, appraise, and select PROMs for COS across a range of medical specialities [
      • Ramsey I.
      • Eckert M.
      • Hutchinson A.D.
      • Marker J.
      • Corsini N.
      Core outcome sets in cancer and their approaches to identifying and selecting patient-reported outcome measures: a systematic review.
      ]. Our previous systematic review demonstrated an inconsistent use of PROMs and variation of outcomes reporting in therapeutic trials of women with CPP [
      • Ghai V.
      • Subramanian V.
      • Jan H.
      • Pergialiotis V.
      • Thakar R.
      • Doumouchtsis S.K.
      A systematic review on reported outcomes and outcome measures in female idiopathic chronic pelvic pain for the development of a core outcome set.
      ]. Resulting differences in outcome domains, terminology, subscales, and scoring prevent comparison and synthesis of data. The COSMIN/COMET guideline provides a practical four-step method to guide COS developers undertaking this process, including recommendations concerning the selection of measures for a COS: (1) select one instrument per outcome; (2) ensure there is high-quality evidence for content validity, internal consistency, and feasibility of the instrument; and (3) obtain consensus on the instrument [
      • Prinsen C.A.C.
      • Vohra S.
      • Rose M.R.
      • Boers M.
      • Tugwell P.
      • Clarke M.
      • et al.
      How to select outcome measurement instruments for outcomes included in a “Core Outcome Set” - a practical guideline.
      ]. Uptake of the COSMIN guidance will ensure core outcomes are operationalized and consistently measured.
      We evaluated content validity of PROMs reporting QoL outcomes in women with CPP. It is the first measurement property to consider when selecting a PROM. Our research group is in the process of evaluating further psychometric properties. These findings will inform future discussions thereby facilitating a consensus because valid and reliable instruments are recommended for the assessment of core outcomes such as QoL.

      6. Conclusion

      This systematic review has shown poor quality evidence for content validity of PROMs measuring QoL in women with CPP. Developers of future instruments should pay attention to the judicious documentation of qualitative research methods and consider the COSMIN criteria when developing PROMs.

      CRediT authorship contribution statement

      Vishalli Ghai: Study conception, design, data collection, analysis, and drafting the manuscript. Venkatesh Subramanian: Data collection, analysis, and review of draft manuscript. Haider Jan: Review of study design and review of draft manuscript. Stergios K. Doumouchtsis: Study conception, review of study design, data collection, and draft manuscript.

      References

        • Mathias S.D.
        • Kuppermann M.
        • Liberman R.F.
        • Lipschutz R.C.
        • Steege J.F.
        Chronic pelvic pain: prevalence, health-related quality of life, and economic correlates.
        Obstet Gynecol. 1996; 87: 321-327
        • Chen I.
        • Thavorn K.
        • Shen M.
        • Goddard Y.
        • Yong P.
        • MacRae G.S.
        • et al.
        Hospital-associated costs of chronic pelvic pain in Canada: a population-based descriptive study.
        J Obstet Gynaecol Can. 2017; 39: 174-180
        • Chiarotto A.
        • Terwee C.B.
        • Kamper S.J.
        • Boers M.
        • Ostelo R.W.
        Evidence on the measurement properties of health-related quality of life instruments is largely missing in patients with low back pain: a systematic review.
        J Clin Epidemiol. 2018; 102: 23-37
        • Ghai V.
        • Subramanian V.
        • Jan H.
        • Pergialiotis V.
        • Thakar R.
        • Doumouchtsis S.K.
        A systematic review on reported outcomes and outcome measures in female idiopathic chronic pelvic pain for the development of a core outcome set.
        BJOG. 2020; 128: 1471-6412
        • Ghai V.
        • Subramanian V.
        • Jan H.
        • Thakar R.
        • Doumouchtsis S.K.
        CHORUS: an International Collaboration for Harmonising Outcomes, Research, and Standards in Urogynaecology and Women's Health. A meta-synthesis of qualitative literature on female chronic pelvic pain for the development of a core outcome set: a systematic review.
        Int Urogynecol J. 2021; 32: 1187-1194
        • Mokkink L.B.
        • Terwee C.B.
        • Patrick D.L.
        • Alonso J.
        • Stratford P.W.
        • Knol D.L.
        • et al.
        The COSMIN study reached international consensus on taxonomy, terminology, and definitions of measurement properties for health-related patient-reported outcomes.
        J Clin Epidemiol. 2010; 63: 737-745
        • Prinsen C.A.C.
        • Vohra S.
        • Rose M.R.
        • Boers M.
        • Tugwell P.
        • Clarke M.
        • et al.
        How to select outcome measurement instruments for outcomes included in a “Core Outcome Set” - a practical guideline.
        Trials. 2016; 17 (10): 449
        • Jones G.L.
        • Kennedy S.H.
        • Jenkinson C.
        Health-related quality of life measurement in women with common benign gynecologic conditions: a systematic review.
        Am J Obstet Gynecol. 2002; 187: 501-511
        • Bourdel N.
        • Chauvet P.
        • Billone V.
        • Douridas G.
        • Fauconnier A.
        • Gerbaud L.
        • et al.
        Systematic review of quality of life measures in patients with endometriosis.
        PLoS One. 2019; 14: e0208464
        • Traylor J.
        • Chaudhari A.
        • Tsai S.
        • Milad M.P.
        Patient-reported outcome measures in benign gynecologic surgery: updates and selected tools.
        Curr Opin Obstet Gynecol. 2019; 31: 259-266
        • Neelakantan D.
        • Omojole F.
        • Clark T.J.
        • Gupta J.K.
        • Khan K.S.
        Quality of life instruments in studies of chronic pelvic pain: a systematic review.
        J Obstet Gynaecol. 2004; 24: 851-858
        • Terwee C.B.
        • Prinsen C.A.C.
        • Chiarotto A.
        • Westerman M.J.
        • Patrick D.L.
        • Alonso J.
        • et al.
        COSMIN methodology for evaluating the content validity of patient-reported outcome measures: a Delphi study.
        Qual Life Res. 2018; 27: 1159-1170
        • Prinsen C.A.C.
        • Mokkink L.B.
        • Bouter L.M.
        • Alonso J.
        • Patrick D.L.
        • de Vet H.C.W.
        • et al.
        COSMIN guideline for systematic reviews of patient-reported outcome measures.
        Qual Life Res. 2018; 27: 1147-1157
        • Mokkink L.B.
        • de Vet H.C.W.
        • Prinsen C.A.C.
        • Patrick D.L.
        • Alonso J.
        • Bouter L.M.
        • et al.
        COSMIN risk of bias checklist for systematic reviews of patient-reported outcome measures.
        Qual Life Res. 2018; 27: 1171-1179
        • Liberati A.
        • Altman D.G.
        • Tetzlaff J.
        • Mulrow C.
        • Gøtzsche P.C.
        • Ioannidis J.P.A.
        • et al.
        The PRISMA statement for reporting systematic reviews and meta-analyses of studies that evaluate healthcare interventions: explanation and elaboration.
        BMJ. 2009; 339: b2700
        • Terwee C.B.
        • Jansma E.P.
        • Riphagen I.I.
        • de Vet H.C.W.
        Development of a methodological PubMed search filter for finding studies on measurement properties of measurement instruments.
        Qual Life Res. 2009; 18: 1115-1123
        • Beaton D.E.
        • Bombardier C.
        • Guillemin F.
        • Ferraz M.B.
        Guidelines for the process of cross-cultural adaptation of self-report measures.
        Spine. 2000; 25: 3186-3191
        • Guyatt G.H.
        • Oxman A.D.
        • Vist G.E.
        • Kunz R.
        • Falck-Ytter Y.
        • Alonso-Coello P.
        • et al.
        GRADE: an emerging consensus on rating quality of evidence and strength of recommendations.
        BMJ. 2008; 336: 924-926
        • Beck A.T.
        • Ward A.T.
        • Mendelson M.
        • Mock J.
        • Erbaugh J.
        An inventory for measuring depression.
        Arch Gen Psychiatry. 1961; 4: 561-571
        • Beck A.T.
        • Steer R.A.
        • Brown G.K.
        Manual for the Beck Depression Inventory-II.
        Psychological Corporation, San Antonio, TX1996
        • Cleeland C.S.
        The brief pain inventory user guide.
        (Available at)
        • Jones G.
        • Kennedy S.
        • Barnard A.
        • Wong J.
        • Jenkinson C.
        Development of an endometriosis quality-of-life instrument: the endometriosis health profile-30.
        Obstet Gynecol. 2001; 98: 258-264
        • Jones G.
        • Jenkinson C.
        • Kennedy S.
        Development of the short form endometriosis health profile questionnaire: the EHP-5.
        Qual Life Res. 2004; 13: 695-704
        • EuroQol Group
        EuroQol--a new facility for the measurement of health-related quality of life.
        Health Policy. 1990; 16: 199-208
        • EuroQol Research Foundation
        EQ-5D-3L user guide, Version 6.
        EuroQol Research Foundation, 2018 (Available at)
        https://euroqol.org/publications/user-guides
        Date accessed: November 16, 2021
        • EuroQol Research Foundation
        EQ-5D-5L user guide, Version 3.
        (Available at)
        https://euroqol.org/publications/user-guides
        Date: 2019
        Date accessed: November 16, 2021
        • Herdman M.
        • Gudex C.
        • Lloyd A.
        • Janssen M.
        • Kind P.
        • Parkin D.
        • et al.
        Development and preliminary testing of the new five-level version of EQ-5D (EQ-5D-5L).
        Qual Life Res. 2011; 20: 1727-1736
        • Waddell G.
        • Newton M.
        • Henderson I.
        • Somerville D.
        • Main C.
        A Fear-Avoidance Beliefs Questionnaire (FABQ) and the role of fear-avoidance beliefs in chronic low back pain and disability.
        Pain. 1993; 52: 157-168
        • Zigmond A.S.
        • Snaith R.P.
        The hospital anxiety and depression scale.
        Acta Psychiatr Scand. 1983; 67: 361-370
        • Barkham M.
        • Hardy G.E.
        • Startup M.
        The structure, validity and clinical relevance of the inventory of interpersonal problems.
        Br J Med Psychol. 1994; 67: 171-185
        • Horowitz L.M.
        • Alden L.E.
        • Wiggins J.S.
        • Pincus A.L.
        Inventory of interpersonal problems manual.
        Mind Garden Inc., Menlo Park, CA2003
        • Barkham M.
        • Hardy G.E.
        • Startup M.
        The IIP-32: a short version of the inventory of interpersonal problems.
        Br J Clin Psychol. 1996; 35: 21-35
        • Kerns R.D.
        • Turk D.C.
        • Rudy T.E.
        The west haven-yale multidimensional pain inventory (WHYMPI).
        Pain. 1985; 23: 345-356
        • Fairbank J.C.
        • Couper J.
        • Davies J.B.
        • O'Brien J.P.
        The Oswestry low back pain disability questionnaire.
        Physiotherapy. 1980; 66: 271-273
        • Fairbank J.C.
        • Pynsent P.B.
        The oswestry disability index.
        Spine (Phila Pa 1976). 2000; 25 (discussion 2952): 2940-2952
        • Williams D.A.
        • Thorn B.E.
        An empirical assessment of pain beliefs.
        Pain. 1989; 36: 351-358
        • Williams D.A.
        • Robinson M.E.
        • Geisser M.E.
        Pain beliefs: assessment and utility.
        Pain. 1994; 59: 71-78
        • Thirlaway K.
        • Fallowfield L.
        • Cuzick J.
        The Sexual Activity Questionnaire: a measure of women's sexual functioning.
        Qual Life Res. 1996; 5: 81-90
        • Ware J.
        • Snow K.K.
        • Kosinski M.A.
        • Gandek B.
        SF-36 health survey manual and interpretation guide.
        The Health Institute, New England Medical Centre, Boston, MA1993
        • Ware Jr., J.E.
        • Sherbourne C.D.
        The MOS 36-item short-form health survey (SF-36). I. Conceptual framework and item selection.
        Med Care. 1992; 30: 473-483
        • Ware J.E.
        • Kosinski M.
        • Keller S.D.
        SF-12: how to score the SF-12 physical and mental health summary scales.
        2nd ed. The Health Institute, New England Medical Centre, Boston, MA1995
        • World Health Organization
        Programme on mental health: WHOQoL user manual.
        World Health Organization, Geneva1998
        • The World Health Organization Quality of Life Assessment (WHOQOL)
        Development and general psychometric properties.
        Soc Sci Med. 1998; 46: 1569-1585
      1. Development of the world health organization WHOQOL-BREF quality of life assessment. The WHOQOL group.
        Psychol Med. 1998; 28: 551-558
        • Chiarotto A.
        • Ostelo R.W.
        • Boers M.
        • Terwee C.B.
        A systematic review highlights the need to investigate the content validity of patient-reported outcome measures for physical functioning in patients with low back pain.
        J Clin Epidemiol. 2018; 95: 73-93
        • Craxford S.
        • Deacon C.
        • Myint Y.
        • Ollivere B.
        Assessing outcome measures used after rib fracture: a COSMIN systematic review.
        Injury. 2019; 50: 1816-1825
        • Alden L.E.
        • Wiggins J.S.
        • Pincus A.L.
        Construction of circumplex scales for the inventory of interpersonal problems.
        J Pers Assess. 1990; 55: 521-536
        • Horowitz L.M.
        • Rosenberg S.E.
        • Baer B.A.
        • Ureño G.
        • Villaseñor V.S.
        Inventory of interpersonal problems: psychometric properties and clinical applications.
        J Consult Clin Psychol. 1988; 56: 885-892
        • Stones R.W.
        • Selfe S.A.
        • Fransman S.
        • Horn S.A.
        Psychosocial and economic impact of chronic pelvic pain.
        Baillieres Best Pract Res Clin Obstet Gynaecol. 2000; 14: 415-431
        • Davies C.F.
        • Macefield R.
        • Avery K.
        • Blazeby J.M.
        • Potter S.
        Patient-reported outcome measures for post-mastectomy breast reconstruction: a systematic review of development and measurement properties.
        Ann Surg Oncol. 2021; 28: 386-404
        • Comins J.
        • Siersma V.
        • Couppe C.
        • Svensson R.B.
        • Johansen F.
        • Malmgaard-Clausen N.M.
        • et al.
        Assessment of content validity and psychometric properties of VISA-A for Achilles tendinopathy.
        PLoS One. 2021; 16: e0247152
        • European Medicines Agency Committee for Medicinal Products for Human Use (CHMP)
        Appendix 2 to the guideline on the evaluation of anticancer medicinal products in man. The use of patient-reported outcome (pro) measures in oncology studies.
        (Available at)
        • Brod M.
        • Tesler L.E.
        • Christensen T.L.
        Qualitative research and content validity: developing best practices based on science and experience.
        Qual Life Res. 2009; 18: 1263-1278
        • Chen J.
        • Ou L.
        • Hollis S.J.
        A systematic review of the impact of routine collection of patient reported outcome measures on patients, providers and health organisations in an oncologic setting.
        BMC Health Serv Res. 2013; 13: 211-224
        • Wartolowska K.
        The nocebo effect as a source of bias in the assessment of treatment effects.
        F1000Res. 2019; 8: 5
      2. Patient-Reported Outcomes and Medical Devices - Guidance.
        (Available at)
        • Ricci L.
        • Lanfranchi J.-B.
        • Lemetayer F.
        • Rotonda C.
        • Guillemin F.
        • Coste J.
        • et al.
        Qualitative methods used to generate questionnaire items: a systematic review.
        Qual Health Res. 2019; 29: 149-156
        • Lasch K.E.
        • Marquis P.
        • Vigneux M.
        • Abetz L.
        • Arnould B.
        • Bayliss M.
        • et al.
        PRO development: rigorous qualitative research as the crucial foundation.
        Qual Life Res. 2010; 19: 1087-1096
        • Patrick D.L.
        • Burke L.B.
        • Gwaltney C.J.
        • Leidy N.K.
        • Martin M.L.
        • Molsen E.
        • et al.
        Content validity--establishing and reporting the evidence in newly developed patient-reported outcomes (PRO) instruments for medical product evaluation: ISPOR PRO good research practices task force report: part 1--eliciting concepts for a new PRO instrument.
        Value Health. 2011; 14: 967-977
        • Patrick D.L.
        • Burke L.B.
        • Gwaltney C.J.
        • Leidy N.K.
        • Martin M.L.
        • Molsen E.
        • et al.
        Content validity--establishing and reporting the evidence in newly developed patient-reported outcomes (PRO) instruments for medical product evaluation: ISPOR PRO Good Research Practices Task Force report: part 2--assessing respondent understanding.
        Value Health. 2011; 14: 978-988
        • Ramsey I.
        • Eckert M.
        • Hutchinson A.D.
        • Marker J.
        • Corsini N.
        Core outcome sets in cancer and their approaches to identifying and selecting patient-reported outcome measures: a systematic review.
        J Patient Rep Outcomes. 2020; 4 (12): 77