Strengthening the Reporting of Observational Studies in Epidemiology for respondent-driven sampling studies: ‘‘STROBE-RDS’’ statement

Objectives: Respondent-driven sampling (RDS) is a new data collection methodology used to estimate characteristics of hard-to-reach groups, such as the HIV prevalence in drug users. Many national public health systems and international organizations rely on RDS data. However, RDS reporting quality and available reporting guidelines are inadequate. We carried out a systematic review of RDS studies and present Strengthening the Reporting of Observational Studies in Epidemiology for RDS Studies (STROBE-RDS), a checklist of essential items to present in RDS publications, justiﬁed by an explanation and elaboration document. Study Design and Setting: We searched the MEDLINE (1970 e 2013), EMBASE (1974 e 2013), and Global Health (1910 e 2013) databases to assess the number and geographical distribution of published RDS studies. STROBE-RDS was developed based on STROBE guidelines, following Guidance for Developers of Health Research Reporting Guidelines. Results: RDS has been used in over 460 studies from 69 countries, including the USA (151 studies), China (70), and India (32). STROBE-RDS includes modiﬁcations to 12 of the 22 items on the STROBE checklist. The two key areas that required modiﬁcation concerned the selection of participants and statistical analysis of the sample. Conclusion: STROBE-RDS seeks to enhance the transparency and utility of research using RDS. If STROBE-RDS

Conflict of interest: All authors have completed the ICMJE uniform disclosure form at http://www.icmje.org/coi_disclosure.pdf and declare: no support from any organization for the submitted work; no financial relationships with any organizations that might have an interest in the submitted work in the previous 3 years; and no other relationships or activities that could appear to have influenced the submitted work. M.E. is a member of the STROBE group. 1 R.G.W and A.J.H. contributed equally to this work.

Introduction
Hidden or hard-to-reach population subgroups are often key to the maintenance of infectious diseases in human populations [1]. However, it is often difficult to investigate the factors that drive transmission in these groups by using commonly used epidemiologic methods of data collection because of the lack of an adequate sampling frame [2]. Researchers have therefore typically resorted to various types of convenience sampling to gather data on hidden populations [3]. Although convenience sampling has its advantages, this approach is unable to generate unbiased population-based estimates of infection prevalence and risk factors. In an attempt to address these limitations, respondent-driven sampling (RDS), a variant of a linktracing design, was proposed in 1997 [4].
RDS studies are characterized by both a specific data collection method and specific statistical analysis methods. Key features of the data collection include: (1) a small proportion of the sample is recruited by the researcher (i.e., the ''seeds'') and a large proportion of the sample is recruited (in recruitment ''waves'') by other members of the target population to whom they have a social relationship; (2) recording of recruitment connections between respondents (e.g., who recruited whom); (3) the maximum number of people that each participant can recruit is determined by the researcher by giving out a limited number of recruitment ''coupons''; (4) respondents are compensated for participating in the study and recruiting others into the study. Collectively, these features often make RDS an efficient data collection method [5]. Although often efficient, the RDS data collection method produces multiple challenges for the analysis. First, because most of the sampling is conducted by respondents, assumptions about the sampling process are needed. Second, under a variety of assumptions, not all members of the target population will have the same probability of selection, so this probability is typically estimated using a combination of modeling assumptions and study data. Finally, because sampling happens through pre-existing relationships, the observations are not independent. These challenges make point and variance estimation from RDS data more complex than from other forms of sampling. Numerous approaches have been developed, and more are currently under development [6e12].
Since its introduction in 1997 [4], there has been a rapid increase in the number of surveys of hidden or hardto-reach populations using the RDS methodology, primarily of individuals at risk of sexually or parenterally transmitted infections including HIV [5,13], but also on topics as diverse interpersonal violence [13] and strategies for improving cancer screening recruitment [14]. Many countries including the United States, Ukraine, Vietnam, Mauritius, Morocco, and Brazil use RDS as part of their national public health systems, and data from RDS studies are used by major public health organizations including the US Centers for Disease Control and Prevention, the Joint United Nations Programme on HIV/AIDSs, and the Global Fund to Fight AIDS, Tuberculosis and Malaria. The National Institutes of Health alone has awarded around $100 million in funding to projects using RDS and its variants [15].
Making sense of the rapidly increasing amount of data collected using RDS [5,16] is crucial to the integration of this information into the practice of medicine and public health. However, the assessment of the strengths and weaknesses of RDS data and methods has been limited by the inadequate reporting of RDS studies. An assessment of 22 randomly selected RDS studies has recently been carried out [17]. The assessment found that overall only around one-third of items sought were reported. Key details of the sampling and statistical methods were particularly poorly reported, including the methods of seed selection (reported in 45% of studies), the number of recruits from each seed and number of recruits in each recruitment wave (33%), the details of the recruitment venues (33%), eligibility criteria for seeds (!20%), wording of network size questions (!20%), if seeds were included in the analysis (!20%), how participants were trained to recruit others (0%), and an explanation for differences between unadjusted and adjusted estimates (0%) [17].
To improve the quality of reporting of observational epidemiologic studies, the Strengthening the Reporting of Observational studies in Epidemiology (STROBE) statement [18e21] and extensions [22,23] were developed. However, the STROBE statement is inadequate for reporting RDS studies because of the major differences in the RDS sampling and estimation procedures. Our aims are to present a systematic review of the number of RDS studies and present the ''STROBE-RDS'' statement, a checklist of essential items to present in RDS publications, justified and supported by a stand-alone explanation and elaboration document.

Systematic literature review of RDS studies
A systematic literature review was carried out purely to assess the number of published RDS studies and summarize their geographical distribution. Briefly, we searched the MEDLINE (1970e2013), EMBASE (1974e2013), and Global Health (1910e2013) databases and asked experts for their collections of relevant articles. Studies conducted in any country, in any language, among any study population were included; reviews, editorials, commentaries, and methodological articles were excluded. A previous assessment had identified the inadequacy of previous RDS reporting [17], and therefore, further details on RDS reporting were not collected. Full details are shown

What is the implication and what should change now?
The STROBE-RDS statement seeks to enhance the transparency and utility of research using RDS. If widely adopted, the STROBE-RDS checklist should improve public health decision making in infectious diseases.
in the Supplementary Material (Document 1 at www. jclinepi.com).

Statement development
The STROBE-RDS statement was developed after the Guidance on Health Research Guidelines of Moher et al [24]. The initial need for the RDS reporting guidelines was identified in RDS expert and stakeholder discussions at an RDS symposium in 2011 [25]. This was followed by a systematic evaluation of the reporting of RDS studies that concluded that the reporting of RDS studies was inadequate [17]. Existing guidelines were then reviewed, and the most suitable was the STROBE statement [18,19], but this was assessed to be inadequate for reporting RDS studies because of the major differences in the RDS sampling and estimation procedures. The vast majority of existing RDS studies use a cross-sectional study design. Therefore, this statement is an extension of the STROBE guidelines [18], the STROBE explanation and elaboration document [19], and the STROBE checklist for cross-sectional studies [20]. Version 1 of the STROBE-RDS checklist was distributed for consultation by posting on the Equator Network Web site [26] and the RDS listserv [27] and by sending to known experts. Themes emerging from the feedback included: strong support for the initiative and a request to restrict the scope of the guidelines to crosssectional epidemiologic studies that seek to generate representative estimates for the target population. The checklist was revised based on this feedback, and version 2 was published in early October 2012 [28]. Version 2 of the checklist was piloted during October 2012 by using it to guide manuscript drafting and was sent to the STROBE group for feedback [18]. Researchers (n 5 5) piloting the checklist provided much useful feedback and requested a stand-alone checklist and supporting document. This and other feedback was used to develop version 3 of the checklist, which was discussed at a 2-day face-to-face meeting in New Orleans in October 2012 [29]. A list of potential meeting invitees was drawn up by a subset of coauthors (R.G.W., A.J.H., and W.H.) after consultation with STROBE initiative members, statisticians, epidemiologists, and empirical RDS researchers. Potential invitees were categorized into three groups: statisticians/survey methodologists, epidemiologists/empirical RDS researchers, and journal editors. Fifty percent (2 of 4), 46% (6 of 13), and 0% (0 of 2), respectively, of meeting invitees in these three groups participated (11 meeting attendees in total). Participants were sent the draft checklist, a summary of the previous RDS reporting [17] and the guidelines by Moher et al. [24]. At the meeting, each draft checklist item was presented and discussed in turn and edited on screen in real time until agreement on the final version of the checklist was reached by consensus and is presented in this manuscript. Consensus was defined as asking verbally all participants if they agreed to the written text for each item. There were no items on which consensus was not reached during the meeting. After the face-to-face meeting, this summary manuscript was drafted by the authors, and to accompany the checklist, an ''Explanation and Elaboration'' document was developed and revised based on feedback from experts and the STROBE group and is presented in the Supplementary Material (Document 2 at www.jclinepi.com).

Statement scope
The scope of the ''STROBE-RDS'' statement is limited to (1) epidemiologic studies (the scope of the original STROBE guidelines), (2) cross-sectional studies (the most common RDS study design to date), and (3) RDS studies that seek to generate representative estimates for the target population (currently the most contentious and potentially most policy-relevant use of RDS). Furthermore, as RDS is both a sampling and a data analysis method, guidelines for reporting on both aspects of RDS are provided. Finally, in response to feedback from researchers piloting the STROBE-RDS checklist, we aim to provide a self-contained statement that minimizes the need to refer to other documents when reporting on an RDS study.

Results
The systematic literature review and input from experts identified that globally over 460 peer-reviewed publications have reported using RDS since the mid-1990s, with most published since 2006 (Fig. 1A). Fig. 1B shows the global distribution of study locations. RDS studies have been conducted in 69 countries, including the United States of America (151 studies), China (70), India (32), Mexico (22), and South Africa (16). The articles came from 141 different journals. Most journals (91) had published either one or two articles included in the review. Nine journals had published 10 or more articles (or conference abstracts). Supplementary Material (Document 1 at www.jclinepi.com) provides more details of review methods, results, and included and excluded articles.
In Table 1, we present the proposed STROBE-RDS statement checklist, a modification of the STROBE statement checklist for cross-sectional studies [20]. The left column lists the original STROBE checklist for cross-sectional studies. The right column summarizes the reporting recommendation. A three-column version of the checklist, highlighting the changes from the original STROBE checklist, is shown in the supplementary information, along with an explanation and elaboration document [Supplementary Material (Document 2 at www.jclinepi.com)]. As there is considerable variation in the use of terms and definitions across the disciplines in which RDS is used, we also present a list of suggested RDS terms and definitions to be used when reporting RDS studies (Box 1).
The STROBE-RDS checklist provides modifications to 12 of the 22 items on the STROBE checklist. The two key areas requiring modification concerned the selection of participants and the statistical analysis of the sample. These modifications are summarized below.

Selection of participants
As members of the target population, not researchers, recruit most study participants in RDS studies, details of the formative research conducted before the study (T5b) and of how participants were trained to recruit others (6a) should be reported. Key details of the recruitment process should also be reported, including the number of coupons issued per person, any time limits for referral (6a), procedures of seed selection (6b), the exact wording of personal network size question(s) (6d), the incentives for participation and recruitment (6e), and how the recruitererecruit relationship was tracked (7b). Variation in study procedures during data collection should also be reported (6c). Methods to assess eligibility and reduce repeat enrollment should be described (8b) so that other researchers can understand the data collection process and any biases it might introduce. Authors should report reasons for nonparticipation at each stage (e.g., not eligible, does not consent, decline to recruit others) (13b), the number of coupons issued and returned (13d), the number of recruits by seed and number of recruitment waves for each seed (13e), any recruitment challenges (e.g., commercial exchange of coupons, imposters, duplicate recruits), and how they were addressed (13f) ( Table 1, items 5b, 6a,b,c,d,e, 7b, 8b, 13b,d,e,f).

Statistical analysis
Several different estimators exist for estimating the prevalence of a specific trait (e.g., HIV prevalence) from RDS data [6e12]. There are also a number of different methods for producing confidence intervals around these estimates [8,30e32]. Evaluations of these methods have been equivocal [33e35], and the best estimator may depend on specific features of a study [36]. At this time, there is no consensus that one estimator should be universally used. As such, we recommend authors clearly describe the statistical methods used, including those to

Results
Participants 13 (a) Report the numbers of individuals at each stage of the studydfor example, numbers potentially eligible, examined for eligibility, confirmed eligible, included in the study, completing follow-up, and analyzed (a) Report the numbers of individuals at each stage of the studydfor example, numbers potentially eligible, examined for eligibility, confirmed eligible, included in the study, and analyzed (b) Give reasons for nonparticipation at each stage (b) Give reasons for nonparticipation at each stage (e.g., not eligible, does not consent, decline to recruit others) (c) Consider use of a flow diagram (c) Consider use of a flow diagram (d) Report number of coupons issued and returned (e) Report number of recruits by seed and number of RDS recruitment waves for each seed. Consider showing graph of entire recruitment network (f) Report recruitment challenges (e.g., commercial exchange of coupons, imposters, duplicate recruits) and how addressed (g) Consider reporting estimated design effect for outcomes of interest Descriptive data 14 (a) Give characteristics of study participants (e.g., demographic, clinical, social) and information on exposures and potential confounders (a) Give characteristics of study participants (e.g., demographic, clinical, social) and, if applicable, information on correlates and potential confounders. Report unweighted sample size and percentages, estimated population proportions or means with estimated precision (e.g., 95% confidence interval) (b) Indicate the number of participants with missing data for each variable of interest Italics highlight changes from STROBE statement checklist for cross-sectional studies [20]. Full details of modifications from [20] are shown in Table S1 in Supplementary Material at www.jclinepi.com. adjust for sample design, both when making estimates ( Table 1, 12a,b) and when quantifying the uncertainty in those estimates (16a,c). As the utility of the various RDS estimators is unknown, we recommend reporting unadjusted and study designeadjusted estimates and, if applicable, confounder-adjusted estimates and their precision (16a). If adjustment of the primary outcome leads to marked changes, information on factors causing the changes (e.g., personal network sizes, recruitment patterns by group, key confounders) should be reported (16c) ( Table 1, items 12a,b, 16a,c).

Discussion
The STROBE-RDS statement is a checklist of essential items that should be reported in RDS studies. The statement has several strengths. It is based on existing guidance for reporting observational studies, [18e20] was developed by an interdisciplinary group that included epidemiologists, statisticians, and empirical RDS researchers, and explicitly justifies changes from the original STROBE statement.
The decisions made for the conduct and data analysis of RDS studies will influence the representativeness of a study's results. The empirical evidence on how representative the study results are is limited, and improving the methodology is an active research area. Transparent reporting is essential for developing a better evidence base and improving RDS methods.
Based on feedback we received from researchers writing up RDS studies during the piloting of earlier versions of this checklist [28], we decided to provide a standalone STROBE-RDS checklist with full supporting documentation, rather than only providing a modified STROBE checklist. This should mean that researchers writing up RDS studies will be more likely to use these reporting guidelines. We encourage readers to read the accompanying explanation and elaboration document in full (supporting material) before embarking on writing up an RDS study.
The statement can be used by authors, peer reviewers, and editors to improve the reporting of RDS studies. We invite journals to endorse STROBE-RDS, and although STROBE-RDS will be published in English only once,

Box 1 Respondent-driven sampling key terms and definitions
Candidate participantda coupon recipient who attempts to enroll in the study. Coupondan invitation to enroll in the RDS study that a participant can give to other people. Coupon recipientda person who receives a coupon. Equilibriumdthis term has inconsistent usage in the RDS community. The most common usage is that the observed sample composition matches the expected long-run sample composition assuming a specific model of the sampling process.
Follow-up interviewdan interview where additional information is collected from the subset of participants who return to the study site a second time to collect recruitment incentives and/or biological test results.
Homophilydthis term has inconsistent usage in the RDS community. Sometimes it is used to refer to the tendency for sample recruitments to occur between participants in the same social category and sometimes to refer to the tendency for relationships in the target population to occur between participants in the same social category.
Main interviewdinterview that is conducted of all participants where the main study information is collected.
Participantsdmembers of the target population who have provided consent and completed the main interview. Participation incentivedthe money, goods, and/or services provided to participants for completing the main interview.
Peer-recruited participantda participant recruited by a member of the target population.
Personal network size (also called ''degree'')dthe number of relationships a person has to members of the target population.
Population estimatedan estimate of a characteristic of the study population that takes into account the RDS sampling design.
Recruitment incentivedthe money, goods, and/or services provided to participants for each new participant they are able to recruit.
Recruitment tree (also chain)dthe set of all participants linked to a specific seed. Sample descriptionda summary statistic of participants that does not take into account the RDS sampling design. Screening interviewda short initial interview with people hoping to enroll in the study that seeks to verify membership in the target population and request consent.
Seedda participant who is recruited by a researcher. Target populationdthe set of people about whom the researchers wish to make estimates. Wavedthe set of participants a given number of recruitments from a seed. Abbreviation: RDS, respondent-driven sampling.
we invite others to translate STROBE-RDS and to submit commentaries, editorials, or use other means to raise the awareness of the STROBE-RDS publication. The ability to provide information in Web supplements should alleviate concerns about the increased length of manuscripts resulting from following the guidelines. We welcome comments directed to the corresponding author or via the journal or Equator Network Web sites where the guidelines are also deposited [26]. These will be used to update this STROBE-RDS statement as RDS methods develop. The STROBE-RDS statement does not prescribe or dictate how an RDS study should be designed or analyzed. Rather, it seeks to enhance the transparency of research using RDS to increase the understanding of individual studies and enable comparisons between studies. If widely adopted, the STROBE-RDS checklist should improve global public health decision making for infectious diseases. Further studies could assess the impact of STROBE-RDS on the transparency of RDS research and on global public health decision making.

Search strategy and selection criteria
We searched published (physically published or online), peer-reviewed literature accessible through July 2013 that reported using RDS. Studies from all countries were included. We conducted searches using MEDLINE (1970e2013), EMBASE (1974e2013), and Global Health (1910e2013). Search terms used included ''respondent driven'' or ''respondent-driven'' or ''RDS.''