Sharing Individual Participant Data (IPD) within the Context of the Trial Reporting System (TRS)

Deborah Zarin and Tony Tse of ClinicalTrials.Gov consider how sharing individual participant data can and cannot help improve the reporting of clinical trials.

What Is the Nature of IPD?
As attention shifts to IPD sharing, it is instructive to consider the mechanism by which initial "raw" data collected from each trial participant are analyzed, transformed, and aggregated into the summary data reported in the results sections of journal articles, conference abstracts, press releases, and package inserts and as entries in results databases (Fig 1).
Each arrow in Fig 1 indicates a transformation of trial data. While some transformations are based on procedures prespecified in study documents (e.g., detailed criteria or algorithms in the protocol or statistical analysis plan), others likely rely on ad hoc expert judgments. For example, analyzing IPD collected for the primary outcome measure of "change in tumor size from baseline at 3 months" might involve the following decisions: • choosing a specific imaging approach (e.g., fluorodeoxyglucose (FDG)-positron emission tomography (PET) using a specific device); • determining a particular method for transforming 2-or 3-D images into tumor size measurements (e.g., Digital Imaging and Communications in Medicine [DICOM] standard using autocontouring to calculate the volume for the region of interest); • applying these methods to measure tumor size for each individual at baseline and at 3 months; and • calculating and recording the changes in size per participant.
Additional decisions must be made by the researchers about the handling of missing data, unreadable images, and other data deficiencies; determining the analysis population (e.g., all who started the study [including those who discontinued] or only those who received the full course of treatment); and aggregating the IPD for purposes of reporting and analysis (e.g., mean change in size versus proportion with a change over a certain size). The most granular data (far left in Fig 1) would provide insight into these decisions and allow independent  [12]. researchers to examine the implications of alternative analytic decisions. On the other hand, the least granular IPD (far right) would obscure some of these decisions and would not allow for testing the impact of different analytic methods. Most discussions of IPD sharing policies sidestep the issue of matching IPD types with anticipated benefits and burdens. For example, third-party researchers interested in independently recoding the IPD would need access to uncoded data (i.e., data types to the left of "Coded" on the x-axis in Fig 1). In contrast, users who intend to replicate and confirm the reproducibility of aggregate data published in a journal article may only require access to the analyzable IPD (i.e., final type of IPD before undergoing transformation into aggregated data in Fig 1). While not an insurmountable barrier for IPD sharing policies, we believe that consideration of various data types and their uses is a timely issue for discussion within the research community, including questions such as the following: • What standard terminology or classification should be used to describe the different data types?
• Which types of IPD should be made available systematically?
• When more than one type is available for sharing, how should they be uniquely identified and tracked (e.g., cited) within the research community?
Where Does IPD Fit in the TRS?
The TRS framework encompasses key existing and proposed efforts and is designed to increase trial transparency systematically. Fig 2 depicts the TRS as a pyramid with prospective registration at its base, summary or aggregate trial results reporting in the middle, and the sharing of trial IPD and relevant documents at its apex. At its base, prospective registration provides a public listing of all ongoing and completed trials, along with key protocol and administrative details to allow people to identify the full set of trials conducted within a research area (e.g., antidepressant trials in children). Trial registration, if done and used appropriately, also allows for the assessment of fidelity to key protocol details, such as definition of the prespecified primary outcome measure [13]. Summary results reporting in trial registries, currently implemented at ClinicalTrials.gov and the European Union Clinical Trials Registry [14], is the next level of the TRS. Results databases-designed to ensure that aggregate trial results are reported systematically in a timely, structured, and complete manner based in part on expert trial-reporting guidelines such as the Consolidated Standards of Reporting Trials (CONSORT) statement [15] and its extensions-call attention to unacknowledged deviations from the registered protocol details [13]. Current policies are generally intended to address these two foundational levels of the TRS.
Registration information and summary results displayed as a single trial record provide the minimal, essential information needed to understand a trial and its findings. Each record also uses a format that is highly structured and searchable by a range of criteria. Ideally, users could easily retrieve information about all completed or ongoing trials for a particular clinical or policy question (e.g., to identify a need for additional research or conduct a systematic review), avoiding the biases imposed by incomplete and selective publication. Trial registration and results records are also linked, via unique registry identifiers, to relevant peer-reviewed journal publications [16]. As the use of unique registry identifiers expands (e.g., systematic reviews and press releases), an extensive network of automated, explicit linkages can provide an even more useful way to identify publicly available information about a trial from the trial record itself (Fig 3).
IPD and related documents reside at the apex of this pyramid because they are most useful within the context of the two lower levels, which serve as the foundation. Without careful use of trial registries and summary results databases, access to IPD might simply recreate or amplify existing reporting biases [17]. For example, analysis of trial IPD cannot mitigate biases that stem from selective release of data from only one trial among a "family" of trials for the studied population, intervention, and condition (e.g., a likely result of proposals to require the release of IPD only upon journal publication).

How Would the Three Key Components of TRS Work Together? Case Study: Recent Reanalysis of Study 329
Study 329, sponsored by SmithKline Beecham (now GlaxoSmithKline [GSK]), was one of several studies conducted to examine the use of Paxil (paroxetine) in children with depression and the first with results to be published. The original publication of Study 329 in 2001 implied that the study results showed the safety and efficacy of Paxil in children [18]. In 2004, the New York State attorney general filed a consumer fraud lawsuit against GSK, alleging that the suppression and misreporting of trial data created the false impression that Paxil was safe and effective in depressed children [19]. A newly published reanalysis, part of the Restoring Invisible and Abandoned Trials (RIAT) initiative [20], was based on access to original case report forms (CRFs) for 34% of the 275 participants [21]. These highly granular IPD datasets enabled the researchers to recategorize certain adverse events that they determined had been miscategorized originally (e.g., "mood lability" rather than the more serious "suicidality"). The reanalysis concluded that Study 329 did not show either efficacy or safety.

How Would the Problems of Study 329 Be Addressed by the Current TRS?
It would be an oversimplification to conclude that this reanalysis demonstrates the need to make IPD for all trials available. A more nuanced look at the specific problems is useful. Many of the concerns about Study 329 and the other Paxil studies might have been addressed if current policies regarding registration and results reporting had been in existence (Table 1, [22][23][24]). The key issue that specifically required access to IPD was the detection of miscategorization of some adverse events in the original report. It is important to note that this illuminating reanalysis required access to the highly detailed IPD available in the original CRFs, represented by the far-left side of the x-axis in Fig 1. However, recent high-profile proposals for the sharing of IPD might not have added any clarity in the case of the Paxil studies in children beyond what could have been achieved with the optimal use of a registry and results database (i.e., two foundational levels of the pyramid in Fig 2). The reason is that journal publication serves as the "trigger" for IPD release in many of these proposals [1]), which could not possibly mitigate biases resulting from selective publication in the first place (i.e., IPD from unpublished trials would be exempt from sharing requirements). In addition, such proposed IPD policies call for the release of only the "coded" or "analyzable" dataset, which would not have allowed for the detection of miscategorization or the recategorization of the adverse events. Finally, such proposals would only require the sharing of a subset of IPD and documents for those aggregate data reported in the publication and not the full dataset, precluding secondary analyses intended to go beyond validation and reproducibility of the original publication.

Conclusion
The evolving TRS can be thought of as a pyramid, with each successive layer being dependent on the layer(s) below it. We should not allow the prospects for providing access to IPD and relevant documents to divert attention from the continuing need to ensure complete, accurate, and timely trial registration and summary results reporting-as well as attentive and consistent use of these tools by key stakeholders. In addition, IPD sharing policies and systems must consider the different benefits and burdens that would be expected from third-party access to data types of varying levels of granularity. Invalid and unacknowledged categorization of certain adverse events, resulting in the underreporting of suicidality [24] Sharing Highly Granular IPD and Documents (e.g., CRFs) Access to high-granularity IPD enabled the elucidation of data analytic decisions that had not been publicly disclosed; reanalysis was possible with different methods of categorizing adverse events doi:10.1371/journal.pmed.1001946.t001