The State of Infectious Diseases Clinical Trials: A Systematic Review of ClinicalTrials.gov

Background There is a paucity of clinical trials informing specific questions faced by infectious diseases (ID) specialists. The ClinicalTrials.gov registry offers an opportunity to evaluate the ID clinical trials portfolio. Methods We examined 40,970 interventional trials registered with ClinicalTrials.gov from 2007–2010, focusing on study conditions and interventions to identify ID-related trials. Relevance to ID was manually confirmed for each programmatically identified trial, yielding 3570 ID trials and 37,400 non-ID trials for analysis. Results The number of ID trials was similar to the number of trials identified as belonging to cardiovascular medicine (n = 3437) or mental health (n = 3695) specialties. Slightly over half of ID trials were treatment-oriented trials (53%, vs. 77% for non-ID trials) followed by prevention (38%, vs. 8% in non-ID trials). ID trials tended to be larger than those of other specialties, with a median enrollment of 125 subjects (interquartile range [IQR], 45–400) vs. 60 (IQR, 30–160) for non-ID trials. Most ID studies are randomized (73%) but nonblinded (56%). Industry was the funding source in 51% of ID trials vs. 10% that were primarily NIH-funded. HIV-AIDS trials constitute the largest subset of ID trials (n = 815 [23%]), followed by influenza vaccine (n = 375 [11%]), and hepatitis C (n = 339 [9%]) trials. Relative to U.S. and global mortality rates, HIV-AIDS and hepatitis C virus trials are over-represented, whereas lower respiratory tract infection trials are under-represented in this large sample of ID clinical trials. Conclusions This work is the first to characterize ID clinical trials registered in ClinicalTrials.gov, providing a framework to discuss prioritization, methodology, and policy.


Introduction
In the context of modern, evidence-based prevention and treatment of infectious diseases, clinical trials provide information about the accuracy of molecular or microbiological diagnostics, the prognosis of infectious syndromes, and the potential risks and benefits of anti-infective regimens. However, there is a paucity of clinical trials informing the specific questions faced by infectious disease (ID) specialists. More than 50% of the recommendations contained in the practice guidelines published by the Infectious Diseases Society of America (IDSA) are based solely upon expert opinion; less than a quarter are based on evidence from randomized controlled trials (RCTs) [1,2]. There is also increasing concern about the role of the medical products industry in the design and conduct of research, and in formulating practice guidelines [3]. But despite these concerns, the current spectrum of ID clinical trials has largely gone without systematic scrutiny regarding patterns of topical focus, geographical distribution, and levels of industry involvement.
ClinicalTrials.gov, a U.S. Food and Drug Administration (FDA)-initiated registry of more than 100,000 trials from 174 countries, provides a unique opportunity to take a ''snapshot'' of ID trials in terms of content and sponsorship. In September of 2007, registration of trials of drugs, biologics, and devices with ClinicalTrials.gov became a legal requirement for a large segment of clinical research conducted under U.S. jurisdiction; in addition, many peer-reviewed journals require registration with Clinical-Trials.gov or a comparable registry as a condition of publication [4]. Although clinical trials as a whole have been summarized from this registry [5,6], ID trials have not been described, in part because they had not been prospectively designated as infectionrelated trials.
Recently, the Clinical Trials Transformation Initiative (CTTI) transformed the ClinicalTrials.gov database into a searchable relational dataset of registry content and regrouped studies into clinical specialties, thereby allowing targeted analysis. In this crosssectional study, we aimed to characterize the scope and nature of ID clinical trials in this registry through a systematic analysis of characteristics of the registered trials, including trial methodology, geographic distribution, and funding source. We also evaluated the alignment between current clinical research priorities and the infections that cause the highest morbidity and mortality in the United States and around the world.

Materials and Methods
We performed a systematic analysis of characteristics of infectious disease trials registered with ClinicalTrials.gov from October 1, 2007 to September 27, 2010. We chose Clinical-Trials.gov for our analysis over other registries because of its large size, international field of studies, publically available data element definitions and design details, inclusion of a full range of clinical conditions, broad group of trial sponsors [7], and its regulatory mandate [8], and also because ClinicalTrials.gov is the largest registry to allow bulk download of its entire dataset. The methods used by ClinicalTrials.gov to register clinical trials have been described previously [7,9,10]. Briefly, trial sponsors and investigators from around the world can enter trial data through a Webbased data entry system. The sample we examined in the present study includes trials that were registered to meet legal obligations such as from the FDA Amendments Act of 2007, as well as those registered to comply with requirements for peer-reviewed publication by the International Committee of Medical Journal Editors [4].

Creation of the ID Study Dataset
We downloaded a dataset comprising all 96,346 clinical trials registered with ClinicalTrials.gov from its inception through September 27, 2010. We focused on trials registered on or after October 2007, as this date corresponded with the beginning of the U.S. legal requirement [8] for registering certain trials of drugs, biologics, and devices. We next designed and implemented a relational database (Oracle RDBMS, version 11.1 g [Oracle Corporation, Redwood Shores, California, USA]) to analyze the aggregate data. The resulting Aggregate Analysis of Clincal-Trials.gov (AACT) database is publicly available, along with data definitions and comprehensive data dictionaries, at the CTTI website [11]. We focused on interventional trials, which were identified based on the study type field included in the Clinical-Trials.gov registry. This field offers 4 options: Interventional, Observational, Expanded Access, and Not Applicable [11]. We used study type field as a filter, which resulted in 40,970 interventional trials registered during the 3-year period from October 2007 to September 2010 ( Figure 1).
Next, we regrouped studies by clinical specialties within this derivative database [12]. In order to identify trials that were potentially relevant to ID, we focused on 2 trial characteristics: condition and intervention, which were defined either by data submitters or linked to Medical Subject Heading (MeSH) terms generated by a National Library of Medicine (NLM) algorithm based on the 2010 MeSH thesaurus [12]. A manual review of 9031 unique condition-related MeSH terms identified 911 (10%) terms relevant to ID. Not all conditions could be linked to MeSH terms; therefore, free-text condition terms appearing in 5 or more studies were also annotated for relevance to ID. Using this approach, out of 1220 unique, frequently occurring free-text condition terms, we identified 100 (8%) ID-related terms.
The second trial characteristic used to identify ID-relevant studies was the intervention term. We focused only on intervention terms that appeared in 4 or more studies and reviewed these for relevance to ID. This process identified 136 ID-related intervention MeSH terms out of 933 reviewed (15%). The results of algorithmic classifications were validated by comparison with classifications based on manual review [12]. All annotations were performed by 3 ID physicians (N.D.G., C.D.P., and E.L.T.). The identified terms relevant to ID are listed in Appendix S1.
Using a computer-based search, we identified 4149 studies with at least 1 ID-relevant term in the NLM-generated MeSH ''condition'' field, the submitted free-text ''condition'' field, or the submitted intervention name field. The authors (N.D.G., C.D.P., and E.L.T.) then manually reviewed each study to exclude non-ID studies and assign each study to an ID subcategory. Trials were included if they focused on a communicable disease. Trials evaluating non-communicable diseases were included only if the study hypothesis was that microbes played a role in the disease under study (e.g., probiotics to prevent hepatic encephalopathy). Studies that examined ID complications were included only if the intervention focused on the infection itself (e.g., post-herpetic neuralgia that included antiviral therapy).
Additional details are available in supplemental material (Checklist S1 and Diagram S1).

Subcategorization of the ID Study Dataset
After defining the ID trials dataset, we subcategorized trials based on study title and description. When possible, subcategories were defined based on World Health Organization (WHO) causeof-death groupings [13]. ''Maternal Conditions'' and ''Perinatal Conditions'' were excluded, because these categories include both communicable and non-communicable conditions. In addition to these 18 WHO-defined categories, 40 additional subcategories (for a total of 58) were defined such that each trial was assigned to at least 1 subcategory ( Figure 2). Trials that fit equally well into multiple subcategories were assigned to more than 1 category. The percentage of all ID-related mortality and ID-related disabilityadjusted life years (DALY) attributable to the following conditions was calculated from the WHO Global Burden of Disease [13][14][15]: HIV-AIDS, hepatitis C, lower respiratory tract infection (LRTI), hepatitis B, malaria, diarrheal diseases, sexually transmitted diseases (STD) excluding HIV, tuberculosis, childhood cluster diseases (pertussis, poliomyelitis, diphtheria, measles, and tetanus), and meningitis.

Analytical Methods
SAS software, version 9.2 (SAS Institute, Cary, NC, USA) was used to calculate frequencies and percentages for categorical trial characteristics and median and interquartile ranges (IQRs) for continuous characteristics. A chi-squared test was used to evaluate whether trials that were focused on prevention or vaccines were more likely to restrict enrollment to children than non-prevention or non-vaccine trials, respectively. Disease prevalence and diseasespecific DALYs were derived from the WHO 2004 Global Burden of Disease report, whereas disease-specific mortality was derived from its 2008 update [13][14][15]. Enrollment (number of trial participants) was reported for each trial. For active trials that had not completed recruitment, the anticipated enrollment was recorded. For trials that had completed or terminated recruitment, the value was updated to record actual enrollment. Summary statistics were calculated by pooling across active and completed trials. Because ClinicalTrials.gov does not require definition of the funding source for any given study, we derived the probable funding source based on the lead sponsor and collaborator fields [6]. The lead sponsor is the primary organization that oversees study implementation and is responsible for conducting data analysis [11]; available categories include ''Industry'', ''NIH'', ''U.S. federal (excluding NIH)'', or ''Other''. Collaborators are other organizations that provide additional support including funding, design, implementation, data analysis, or reporting. A trial was considered to be ''industry-funded'' if the lead sponsor was from industry, or if the NIH was neither a lead sponsor nor collaborator and at least 1 collaborator was from industry. An ''NIH-funded'' study required the NIH to be either a lead sponsor or a collaborator, and no industry as lead sponsor. ''Other'' was used to describe studies for which the lead sponsor and collaborator fields were non-missing and did not meet criteria for either ''industry-funded'' or ''NIH-funded''. To further define this lead sponsor group, we undertook a manual review to assign trial sponsors to ''Non-U.S. Government'', ''Academic/Hospital'', or ''Consortium''. Location of study facilities (sites) was defined by the study sponsor and could include more than 1 location for multisite studies. This information was used to define the regional distribution within the ID trials dataset.
Countries were grouped into 11 global regions (http://www. clinicaltrials.gov/ct2/search/browse?brwse = locn_cat). For each of 7 disease conditions (LRTI, Diarrheal diseases, HIV/AIDS, Tuberculosis, Malaria, Hepatitis B, and Hepatitis C), regions were ranked by the number of disease-specific studies with sites in a region, and by the total number of disease-specific deaths in a region [15]. These 2 rankings were compared graphically to examine the geographic distribution of trials in a specific disease relative to the geographic variation in mortality for that disease.

Funding
This work was supported by a grant from the FDA awarded to Duke University. The study sponsor had no role in the design or performance of this study, or in the writing of the manuscript. All authors had full access to all study data.

Results
The initial dataset downloaded on September 27, 2010 included 96,346 clinical trials registered with ClinicalTrials.gov. A total of 40,970 interventional trials were registered from October 1, 2007, after enactment of mandatory registration in September 27, 2007; of these, 3570 (9%) were defined as the ID trials dataset ( Figure 1). This was similar to the number of cardiovascular medicine (n = 3437 [8%]) and mental health trials (n = 3695 [9%]) but less than the number of oncology trials (n = 8992 [22%]) [6].
Of 3570 ID trials, 3207 (90%) were assigned to 1 subcategory, 329 (9%) were assigned to 2 subcategories, and less than 1% were assigned to more than 2 subcategories. The distribution of trials across 58 ID subcategories is presented in Figure 2. The representation of certain disease subcategories within the Clin-icalTrials.gov registry is compared with ID-related mortality and disability (as defined by the WHO Global Burden of Disease) in Figure 3. Based on their frequency in the database and their global impact, 5 subcategories were chosen for more detailed characterization: HIV-AIDS, hepatitis C virus, LRTIs, malaria, and tuberculosis. ''Treatment'' was the primary purpose in the majority of both ID and non-ID trials (53% and 77%, respectively; Table 1). However, the difference in their relative frequencies was due to the greater number of prevention-oriented trials seen in ID (38% vs. 8%). Further, a greater number of biological interventions was observed among ID trials (29% vs. 5% in non-ID trials), nearly all of which involved vaccine trials. Procedural interventions and devices were under-represented among ID trials (4% vs. 11% for non-ID trials, and 3% vs. 10% for non-ID trials, respectively).
ID trials were more likely than non-ID trials to be restricted to children (#18 years; 17% vs. 6%) yet also more likely to exclude elderly persons (.65 years; 48% vs. 30%). Among ID trials, those with a primary purpose of prevention were more likely to restrict enrollment to children than those with a treatment or other purpose (27% vs. 11%; x 2 p-value,0.0001) (Table S1). Similarly, ID vaccine trials were more likely to restrict enrollment to children than non-vaccine trials (30% vs. 12%; x 2 p-value ,0.0001). However, even among trials where the primary purpose was not prevention, trials that restricted enrollment to children were more common among ID trials than non-ID trials (10.9% vs. 5.6%; x 2 p-value ,0.0001). The majority of ID trials were randomized (73% vs. 68%), although both ID and non-ID trials had the same proportion (56%) of open (i.e., non-blinded) trials. Reporting the presence or absence of a data monitoring committee (DMC) was not required for study registration. Consequently, 22% of studies in the ID trials dataset lacked this information. Among the 2763 studies that reported DMC status, the majority (55%) did not have one-rates similar to those seen in non-ID trials (60%). Among studies reporting DMC status, non-U.S.-based studies were more likely to use DMCs than U.S.-based studies (48% vs. 39%, respectively). However, a large number of trials did not provide DMC data in their registration information (21% and 17% for non-U.S.-and U.S.-based studies, respectively).
North America was the most frequently identified study location for ID studies (46%, Table 1 and Table S2). Compared with non-ID studies, other world regions were more frequently identified in the ID trials dataset; further, these regions were associated with local disease prevalence. For example, studies focusing on malaria and tuberculosis were concentrated in Africa (56% and 35%,  respectively) but infrequently located in North America (13% and 17%, respectively). In contrast, 42% of LRTI trials were located in North America and 36% in Europe, but only 6% were located in Africa. We also plotted the geographic distribution of trials relative to the burden of disease across eleven world regions based on rank ( Figure 4). For the 7 disease conditions analyzed (LRTI, Diarrheal  Table S3. Industry-funded studies were more likely to focus on treatment (54%) and have drug-based interventions (57%) compared with NIH-funded studies (38% and 42%, respectively). In contrast, a greater proportion of NIH-funded studies focused on behavioral interventions than did industry-funded studies (26% vs. ,1%, respectively). Trials of phase 3 or later accounted for 24% of NIH-funded interventional studies, compared with 44% of industry-funded trials. Trial status also differed by source of funding: more industry-funded trials were identified as completed (45%, vs. 18% for NIH-funded), while more NIH-funded trials were identified as recruiting (45%, vs. 26% for industry-funded). The percentage of trials with a DMC also differed by funding source; 40% of industry-funded trials included a DMC, compared with 61% of NIH-funded trials. Study size and other characteristics were otherwise comparable between industry-and NIHfunded trials.
Several trial characteristics differed based on U.S vs. non-U.S. trial locations (Table S4)

Discussion
An overview of all ClinicalTrials.gov trials was recently published [6], although this study is the first to specifically characterize the subset of ID trials within the ClinicalTrials.gov registry. While there are more comprehensive registries, such as the International Clinical Trials Registry Platform (ICTRP) managed by the WHO, we chose to focus on ClinicalTrials.gov because of the tools available for its characterization, notably the AACT database facilitated by the CTTI. A manual review of several ID-specific disease states revealed that about 80% of trials in the ICTRP were also registered in ClinicalTrials.gov (data not shown). Therefore, although our study does not characterize all ID trials, it provides a representative contemporary sample from which to describe the overall state of ID clinical trials.
We found that ID trials were heterogeneous, but also shared some interesting specialty-specific themes. The emphasis of ID clinical trials on prevention strategies, primarily vaccine interventions, highlights the public health direction of the field and suggests that further investment in ID trials may facilitate population-level impact. We did note that prevention trials more frequently included sites outside of North America. We could not identify a clear reason for this although one factor considered to play prominently in this finding is the heavy emphasis on vaccines in prevention trials. Vaccination programs as part of routine clinical care in the U.S. are robust compared to other countries and regions. As such, non-U.S. based sites investigating vaccines represent opportunities both for mutualistic benefit-to the trial and to the population. Furthermore, this focus on prevention through vaccination was hypothesized to be one reason why ID trials tend to restrict enrollment to pediatric subjects more frequently than non-ID trials. We found that among ID trials, vaccine and prevention trials were significantly more likely to restrict enrollment to children than non-vaccine or non-prevention trials. However, among all trials where prevention was not the primary purpose, ID trials were still more likely to restrict enrollment to children than non-ID trials. Prevention-focused trials may still contribute to the higher rates of pediatric Figure 4. Rank of disease-specific trial frequency vs. disease-related mortality across eleven world regions. Data is presented for 7 categories: LRTI, diarrheal diseases, HIV/AIDS, tuberculosis, malaria, hepatitis B, and hepatitis C. Panels are arrayed in descending order of global disease mortality from left to right and top to bottom. Trial rank was calculated by first identifying all trials for the specified disease and then ranking world regions based on the number of trials including that region. Death rank was calculated using the total number of deaths per region due to the specified disease. The diagonal line is provided for reference. Regions above the line rank higher in terms of number of trials compared to diseasespecific mortality. Regions below the line rank lower in terms of number of trials compared to disease-specific mortality. Regions on or close to the line have similar trial and mortality ranks. doi:10.1371/journal.pone.0077086.g004 enrollment, although there are likely to be other explanatory factors such as the greater burden of infectious diseases carried by children as compared with other chronic or non-communicable diseases. Conversely, ID trials tended to exclude elderly subjects more often than non-ID trials. This may again relate to the disproportionate burden of communicable diseases affecting children. However, this exclusion of the elderly was seen in ID subcategories that do affect elderly persons with a disproportionately high burden of morbidity and mortality. Careful consideration should be given before excluding elderly subjects in future study designs. We note that this analysis of the ClinicalTrials.gov registry likely underestimates the prevalence of preventative strategies, because the legal requirement for registration with ClinicalTrials.gov exempts phase 1 trials, trials not involving a drug, biologic, or device, and trials not under U.S. jurisdiction. This applies not only to ID trials but likely extends to other trial categories as well.
We also found that ID trials tended to be larger than trials in other specialties, both in terms of median actual or anticipated subject enrollment, and in representation of studies enrolling (actual or anticipated) .10,000 subjects. Despite the larger median size of ID studies, we observed considerable heterogeneity in this regard. For example, the median size of 4 trachoma trials was 8438 subjects; 40 trials focusing on Haemophilus had a median enrollment of 600 subjects; and 111 trials focusing on STDs (excluding HIV) had a median enrollment of 400 subjects. This contrasts with several other subcategories with a concentration of small trials: the median size for HIV-AIDS trials was 66 subjects; for HCV trials, 60 subjects. Clinical trialists continue to face ongoing challenges with regard to the ability to enroll adequate numbers of research participants [16], but this issue is not universally applicable across the field of ID. We cannot, however, determine whether differences in study size reflect differences in prioritization, funding, or other resources. Though these factors may be playing a role, it is also true that some diseases and trials require fewer participants to meet the specified objectives -scientific, epidemiological, statistical, or otherwise. This consideration applies to different infectious diseases as well as to noninfectious conditions. The results and methodology presented here suggest an opportunity to consolidate clinical research efforts for maximal impact.
This snapshot of ID clinical trials may also be useful for guiding future funding and policy decision-making. In our review, we found that frequency of particular ID trial subcategories did not correlate directly with their global or U.S. health impact. For example, although HIV-AIDS trials represented 23% of the ID trials dataset, it accounts for 15% of global communicable diseaserelated mortality and 9% of U.S. communicable disease-related mortality (Figure 3). Trials focusing on hepatitis C also constituted a larger percentage of the ID trials dataset than the estimated global mortality. In contrast, LRTI studies made up only 6% of ID studies, but LRTI ranks as the most common cause of global and U.S. communicable disease-related deaths (28% and 45%, respectively) [14,15]. Diarrheal diseases also contribute a significant burden of global communicable disease-related mortality and disability [13][14][15] but were not substantially represented in the ClinicalTrials.gov ID portfolio. This discrepancy is cited as one reason for the failure to achieve Millennium Development Goals to reduce childhood death due to diarrheal disease [17]. The greatest discrepancy between representation in the ID trials dataset and communicable disease-related mortality rates was observed with the category of ''High-Intensity Intestinal Nematode Infections''. This most prevalent of ID-related conditions affects more than 150 million people worldwide annually [13]; however, only 12 such trials (,1%) were identified in the ID trials dataset. Another common clinical problem, particularly in economically developed nations, is prosthetic joint infection (PJI) and osteomyelitis [18][19][20]. Although optimal treatment of PJI and osteomyelitis is a common challenge for many ID practitioners, we identified only 10 trials focusing on these infections.
The lack of correlation between diseases with high global mortality and the number of clinical trials targeting these infections was striking, and suggests a need for further emphasis on research in high-mortality clinical syndromes. Even among infections that cause high morbidity in the United States, such as PJI and osteomyelitis, we found few trials focused on these conditions in the ClinicalTrials.gov registry. The absence of promising therapies may explain why trials for a particular disease are under-represented, in which case future studies should emphasize basic science and drug discovery. For example, the relatively high prevalence of hepatitis C trials may be related to the recent discovery of new protease inhibitor therapies.
Despite the significant value provided by the ID trials dataset in this analysis, this resource has limitations. First, ClinicalTrials.gov was primarily designed as a public repository for research trials and was not intended to support aggregate analysis. Second, because the methodology developed to annotate the ClinicalTrials.gov database by clinical specialty relied on a group of experts drawn exclusively from a single institution (Duke University), further validation would be appropriate. Third, this analysis includes interventional trials for which the FDA requires registration, which may introduce a bias toward trials involving therapeutics rather than other intervention strategies. However, publishers who have adopted the International Committee of Medical Journal Editors' Uniform Requirements mandate registration of all interventional studies, regardless of intervention type, phase, or location [4]. Despite the FDAAA requirement to register all applicable interventional studies within 21 days of a study's start, many fail to do so in a timely manner [21]. As a result, there are likely to be some trials begun within our October 2007-September 2010 window that were not registered and therefore are not included here.
Our analysis also excludes observational and other noninterventional epidemiologic studies and thus we present a crosssection of ID clinical trials, not a comprehensive review of all IDrelated clinical research. When study sponsors or investigators register a clinical trial in ClinicalTrials.gov, not all fields are mandatory. For example, registrants were not required to specify whether a DMC was involved in the trial. This limits our interpretation of the available data, given how frequently this particular parameter was missing. It also suggests that Clinical-Trials.gov may need to revise the list of mandatory fields so as to provide a more robust assessment of a given clinical trial.
Another caveat to the interpretation of our findings is that the number of trials or participants within a given disease area is not a surrogate measure for resource investment. Some small trials may indeed be very expensive and some large trials may be relatively inexpensive. Unfortunately, ClinicalTrials.gov does not record data on financial or resource expenditures. Furthermore, research into some disease states may not require ClinicalTrials.gov registration and would therefore appear to be underrepresented. This includes health systems research such as implementation research and quality improvement research.
Our analysis of ClinicalTrials.gov revealed that ID trials are well-represented as a proportion of the overall clinical trials enterprise, tend to be larger than non-ID trials, and have a greater representation outside of the United States than do other specialties. However, there is considerable variation across ID trials, much of which reflects the heterogeneity of infectious diseases themselves. Our analysis highlights discrepancies between the number and quality of trials in some disease states relative to the global burden of those diseases, and may prompt examination of how best to prioritize and coordinate research funding both within and across national boundaries. Data generated from highquality clinical trials not only improves disease management but also informs clinical practice guidelines. The results presented here should inform improvements in clinical research methodology, focus resources more deliberately, and serve as a launchpad for establishing future clinical research priorities.    Appendix S1 List of ID Terms Used to Identify Potential Infectious Disease Trials from the ClinicalTrials.gov Registry.