Genome and bioinformatic analysis of a HAdV-B14p1 virus isolated from a baby with pneumonia in Beijing, China.

The genome of HAdV-B14p1 strain BJ430, isolated from a six-month-old baby diagnosed with bronchial pneumonia at the Beijing Children’s Hospital in December 2010, was sequenced, analyzed, and compared with reference adenovirus genome sequences archived in GenBank. This genome is 34,762 bp in length, remarkably presenting 99.9% identity with the genome from HAdV14p1 strain 303600, which was isolated in the USA (2006). Even more remarkable, it is 99.7% identical with the HAdV-B14p (prototype “de Wit” strain) genome, isolated from The Netherlands in 1955. The patient and its parents presumably had no or limited contact with persons from the USA and Ireland, both of which reported outbreaks of the re-emergent virus HAdV-14p1 recently. These genome data, its analysis, and this report provide a reference for any additional HAdV-B14 outbreak in China and provide the basis for the development of adenovirus vaccines and molecular pathogen surveillance protocols in high-risk areas.


Introduction
Human adenoviruses(HAdVs)are typed and ordered into seven species (A to G) with greater than 64 genome types reported in which the associations for specific human diseases are characterized [1,2]. For example, HAdVs classified in species B, C, and E are known to cause respiratory infections [3]. Specifically, within species B, HAdV-B3, -B7, -B16, and -B21, belonging to subspecies B1, are respiratory tract pathogens, as are HAdV-B14, -B35 and -B55, which are members of subspecies B2.
HAdV-B14p was first isolated from The Netherlands and linked to a respiratory tract disease outbreak in military recruits between April and May, 1955[4]. This particular virus was associated with sporadic cases of acute respiratory disease (ARD) in Europe and Asia through the 1960s [4,5] and then was not reported for a long period. In the approximately 50-year interval from the original identification to the recent outbreaks, reports of respiratory disease associated with HAdV-B14p were rare and limited to small numbers of patients [6]. Recently, from 2005 through 2009, HAdV14p1 has apparently re-emerged and has been associated with several large ARD outbreaks across the USA, associated with nine military and twenty-four civilian communities in contrast to the past, as well as in Europe (Ireland) [6][7][8]. Several HAdV-B14like infections have also been recently reported in China, beginning from 2010 [9,10]. The applications of genomics and bioinformatics to the adenoviral genomes provide high resolution insights into their epidemiology and evolution, specifically revealing the molecular basis for the genesis of several emergent and re-emergent adenovirus pathogens [2,[11][12][13][14][15][16][17][18][19][20]. This report presents the isolation and identification of a type 14 adenovirus, isolated from a six-month-old baby diagnosed with bronchial pneumonia. It was confirmed as HAdV-B14p1 by the analysis of its genome and, in particular, the hexon, fiber, and E1A genes. The genome was sequenced, analyzed, and compared with other reference adenovirus genome sequences archived in GenBank. This analysis shows that the Beijing HAdV14p1 has a close phylogenetic relationship with HAdV-B14p, isolated in 1955 from The Netherlands. Its sequence is remarkably conserved, given the time and geographic distances. It is also nearly identical to a HAdV-B14p1 strain (303600) recently characterized from an outbreak in the USA (2006). Given the recent outbreaks of this particular virus in the USA, Europe and China, these genome data and this report provide a reference for recognizing any future HAdV-B14 outbreak in China and serves as a foundation for the development of adenovirus vaccines and surveillance protocols in high-risk areas. Further, these observations will aid the continuing research and development of adenovirus-based vectors.

Virus recovery and DNA extraction
The specimen was isolated from a nasopharyngeal aspirate of a six-month-old infant diagnosed with bronchial pneumonia at the Beijing Children's Hospital on December 2010. The sample collection and detection protocols were approved by the Ethics Review Committee from the National Institute for Viral Disease Control and Prevention, Chinese Center for Disease Control and Prevention. As such, the parents of the patient have given written informed consent to publish these case details. The sample was detected as adenovirus positive using a polymerase chain reaction (PCR) protocol. Subsequently, specific primers were designed for sequencing the open reading frames of the E1A, hexon, and fiber genes. Virus from this and other clinical samples were grown in Hep-2 cells; this sample presented a characteristic cytopathic effect (CPE), which was observed after 10 days of culturing. Viral genomic DNA was extracted from 140 ml of infected cell lysate using a QIAamp Viral RNA Mini kit (Qiagen, Ltd.; Germany), applied according to the manufacturer's instructions.

PCR amplification and sequencing strategy
Appropriate primers were designed using reference HAdV sequences that are available from GenBank. For genome DNA sequencing, a PCR strategy with 64 primer pairs was employed to amplify the the whole genome with overlapping fragments. The linear end-terminal sequences were determined using a procedure described previously. All of the reported sequences are the result of at least two sequencing reactions include both directions.

DNA sequencing
Amplified genomic fragments were purified using agarose gel electrophoresis with a QIAquick gel extraction kit, as per the manufacturer's instructions (Qiagen, Ltd.; Germany). Sangerbased DNA sequencing reactions were performed bi-directionally using an ABI Prism Big Dye Terminator v3.1 Cycle Sequencing Ready Reaction kit and AmpliTaq DNA polymerase (Applied Biosystems, Inc.; Foster City, CA). DNA sequencing ladders were resolved using an ABI 3730 DNA sequencer (Applied Biosystems, Inc.; Foster City, CA). Unresolved and ambiguous sequences were re-sequenced with additional custom primers located adjacent to the regions in question.

Bioinformatic Analyses
The genome was assembled using Sequencher (v4.0.5; Gene Codes, Inc.; Ann Arbor, MI). For genome sequence annotation, the assembled genome was partitioned into 1-kb non-overlapping segments, similar to the method of Lauer et al, and each segment was systematically queried against the GenBank non-redundant sequences database using their BLASTX program. DNA sequence alignments were revealed using the BioEdit sequence alignment editor software (v5.0.9; T. Hall, North Carolina State University) and an online Gene-wise program (http://www.ebi.ac.uk/Wise2/ advanced.html). All sequences were submitted for BLAST analysis (http://blast.ncbi.nlm.nih.gov/Blast.cgi/) to identify their closely related counterparts. As a quality control check, the annotated coding sequences corresponded to their counterparts from other HAdV genomes. Annotation of the splice positions and correlation with functional properties were completed by using an on-line splice prediction software (http://www.fruitfly.org/seq_tools/ splice. html) and the GENSCAN software (http://genes.mit. edu/GENSCAN.html).
For genome and gene phylogenetic analysis, the forward and reverse sequence data were re-aligned and re-assembled into a single consensus sequence using the software MEGA v4.0 (Molecular Genetic Analysis Software; http://www. megasoftware.net). This confirmed the Sequencher-based assembly. Phylogenetic analysis was subsequently performed using MEGA as well, via the maximum-composite-likelihood method that generated neighbor-joining and bootstrapped trees of phylogeny with 1,000 replicates; all other parameters were set by default. Sequence percent identities were calculated using software which was part of the EMBOSS package (http://www. ebi.ac.uk/Tools/emboss/). Comparison of mutations across the genomes was done using unpublished software developed recently (D.S.).
GenBank archived sequences and accession numbers. The following genomes were used for these analyses, with details given for the type 14 strains (''p'' denotes prototype; if no designation is noted, the strain is the prototype): HAdV-B14p1 USA/303600/ examined using PCR protocols. In these assays, viral pathogens known to cause acute respiratory disease, such as adenovirus, influenza, parainfluenza, SARS coronavirus, rhinovirus, and RSV, were screened using appropriate PCR primer sets.

Results
Comparative genomics of HAdV-B14p and two HAdV-B14p1 strains The complete genome of HAdV-B14p1 strain BJ430 contains 34,762 base pairs (bp). This is very similar to the genomes of the prototype and another HAdV-B14p1 virus (strain 303600; Lackland Air Force Base, USA), with genome sizes of 34,764 bp. Fortytwo putative coding regions (Table 1) were identified in this genome along with conserved non-coding sequence motifs that are in common with HAdV-B14p1 (strain 303600) and the prototype ( Table 2). A map of the genome organization of coding sequences from strain BJ430 is shown in Fig. 1. Note that the colors of the arrows are used for contrast only and to group the coding regions to the gene transcripts, e.g., E1A, L1, E2B, etc. The colors do not reflect any other relationships other than grouping the genes to their transcript, for example, the two red genes of ''L1'' have no relationships to the eight red genes of ''E3''.
When the genome of the HAdV-B14p1 strain BJ430 was compared to those from the HAdV-B14p1 strain 303600 and the prototype HAdV-B14p ''de Wit'' viruses, the percent nucleotide identities were 99.9% and 99.7%, respectively (GenBank acc. no. FJ822614 and AY803294). However, there were sequence differences amongst them, indicating that although the HAdV-B14p1 viruses may have a common ancestry, they have diverged from one another recently.
Between the two genomes of HAdV-B14p1, and using BJ430 as the reference, there are eleven base substitutions, four single base insertions and two deletions, with one involving a single base (A) and the other involving TT. On the other hand, strain BJ430 differs from HAdV-B14p by 94 base substitutions and, relative to strain BJ430, there are five insertions (T, AAA, A, T, AGAAAA) and six deletions (GTG, T, TT, A, A, AA). Of these, the sixnucleotide deletion results in the deletion of amino acids 251 and 252 (Lys and Glu) that are located in the fiber knob near the recognized putative receptor-binding site [21]. In regards to the fiber gene, the sequence from strain BJ430 was identical to strain 303600. For the hexon gene, there was one single base substitution (G to A), which resulted in a synonymous mutation.

Computational analysis of proteins
Presented in Table 3 are the percent identities of select proteins spanning the genome. Proteins from all prototype genomes from all HAdV species are represented, with one from each but including all subspecies B1 and B2 prototypes. One chimpanzee adenovirus (SAdV-B21) is included, as it segregates into the subspecies B1 subclade.

Phylogenetic analyses
A whole genome phlyogenetic analysis of strain BJ430 was carried out in the context of all of the species B genomes along with a representative genome from each of the other six HAdV species (Fig. 2). This provides an understanding of the sequence relationships of this respiratory pathogen to other pathogens. HAdV-B14p1 strain BJ430 forms a subclade with other members of subspecies B2. This branched with a clade of all subspecies B1 genomes, and both form a clade that is distinct from the clades representing the other species. With bootstrap values greater than 80, there is high confidence in these phylogenetic relationships.
Three genes were also analyzed using phylogenetics (Fig. 2). These three genes (E1A, hexon, and fiber) were selected based on the GenBank availability of their sequences from viruses isolated in the past and associated with respiratory disease. Unfortunately whole genome data were not available for these viruses. Given the expense of DNA sequencing in the past, this was expected. Note: ''HAdV-B11a'' is the old name given to a respiratory pathogen that has been since re-named ''HAdV-B55'' based on its genome data and pathogenicity profile [22]. It is noted here as the serotypebased name ''HAdV-B11a'' to be consistent with the GenBank entries. E1A, hexon and fiber genes from both strains of HAdV-14p1, and also from another recent HAdV-B14p1 isolate (Ireland), form subclades with their counterparts from HAdV-14p. No whole genome data for the Ireland HAdV-B14p1 has been published.

Discussion
In China, adenovirus is not recognized presently as a statutory infectious disease agent; therefore, reports of these viruses have not been incorporated into the National Infectious Disease Agent Monitoring System. Recently, adenovirus-linked respiratory infections are increasingly being reported in China, with multiple HAdV types identified as causative agents: HAdV-C1, -C2, -E4, -C6, -B3, -B7, -B11, -B14, and -B55 [9,14,[22][23][24][25]. Presumably the ''HAdV-B11'' is a mischaracterization due to an incomplete assessment of the genome, that is, the fiber gene that may identify it as a HAdV-B55 virus was not assessed [22]. It should be noted that 'true' adenovirus type 11 viruses are renal tract pathogens, i.e., HAdV-B11p, -B11b, and -B11c. Whether this increase in adenovirus-linked respiratory diseases is due to better clinical detection and reporting or due to social events, e.g., global travel and interactions, is a matter of great concern to the public health of China. Improved detection, identification, and high resolution analysis of the genomes of these highly contagious emergent and re-emergent pathogens will enable a better understanding of this virus, and will provide a basis for enacting measures to prevent or limit its effect on the population. Here we report the details of the analysis of a HAdV-B14p1 genome recovered from a respiratory infection in a six-month-old baby from Beijing. It is the first bioinformatics report of a HAdV-B14 virus in China. This pathogenic agent was confirmed using virus isolation and with both molecular detection and serology.
HAdV-B14p1 has re-emerged after an approximately 50-year absence globally; it was linked to several ARD outbreaks in the Americas (USA), Europe (Ireland), and Asia (China), with reports of virus isolation and identification from 2007-2010. This virus is of great concern as it is highly contagious, with morbidity and mortality reported in the USA and Ireland outbreaks. In China, to date, the only reported cases have been sporadic and limited to single or a few patients. Further, all of the reported cases in China presented with mild to moderate clinical symptoms, without fatalities. Access to the high resolution data embedded in the genomes provides the tools for understanding the evolution of the pathogen as well as insights into origins of new strains, dissemination, and epidemiology. The nucleic acids alignment of the HAdV-B14p1 genomes, along with three genes (hexon, fiber, and E1A) from these three widely separated global regions showed high levels of identity, suggesting a common ancestry. Sequence analysis shows a small number of genome changes between the recent China and USA strains of HAdV-B14p1, with nearly identical differences in both that are divergent from the genome sequenced from the prototype virus isolated in The Netherlands in 1955. These differences include a two amino acid deletion in the fiber gene, which suggests a shared lineage of these two HAdV-B11p1 viruses. Phylogenetic analysis of the three genes support the partition of HAdVs into species, originally based on biological, epidemiological, and structural attributes. Of particular note are the subclades of subspecies B1 and B2, which are closely related viruses. Of these, HAdV-B14 and -B55 are recent re-emergent respiratory pathogens; HAdV-B3, -B7, and -B16 are long-standing respiratory pathogens. As HAdV-B55 is a recombinant, involving a renal pathogen and a respiratory pathogen, and as HAdVs from species D HAdVs comprise prototypes of recombinogenic origins [26], clinicians and epidemiologists should factor this into surveillance and diagnostic protocols.
Phylogenetic analysis with available counterpart genes from the recent Europe (Ireland) isolate suggests it too is of the same lineage. This is particularly interesting, given no evidence of an epidemiologic connection for the strains from these different and widely separated countries. Regarding strain BJ430, the patient's parents are temporary workers in Beijing, originally from Nanchang (Jiangxi Province); they have no history of travel abroad and have limited if any interactions with foreigners. An epidemiological investigation was not able to determine the origins of the child's HAdV-B14p1 viral infection.
Presumably HAdV-B14p1 is transmitted from person to person, with the virus evolving and ''adapting'' in each round of replication [27]. New strains of HAdVs are problematic if there is no herd immunity present to counter that specific virus. Of additional concern is whether it may recombine with another HAdV to evolve into a more deadly or more contagious pathogen. An example of this is HAdV-B55, which has emerged and has been reported in China recently. The isolate QS-DLL was originally reported as ''HAdV-B11a'', based on its hexon and fiber genes, but is recognized as a new genome type HAdV-B55 due to its whole genome identity to HAdV-B14 as well as its pathogenecity profile as a respiratory rather than a renal tract pathogen. HAdV-B55 is a recombinant adenovirus comprising a large portion of the HAdV-B14 genome and a much smaller portion of the HAdV-B11 genome. It is also an emergent pathogen in China as HAdV-B55 was identified as the virus responsible for a large respiratory disease outbreak in 2006, which included one fatality, and has been found subsequently to cause respiratory disease outbreaks in several provinces. It is a serious public health issue in both military and civilian populations nationally in China (unpublished). HAdV-B14p1 is not a recombinant genome as its genome is nearly identical to that of the prototype, which was examined for recombination events [28]. This recombinant analysis was repeated and confirmed for this study, given the influx of additional genomes into GenBank. The phylogenetic survey and protein percent identity analysis support this view as well. This particular type 14 genome data will enable the identification and further understanding of any future emergent recombinant HAdV resulting from adenovirus type 14.
To determine the source of these infections and to assess if there is nosocomial transmission, archived samples were examined under the auspices of an annual surveillance program for respiratory tract diseases at the Beijing Children's Hospital. The data set was collected from in-house patients during 2010. This retrospective study identified seven samples positive for HAdV (two were HAdV-C2; one was HAdV-C5; two were HAdV-B3; one was HAdV-B7; and one was HAdV-B14p1, described here). In addition, 20 samples were positive for RSV; five for parainfluenza viruses (four were type 3 and one was type 1); and eight were positive for rhinovirus. These included co-infections noted by PCR assays (unpublished data). The six-month-old infant presented in this report showed no co-infection with other respiratory viruses, including other types of adenovirus; therefore, there is no evidence of nosocomial transmission of this reemergent adenovirus type 14. Both epidemiological and virus surveillance of this previously unknown respiratory disease pathogen should be enhanced to provide more data for further understanding of its impact on the population, especially as a foundation for strategies to limit their impact on the population. GenBank deposition. The genome and gene nucleotide sequences, along with the annotations of ''Human adenovirus B human/CHN/BJ430/2010/14[P14H14F14]'', in the naming format preferred by the National Center for Biotechnology Information (NCBI) [29] were deposited in GenBank. These are available under the following accession numbers: JN032132 (genome); JF420883 (hexon); JF420882 (fiber); and JF438997 (E1A). Figure 2. Phylogenetic analysis. E1A, fiber and hexon genes, as well as whole genome sequences of HAdV, are analyzed with respect to their phylogenetic relationships. Genes from the three recent HAdV-B14p1 strains are closely related to each other and to the prototype HAdV-B14p genome. It is remarkable that HAdV-B14p1 has a high level of sequence similarity to the prototype genome after approximately 50 years. Phylogenetic analysis was performed using the software MEGA v4.0 (Molecular Genetic Analysis Software; http://www.megasoftware.net), specifically applying a maximum-composite-likelihood method that generated neighbor-joining and bootstrapped trees of phylogeny with 1,000 replicates; all other parameters were set by default. doi:10.1371/journal.pone.0060345.g002