Species Based Synonymous Codon Usage in Fusion Protein Gene of Newcastle Disease Virus

Newcastle disease is highly pathogenic to poultry and many other avian species. However, the Newcastle disease virus (NDV) has also been reported from many non-avian species. The NDV fusion protein (F) is a major determinant of its pathogenicity and virulence. The functionalities of F gene have been explored for the development of vaccine and diagnostics against NDV. Although the F protein is well studied but the codon usage and its nucleotide composition from NDV isolated from different species have not yet been explored. In present study, we have analyzed the factors responsible for the determination of codon usage in NDV isolated from four major avian host species. The F gene of NDV is analyzed for its base composition and its correlation with the bias in codon usage. Our result showed that random mutational pressure is responsible for codon usage bias in F protein of NDV isolates. Aromaticity, GC3s, and aliphatic index were not found responsible for species based synonymous codon usage bias in F gene of NDV. Moreover, the low amount of codon usage bias and expression level was further confirmed by a low CAI value. The phylogenetic analysis of isolates was found in corroboration with the relatedness of species based on codon usage bias. The relationship between the host species and the NDV isolates from the host does not represent a significant correlation in our study. The present study provides a basic understanding of the mechanism involved in codon usage among species.


Introduction
Newcastle disease virus (NDV) has been isolated from the various avian species around the world. Newcastle disease can result in severe economic losses to the poultry industry worldwide. NDV belongs to the genus Avulavirus under the family Paramyxoviridae [1]. However, NDV has been isolated from different non avian species [2]. Complete and partial genome sequences of NDV isolated from different species are being regularly reported from different parts of the world. NDV genome encodes six different proteins in order of a nucleoprotein (N), a phosphoprotein (P), a matrix protein (M), a fusion protein (F), an attachment protein called the hemagglutinin-neuraminidase (HN), and a large polymeraseprotein (L) from 39-N-P-M-F-HN-L-59 direction. The envelope of NDV contains two surface glycoproteins, the HN and the F protein. Various studies have shown that the amino acid sequence at the F protein cleavage site is a key determinant of NDV virulence [3,4,5,6]. However, the cleavage of F protein in a wide range of host tissues is responsible for the systemic spread of NDV and also for its virulence [5]. The F protein being a surface glycoprotein is present on the NDV envelope and mediates its fusion with the host cell membrane. Furthermore, the F protein is assisted in its function by the HN protein and the productive infection of NDV requires cleavage of F protein precursor F0 (553 amino acid) into two subunits F1 and F2 [7]. The cleavage site amino acid sequence determines cleavage specificity and varies with the type of strain [8,9]. The F protein cleavage site of less virulent strains of NDV consists of monobasic or dibasic amino acid residues [9]. The F protein cleavability by intracellular proteases do not take place due to the presence of one or two basic amino acids thus, extracellular proteases are required to cleave F protein limiting the tropism of NDV to respiratory and enteric tracts. In most cells, the polybasic amino acids of F protein in velogenic strains act as the cleavage recognition site for furin like proteases [8,9].
It has been observed that F proteins of virulent NDV strains contain lysine (K) and arginine (R) at their cleavage site ( 112 R-R-Q-R/K-R 116 ), and a phenylalanine at position 117 of F 1 . This site is recognized by intracellular proteases, furin that cleave the polybasic cleavage site forming F1 subunit which is suggested to be the contributor of neurological effects [9,10,11]. It was postulated that the proper binding of furin protease, assisted by the presence of basic amino acids at the F protein cleavage site, leads to cleavage thus altering host-cell enzyme activity [11]. The variation in intracellular cleavage of virulent NDV F protein is observed to be dictated by the presence of arginine at position 113, 115 and 116 [12]. The substitution of the neutral amino acid, glutamine present at 114 position with an acidic or basic amino acid would attenuate NDV [4]. Furthermore, the attenuation of NDV by substituting valine to isoleucine at position 118 around the fusion cleavage site was also reported [4]. In another study, it has been shown that mutation at glycosylation site of F protein may enhance the virulence and pathogenicity of NDV [13]. It is also evident that mutation in the cytoplasmic domain of F protein can lead to the production of a hyperfusogenic virus that could ensure increased viral replication and pathogenesis in chickens [14]. Subtilisin like mammalian proteases, e.g., PC6 and PACE4 are reported as candidates for the cleavage of the F protein [15]. The F protein mediates virus penetration by inducing fusion between the viral envelope and host cell plasma membrane [3,5,12,16,17]. Various other factors are also accountable for the virulence of NDV [12]. In chicken and infected macrophages, F protein is a determinant of NDV virulence [6,18]. We have shown that gradients of NDV virulence are multigenic, F protein is a major player of NDV virulence and pathogenicity and, the superiority of F as an antigen over HN for better and sterile immunity against NDV infection [19]. Based on our current understanding F glycoprotein is the most suitable protein for investigating the infectious capability of NDV strains. Based on pathogenic studies NDV is categorized into three major pathotypes: lentogenic (low virulence), mesogenic (moderate virulence) and velogenic (highly virulent) [20].
The synonymous codon usage is the non-random selection of frequently used codons, the selection of which is limited by codon bias for different genes [21,22,23,24]. Synonymous codons are not used randomly as some codons are used more frequently than others [25]. Factors that may dictate synonymous codon usage bias include natural selection, mutational pressure, translational efficiency and compositional constraints of the mammalian genome [26,27,28]. Many studies have shown the contribution of codon usage bias patterns in order to understand the virus evolution [27,29]. Although the factors responsible for the pathogenicity of the NDV due to F glycoprotein have been studied but the non-random synonymous codon usage variation in NDV isolates from different species has not been reported. A comprehensive analysis of the codon usage bias patterns of NDV isolates from different species may be necessary to understand the codon usage patterns in the virus evolution while crossing the species barrier. This analysis may pave way for future understanding of selection pressure due to host metabolome interaction and enable deciphering of the virus evolutionary trend among different species.

Gene sequences
Two hundred and one complete F gene sequences for NDV isolates from four major host species were obtained from the GenBank ( Table 1). The four major avian species (chicken, duck, pigeon, and goose) were selected based on the availability of more than ten complete open reading frames (ORF) of the NDV F gene sequences from each in GenBank. The strains of NDV collected for the analysis were from all three major pathotypes, namely lentogenic, mesogenic, and velogenic. The codon usage pattern analysis was performed for the coding sequence of the F gene.

Codon usage analysis
The patterns of codon usage were analyzed for the two hundred and one F gene sequences for NDV isolated from four major host species. The relative synonymous codon usage (RSCU) values of each codon for the F gene were calculated using Codon W 1.4.4 software. The calculation of RSCU index enables      the characterization of synonymous usage of codons and is expressed as the ratio of the observed usage of codons to the expected value if all codons were used frequently. The RSCU value of 1 indicates that the codon is chosen randomly and evenly, RSCU .1 indicates that the codon usage is more frequent than the expected, and RSCU ,1 indicates that the codon chosen is less frequent [30].
RSCU calculation formulae: RSCU 5 g ij n j = P nj i g ij Where g ij 5 observed number of codons for i th codon for j th amino acid that has n j kinds of synonymous codon.

Effective number of codons
Quantification of the codon usage bias of the ORF in a gene is calculated by the effective number of codons (ENC or Nc). The Nc best estimates the absolute synonymous codon usage bias in a gene. ENC calculation formulae: ENC 5 2zsz 29= s 2 z 1{s ð Þ 2 È É À Á where s represents the value of G+C at the third codon position (GC3) [31]. Nc has been calculated using the Codon W 1.4.4 program. The Nc value was correlated to the percentage of GC3s. In case of biased codon usage only one codon for each amino acid is used and the Nc equals to 20. In case the Nc value is 61 then there is no bias in codon usage and all synonymous codons are equally used.

Codon adaptation index (CAI)
The CAI requires a reference set of highly expressed known gene and enables the estimation of the amount of bias (Codon W 1.4.4 program). The high value of CAI refers higher codon usage bias and expression level [32,33,34]. The CAI index is defined as the geometric mean of relative adaptiveness values. Nonsynonymous codons and termination codons (dependent on genetic code) are excluded from the calculation. CAI values range from 0 to 1, with higher values indicating a higher proportion of the most abundant codons [35].

Chemical property analysis of amino acids using various indices
Aliphatic Index The aliphatic index (AI) refers to the relative volume of a protein that is occupied by aliphatic side chains (alanine, isoleucine, leucine and valine) and contributes to the increased thermo-stability observed for globular proteins. The AI of a protein is calculated according to the following formula [36].
Here: X(A), X(V), X(I), and X(L) are mole percent (100 X mole fraction) of alanine, valine, isoleucine, and leucine, respectively. The coefficients a, b are the relative volumes of valine side chains (a 5 2.9) and of Leu/Ile side chains (b 5 3.9) relative to that of alanine side chains.

Grand average of hydropathy (GRAVY)
GRAVY is calculated as the arithmetic mean of the sum of the hydrophobic indices of each amino acid [37].

Correspondence analysis
Principal component analysis (PCA) was performed using the software XLSTAT version 2013.5.02. The PCA provides information regarding the major trend involved in the codon usage patterns measured from RSCU values and are calculated from 59 codons excluding methionine, tryptophan and all termination codons [38]. Correlation analysis was performed for the first two axes of PCA (PC1 and PC2). Pearson rank correlation analysis was performed to infer the relationships between the two axes of PCA and different variables like GRAVY, aromaticity index, aliphaticity index and GC3s.

Major avian-host species of NDV
The phylogenetic relationship of the four major avian host species of NDV was studied using the MEGA6 software. The mitochondrial DNA is an important data source in building the phylogenetic [39]. Advantages of mitochondrial genome over nuclear gene is that they are unlikely to have undergone many intra specific recombination events [40]. In order to confer the phylogenetic relationship of the major avian host species the mitochondrial genome reference sequence for the four major avian host species

NDV strains
The phylogenetic relationship between the NDV strains was studied using MEGA6 software. The 201 complete F gene sequences were obtained from GenBank ( Table 1). For the ease of understanding the labeling was done as accession no/ virulence/host species. In the label L, M, V, CH, DK, GE and PN stand for lentogenic, mesogenic, velogenic, chicken, duck, goose and pigeon respectively. The neighbor-joining method was used with parameters including pairwise deletion, 1000 replicates for bootstrap analysis and Jukes-Cantor substitution model.

Codon usage bias of fusion (F) gene in four major host species
The Nc value is the determinant of the degree of bias in codon usage. The four major host species showed a range of Nc values. For chicken, the maximum and minimum Nc values were 60 and 52.3, respectively. For duck, the maximum and minimum Nc values were 59.3 and 54.4, respectively. For pigeon, the maximum and minimum Nc values were 60.5 and 53.8, respectively. For goose, the maximum and minimum Nc values were 56.8 and 53.1, respectively ( Table 1). The mean Nc was found to be maximum for duck and minimum for the goose. The GC3 is the amount of G + C at the third position whereas the GC is the total amount of G +C. The pigeons showed the maximum value of mean GC3s while the minimum value of mean GC3s was calculated for chicken. Similarly, maximum value of mean GC was calculated for a duck while chicken showed the minimum value ( Figure 1). Slight variation in the values of mean GC3s, GC and Nc for duck and pigeon was observed. The mean GC was found to be greater than mean GC3s for all four species.

Species specific identification of optimal codons
Analysis of codon based on RSCU values showed 21 optimal codons for 19 different amino acids, preferentially used for F gene of NDV (Table 2). A preferential usage of the optimal codons was compared against the species' specific codon usage and the frequencies of optimal codon usage (FOP) in all species were calculated [41]. Species specific codon usage was compared with the optimal codon usage. However, analysis within species showed maximum similarity with the optimal codon usage for chicken followed by goose, duck and minimum for pigeon isolate (Table 3).

Nc plot
The Nc value is plotted against the corresponding GC3s. The genes having their codon selection constrained to GC composition are supposed to lie on the continuous curve which represents a random codon usage. If the gene lies below the curve it represents mutational bias and translational selection. All the points were found to lie just below the curve ( Figure 2). Furthermore, a significant positive correlation between the values of GC3s and Nc was observed. There was almost no difference between the CAI values for all the isolates. The average CAI was found to be 0.174 (p ,0.05), which is apparently low as CAI ranges from 0 to 1. Principal component analysis using Pearson correlation was performed to evaluate the relationship between the first two axes of PCA, GRAVY, aromaticity, GC3s and aliphatic index ( Table 4). The GRAVY, aromaticity, GC3s were found to have no correlation with both axes. To represent the variation in position of each codon a scatter plot for the optimal codon usage is plotted between the PC1 and PC2 (Figure 3). The PC1 accounts for 89.10% and PC2 accounts for 7.45% of the total variation. Thus the first axis accounts for the major impact on total variation in synonymous codon usage as compared to an appreciable impact by the second axis. The correlation between the isolates from four species is represented in the plot of isolates against the PC1 and PC2 (Figure 4).

Phylogenetic analysis
The phylogenetic tree of the major avian host species of NDV represents the relationship between them. It was observed that Anas platyrhynchos (Duck) and Gallus gallus (Chicken) are ancestrally more correlated whereas; Anser anser (Goose) and Columba livia (Duck) bear a closer ancestral relationship ( Figure 5).  The phylogenetic tree of the 201 NDV isolates represents the relationship which can be categorized on two bases ( Figure 6). The first basis is being species from which the strain was isolated and the second being virulence shown in that species based on the F protein cleavage site. It was observed that on the basis of species except one pigeon isolate (FJ986192) lying in region 3 all the pigeon isolates were seen to lie in region 2 ( Figure 6). The region 1 and 5 consisted of isolates from chicken, duck and goose whereas; the region 4 consisted of isolates of only duck. It is evident from the phylogenetic tree that isolates from pigeon clearly out lies and is not found mixed as is seen in case of chicken, duck and goose. A group of isolates from duck is also seen in region 4 to out lie from the rest. On the basis of virulence the phylogenetic tree can be seen to represent a similar trend, all the isolates from duck species lying in region 4 are lysogenic. Most of the mesogenic isolates from chicken are seen to lie in region 3. Very few reported strains from pigeon are lentogenic and mesogenic whereas most are velogenic and seen to group together in region 2. The region 1 comprises of most of the velogenic isolates of chicken, goose and duck.   Discussion NDV is one of the most important diseases of poultry and is endemic in many parts of the world. Occasionally the virus has also been reported from many different animal species. The F protein determines the extent of infectivity of NDV [3,5]. Although the major players of NDV virulence and pathogenicity are F and HN, but its gradients are largely multigenic. The F being the major antigenic  Synonymous Codon Usage in Fusion Protein Gene of NDV determinant in NDV changed our views of considering HN as a major protective antigen [19]. The interaction between the host cell membrane and the fusion protein may depend on the type of species that gets infected with NDV. It has  been shown that mutational pressure plays an important role in codon usage bias in NDV [42]. The codon bias in the F protein may vary within the species that are infected with NDV thus it is significant to address codon bias in F protein of NDV. Although NDV has been isolated from many other avian as well as non- Figure 6. Phylogenetic tree illustrating relationship among the 201 Newcastle disease virus (NDV) strain (labelled as accession no/pathogenic/ species) following the neighbour-joining method using MEGA6 software. Parameters include: pairwise deletion, 1000 replicates for bootstrap analysis and Jukes-Cantor substitution model, the rate variation among sites was modelled with a gamma distribution (shape parameter 5 5). L, M and V stands for lentogenic, mesogenic and velogenic strains. CH, PN, DK and GE stands for the chicken, pigeon, duck and goose, respectively. doi:10.1371/journal.pone.0114754.g006 avian hosts but, their meager number in the GenBank makes the data insufficient to consider for the present study. Four major avian species, namely chicken, duck, pigeon and goose were chosen for the present study considering the fact that most of the NDV strains are isolated from these species. Although only few NDV sequences are reported from goose, a total of 201 GenBank entries were included in the study covers the major isolates from four avian species. There is an obvious difference between these values for the isolates with lower Nc values suggesting that a lower Nc represents greater bias in codon usage. Amongst the four major host species maximum bias is for isolate from goose and least for isolate from duck. Although, there is a clear variation in the codon usage by the isolates from the four species still similarities within isolates can be observed. A plot of Nc values against GC3s effectively demonstrates heterogeneity [31].
The results suggest that the GC mutational bias is maximum in case of isolates from goose and minimum in case of isolates from duck. Low value of CAI obtained after analyzing all the isolates from the four species suggest lower codon usage bias and its low expression level. The frequencies of aromatic and aliphatic amino acids were found to have no association with the variation in codon usage in the F gene. Our results showed that the relatedness between the isolates from four species can be grouped according to the minimum positional variation. The isolates from chicken, goose and duck have least variation and can be considered as closely related isolates in terms of codon usage. In contrast, marked variation among isolates from pigeon suggests their distance in terms of codon usage. It corroborates with the fact that greater the distance between the isolates, greater is the variation in codon usage. It is also evident from the phylogenetic tree of NDV that isolates of pigeon clearly lies separately. Moreover, some of the isolates from duck also lie separately. Thus, the phylogenetic tree ( Figure 6) and the relatedness between the species based on codon usage bias ( Figure 4) clearly complement each other. The relationship between the host species and the NDV isolates from the host does not represent a significant correlation in our study. To the best of our understanding the present work is the most comprehensive codon bias analysis of a viral protein from species' point of view. It would be interesting to statistically investigate the NPL complex of NDV in terms of its codon usage and its role in virulence if any.