Epitope-based universal vaccine for Human T-lymphotropic virus-1 (HTLV-1)

Human T-cell leukemia virus type 1 (HTLV-1) was the first oncogenic human retrovirus identified in humans which infects at least 10–15 million people worldwide. Large HTLV-1 endemic areas exist in Southern Japan, the Caribbean, Central and South America, the Middle East, Melanesia, and equatorial regions of Africa. HTLV-1 TAX viral protein is thought to play a critical role in HTLV-1 associated diseases. We have used numerous bio-informatics and immuno-informatics implements comprising sequence and construction tools for the construction of a 3D model and epitope prediction for HTLV-1 Tax viral protein. The conformational linear B-cell and T-cell epitopes for HTLV-1 TAX viral protein have been predicted for their possible collective use as vaccine candidates. Based on in silico investigation two B cell epitopes, KEADDNDHEPQISPGGLEPPSEKHFR and DGTPMISGPCPKDGQPS spanning from 324–349 and 252–268 respectively; and T cell epitopes, LLFGYPVYV, ITWPLLPHV and GLLPFHSTL ranging from 11–19, 163–171 and 233–241 were found most antigenic and immunogenic epitopes. Among different vaccine constructs generated by different combinations of these epitopes our predicted vaccine construct was found to be most antigenic with a score of 0.57. T cell epitopes interacted strongly with HLA-A*0201 suggesting a significant immune response evoked by these epitopes. Molecular docking study also showed a high binding affinity of the vaccine construct for TLR4. The study was carried out to predict antigenic determinants of the Tax protein along with the 3D protein modeling. The study revealed a potential multi epitope vaccine that can raise the desired immune response against HTLV-1 and be useful in developing effective vaccines against Human T-lymphotropic virus.


Introduction
Human T Lymphocyte Virus 1 or HTLV-1 is a type C retrovirus that possesses proteins capable of oncogenesis [1]. Belonging to the Retroviridae family (sub-family: Orthoretrovirinae; genus: Deltaretrovirus), HTLV-1 has been identified to be the first human virus with a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 replication mechanism. A protein known as the HBZ protein (HTLV-1 basic leucine zipper factor) is encoded by the antisense transcript of the proviral genome [27] which upon interaction with the Jun family of AP-1 transcription factors JunB, c-Jun, JunD [45][46][47], cAMP response element binding (CREB) and CREB binding protein (CBP)/p300 regulate both cellular and viral gene transcription [48][49][50] and also plays a vital role in the proliferation of infected T cells [51][52][53].
Although the HTLV1 mediated pathogenesis is the cumulative result of multiple types of proteins, Tax protein has been found to be a key trigger element of a nuance of cellular events like resistance to apoptosis, cell signaling, cell cycle regulation and interference with checkpoint control and inhibition of DNA repair through its transactivational properties [26]. In patients with HTLV-1-Associated Myelopathy/Tropical Spastic Paraparesis (HAM/TSP), Tax protein is expressed at an elevated level regardless of the proviral load and therefore can be considered as an overt marker for HAM/TSP prognosis and a target for the development of therapeutics [1]. Tax protein has also been found to play significant role as a leukemogen and has successfully managed to arrest apoptosis in T-lymphocytes in vitro and cause cancer in transgenic animals [54,55]. Findings have established the fact that in 60% of the leukemic cases, Tax expression is undetectable [56]. This could be due to the fact the high levels of Tax protein in an infected cell make them susceptible to be attacked by Cytotoxic T Cells (CTL) which could be a reason for depleted levels of Tax expression [57].
For more than two and a half decades, multiple studies have been conducted on understanding the pathogenesis of HTLV-1 in order to develop therapeutics and vaccines against the virus. A favourable target for previous approaches to developing HTLV-1 vaccines and therapeutics has been the envelope glycoprotein (Env) since it is important for creating the baseline for viral gene transmission to a non-infected cell through cell mediated fusion [58]. As for a non-structural protein, Tax can also be a promising target for its importance as a determining factor of viral persistence and pathogenesis [1,39,40,55] for playing role as a key transcriptional activator [39,40]. Asymptomatic carriers, if not at an elevated level, have Tax protein expressions in infected cells (Median probability Tax protein expression of an infected cell in an Asymptomatic Carrier is 28% [1]). The goal of the current study is to map small continuous antigenic amino acid sequences in the Tax protein (epitopes) and selection of the most tangible Tax protein epitope candidate for in silico vaccine development against HTLV-1 following the principles of comparative modeling, stereochemical and epitope conservation analysis. In vivo, epitopes are presented to T cells through Human Leukocyte Antigen (HLA) molecules and are also detected by B cell receptors. The identification of epitopes by both T cells and B cells is a forerunner for the development of adaptive immunity against a specific antigen which the current study aims to develop against HTLV-1.
It is natural to consider the envelope glycoprotein of HTLV to be a more favourable candidate for vaccines; firstly because the envelope glycoprotein plays an important role in initiating viral pathogenesis by acting as the anchoring protein between the virus and the receptor and secondly, because previous studies have found highly immunodominant epitopes in the glycoprotein that are likely to evoke the immune cells [59][60][61]. However, factors like conservancy of the glycoprotein as well as nature of the epitopes should also be taken into account. In a previous study conducted on the molecular characterization of HTLV-1 gp46 glycoprotein, minor sequence variations were observed across different geographical regions as well as clinical characterization of the samples [62]. Additionally, the envelope glycoprotein is sensitive to mutations and will render non-expressive and dysfunctional if it undergoes any kind of mutation [63]. In case of epitope mapping, often in protein sequences immunodominant epitopes coincide with neutralizing epitopes, which can lead to faulty and diminished levels of epitope identification by immune cells [64]. A similar scenario was observed for envelope glycoprotein gp46, when Palker et al.'s study found that the central neutralizing region of gp46 (190)(191)(192)(193)(194)(195)(196)(197)(198)(199)(200)(201)(202)(203)(204)(205)(206)(207)(208)(209) and the C-terminal neutralizing epitope (296-312) as Desgranges et al. defined had immunodominant properties [65]. Moreover, a study conducted by Horal et al., highlighted overlapping immunodominant and neutralizing epitopes in the regions including 176-199, 190-212, 224-244, 240-262, and 292-314 [66]. These facts imply the chances of improper identification of epitopes by immune cells and subsequent failure of the vaccine itself to provoke proper immune response. The Tax protein on the other hand is predominantly expressed in infected cells as transcription activators as well as exosomes, even in the early stages, therefore identification of the epitopes by immune cells and building the adaptive immune system against HTLV will be a more agile process [67][68][69].
The approach of this study to designing a Tax protein epitope based vaccine against HTLV-1 in silico might prove to be effective in activating the acquired immune system and stimulating specific humoral and cell mediated immune response against HTLV-1 in healthy individuals; treating both symptomatic and asymptomatic individuals with HTLV-1 infection and in preventing the development of further neurological, pulmonary, ophthalmological, rheumatological and urological co-infections in infected individuals.

Materials and methods
A flow chart describing the overall procedures of construction of a multi-epitope based peptide vaccine for HTLV-1 Tax protein has been illustrated in Fig 1.

Protein sequence retrieval
The amino acid sequence of Trans-activating transcriptional regulatory protein of HTLV-1 (Tax protein) was retrieved from the NCBI Protein database in FASTA format. The NCBI Protein database consists of a collection of sequences retrieved from SwissProt, PIR, PRF, PDB. It also includes translations from elucidated coding regions in GenBank, RefSeq and TPA. Numerous bioinformatics and immunoinformatics tools and databases were used to identify and analyse physicochemical, structural and functional properties of the selected Tax protein sequence and prediction of B-cell and T-cell epitopes.

Primary and secondary structure analysis
The primary structure of the protein was evaluated using ProtParam tool of the Expasy server [70]. ProtParam computes various physical and chemical characteristics of given protein sequence, such as number of amino acid, isoelectric point (pI), molecular weight, instability index, aliphatic index, number of total atoms, grand average of hydropathicity (GRAVY). The secondary structure was analyzed using self-optimized prediction method with alignment (SOPMA) [71] and PSIPRED [72] to access properties like transmembrane helices, globular regions, bend region, random coil and coiled-coil region and to obtain a graphical presentation of the protein sequence of interest (HTLV-1 Tax protein).

Tertiary structure analysis
The three dimensional structure from the amino acid sequence of the selected HTLV-1 Tax protein was predicted and analyzed using Phyre2 and (PS) 2 Servers [73][74][75]. Phyre2 server allows 3D modelling of a provided amino acid sequence using the alignment of hidden Markov models by the help of HHsearch [76], improving accuracy of alignment and detection rate in a pronounced way. Phyre2 also models 3D structures for regions in a query sequence that does not have any detectable homology with recognized structures by incorporating an ab initio folding simulation called Poing [77]. The (PS) 2 homology modelling server operates through a substitution matrix for detecting homologous proteins by combining both sequence and secondary structure information [74,75].

Model refinement
The generated 3D model of HTLV-1 Tax protein was refined at an atomic level based on a reference 3D protein model using high resolution protein structure refinement tool ModRefiner [78] and 3DRefine [79].

Validation of protein structure
Additional evaluation of the 3D structure was done by the help of multiple web tools. Ramachandran plot was generated using PROCHECK [80] to further assess the Phi/Psi angles for better understanding of the protein backbone confirmation. PROCHECK [80] was also used for stereochemical analysis of the predicted protein using a residue-by-residue approach. An additional evaluation of the generated 3D structure was carried out using ERRAT [81] and Verify 3D [82].

B-cell epitope prediction
Linear B cell epitopes were predicted using sequential B-cell epitope predictor BepiPred-2.0 under IEDB (Immune Epitope Database) server [83]. Selection of the predicted B cell epitopes was done based on transmembrane topology (to discriminate between soluble and membrane proteins) and antigenicity for which TMHMM server 2.0 [84] and Vaxijen 2.0 web server [85] were used respectively. Allergenicity and toxigenicity of the selected epitopes were checked using AllerTOP v. 2.0 [86] and ToxinPred [87] respectively.

T-cell epitope prediction
NetCTL 1.2 server was used to predict CTL epitopes for the selected HTLV-1 Tax protein sequence [88]. Binding affinity towards multiple HLA class molecules was assessed using MHC I binding prediction tools of IEDB server [89]. In case of predicting MHC class I binding epitopes, the Stabilized Matrix Base Method (SMM) was used in order to calculate the IC50 values [90]. Epitopes having an IC50 value less than 200nm were selected for the steps following.

Population coverage calculation
Population coverage for the selected epitopes was observed using the IEDB resource for population coverage [91] considering denominated MHC restriction of T cell responses and polymorphic HLA combinations for different regions of the world. The selected epitopes were then scrutinized for their antigenic and immunogenic potentials using Vaxijen 2.0 [83] and immunogenicity analysis resources of the IEDB server [92]. The IEDB analysis resource was also used for assessing epitope conservancy. AllerTOP v. 2.0 [86] and ToxinPred [87] were used for assessing allergenic and toxigenic aptitude of the selected epitopes.

3D epitope structure prediction and molecular docking analysis
To substantiate the interaction between the selected T cell epitopes and HLA molecules, in silico docking was carried out between the molecules using AutoDock Vina [93]. For the process, HLA-A � 0201 was selected due to having a high genetic frequency. In order to stimulate the docking, a crystal structure of HLA-A � 0201 was retrieved from RCSB protein database (Research Collaboratory for Structural Bioinformatics) [94] having the PDB ID 1B0R. PEP-FOLD [95] server was used to predict peptide structures of the selected T cell epitopes based on antigenicity. To carry out docking simulation first of all the peptide ligand (Influenza Matrix Peptide) bound to HLA-A � 0201 was removed by PyMol software package. This removed peptide ligand was then docked with the binding groove of HLA-A � 0201. The binding groove had center box coordinates X: 29.900, Y: 0.614, Z: 49.512 and the dimensions of the grid box were X: 40, Y: 40 and Z: 40 (unit of the dimensions, Å) targeting the active site of the protein.

Construction of multi epitope vaccine, modeling, and validation
The selected B cell and T cell epitopes based on antigenicity and allergenicity were linked in order to create a fusion peptide using GPGPG and AAY peptide linker [96] in multiple combinations. VaxinPad [97] was used to predict a peptide based adjuvant for the amino acid sequences generated using the selected epitopes. Selection of the adjuvant was based on immunomodulatory potential accompanied with properties like hydrophobicity, hydrophilicity, steric hindrance, solvation, hydropathicity and isoelectric point (pI). EAAK rigid peptide linker was used to link the selected adjuvant in the N-terminal end of the multi epitope combinations. The multi epitopes were then assessed for their antigenic and allergenic potencies using Vaxijen 2.0 web server [85] and AllerTOP v. 2.0 [86]. Multi epitope sequences exhibiting the maximum antigenic potential and non allergenicity were selected for tertiary structure evaluation. I-TASSER (Iterative Threading ASSEmbly Refinement) homology modelling tool was used to predict 3D structures [98][99][100] for the selected multi epitope sequences. The final 3D structures were further validated using PROCHECK [80], ERRAT [81] and Verify3D [82]. Upon completion of evaluation, the 3D structures were visualized using Pymol [101].

Disulfide engineering of the 3D multi epitope vaccine constructs
To check if the structurally validated 3D vaccine constructs were accessible to the addition of novel disulfide bonds to provide the protein with increased stability and decreased conformational entropy [102], Disulfide by Design 2 (DbD2) was used for disulfide engineering of the 3D multi epitope vaccine constructs [102].

Codon adaptation and in silico cloning
In order to express multi epitope vaccine construct selected based on antigenicity, peptide validation parameters and binding affinity towards HLA molecules in an Escherichia coli K12 strain codon optimization was done using Java Codon Adaptation Tool (JCat) [103]. The optimized codon sequence was further screened for expression parameters, codon adaptation index (CAI) and percentage of GC-content. For in silico cloning simulation, a bacterial, kanamycin resistant expression vector pETSUMO which consists of a 6x polyhistidine tag was selected [104]. The in silico restriction cloning simulation between the adapted codon sequence and pETSUMO expression vector was carried out using Snapgene software. The use of a pET-SUMO expression vector will facilitate TA cloning as well as provide an easier screening modus operandi for transformed cells. The hexa histidine tag present in pETSUMO expression vector will facilitate swift detection of the recombinant vaccine construct in immunochromatographic assays [105].

Molecular docking of vaccine constructs with TLR4
In order to predict the binding affinity between the Multi Epitope Vaccine Construct and Toll like receptor 4 [TLR-4 (PDB: 4G8A)], a member of the toll like receptor family of protein involved in triggering the innate immunity system [106], molecular docking approaches were implemented computationally. PatchDock (Beta 1.3 Version) docking server [107,108] was adopted to receptor-ligand docking and the generated Protein Data Bank (PDB) file of the protein-peptide docking complex was visualized in BIOVIA Discovery Studio Visualizer v12.1.0.15350.

Sequence retrieval, transmembrane topology and antigenicity features analysis of the HTLV-1 Tax protein sequences
The complete amino acid sequences of HTLV-1 Tax protein were retrieved from NCBI database. Seven amino acid sequences of the protein were retrieved in FASTA format with accession number AYN25353.1, BBA30574.1, BBD74587.1, AYN25375.1, AYN25364.1, AYN25342.1, AYN25331.1 and BAX77785.1. Antigenicity of the protein was confirmed through assessment in VaxiJen v2.0 server and the Tax protein amino acid sequence with accession number BAX77785.1 was found to be most antigenic with a score of 0.4610 at threshold 0.4. Antigenicity is a prerequisite for a protein or amino acid sequence to be a vaccine candidate. The transmembrane topology of the protein was determined by TMHMM Server v. 2.0 and it was found that the protein fulfills the criteria of exo-membrane protein.

Primary and secondary structure analysis
A physico-chemical analysis of the protein primary structure was performed by the Expasy server's ProtParam tool. From the result generated by ProtParam it was found that the 353 amino acids long HTLV-1 tax protein has a molecular weight of 39470.53Da with a high aliphatic index of 87.56%. Having pI of 6.45, less than 7, the protein belongs to negatively charged proteins. Instability index of 48.9 indicates that the protein is unstable in vitro. Interestingly it has a negative grand average hydropathy score accompanied with a high extinction coefficient as summarized in Table 1. Secondary structural features analysed by SOPMA and PSIPRED reveals the abundance of random coil (54.96%) followed by extended strand (21.53%), alpha helix (17.85%) and 5.67% of beta turn. in HTLV-1 Tax

Tertiary structure prediction, refinement and validation
The tertiary structure of a protein is critical to its function and stability. 3D structure of the protein was predicted using the Phyre2 server and PS 2 server. PDB ID 2I46 was selected as a template. The structure was predicted with 75% reliability. A pymol generated 3D structure of

PLOS ONE
Epitope-based universal vaccine for HTLV-1 HTLV-1 Tax protein is displayed in Fig 3a. Major local distortions contained by homology based modeling include irregular H-hydrogen bonding networks, steric clashes and unphysical phi/psi angles which reduce the structure models usefulness for high-resolution functional analysis. Protein model refinement is required to overcome these distortions. Refinement of the predicted 3D structure is required for bringing it closer to its native structure. 3Drefine was used for the structure refinement. The 3Drefine tool utilizes repetitive optimization of hydrogen bonding networks along with energy minimization at atomic-level on the optimized model by utilizing knowledge-based force fields and composite physics for efficient protein structure refinement. It generated 5 refined models with RMSD ranging from 0186 to 0.357. The best refined structure had the highest 3D score of 34961.1 accompanied by least RMSD 0.186.
After refinement the refined model was validated using PROCHECK, ERRAT and VERIFY 3D. Ramachandran plot generated by PROCHECK showed that 77.7 7% of the residues fall within the most favoured region (Fig 3b). Results generated from ERRAT showed an overall quality factor of 39.86 for the best refined structure (Fig 3c) and Verify3D depicts that only 27.20% of the residues in the protein had 3D-1D score equal or above 0.2.

PLOS ONE
Epitope-based universal vaccine for HTLV-1

B cell epitope prediction
Overall antigenicity assessment of HTLV-1 tax protein by VaxiJen server regarded protein as a promising antigen at threshold 0.4. Transmembrane topology analysed by TMHMM predicted that all the residues of the protein are exposed outside the cell membrane. Epitope was predicted by the IEDB server using Kolaskar and Tongaonkar antigenicity and Bepipred linear epitope prediction method. Based on antigenicity test performed by VaxiJen server 7 epitopes ( 324 KEADDNDHEPQISPGGLEPPSEKHFR 349 , 252 DGTPMISGPCPKDGQPS 268 , 114 77 SFPTQRTS 84 ) were found to be antigenic ( Table 2). TMHMM is used to evaluate the transmembrane topology of the antigenic B cell epitopes. EHQITWDPIDGR and SFPTQRTS with antigenicity score 1.3783 and 0.7813 respectively were exposed inside while epitope LSFPDPGLRPQ, PITHTTPNIPPS, PFRNGYMEPTLGQ, DGTPMISGPCPKDGQPS and KEADDNDHEPQISPGGLEPPSEKHFR having antigenicity score of 1.3581, 0.9582, 0.9083, 0.7207 and 0.7170 were exposed outside.
Toxicity of the antigenic epitopes was accomplished by the ToxinPred tool. All the examined epitopes reported to be non-toxic. Hydrophilic epitopes were subjected to assessment for electricity using AllerTopv2.0. KEADDNDHEPQISPGGLEPPSEKHFR and DGTPMISGPCPKDGQPS demonstrated no allergenicity while PFRNGYMEPTLGQ, PITHTTPNIPPS and LSFPDPGLRPQ were predicted as potential allergens as summarized in Table 4. Based on the antigenicity, surface accessibility and allergenicity 2 epitopes KEADDNDHEPQISPGGLEPPSEKHFR and DGTPMISGPCPKDGQPS spaning region 324-349 and 252-268 respectively were finally selected for vaccine model construction.

T cell epitope prediction
NetCTL server and IEDB T cell epitope prediction tools were used to predict T cell epitopes of HTLV-1Tax protein. Among the 94 MHC class I epitopes predicted based on combined score 64 epitopes wear selected with IC50 value less than 200nm. 15 epitopes among them we are found to interact with multiple which were selected for antigenicity assessment at threshold level of 0.4. Epitopes with position 11-19, 151-159, 163-171, 178-186, 233-241, 297-305 and 307-315 were found antigenic (Table 3). LLFEEYTNI, QLGAFLTNV, LLFGYPVYV, ITWPLLPHV, GLLPFHSTL peptide sequence were found immunogenic when analysed with IEDB class I immunogenicity prediction tool. Immunogenic epitopes are listed in Table 4. From the transmembrane topology determination of these immunogenic epitopes by TMHMM server v. 2.0 it was found that all of these epitopes fulfills criteria of exomembrane protein. Toxicity was analysed by ToxinPred which indicated all 5 immunogenic epitopes as non toxic. The Allergenicity test carried out by AllerTop v2.0 depicted LLFGYPVYV, ITWPLLPHV and GLLPFHSTL as non allergen. These epitopes were also reported non toxic by ToxinPred (Table 4).

Population coverage
As the different HLA alleles have different rates of occurrence in different ethnicities in the world, the analysis of individuals coverage by the respective HLA alleles of the predicted Tcell epitopes is an important part of an effective vaccine design. The population coverage analysis showed maximum coverage in Mexico (90.21%) followed by England (89.88%), South Africa (81.56%), North America (76.82%), South America (74.95%) and Japan (70.92%). Very low population coverage of 1.20% was observed in the United Arab of Emirates while it was more than 70% in three major parts of Asia, 70.48%, 71.78% and 74.53% respectively in Southeast Asia, Northeast Asia and East Asia. Cumulative population distribution of HLA alleles for the selected T cell epitopes is pictured by Tableau Public online public software in

Molecular docking simulation of the HLA allele-peptide interaction
Binding energy achieved for the peptide ligand (Influenza Matrix Peptide) was -5.7 Kcal/mol. A higher binding energy was acquired for our predicted epitopes ITWPLLPHV, LLFGYPVYV and GLLPFHSTL as follows -8.8, -8.

Multi-epitope vaccine construction
Three CTL epitopes (LLFGYPVYV, ITWPLLPHV and GLLPFHSTL) and two B cell epitopes (KEADDNDHEPQISPGGLEPPSEKHFR, DGTPMISGPCPKDGQPS). B cell epitopes were linked using GPGPG linkers while CTL epitopes by AAY linkers. A peptide adjuvant PMISWPCPKD was selected based on immunogenicity and high molecular weight using Vax-inPred server. The Adjuvant was added to N termini of the vaccine construct using EAAAK linker. EAAAK linkers were used on both the terminals of the vaccine construct. Antigenicity score of the Multi-epitope vaccine construct was 0.57 as predicted by VaxiJen v2.0 server. A schematic diagram of 109 aa long multi-epitope vaccine is shown in Fig 6.

Structure prediction of multi-epitope vaccine
Primary and secondary structure analysis provides insight about stability of a peptide. Various physicochemical parameters were determined by ExPASY's ProtParam server. A high molecular weight of 11.523kDa is an indication of the antigenic nature of the vaccine construct. Theoretical pH is valuable in determining the buffer system for the vaccine purification process which was computed 5.18 representing the slightly acidic nature of the vaccine. Number of positively and negatively charged residues were found to be 8 and 13 respectively. Assuming all cysteine residues are reduced, the extinction coefficient was 18450M-1cm−1 at 280 nm measured in water. The calculated half life was found 1 h in mammalian reticulocytes (in vitro), while >10 h in E. coli (in vivo) and 30 min in yeast (in vivo). The instability index was 38.16. Having an instability index below 40 classified the vaccine as stable. The aliphatic index was high and assessed to be 65.60 whereas the grand average of hydropathicity (GRAVY) was -0.367. A high aliphatic index supports the thermostability of the designed

PLOS ONE
Epitope-based universal vaccine for HTLV-1 vaccine while a negative GRAVY score represents its hydrophilic nature. Secondary structure of the vaccine was assessed by SOPMA server using 109 aa long sequence. It predicted that 25.69%, 4.59%, 4.59% and 65.14% amino acids are involved in α-helix, extended strand, βturn, and random coil, respectively. A probability score graph of frequency of helix, strand, turn, and coil at each amino acid position in the secondary structure of the final vaccine construct. Tertiary structure of the multi-epitope vaccine prediction by I-tasser which is an ordered approach for the prediction of protein structure and function. It identifies structural templates from the PDB by multiple threading approach LOMETS. Structure generated by the I-tasser is shown in Fig 7.

Disulfide engineering, codon adaptation, and in silico cloning of the vaccine construct
Disulfide engineering was done through Disulfide by Design 2 (DbD2) which predicted a total of 20 pairs of residues for the probable disulfide bonds formation (Table 6) But, after considering the Chi3 value between −87 and +97 and the energy score less than 4 kcal/mol only 4 pairs of residue pairs were selected for making disulfide bonds. JCat server was used for the adaptation of codon usage of the designed vaccine constructs for E. coli K12. An optimized codon sequence of 329 nucleotides was provided by JCat. The percentage of the GC-content of 50.78% and CAI of 1.0 of the optimized codon sequence ensured that the vaccine construct was highly expressed in E. coli K12. The improved DNA sequence was translated into our vaccine protein appropriately and GC-content increased to 58.10% for E. coli K12.
Later on, the adapted codon sequence was cloned into the E. coli pET SUMO vector and E. coli strain DH5 alpha was selected as host. A 5970 bp recombinant vector was obtained after TA cloning of vaccine DNA sequence (Fig 8).

Molecular docking of the multi-epitope vaccine with immune receptor TLR4
The docking of vaccine-receptor was performed using Patchdock server for evaluating the complex formation of the our vaccine construct with an immune receptor such as TLR4 and their binding affinity. The PatchDock server provided 20 docking complexes. Among the complexes, we selected only the docking complex with the highest negative Atomic Contact Energy (ACE) value for analysis. The ACE value of the selected docking complex was -247.59 which

Discussion
HTLV-I is the prime causative agent leading to a disabling inflammatory disease HAM/TSP and an aggressive malignancy named Adult T cell Leukemia. To date no effective vaccine and treatment is available for HTLV-1 infection. Our study is focused on development of an effective vaccine against HTLV-1 Tax protein which plays a major role in viral pathogenesis. While the CTL response unique to Tax is a common occurrence in many HTLV-1-carriers, Tax protein is a major antigenic target for HTLV-1-specific CTLs [57]. Tax protein is the first HTLV-1 protein to be expressed in an infected cell [1]. It was reported as a potential immunotherapeutic target against HAM/TSP and ATL [109]. HBZ is another HTLV-1 protein which also plays an important role in viral infectivity surged with survival and growth of leukemic cells [110].

PLOS ONE
Despite the fact HBZ performs an essential role in the proliferation of HTLV-1-infected cells, it could additionally provide a unique mechanism that lets them evade immune recognition [111]. Antigenicity prediction of Tax and HBZ using VaxiJen server at threshold 0.4 regarded Tax protein as probable antigen with score of 0.461 while HBZ as non antigen with score of 0.3063. These contrasting facts show that Tax is a potential candidate for vaccine design albeit HBZ is a potential drug target. A physico-chemical analysis of the protein sequence was done by the Expasy server's Prot-Param tool. It revealed an instability index of 48.9, which denotes, this protein will be unstable in vitro because a value over 40 is considered unstable [70]. Interestingly this protein was also predicted to have a high aliphatic index; it is the total volume occupied by aliphatic side chains

PLOS ONE
Epitope-based universal vaccine for HTLV-1 and higher value is considered a positive factor for increased thermostability. Along with high extinction coefficient and negative GRAVY, the extents of other parameters imply the stability of the protein.
Results generated by secondary structure analysis tool PSIPRED and SOPMA showed the HTLV-1 Tax protein is adorned with 17.85% alpha helix and 54.96% random coils along with 5.26% extended strands and 21.53% beta turns. The abundance of coiled regions is an indication of higher conservation and stability of the model.
Tertiary structure of a protein determines its function and stability. To date no tertiary structure for HTLV-1 Tax is available in Protein Data Bank. We have generated a 3D structure of Tax protein using Phyre2 server and PS 2 server with 75% reliability. Often 3D structures generated by bioinformatics tools contain significant local distortions, including unphysical phi/psi angles, and steric clashes irregular H-hydrogen bonding networks, which make the structure models unfit for high-resolution analysis of functions. Refinement of the modeled structures can bring up a solution of this problem [78,112]. Best model predicted by 3Drefine was selected after refinement based on having the highest 3D score and least RMSD. After refinement the protein model was validated by ERRAT, Verify3D, PROCHECK. ERRAT interpreted the overall quality of the model with the quality factor 78.313; this score represents the proportion of the protein that falls below the rejection limit of 95%. Verify 3D shows 63.49% of the residues have 3D-1D score above 0.2 whereas a good quality model required 80% of the residues to have 3D-1D> = 0.2. Ramachandran plot, a plot of the dihedral anglesphi (φ) and psi (ψ)-of the amino acids contained in a peptide, generated by PROCHECK illustrates that 75.7% of the amino acid fall within the most favored region and 17.4% in additional allowed region. Analysis based on 118 structures of resolution of minimum 2.0 Angstroms and R-factor less or equal to 20%, a good quality model should have over 90% of its residues in the most favoured regions.
Most B cell epitopes are discontinuous epitopes, epitopes composed of amino acid residues residing in different regions of the protein, which are arranged together by the folding of the protein chain [113,114]. These residue groups cannot be isolated in the same conformation from the antigen [114]. As per concern linear B cell epitopes for HTLV-1 tax are curated for IEDB. Bepipred linear epitope prediction method was used to predict 13 B cell epitopes. Among the predicted epitopes 7 of them exhibited antigenicity. From our data, we see that only 2 epitopes were present inside while other 5 epitopes were found outside. These 5 epitopes, KEADDNDHEPQISPGGLEPPSEKHFR, DGTPMISGPCPKDGQPS, PFRNGY-MEPTLGQ, EHQITWDPIDGR, PITHTTPNIPPS, LSFPDPGLRPQ were assessed for toxicity and allergenicity which led as to select 2 non-allergenic and non-toxic B cell epitopes as potential vaccine candidate.
Once the proteins enter the host antigen-presenting cells (APC) of the, they are processed and then the T cell epitopes are proteolytically cleaved from the protein, and represented by the MHC molecules on the surface of APCs, exposing them to the receptors of T cells [114]. MHC class I molecules represent endogenous antigens such as intracellular bacterial, viral and tumor inducing proteins while MHC class II represent epitopes from the exogenous proteins. HTLV-1 Tax is preferentially produced endogenously in the cytoplasm of a host cell and localized in the cytoplasm with fewer speckle-like dots in the nucleus [115]. Induction of virus-specific CD8+ cytotoxic T lymphocytes (CTL) by MHC class I presented peptide is required for effective viral clearance [116]. MHC classI binding T cell epitopes were by NetCTL server and IEDB server. 94 MHC class I T cell epitopes were selected based on combined score from which 48 epitopes were extracted with IC50 less than 200nm. The lower the IC50 value the higher their binding affinity with the HLA molecules. About 16 epitopes interacted with more than 5 HLA alleles. Epitopes those which interacted with �5 MHC HLA-alleles are most likely to be potential vaccine candidates [117,118]. Antigenicity assessment with VaxiJen server at threshold 0.4 mined 7 epitopes to be antigenic and among them 5 were immunogenic. Epitope conservancy was found 100% for all 5 epitopes as predicted by IEDB server. Toxicity and allergenicity evaluation of the epitopes rendered all selected epitopes to be non-toxic but only 3 of them are non allergenic. Efficiency of a multiepitope vaccine greatly relies on precise interaction between epitopes and HLA alleles. MHC class I alleles having interaction with LLFGYP-VYV, ITWPLLPHV and GLLPFHSTL were searched for population coverage. From our study the highest population coverage was recorded for Mexico (90.21%) followed by England (89.88%) and South Africa which is located in Sub Saharan Africa. South America and North America showed population coverage of 74.95% and 76.82% respectively while in Japan it was slightly lower (70.92%). East Asia, North East Asia and South Asia had population coverage just above 70%. Cameroon, a country of central Africa exhibited 71.39% population coverage while UNited Arab of Emirates was found with the lowest percentage (1.2%) of coverage. The major HTLV-1 highly endemic regions are the sub-Saharan Africa, South America Southwestern part of Japan, the Caribbean area, Australo-Melanesia and foci in the Middle East [119]. Age, ethnicity, mode of transmission are important factors influencing variation in population coverage [119].
Previous study of vaccine designing on HTLV-1 partially aligns with our study in that they have selected epitopes LLFGYPVYV, QLGAFLTNV and GLLPFHSTL as potential candidates for multivalent vaccine [l] but differ in that QLGAFLTNV was excluded from our vaccine as it was found allergenic. Instead, a non-allergenic T cell epitope ITWPLLPHV was assigned to be a part of our designed vaccine.
Docking simulation was carried out on AutoDock to assess the binding efficiency of the selected T cell epitopes to HLA molecules. The epitopes were docked with HLA-A � 0201 complexed with a peptide with the carboxyl-terminal group substituted by a methyl group. The peptide had binding energy of -5.0Kcal/mol with the HLA molecules whereas our selected T cell epitopes ITWPLLPHV, LLFGYPVYV and GLLPFHSTL showed higher binding energy, -8.8 Kcal/mol, -8.6 Kcal/mol, -8.5 Kcal/mol respectively suggesting satisfactory binding accuracy of the predicted epitopes. The ITWPLLPHV is stabilized by twelve hydrogen bonds, eleven hydrophobic bonds and one electrostatic bond while interacting with the receptor protein. It also forms hydrogen bonds with catalytic residue A:Tyr27 and hydrophobic interaction with A:Pro235 and B:Tyr463. ITWPLLPHV interacts through ten hydrogen bonds with the catalytic residue A:Tyr27 and B:Tyr463 and fifteen hydrophobic interactions with the catalytic residue A:Pro235 and B:Tyr463. GLLPFHSTL forms thirteen hydrogen bonding interactions with catalytic residue A:Tyr27, one electrostatic and four hydrophobic interactions observed with the catalytic residue A:Pro235 and B:Tyr463. Two potential B cell epitopes and three T cell epitopes were selected for multi-epitope vaccine construction. A suitable adjuvant was selected using the VaxinPad tool based on antigenicity. B cell epitopes were linked using GPGPG linker while MHC class I epitopes were joined by AAY linker. EAAK linkers were added with N -termini and C -termini of vaccine construct. These linkers are widely used in multi-epitope vaccine construction [120][121][122]. Vaccine construct with highest antigenicity among multiple combinations of these epitopes and linkers was selected for further investigation. The vaccine construct was tested for allergenicity by AllerTop 2.0 and AllergenFP which declared the vaccine construct as non-allergen. An important insight was gained from the physicochemical properties and secondary structure analysis of the protein performed by Prot-Param and SOPMA tools. It was found stable with instability index 30.57, negative GRAVY score (-0.367) and high aliphatic index are indications of the stability of the vaccine construct. Results generated by secondary structure prediction tool SOPMA showed the vaccine structure is dominated by 25.69% alpha helix and 65.14% random coils. The abundance of coiled regions indicates higher conservation and stability of the model [70,71,123]. As no suitable homologous template for homology modelling was not available for the vaccine construct peptide sequence ab initio modelling was executed using the I-Tasser tool to predict its 3D structure. Disulfide engineering was done at several positions on the vaccine tertiary structure. Disulfide bonds increase the stability of the protein [124].
Codon optimization was carried out in order to achieve high-level expression of our recombinant vaccine protein in E. coli (strain K12). Both the GC content (58.10%) and the codon adaptability index (1.0) were favourable for increased expression of the protein in bacteria. After codon adaptation the protein nucleic acid sequence was cloned in pET SUMO by TA cloning and expressed in E. coli strain DH5 alpha. DH5 alpha is a E. coli strain which allows easy selection of transformed colonies by blue-white screening. Again pET SUMO contains His Tag sequence at the 5' end of the vaccine sequence which allows easy purification of the desired protein.
Moreover, it is necessary to know the immune response of TLR4 against the vaccine protein. TLR4 is known to activate innate immunity against HTLV-1 as reported in several studies [125,126]. It has been widely demonstrated that TLR4 plays an important role in the recognition of endogenous molecules which are released by necrotic cells and injured tissues [127]. These molecules activate an intense proinflammatory response through interaction with TLR4 [128]. To assess the binding affinity between the vaccine and TLR4 a molecular docking was performed and analysis of the result showed good binding affinity between them. A high binding affinity to TLR4 supports the acceptability of our predicted multi-epitope vaccine.

Conclusion
Although the approaches in our study to predicting HTLV-1 TAX protein based epitopes and construction of a multi epitope vaccine were all in all in silico, the insights and outcome gained from the study can provide an elementary ground for expediting investigations related to designing and constructing epitope based vaccine against HTLV-1 in a wet lab and take it to in vitro studies from there. Further studies involving extensive laboratory assays and techniques might also render a higher a higher frequency and array of HLA molecules in terms of detecting the epitopes within the Tax protein.
Supporting information S1