iBRAB: In silico based-designed broad-spectrum Fab against H1N1 influenza A virus

Influenza virus A is a significant agent involved in the outbreak of worldwide epidemics, causing millions of fatalities around the world by respiratory diseases and seasonal illness. Many projects had been conducting to investigate recovered infected patients for therapeutic vaccines that have broad-spectrum activity. With the aid of the computational approach in biology, the designation for a vaccine model is more accessible. We developed an in silico protocol called iBRAB to design a broad-reactive Fab on a wide range of influenza A virus. The Fab model was constructed based on sequences and structures of available broad-spectrum Abs or Fabs against a wide range of H1N1 influenza A virus. As a result, the proposed Fab model followed iBRAB has good binding affinity over 27 selected HA of different strains of H1 influenza A virus, including wild-type and mutated ones. The examination also took by computational tools to fasten the procedure. This protocol could be applied for a fast-designed therapeutic vaccine against different types of threats.


Introduction
Influenza A virus is known as an agent causing seasonal illness of respiratory diseases. This agent is also responsible for the outbreak of worldwide epidemics, causing millions of fatalities worldwide. The first well-documented pandemic appeared to start in Russia and reached St. Petersburg in October 1889 [1]. However, the H1N1 Spanish flu in 1918 is the nightmare pandemic, starting with 550,000 deaths and increasing to at least 50 million, possibly up to 100 million, of deaths worldwide [2]. From then, many pandemics occurred caused by different types of influenza A virus, such as H7N9, H3N2 or H5N1, were recorded [3][4][5][6]. Despite different types of threats, the H1N1 from the 1918 pandemic is still the potential threat nowadays by its highly frequent shift of genetic components [7].
To fight back the influenza virus, doctors and scientists around the world are working on vaccines and drugs specifically designed for this type of disease. Most of the prophylactic vaccine developed in the past and currently used focus on a specific strain of influenza A virus [8].
In parallel with research to improve the prophylactic vaccine's effectiveness against influenza A viruses, many research groups worldwide work on antibodies collected from recovered infected people for therapeutic vaccines. The therapeutic vaccine encourages the infected body to fight against the disease harder [9]. Antibodies are widely used as therapeutic vaccines due to their a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 Collaboratory for Structural Bioinformatics-Protein Data Bank (RCSB-PDB) database. Between 2010 and 2019, 63 published Abs and Fabs can bind and neutralize influenza hemagglutinin (HA). Ten of them have binding at the RBS and neutralizing activities on wide-range strains of the same H1 subtype (Table 1). Both sequence and 3D structure of the heavy or light chain of each Ab/Fab were extracted into separated files for later analysis. Similarly, in the Table 1. List of broad-spectrum antibodies (Abs)/antigen-binding fragments (Fab) and Hemagglutinin (HA) proteins used in the experiment.

Multiple sequence alignment (MSA)
The alignment of multiple heavy/light-chain sequences was done by the MEGA-X program to propose a sequence of broad-spectrum Fab model [81]. Heavy chains and light chains were aligned separately using MUltiple Sequence Comparison by Log-Expectation (MUSCLE) algorithm. The parameters were set with Gap open -2.9, Gap extends 0.0, and Hydrophobicity multiplier 1.2.

Complementarity-determining region (CDR) determination
CDRs of chains were examined on IMGT/3Dstructure-DB and IMGT/2Dstructure-DB database [82][83][84]. For several newly published Abs/Fabs, the data have not been available on the IMGT database yet, Paratome tool was used to identify the CDRs [52]. The positions of CDRs in heavy and light chains were recorded and marked on the multiple sequence alignment results.

Superimposition
All heavy chains and light chains of Abs/Fabs were run with UCSF Chimera 1.13.1 for the superimposition approach [85]. Smith-Waterman alignment algorithms, BLOSUM-62 matrix with Gap extension penalty 1 were used for alignment. The secondary structure was included and accounted for 30% of the total score. The matching process was iterated by pruning long atom pairs until no pair exceeds 2.0Å. Then, all chains were chosen for alignment with residue-residue distance cut-off 5.0Å; the aligned residues were put in a column if they are within the cut-off value of at least one other structure. Circular permutation and iterate superimposition/alignment were turned on with 3 times iterated alignment and 3 consecutive columns in stretches of superimposing full columns.

3D-structure modeling
Based on the superimposition from UCSF Chimera, sequences of heavy and light chains for the Fab model were proposed for the Fab model against the H1 subtype. 3D structures of proposed models were constructed by MODELLER version 9.21 [86]. All ten Abs/Fabs from previous experiments were used as templates for the modeling. Some parameters were kept as a previous superimposed experiment, while the gap extension penalty was increased to 3 to reduce the gap introduction. A hundred new 3D models for each query were built and evaluated by Discrete optimized protein energy (DOPE) score [87]. The output model, which generated a minimum molecular probability density function (molpdf) value and the DOPE score, was continued with I loop refinement at CDR3 regions, and regions with DOPE peak higher than -0.02. The refinement was done for a hundred outputs.

Flexible docking
Molecular docking is an effective method for virtual screening. High Ambiguity Driven protein-protein DOCKing (HADDOCK) 2.4 server was used as the flexible docking simulation between the Fab model and HA proteins to estimate the binding affinity [88,89]. All amino acids at CDR regions of the Fab model and RBS region of HA were chosen as active residues without penalty. All other parameters were set at default. RMSD-based and contact-based clustering was done with cut-off values 7.5Å and 0.75, respectively. The algorithm of Daura et al.

Protein-protein interaction (PPI) calculation
The PPI analysis was performed by BIOVIA Discovery Studio 2020 (Dassault Systèmes, San Diego, California, USA). A web-based server Protein Interaction Calculator of the Molecular Biophysics Unit, Indian Institute of Science (Bangalore), was used together to maximize the analysis [90]. The complex systems were examined all types of hydrogen bonds, hydrophobic interaction, electrostatic interaction, and steric bumps intermolecularly between the Fab model and HA proteins.

Statistical analysis
All the graphs, together with mean, standard deviation, and other parameters shown in figures, tables were drawn and calculated by package 'tidyverse' in R tool (with interface RStudio software). P-value was calculated using the one-way ANOVA method-Tukey pairwise test, and statistical significance was set at p-value < 0.05.

Results and discussion
The constant regions of heavy chains are the same through ten Fab templates, while the constant regions of light chains are different between λ and κ groups To examine the Fab templates in amino acid sequence and 3D structure, we did MSA by the MUSCLE algorithm of the MEGA-X program and superimposition using the UCSF Chimera 1.13.1 tool. The results show that amino acid sequences at the framework and constant regions have higher conservative property, while variable regions vary between Fab templates (Fig 1).
In the constant regions of Fab templates, both MSA and superimposition give consistent results with each other. The sequences and 3D structures are preserved through ten heavy chains and two groups of light chains. There is only one difference at 130-loop forming between beta-sheets, although the amino acid sequences of heavy chains are the same (Fig 1B). This incident is due to the biophysical activity in the physiological environment that loop structure can be flexible cross-linking with surrounding chemical molecules [91,92]. Besides, despite the difference in amino acid sequences of λ and κ light chains, the scaffold of 3D structures is similar with beta-sheets and alpha-helices, and the formed loops in κ are longer than in λ light chains ( Fig 1E). Many works have shown that as loop length increases, the loop's stability is decreased by thermal and chemical denature [93,94]. With shorter loops, the structure of a protein is more stable in the physiological state; thus, the λ light chain is preferable in our proposed broad-spectrum Fab model. Besides, this λ light chain can help avoid antigen-specific selection by the pre-B cell [95]. MSA results show a very divergence between Fab templates in both heavy and light chains in the variable regions. However, the 3D structure of framework segments at these variable regions are superimposed well. These segments are essential to form the scaffold for Ab and generate three CDR loops. As expected, 3D structures of CDR loops between frameworks are profoundly different between Fab templates. The length of loops, presence of helix-turn in CDR1 and CDR3, or the extension of beta-sheet near CDR3 could make the difference. The HCDR1 loop forms small alpha-helix in 2D1, 5J8, 1F1, CH67, and H2897 Fabs, while others have big-angle HCDR1 loops. Moreover, HCDR2 loops, which are relatively short and made by two beta-sheets near the HCDR1, give loops a semi-flexible

PLOS ONE
iBRAB: In silico based-designed broad-spectrum Fab structure. These criteria explained the minor interaction of HCDR1 and HCDR2 in Fab binding on the surface of HA. Notably, the HCDR3 loop, which is the primary interaction with HA and reported as a key to insert into the RBS to prevent the virus from accessing sialic acid on the host cell membrane, has transitioned with different influenza A strains differently. While CH67, H1244 and H2227 Fab templates have alpha-helix turn in HCDR3, 5J8, 6639, and 641 I-9 Fabs have longer beta-sheet, and others have a bigger loop with a long sequence. The three CDR loops in light chains have preserved 3D structure within their types. The λ light chains have alpha-helix at CDR2 and long CDR3 loop, while κ light chains have shorter CDR3 loop and no alpha-helix at CDR2, which can be explained by the MSA results between them.

Proposed sequence of broad-spectrum Fab model against H1 influenza A virus includes different importance interacted amino acids of templates
Framework and constant regions have a minor role in binding between antibody and antigen [53], thus, the previous MSA and superimposition results confirm the amino acid sequence of these regions for the Fab model. Essential amino acids of each Ab/Fab templates, which have crucial interaction with HAs, were considered when the model's sequence was proposed. All the critical interactions were collected and distributed to the proposed sequence to increase the possible interactions with HA (Table 2). Many works had proven that mutations at RBS could abolish the binding between Fab and HA by deleting the interaction. One example, the mutation of Lys166 to Asp166 in HA can cause the deletion of a salt bridge between Lys166 and Asp93 L in 2D1 Fab, thus, the addition of other amino acids next to that position which can form interaction with Asp166 could increase the binding activity with mutated strains [17]. Moreover, the long HCDR3 loop with a narrow-angle is preferable to form a distinct pitch to penetrate the RBS region, reaching the deepest position [32]. In previous studies, there was no report on detailed interactions formed between HCDR1 and HCDR2, even they were mentioned as binding factors with HAs. Most of the HA interactions were reported and gathered in the HCDR3 region. These interactions are formed with polar and hydrophobic amino acids at HCDR3, which intruded into RBS. In detail, Arg and Asp are dominant in the collection, whereas the other polar and hydrophobic amino acids vary on different Fab templates. Many essential amino acids were combined in the model to increase the numbers of formed interactions. Aspartate amino acids were noted on the model's heavy chain sequence at position 100, 101, 108, and 111. This amino acid interacts with different amino acids of HA. In 5J8, . These Cys are also put in the model at 103 and 109-positions. Although the LCDR loops are not the primary interaction with HA, they are essential in making complement binding with other regions around the binding site. It is reported that aspartic acid is the crucial interactor, which is mentioned and identified in protein complexes of HA-32D6, HA-2D1, and HA-5J8 [12,17,97]. This amino acid was added to position 52 and 94. The interaction between 32D6 and HA also emphasized the interaction between Tyr31 L , Ala32 L , Thr94 L , Trp95 L , and Ala97 L with Gly172, Asn173, Gln206, and Gln210. These amino acids were recruited to the model to maximize the interaction [97]. Tyr32 L and Tyr33 L of H2897 make similar linkage as Tyr31 L of 32D6; thus, one Tyr was added for position 32 [31]. Gly29 L in 5J8 was also noted to pair with Asp54 L in the Fab model at position 30 to strengthen the interaction with Lys145 [12].
3D structure modeling of the Fab model gives an average -0.03 DOPE score, while CDR3 regions generated high energy The sequences were then modeled and evaluated for the 3D structure using the MODELLER tool based on known multiple templates and loop refinement processes. As previous results showed the agreement between the 3D structure in different regions, the CDR loops are fundamental in this process, making direct interaction with HA proteins. A hundred structural models were generated for the Fab model; the top ten structural models' energy profiles were generated and plotted to analyze the stability of generated 3D structures (Fig 2). The residual DOPE scores of the top ten models of the proposed Fab were around -0.03, most of them gave lower DOPE score as low as -0.045. The structural model 28, which had a more stable energy profile than other models, gave the lowest global DOPE score further evaluated. However, residues at H/LCDR3 loops, the important interacted regions with RBS of HA protein, possess quite a high energy profile that would suggest nasty modeled loops for a later experiment [98]. The comparative modeling by MODELLER does not reflect the dynamic of protein structure; thus, these DOPE scores only present the comparable property with other models [86]. Those high DOPE regions were then going under ab-initio loop refinement for another hundred outputs in each refined step. Each peak was done separately until the global DOPE score stable. Six steps of loop refinement were done for six peaks having a score equal to or higher than -0.02. As a result, from step 4, the DOPE scores are not different within later steps. Thus, the structure at loop refinement step 03 was chosen for the subsequent analysis.

Broad-spectrum Fab models give good binding affinity on a wide range of H1N1 strains
Due to the 3D structure of the model constructed by a computational program, we used flexible docking between CDRs of Fab and RBS of HA for maximum results. As the protein in the biological environment can be flexible, especially the CDR loops, which have a high energy profile, could be fluctuated depending on the interaction, rigid docking is not suitable for this

PLOS ONE
iBRAB: In silico based-designed broad-spectrum Fab binding examination. Flexible docking of Fab templates and respective HA give results which is consistent with an original pose from experimental methods (Fig 3A). All the best cluster in each case has the lowest binding energy when comparing with other clusters. Most of them have a binding position matching an original pose from X-ray crystallography, except complex systems 641 I-9 + SI-3/06 and H2227 + SI-3/06. While the original pose of H2227 with SI-3/06

PLOS ONE
iBRAB: In silico based-designed broad-spectrum Fab can be found in cluster 1, 641 I-9 + SI-3/06 system does not show any sophisticated matching with original pose in flexible docking. In experimental results, 641 I-9 and SI-03/06 made contacts only through the heavy chain of Fab [35], while in our docking experiment, the residues in both heavy and light chains were examined. Also, the 3D structure fetched from RCSB had missed some residues (only from residue 55 to 260 of HA head), making the docking inconsistently. In the interaction between H2227 and same influenza strain, when investing through X-ray, although the HCDR3 had contact with RBS of HA, this effect is little comparing with interactions forming between HA between HCDR1 and HCDR2 [32]. Following the number of clusters generated by HADDOCK, 5J8 Fab, and Nag-09N083/06 system has the most stable binding with the lowest number of clusters, and H1244 + Bei-262 Similar to template analysis, the best cluster in each case was chosen for further analysis. The mean and standard deviation of binding energy was calculated and plotted in the same Fig 3. As a result, these complexes' binding energy is agreed with a normal distribution (μ -308.563, δ 64.642) followed Fab templates when mean and standard deviation lay within 3 standard deviations of the mean. This result can be confirmed the broad-spectrum activity and binding effect of the model Fab on all 20 different H1N1 strains of influenza A virus. Among those 20 strains, the binding affinity of the Fab model and BM-1/18, PR-8/34, Mel-1/46, Net-002P1/51, Den/57, SI-3/06, and Jul-FLU3969/06 is better than templated complexes. The research on these strains was conducted through pandemic 1918, and there is a long time for people developing immune system against these strains, the information gathered from these strain-against Fabs provides proper designation. While all strains from pandemic 2009 show a low binding affinity between the Fab model and HA.
The Fab model has the best binding affinity or lowest binding energy with Bei-262/95 strain, while it generates low binding affinity with Was-05/11 and Jia-ALSI/11, which are both from year-2011 strains. Although the Was-05/11 and Tot-YK012/11 have the same amino acid sequence at the head, their 3D structures at RBS have little difference (0.555 RMSD), which causes the difference in docking result between the Fab model and these strains. Meanwhile, Jia-ALSI/11 is an entirely new strain which is not related to other 2011 strains. The MSA of amino acid sequence at the head of these strains showed a distinguished Jiangsu strain in 2011, which explained the low binding affinity of Fab with this strain. There is no report on this strain, the work related to this published 3D structure on RCSB is still not published yet; thus, the Fab model did not have any residues information interacting with this strain. When observing the Minimum evolution phylogenetic tree constructed by MEGA-X, the Jia-ALSI/ 11 has the same origin as the 1918-pandemic strain, especially BM-1/18. It means this strain's genetic material is still wandered around and had mutations to invade the immune system. A complement to this conclusion is the low binding affinity of the Fab model to this Jia-ALSI/11 strain. Meanwhile, the big difference in binding energy between the Fab model and NY-1/18 can be explained by high mutated of this strain compared to BM-1/18.
In our flexible docking experiment, mutated BM-1/18, SI-3/06, and Cal-04/09 had been collected and run together with the wild-type strain (Fig 4). When comparing the HADDOCK score of the best cluster between wild-type and mutated strains, there is no significant difference between groups in two previous strains, while it is significantly different between wild-type and mutated types in Cal-04/09 strains. The HADDOCK score is recorded poorly in wild-type; thus, it is concluded that the interaction made by the Fab model and wild-type RBS is fewer than in mutated RBS. This result is consistent with the analysis of binding energy between them and confirms our conclusion. The Fab model had better affinity on mutated strains than wild type strain, which is essential in drug development against these viruses.

PLOS ONE
iBRAB: In silico based-designed broad-spectrum Fab Meanwhile, the binding affinity between the Fab model and other mutated BM-1/18 and SI-3/ 06 is significantly different from wild type strain. The RBS in wild type made better interaction with the Fab model than the RBS in mutated strains. In BM-1/18, adding Lysine at 130 in 130-loop of RBS made the loop longer; thus, it alternates some interaction, although it makes other types of interaction. This add130K deleted two salt bridges made between the Fab model with Lys219 and Asp222. In the SI-3/06 strain, the mutations are identified with three changes; only two are involved in the RBS region. However, when examining protein interaction, two T194K and R223Q mutations are not involved in the interaction between the Fab model and HA. The docking position could explain the significant difference in binding energy between wild type and mutation. As the scatter plot of binding energy and HADDOCK scores of complexes in the best cluster are shown, the Fab model's pose on RBS of HA might be varied, and between wild type and mutation, there will be complexes with the same binding poses.

Lysine in 130-, 150-, and 220-loops play a vital role in forming electrostatic interaction between the Fab model and HA. Primary hydrophobic interaction formed between Fab model and HA made by Alanine, Leucine, and Valine in 130-loop and 190-helix
By examining all ten Fab templates with their corresponding HA proteins, nine out of ten make interaction with RBS of HA mainly via CDR3 loop of the heavy chain (Fig 5A, 5C and 5D). Except for the 2D1 Fab, this Fab template makes a complex by both heavy and light chains and indirect contacts with the RBS region ( Fig 5B). Meanwhile, with selected docked complexes made between the Fab model and HA protein, both the heavy chain and light chain make direct contact with RBS of HAs, which strengthens the interaction between protein complexes (Fig 5E).
The best binding complex between the Fab model and each HA was investigated in protein-protein interaction using BIOVIA Discovery studio and Protein Interaction Calculator. This analysis will give the preliminary results to explain the difference in binding affinity between strains or mutations. All types of interaction were collected and put on the map to gather the information. The interaction between the Fab model and RBS of HA is mainly formed by hydrophobic interaction, hydrogen bond, electronic interaction, and pi-sulfur bond. There is only one pi-sulfur bond formed between Cys103 H of Fab and Try150 of RBS on HA. Most strains have this interaction, except influenza A virus strains in the year 2011. However, the binding affinity with these strains is not high; thus, this bond might not be a significant interaction. Hydrophobic interaction is mainly formed between the Fab model with 130loop and 190-helix turn of RBS region. Especially, Lys130, Val132, Ala134, and Leu191 are found in most strains and made interaction with the Fab model. Ala141 of 130-loop also generate hydrophobic interaction with the Fab model. This amino acid is found in 1918-, 2009and 2011 strains, while in another influenza A strains, this Ala141 is alternated by Lysine or Glutamic acid, which does not form hydrophobic bonding. Similarly, Ala186 in 2009-and 2011 strains are recorded for forming this type of interaction with the Fab model. Talking about hydrogen bonds, which is considered as the main interaction between protein and protein, both conventional and carbon-hydrogen bonds were observed through all loops and helix forming RBS of HA. The interaction between amino acids of the Fab model is throughout the length of RBS, mainly formed by Threonine, Valine, Lysine, Alanine, Glycine, Serine, Aspartic acid, and Glutamine. The number of hydrogen bonds is not significantly different between wild type and mutated strains; the mutated amino acids could also form the hydrogen bonding between the Fab model and HA. Observing the salt bridge made between the Fab model and RBS of HA, it is seen that most of the interaction is formed at 220-loop. While in 1918 strains,  [99][100][101]. The correct binding position should be investigated deeply with simulation, and the critical interaction could be obtained by steered molecular dynamics to explain the different binding affinity.

Conclusions
This study lays the groundwork for developing a universal therapeutic vaccine, especially for the influenza A virus, by bioinformatic approach. The proposed Fab model of broad-spectrum antibody targeting HA of different H1N1 influenza A strains had preliminarily shown the broad-reactive property against RBS of HA. It shows a positive view in applying bioinformatics in dealing with emergent pandemics. The interaction between the Fab model and HA can be sketchy when protein-protein interaction is analyzed by software. However, more information could be obtained with the aid of the molecular simulation method. The information will refine the Fab model, which will eliminate or replace the weak interaction with other amino acids. Furthermore, the Fab model should be examined the physical properties to make sure that it is suitable for the therapeutic vaccine and can pass phase 1 of a clinical test. In the next paper, these results will be reported, and further information will be provided for a better rational therapeutic vaccine design.