Figures
Abstract
Hepatitis B virus (HBV) contributes substantially to liver cancer, related mortality, and liver transplantation worldwide. The small hepatitis B surface antigen (HBsAg), particularly its major hydrophilic region (MHR) and the “a” determinant, is the primary target of serological diagnostics. However, escape mutant amino acid variants (EMAVs) within this region may reduce diagnostic specificity and sensitivity. In this study, publicly available HBsAg sequences were analyzed to determine the prevalence of EMAVs circulating in Ethiopia. We computationally designed three region-specific recombinant antigens (MeRPYS1, MeRPYS2, and MeRPYS3) by incorporating both wild-type and prevalent EMAV sequences. Linear and conformational B-cell epitopes, as well as T helper cell epitopes, were predicted for each antigen. Homology analyses were also performed to assess similarity to host proteins. Secondary and tertiary structures of the antigens were predicted to generate theoretical molecular models. Molecular docking analyses were performed to explore putative interaction patterns between each designed antigen and an anti-HBsAg-specific antibody. The predicted antigen–antibody complexes were further examined using molecular dynamics (MD) simulations to assess their theoretical stability and behavior over time. The resulting simulations provide predictive computational insights into possible antigenic features and interaction tendencies of the designed constructs. These findings are intended to generate testable hypotheses and should be interpreted cautiously, as the study is limited to in silico analyses and requires experimental validation.
Citation: Workneh YA, Sisay DM, Fekadu A, Bika AT, Mogus AT, Sisay Tessema T (2026) In silico design of novel recombinant antigens containing immunologically relevant regions of wild-type and escape mutant variants of HBsAg. PLoS One 21(3): e0344362. https://doi.org/10.1371/journal.pone.0344362
Editor: Anoop Kumar, National Institute of Biologicals (NIB), Ministry of Health & Family Welfare, Government of India, INDIA
Received: July 18, 2025; Accepted: February 19, 2026; Published: March 9, 2026
Copyright: © 2026 Workneh et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: All relevant data are within the manuscript and supporting information.
Funding: The author(s) received no specific funding for this work.
Competing interests: The authors have declared that no competing interests exist.
Introduction
Viral hepatitis remains a significant global health burden, causing deaths primarily due to cirrhosis and liver cancer [1]. In 2015, viral hepatitis accounted for 1.34 million deaths, while in 2022 this figure was estimated at 1.3 million. These mortality estimates are comparable to those attributed to tuberculosis and exceed deaths caused by HIV. A majority (over 82%) of these deaths were attributed to chronic HBV infection [2,3].
In 2022, 254 million people were living with HBV, with 1.2 million new infections reported that year. Despite this substantial burden, only 13% of individuals with chronic HBV were diagnosed, and merely 3% of them received treatment [3]. These statistics highlight major gaps in HBV diagnosis and treatment.
HBV is a compact virus with a partially double-stranded DNA genome and is a member of the Hepadnaviridae family. First identified in 1965, HBV continues to be a major cause of both acute and chronic hepatitis. It spreads primarily through percutaneous, sexual, and parenteral transmission routes, targeting hepatocytes where it replicates and induces liver damage [4].
The genome of HBV is approximately 3.2 kb in length, consisting of relaxed-circular DNA with four overlapping open reading frames (ORFs) coding for essential viral proteins, C, P, S, and X. The S ORF produces the envelope proteins L-HBsAg, M-HBsAg, and S-HBsAg, with S-HBsAg being the most abundantly expressed protein. S-HBsAg forms both the viral envelope and sub-viral particles, which are present in much greater quantities than the infectious virus itself. Due to its abundance, S-HBsAg is a key focus in diagnostic, vaccine and therapeutic studies [5,6].
The S-HBsAg protein comprises 226 amino acids and features critical regions like the MHR and the conserved “a” determinant [7]. These critical regions play a vital role in immune recognition of the virus and are targets to the development of diagnostic assays. However, mutations within these regions can reduce the accuracy of diagnostic tools by allowing the virus to evade detection. Mutations such as I110L, A128V, and Y134F are common examples of diagnostic EMAVs that have been associated with reduced sensitivity and specificity of immunological assays [8,9].
HBV is divided into nine genotypes, labeled A through I. HBV genotypes are defined by amino acid variation within the MHR and show distinct geographic distributions [10,11]. In Ethiopia, genotypes A and D are the most prevalent, alongside occasional reports of genotypes E, C, and F [12–17].
The prevalence of HBV in Ethiopia is estimated to range from 5% to 8%. This prevalence classifies Ethiopia as a high-burden setting and represents a major public health concern [18].
In Ethiopia, commonly used HBV diagnostic tools [19] are imported from abroad. Despite their widespread use, the diagnostic tools often show variable performance. For example, a WHO-prequalified diagnostic kit tested in Ethiopia showed a sensitivity of around 80% which was lower than the sensitivity values reported under controlled validation settings [20,21].
To address diagnostic challenges arising from the genetic diversity and mutation profile of HBV in Ethiopia, this study applied a region-specific computational strategy for HBsAg-based antigen design. Publicly available HBsAg amino acid sequences were analyzed to identify predominant circulating genotypes and prevalent EMAVs within the MHR and the ‘a’ determinant. Based on these analyses, recombinant antigen constructs incorporating immunologically relevant wild-type and EMAV-derived regions were designed in silico. The resulting constructs were further characterized using computational tools to generate predictive insights into their sequence features, structural properties, and theoretical interaction patterns, including putative compatibility with bacterial expression systems. The novelty of this work lies in the systematic prioritization of Ethiopia-specific HBV sequence variation to inform hypothesis-driven antigen design. These findings are intended to guide subsequent experimental validation rather than to demonstrate confirmed antigenicity, diagnostic performance, or suitability for antibody production.
Methods and materials
Identification of HBsAg EMAVs
The amino acid sequences of HBsAg from HBV genotypes circulating in Ethiopia were systematically analyzed using pooled sequence data. All publicly available HBsAg amino acid sequences deposited in the National Center for Biotechnology Information (NCBI) database up to August 2025 were retrieved. A total of 682 HBsAg sequences, each derived from a unique individual, were collected using their corresponding accession numbers. These sequences were distributed across five accession ranges: KP310929–KP311299 (371 sequences), KT367571–KT367731 (161 sequences), MF169791–MF169875 (85 sequences), OP432487–OP432532 (46 sequences), and OL630698–OL630716 (19 sequences).
Accession numbers KT367574 and OP432510 were excluded prior to analysis. Accession KT367574 represented the only genotype E sequence in the dataset and was excluded due to insufficient representation for genotype-level comparative analysis. Accession OP432510 was excluded because the corresponding HBsAg protein sequence was not deposited in the database and was annotated as nonfunctional due to mutation. In addition, ten isolates (KT367572, KT367612, KT367614, KT367620, KT367631, KT367643, KT367684, KT367687, OP432515, and OL630711) containing at least one unidentified amino acid (X) within the “a” determinant region were excluded to avoid ambiguity in mutation analysis. Following these exclusions, 670 HBsAg sequences were retained for further analysis.
The retained sequences were categorized by genotype (A or D) and independently aligned against genotype-specific reference sequences using BioEdit Sequence Alignment Editor (version 7.2.5). HBV isolates AY934770 and AB188243 served as reference sequences for genotypes A and D, respectively. EMAVs were identified by aligning the MHR of each sequence against its corresponding reference sequence and recording amino acid substitutions within the defined region.
Statistical analysis
Genotype distribution and mutation frequencies were calculated as proportions based on the total number of HBsAg sequences with complete MHR data. Ninety-five percent confidence intervals (95% CIs) for proportions were estimated using the Wilson score method without continuity correction.
To assess whether the prevalence of EMAVs differed significantly between HBV genotypes A and D, a two-proportion z-test was performed. The proportion of sequences harboring at least one EMAV in the MHR was calculated for genotype A as:
It was calculated for genotype D as:
Where xA and xD represent the number of sequences with ≥1 MHR mutation and nA and nD denote the total number of sequences analyzed for each genotype, respectively.
The pooled proportion (Ppooled) was calculated as:
The standard error (SE) of the difference in proportions was computed using:
The z-score (z) was then calculated as:
P-values < 0.05 were considered indicative of statistical significance. Effect sizes were also reported as the absolute difference in proportions ().
All eligible sequences available up to the date of data retrieval were included in the analysis, except those explicitly excluded based on predefined criteria, to minimize selection bias.
Design of recombinant proteins
Three recombinant protein constructs (MeRPYS1, MeRPYS2, and MeRPYS3) were designed using a computational workflow. MeRPYS1 and MeRPYS2 were composed of ten tandem repeats of the HBsAg “a” determinant derived from HBV genotypes A and D. The relative proportions of genotype A– and genotype D–derived sequences differed between the two constructs and were informed by genotype prevalence and mutation frequency observed in the Ethiopian sequence dataset. Selected diagnostic EMAVs were incorporated into specific “a” determinant segments, while remaining segments consisted of wild-type sequences. Individual “a” determinant segments were separated by a flexible glycine–serine (GSGSG) linker. A fusion-tag was included at the N-terminus of each construct, and a hexa-his-tag was appended at the C-terminus.
MeRPYS3 was designed by incorporating two copies of the wild-type MHR from genotype A and two copies from genotype D. The MHR segments were separated by the GSGSG linker. A combination of fusion-tags, including a Strep-tag II, a hexa-his-tag, and a Twin-Strep-tag, was incorporated at the N-terminus of the MeRPYS3 construct.
Protein homology test
Potential similarity between the designed recombinant proteins (MeRPYS1, MeRPYS2, and MeRPYS3) and host proteins was evaluated using BLASTp (NCBI). Each protein sequence was queried independently against human (Homo sapiens; taxonomic identifier 9606) and mouse (Mus musculus, BALB/c strain; taxonomic identifier 10090) protein datasets using the Reference Protein (RefSeq_protein) database. Searches were performed with an expectation value (E-value) threshold of 0.05. Sequence matches exhibiting less than 35% amino acid identity with query coverage exceeding 80% were flagged as potentially non-homologous to host proteins and retained for further computational analysis. This computational analysis was performed to screen for theoretical sequence similarity rather than to infer functional cross-reactivity.
Prediction of B-cell epitopes
Linear and conformational B-cell epitopes were predicted from the recombinant proteins using BCPRED (http://ailab-projects2.ist.psu.edu/bcpred/predict.html) and ELIPRO (http://tools.iedb.org/ellipro/), respectively. For linear epitope prediction, the amino acid sequence of each protein was submitted independently to the BCPRED server. Predictions were performed using a fixed epitope length of 20 amino acids with non-overlapping constraints. Predicted epitopes with antigenicity scores meeting the predefined high-threshold criterion (score range: 0.9–1.0) were considered as antigenic [22].
For conformational epitope prediction, the selected tertiary structural model of each protein was submitted to the ElliPro tool. Parameters were set to a minimum protrusion index score of 0.5 and a maximum distance of 6 Å. Predicted conformational epitopes and their corresponding scores were compiled in tabular form, and their spatial distribution was visualized on the modeled tertiary structures. Conformational epitopes with scores ≥ 0.5 were considered as antigenic [23]. These predictions were intended to provide theoretical insights into potential epitope regions rather than to infer immunogenic performance
Epitope conservancy analysis
The sequence conservancy of predicted B-cell epitopes was evaluated using the Epitope Conservancy Analysis Tool provided by the Immune Epitope Database (IEDB; 2024 update). Each predicted epitope sequence was analyzed against a curated dataset of 670 HBsAg amino acid sequences retrieved from the NCBI database. An identity threshold of 80% was applied, and the percentage of HBsAg sequences containing each epitope at or above this threshold was calculated. This computational analysis was performed to assess theoretical epitope conservation across circulating HBV variants rather than to infer population-level immunogenicity.
T helper cell epitope prediction
Putative T helper (CD4⁺) cell epitopes were predicted from the recombinant proteins using the IEDB Analysis Resource (http://tools.iedb.org/main/). Predictions were performed with the NetMHCIIpan 4.1 algorithm using a 15–amino acid sliding window across each protein sequence. A panel of commonly used murine MHC class II alleles (H2-IA^q, H2-IA^s, H2-IA^u, H2-IA^k, H2-IA^d, and H2-IA^b) was selected. Predicted peptides were ranked based on percentile rank and predicted binding scores. Peptides with percentile ranks ≤ 0.5 or predicted binding scores exceeding 0.5 were classified as high-affinity binders. These predictions were intended to provide theoretical insights into potential T helper epitope content rather than to infer in vivo immunogenicity [24].
Evaluation of expression feasibility
The theoretical feasibility of heterologous expression of the constructs in Escherichia coli (E. coli) was evaluated using multiple computational predictors. Protein solubility was assessed using the SOLpro server (https://scratch.proteomics.ics.uci.edu/), with solubility scores greater than 0.5 interpreted as indicative of a higher likelihood of soluble expression. Aggregation propensity was analyzed using the AGGRESCAN web server (http://bioinf.uab.es/aggrescan/?utm_source=chatgpt.com) to identify short aggregation-prone regions (hot spots) and their associated metrics. The presence of signal peptides was evaluated using the SignalP (v6.0) server (https://services.healthtech.dtu.dk/services/SignalP-6.0/) to assess the likelihood of signal peptide–mediated secretion. In addition, each construct was submitted to the JCat server (https://www.jcat.de/Result.jsp) for codon optimization tailored to E. coli expression. These in silico analyses were performed to generate predictive insights into expression-related features rather than to confirm experimental expression outcomes.
Prediction of secondary and tertiary structures
The secondary structures of the designed proteins were predicted using PSIPRED (v4.0). Tertiary structures were modeled with the I-TASSER web server (https://zhanggroup.org/I-TASSER/), generating five candidate models per protein. Each model was evaluated based on Confidence Score (C-score), Template Modeling Score (TM-score), and Root Mean Square Deviation (RMSD). The model from each protein with the highest C-score and TM-score and lowest RMSD was selected for further structural refinement. Refinement was performed using the GALAXYRefine server (https://galaxy.seoklab.org/), producing five additional candidate structures per protein. The refined structures were assessed for stereo-chemical quality and overall structural integrity using multiple metrics, including GDT-HA, RMSD, MolProbity score, Clash score, poor rotamer percentage, and Ramachandran favored regions. A refined model for each protein was selected and evaluated for geometric accuracy and global quality using the PROCHECK (via the PDBsum server) (https://www.ebi.ac.uk/thornton-srv/databases/pdbsum/Generate.html)) and the ProSA (https://prosa.services.came.sbg.ac.at/prosa.php) web servers. Each validated model was independently visualized using PyMOL™ (v3.1.0) tool. All computational analyses were performed to generate predictive structural insights rather than to confirm experimentally validated conformations.
Molecular docking
A broadly neutralizing anti-HBsAg antibody structure (PDB ID: 6VJT) was used for molecular docking analysis with the designed proteins. The refined and validated tertiary structure of each antigen and the antibody structure were independently submitted to the ClusPro 2.0 server (https://cluspro.org/login.php) for antigen–antibody docking. Docking poses (30 per antigen–antibody pair) were generated, each with a unit-less relative weighted energy score that integrates van der Waals interactions, electrostatic forces, and desolvation energy. Docked complexes were clustered according to structural similarity. Representative poses from the most populated clusters were selected for further analysis. Interacting residues at the antigen–antibody interfaces of the selected docking poses were identified and analyzed using the PDBsum server (https://www.ebi.ac.uk/thornton-srv/databases/pdbsum/Generate.html). The resulting interaction profiles were used to evaluate the theoretical binding orientation and interface characteristics of each antigen–antibody complex.
Molecular dynamics simulations
Atomistic MD simulations were performed for the selected MeRPYS1–antibody, MeRPYS2–antibody, and MeRPYS3–antibody complexes using GROMACS 2024 with the CHARMM36m force field. Each complex was solvated in a cubic simulation box containing TIP4P water molecules, with a minimum solute–box distance of 1.0 nm, and neutralized by adding NaCl to a final concentration of 0.15 M. Energy minimization was carried out using the steepest descent algorithm until the maximum force fell below 1000 kJ·mol ⁻ ¹·nm ⁻ ¹.
System equilibration was performed in two stages: a 100 ps NVT equilibration at 300 K using the velocity-rescaling (V-rescale) thermostat, followed by a 100 ps NPT equilibration at 1 bar using the Parrinello–Rahman barostat. MD simulations were conducted for 110 ns with a 2 fs integration time step under periodic boundary conditions, and trajectory coordinates were saved every 100 ps.
Trajectory analyses were performed using built-in GROMACS utilities and custom Python scripts to characterize global and local structural dynamics. Structural stability was assessed by calculating the RMSD of Cα backbone atoms for each antigen–antibody complex following least-squares fitting to the initial structure. Residue-level flexibility was evaluated using root-mean-square fluctuation (RMSF) analysis of Cα atoms. Structural compactness was quantified by calculating the radius of gyration (Rg), and solvent exposure was evaluated using solvent-accessible surface area (SASA). Free-energy landscapes (FELs) were constructed based on principal component analysis of the first two eigenvectors derived from RMSD and Rg distributions, with relative free energies reported in arbitrary units.
Antigen–antibody interface stability was further examined through hydrogen-bond analysis by monitoring donor–acceptor pairs throughout the trajectories. Hydrogen-bond persistence, average donor–acceptor distances, and average and maximum continuous lifetimes were calculated. All MD simulations were conducted using fixed initial structures and predefined parameters, with stochastic elements generated using default random seeds to ensure repeatability of the simulation workflow.
Results and discussion
We report a computational analysis of recombinant HBsAg antigens incorporating both wild-type sequences and EMAVs. Analysis of publicly available HBsAg sequence data from Ethiopia identified regionally prevalent EMAVs within the MHR, including the “a” determinant. These observations informed the design of three recombinant constructs intended to represent the observed sequence diversity of circulating HBV strains. The constructs were evaluated in silico to examine predicted structural and immunological features. This workflow (Fig 1) was applied to generate testable hypotheses regarding protein design and characterization rather than to demonstrate diagnostic performance.
Key diagnostic EMAVs are defined as amino acid substitutions within the MHR of HBsAg, particularly the “a” determinant. Such variants have been associated with false-negative serological findings in previous studies. In this analysis, EMAVs classified as key include A128V, P127T, Y134F, Y134N, S143L, and S143T, based on published reports suggesting potential effects on HBV detectability rather than direct functional confirmation [8,9].
Molecular characteristics of HBV in Ethiopia
Analysis of 670 individual patient samples with complete MHR data from Ethiopia suggested that genotype A was the predominant one in the dataset. Genotype A accounted for 488 sequences (72.84%, 95% CI: 69.5–76.2%), while genotype D represented 182 sequences (27.16%, 95% CI: 23.8–30.5%).
Among genotype A sequences, 93 contained at least one mutation within the MHR, yielding a mutation prevalence ( of 19.06% (95% CI: 15.6% – 22.5%). In contrast, 127 genotype D sequences harbored one or more MHR mutations (
= 69.78% (95% CI: 63.1% – 76.5%).
The pooled proportion of sequences with MHR mutations across both genotypes was calculated as:
Using this pooled estimate, the standard error (SE) of the difference in proportions was:
= 0.0408.
The z-score was calculated as:
This z-value corresponds to a two-tailed p-value < 0.0001, indicating a statistically significant difference in the prevalence of EMAVs between genotypes A and D.
The effect size, expressed as the absolute difference in proportions, was:
This study demonstrates a pronounced and statistically robust difference in the prevalence of EMAVs between HBV genotypes A and D circulating in Ethiopia. Genotype D showed a substantially higher proportion of sequences carrying MHR mutations compared with genotype A, with a highly significant z-score (z = 12.44, p < 0.05).
Importantly, the observed difference was not only statistically significant but also biologically meaningful, as reflected by the large effect size (Δp = 50.7%). Reporting the effect size alongside the p-value provides essential context, indicating that the disparity in mutation prevalence is substantial and unlikely to be attributable to sampling variability alone.
Several common diagnostic escape mutations were identified, including A128V, Y134F, Y134N, S143L, and S143T. The P127T mutation was particularly frequent in genotype D, occurring in 92 cases (50.55%, 95% CI: 43.3% – 57.8%). Notably, some of these substitutions were located within the “a” determinant of both genotypes A and D (Fig 2).
Our findings aligned with other previous studies. A study on pregnant women in the Amhara National Regional State of Ethiopia found that genotype A was predominant, with samples showing surface gene mutations in the MHR [25]. Another study identified a novel HBV sub genotype D10 circulating in Ethiopia, highlighting the virus genetic diversity in the country [16].
The emergence of HBsAg sequence variants could be attributed to the lack of proofreading during the reverse transcription stage of viral replication [11]. It potentially leads to mutations that can result in key diagnostic EMAVs. The phenomenon has been observed globally. For example, a study on Tunisian patients identified naturally occurring EMAVs of HBV, emphasizing the worldwide relevance of the EMAVs [9].
Recent reports from different geographic regions further support the rationale of the present study. A study from the United States identified multiple diagnostic EMAVs within the HBsAg, particularly in “a” determinant, highlighting their effects on the performance of existing diagnostic assays [26]. A study from Nigeria documented numerous EMAVs across the MHR of HBsAg, indicating their potential impact on antigenicity, diagnostic detection, and vaccine-induced immunity [27]. Liu et al. demonstrated that EMAVs can lead to false-negative results in HBV detection [28].
Collectively, these observations motivate further exploration of antigen design strategies that consider both wild-type and EMAVs of HBsAg. The computational designs presented here provide a conceptual framework for examining region-specific recombinant antigen rather than evidence of diagnostic performance. These in silico analyses are intended to generate testable hypotheses regarding sequence prioritization and construct architecture, which may inform subsequent experimental studies without presuming practical diagnostic utility.
Computationally designed recombinant proteins
Fig 3 shows the positions of the MHR and the “a” determinant within HBsAg, along with three computationally designed recombinant proteins: MeRPYS1, MeRPYS2, and MeRPYS3. The MHR is situated near the midpoint of HBsAg and the “a” determinant is located centrally within the MHR (Fig 3a). This ensures that MeRPYS3, which is derived from the MHR, inherently includes the “a” determinant (Fig 3d).
(a) MHR and “a” determinant within HBsAg (b) MeRPYS1 (c) MeRPYS2 (d) MeRPYS3. L represents linker (GSGSG).
As illustrated in Fig 3, three recombinant proteins (MeRPYS1–3) were computationally designed to explore how genotype prevalence and EMAV distribution in Ethiopia could inform region-specific antigen architectures.
MeRPYS1 (42.68 kDa; 415 amino acids) was designed as a multi-meric construct composed of ten HBsAg “a” determinant segments. The sequence begins with an N-terminal B-1-tag and ends with a C-terminal hexa-His-tag. Both wild-type and EMAV sequences from genotypes A and D were incorporated in the design. Each unit is separated by a flexible GSGSG linker to minimize steric constraints. Among the ten incorporated segments, six EMAVs derived from genotype A were distributed across four segments, while eight EMAVs from genotype D were incorporated within two segments. The remaining four segments consisted of wild-type sequences, evenly drawn from both genotypes. This composition reflects an attempt to balance the higher circulation frequency of genotype A with the relatively higher EMAV burden observed in genotype D.
MeRPYS2 (42.66 kDa; 415 amino acids) followed a more conservative design strategy. The construct contains ten “a” determinant segments linked by the GSGSG spacers, preceded by a B-1-tag and terminated with a hexa-His-tag. Most of the segments were wild-type, with six derived from genotype A and two from genotype D. Only a single EMAV, P127T from genotype D, was included and represented twice in the construct. The selective inclusion of this variant was informed by its relatively high prevalence within genotype D. The predominance of genotype A wild-type segments reflects its higher overall representation among circulating Ethiopian HBV sequences.
The contrasting designs of MeRPYS1 and MeRPYS2 were informed by statistical analysis. Genotype A accounts for more than twice the proportion of circulating HBV sequences compared with genotype D, whereas genotype D harbors a slightly higher proportion of EMAV-containing sequences. Rather than favoring a single genotype or mutation profile, the two constructs were designed to explore different weighting strategies between genotype prevalence and mutation density. This generates comparative hypotheses for subsequent experimental evaluation.
MeRPYS3 (35.67 kDa; 338 amino acids) was designed as a structurally distinct construct focusing on the MHR rather than repeated “a” determinants. It includes four wild-type MHR segments, two from genotype A and two from genotype D, separated by the GSGSG linkers. The construct contains a combination of affinity-tags (Strep-tag II, hexa-His-tag and Twin-Strep-tag) at the N-terminus. Unlike MeRPYS1 and MeRPYS2, MeRPYS3 does not incorporate EMAVs and serves as a comparative design to explore genotype-balanced, mutation-free MHR representations.
Collectively, these designs reflect a structured, data-driven exploration of how region-specific genotype distribution and EMAV prevalence may be incorporated into recombinant antigen design using in silico approaches. The resulting constructs are intended to support hypothesis generation regarding antigen composition and representation, rather than to assert diagnostic performance, and warrant further experimental investigation.
The inclusion of a B-1-tag at the N-terminus of MeRPYS1 and MeRPYS2 designs was intended to support exploratory expression of the recombinant proteins in E. coli. This tag may also enable protein detection by Western blotting using labeled anti-protein G secondary antibodies without the requirement for a primary antibody [29]. In MeRPYS3 design, multiple fusion-tags were incorporated to facilitate comparative assessment of expression and purification behavior in a bacterial system [30]. The introduction of GSGSG linker sequences segments may allow conformational freedom and reduce potential steric constraints during folding [31]. In addition, hexa-His-tags were included in all three constructs to facilitate affinity-based purification strategies, providing a theoretical framework for subsequent experimental evaluation [32].
In silico characterization
Protein homology tests.
We conducted homology tests on each computationally designed recombinant protein. The results predicted that, none of the three proteins showed significant similarity to either human or mouse proteins. The observed lack of sequence homology suggests the possibility of increased sequence distinctiveness, which may be relevant for reducing theoretical cross-reactivity. However, this observation is based solely on computational comparison and is intended to generate hypotheses regarding specificity, which require experimental evaluation to determine their practical significance.
The significance of homology testing in developing diagnostic assays is underscored in a previous study, highlighting the challenges of highly homologous genes in molecular diagnostics. The study also emphasizes that recombinant antigens with high homology to host proteins could lead to false-positive results during diagnosis [33].
Predicted B-cell epitopes.
The BCPRED tool predicted ten linear B-cell epitopes from each of MeRPYS1 and MeRPYS2 and eight from MeRPYS3. Each predicted epitope (20 amino acids in length) has an antigenicity score of one.
MeRPYS1 contains five unique predicted linear B-cell epitopes, with each unique epitope duplicated twice at different positions within the sequence. The predicted epitopes align with the design of MeRPYS1, which incorporates five distinct “a” determinant regions, each duplicated twice at the corresponding positions.
MeRPYS2 contains three unique predicted epitopes. The one with six duplicates at different positions of the sequence are located to wild-type “a” determinant regions of genotype A. Another one with two duplicates at different positions are located at the wild-type “a” determinant regions of genotypes D. Additionally, the third one (with two duplicates) is located within a genotype D “a” determinant region containing the P127T EMAV. These predictions are consistent with the design of MeRPYS2, which incorporates three different “a” determinant regions.
MeRPYS3 has four unique linear B-cell epitopes, two epitopes located at the MHRs of genotype A and the other two are located at the MHRs of genotype D sequences. These predictions align with the design of MeRPYS3, which includes MHR sequences from both genotypes, A and D.
Overall, the predicted distribution and characteristics of linear B-cell epitopes (Table 1) are consistent with the intended designs of the respective recombinant proteins.
Each predicted linear B-cell epitope in MeRPYS1 and MeRPYS2 corresponds to the first 19 or 20 amino acids of the respective “a” determinant. This suggests that each segment of the “a” determinant region potentially serves as a linear B-cell epitope within the antigens. This observation aligns with other studies that identified the “a” determinant as a key immunogenic and diagnostic target for HBV [34].
Each of the five distinct B-cell epitopes in MeRPYS1 may be recognized by at least one distinct antibody. Consequently, the entire antigen may support polyclonal antibody responses during animal inoculation [7]. The same logic may work for MeRPYS2 and MeRPYS3 antigens.
However, it is important to note that the predicted epitopes have not yet been validated experimentally. Antigenicity predictions do not always reflect actual antibody recognition because epitope exposure can be influenced by protein folding, conformational dynamics, and post-translational modifications in the eukaryotic expression system. Therefore, experimental validation is required before concluding that the predicted regions are truly antigenic. This limitation is consistent with the findings in a previous study, reporting that the sequence-based epitope prediction offers only modest discrimination between epitope and non-epitope residues. Accordingly, these predictions should be used primarily to prioritize or filter candidate antigens for subsequent experimental validation [35].
In silico prediction of conformational epitopes using ELIPRO identified three to five discontinuous epitopes in each recombinant protein (Table 2). In MeRPYS1, the highest scoring conformational epitope (aa363–413, score 0.844) coincided with the C-terminal cluster of linear epitopes predicted by BCPRED. Similarly, in MeRPYS2, the largest conformational patch (aa320–413, score 0.764) overlapped multiple linear epitopes in the mid-to-C-terminal region. For MeRPYS3, ELIPRO predicted multiple conformational epitopes (aa269–271, aa274; aa141–151, aa158–170, aa187–194, aa229–245, aa324, aa334) that corresponded closely with BCPRED predicted linear epitopes.
A strong concordance was observed between linear and conformational epitope predictions across all three proteins, with most linear epitopes either fully encompassed by or overlapping ELIPRO-predicted conformational patches (Fig 4). This alignment provides predictive insight into surface-exposed antigenic regions.
The areas colored yellow represent the conformational B-cell epitopes of each antigen.
The incorporation of repeated similar epitopes reflects a speculative computational design approach and does not constitute confirmatory evidence of enhanced epitope presentation or immune recognition [36]. Epitope duplication does not inherently confer immuno-dominance, which is a multifactorial phenomenon influenced by MHC binding affinity, antigen processing efficiency, epitope competition, T-cell repertoire, and host-specific immune regulation [37].
Epitope conservancy analysis result
The IEDB Epitope Conservancy Tool evaluated the conservancy of the predicted B-cell epitopes across 670 HBsAg sequences at 80% identity threshold. The result showed that the majority of epitopes were conserved, with most achieving >95% sequence coverage at or above the threshold. Among the five epitopes predicted from MeRPYS1, four exhibited >95% coverage, while the remaining one displayed 94.4% coverage. All predicted epitopes from MeRPYS2 and MeRPYS3 demonstrated near-universal conservation (>98% coverage). The partial variability observed in MeRPYS1 epitopes attributed potentially to the inclusion of EMAVs in protein design. Overall, these findings indicate that the predicted epitopes are conserved across diverse HBsAg amino acid sequences.
Predicted T- helper cell epitopes
IEDB predicted potential T-helper epitopes from recombinant proteins. MHC class II binding predictions using common murine alleles resulted in a dominant set of epitopes with stable interaction patterns. Notably, the peptide SPSLNAAKSELAEAK (core: LNAAKSELA) was a potential binder across multiple alleles, exhibiting percentile ranks as low as 0.01. Overlapping epitopes, including KSPSLNAAKSELAEA and PSLNAAKSELAEAKK, were predicted to be potential binders to several alleles.
The conservation of these epitopes among the three recombinant proteins suggests that they may function as CD4 ⁺ T-cell targets, potentially facilitating B-cell activation and promoting the production of cross-reactive antibodies. However, the predicted T-cell epitopes have not yet been experimentally validated. These predictions are probabilistic rather than definitive, and therefore require experimental confirmation, as emphasized in previous study by Fleri et al [38].
Expression feasibility
The SOLpro solubility prediction indicated that all the proteins are likely to be expressed in E. coli, with probable solubility scores of 0.813 (MeRPYS1), 0.839 (MeRPYS2), and 0.980 (MeRPYS3). Scores above the threshold (0.5) suggested a high likelihood of soluble protein expression in E. coli.
The Aggrescan aggregation prediction revealed that there is low aggregation propensity for all proteins. MeRPYS1 contains six hot spots (HS) with a normalized aggregation score (Na4vSS) of –20.1. MeRPYS2 has four HS and a Na4vSS of –20.2. MeRPYS3 also contains four HS and displayed the highest area above threshold (AAT = 34.128), but the Na4vSS (–4.1) remained negative. Overall, the outputs from these analyses suggest that the proteins are predicted to be largely non-aggregation-prone, with only MeRPYS3 showing a slightly higher aggregation tendency.
The SignalP (v6.0) server predicted no signal peptides from all the antigens, suggesting that none of the proteins is predicted to be directed through the secretory pathway.
Each protein sequence was further subjected to codon optimization using the JCat server for expression in E. coli. The codon optimized sequences demonstrated a CAI value of one for all the proteins, reflecting potential compatibility with the E. coli codon usage. The GC contents (54.6% for MeRPYS1, 54.8% for MeRPYS2 and 57.9% for MeRPYS3) were within the optimal range for efficient gene transcription and translation in E. coli [39].
Predicted secondary structures
The PSIPRED tool was used to predict the secondary structure of each designed protein, revealing proportions of alpha helices, extended strands, and random coils (Fig 5a–5c). These predictions highlight computational trends in the distribution of secondary structural elements and suggest regions of potential rigidity and flexibility within the modeled sequences. While these observations provide insights into structural patterns that may influence protein behavior, they are predictive in nature and do not confirm functional conformation in a biological environment [40].
(a) MeRPYS1 (b) MeRPYS2 (c) MeRPYS3.The areas shaded Pink, Yellow and Gray, represent helix, strand and coil, respectively.
The proportions of helix, strand, and coil in the secondary structure of MeRPYS1 were 15%, 7%, and 78%, respectively. In MeRPYS2, these values were 15%, 5%, and 80%; while in MeRPYS3, they were 15%, 15%, and 70%, respectively.
Our findings align with a study published on designing multi-epitope-based vaccine targeting M-protein of SARS-CoV2, where PSIPRED predicted the secondary structure of the construct. The study reported a proportion of helices (39%), strands (16%) and coils (44%) [41]. Similarly, PSIPRED was employed to predict secondary structure in the design of a multi-epitope vaccine against SARS-CoV-2, emphasizing the significance of structural analysis [42].
Predicted tertiary structures
`The I-TASSER web server generated five PDB-format tertiary structures for each computationally designed protein. For MeRPYS1, the model with the highest C-score (−1.26) and TM-score (0.56 ± 0.15), along with the lowest RMSD (9.8 ± 4.6 Å), was selected for further analysis. The C-score and TM-score of the selected model suggest a globally plausible topology, while high RMSD value indicates potential deviations in local regions. Overall, these metrics indicate that I-TASSER identified structurally related but distant templates. This provides computational support for the overall fold while highlighting regions that may have been modeled with lower confidence. These observations are intended to guide hypotheses for subsequent structural and experimental analyses, rather than confirming functional or structural accuracy.
I-TASSER threading analysis identified PDB entry ID: 8wxbY as the closest structural analog for MeRPYS1, with a TM-score of 0.832 and RMSD of 2.93 Å, covering approximately 90% of the sequence. Although the overall sequence identity was low (~10%), the relatively high TM-score suggests that the global fold was reasonably captured by the modeling approach. The remaining ~10% of the sequence, mainly loop and terminal regions, lacked strong template coverage and was therefore modeled ab initio. These regions likely contributed to the higher global RMSD (9.8 ± 4.6 Å) and should be considered lower-confidence areas within the predicted model. Similarly, for MeRPYS2, the model with the highest C-score (−1.46) was selected, providing a computationally guided representation of the possible global fold, which can be used to generate hypotheses for further structural and experimental validation.
For MeRPYS3, the model with the highest C-score (−2.56) and TM-score (0.42 ± 0.14), along with the lowest RMSD (12.5 ± 4.3 Å), was selected for further analysis. Template analysis indicated structural similarity with several experimentally validated proteins. The top five ranked analogs (PDB: 6mk2A, 4ke4A, 4g0bA, 6lpvA, and 6dd2A) displayed TM-scores greater than 0.87, RMSD values between 1.0–3.0 Å, and template coverage exceeding 95%. These metrics provide computational support for the plausibility of the overall fold and suggest potential alignment with known structural patterns. Core secondary-structure elements showed higher confidence and alignment with the template structures, while certain surface-exposed loops and terminal regions were modeled with lower accuracy, reflecting areas of lower confidence. Overall, these observations highlight predictive structural trends that can guide hypotheses for further computational and experimental validation.
All of the selected models were visualized by PyMOL software (Fig 6a-6c), showing the fusion tag, the linker peptides and protein domains within the tertiary structures.
Tertiary structures: (a) MeRPYS1 (b) MeRPYS2 (c) MeRPYS3; Refined structures: (d) MeRPYS1 (e) MeRPYS2 (f) MeRPYS3; protein domains: light blue, deep blue, green; B-1-tags: magenta; flexible linkers: orange; His-tags: red; and Strep-tag II: yellow.
The GALAXY Refine server generated five refined structures per protein from the selected I-TASSER predicted model. The refined structures vary with GDT-HA, RMSD, MolProbity, Clash score, Poor rotamers and Ramachandran favored region. The model with the highest GDT-HA value, was selected for each protein (Table 3).
The refined models were visualized by PyMOL software (Fig 6d–6f). Evaluation of the refined models using ProSA provided Z-scores (Table 3) and generated images illustrate the overall structural quality of each model (Fig 7a–7c).
(a) MeRPYS1, (b) MeRPYS2, and (c) MeRPYS3.
PROCHECK via the PDBsum web server evaluated the stereo-chemical accuracy of each selected structure and then generated corresponding Ramachandran plots (Fig 8a–8c). The Ramachandran plots allowed us to determine the predicted distribution of residues within the favored, allowed and disallowed regions, providing further validation of the structural quality of each protein.
(a) MeRPYS1, (b) MeRPYS2, and (c) MeRPYS3. Red areas (most favored regions); brown areas (additionally allowed regions); yellow areas (generously allowed regions); white areas (disallowed regions); and blue dots (amino acid residues).
As shown in Table 4, the majority of residues (>95%) were located within the favored and allowed regions of the respective Ramachandran plots, with fewer than 5% in disallowed regions. These results suggest that the modeled tertiary structures of the proteins exhibit acceptable stereo-chemical quality under the applied computational assessment, providing predictive information for further in silico analyses and experimental validations. This observation is consistent with trends reported in previous studies on multi-epitope vaccine design and immune-molecular analyses of divergent human papillomavirus types. This could serve to generate hypotheses regarding the structural plausibility of the designed proteins rather than confirming their functional suitability [40].
Other studies have also highlighted the effective use of I-TASSER for structure prediction combined with Ramachandran plot analysis for structural validation. For example, a study on the LuxI protein utilized I-TASSER for predicting tertiary structures, followed by validation using Ramachandran plots. The analysis revealed that over 90% of the residues were located in favored regions [43].
Molecular Docking
Among the 30 molecular docking results generated by ClusPro for each protein, the model with the highest population coverage and lowest relative docking score was selected for further analysis. These selected models illustrate predicted antigen–antibody interaction patterns and provide predictive computational insights into possible binding orientations. Details of the selected models, including cluster members, cluster centers, and unit less relative docking scores, are presented in Table 5. It is important to note that these docking scores do not represent absolute binding energies, but rather serve as relative metrics for comparing predicted binding orientations across models. The results are intended to generate hypotheses about potential interactions, which require experimental validation to determine biological relevance.
A previous study demonstrated that lower relative free energy values are indicative of more stable interaction patterns compared with binding poses exhibiting higher relative free energy values [44], supporting our findings.
The three selected PDB files (one for each protein) from ClusPro generated models were visualized using PyMOL, highlighting key interactions as shown in Fig 9 below.
(a) MeRPYS1-anti-HBs (b) MeRPYS2-anti-HBs (c) MeRPYS3-anti-HBs. Magenta regions (recombinant proteins); green regions (heavy chains of anti-HBs antibody); and blue regions (light chains of anti-HBs antibody).
Molecular dynamics simulations
MD simulations performed using GROMACS were used to examine the time-dependent behavior of atoms and molecules within the MeRPYS1–3–anti-HBs modeled complexes. These simulations provided computational insights into the theoretical structural dynamics and interaction patterns of the models under simulated conditions. Analysis of the resulting trajectories (Fig 10), including RMSD, RMSF, Rg, SASA, and FEL metrics, suggests potential trends in structural behavior and interactions, which may guide further experimental studies rather than demonstrating functional binding or stability.
As shown in Fig 10, RMSD values for the modeled MeRPYS1–3–anti-HBs complexes remained below 1.75 Å over the 110 ns production trajectories. The values reached a plateau shortly after the equilibration phase. The relatively low RMSD values indicate limited deviation of the backbone-defined global conformations within the simulation framework. As RMSD was calculated using backbone atoms only, these values do not reflect the behavior of flexible regions nor imply structural rigidity. Instead, the observations suggest preservation of the docked conformations under the applied simulation conditions, providing computationally derived trends that may inform hypotheses regarding possible interaction behaviors. These results do not constitute evidence of functional stability or binding and require experimental validation to assess their biological relevance.
Per-residue RMSF analysis revealed an average predicted fluctuation of approximately 0.75 Å, reflecting modest thermal motion consistent with folded globular proteins. Observed RMSF peaks were primarily located in terminal regions and solvent-exposed loops, which are inherently flexible. In contrast, core secondary-structure elements and residues involved in predicted epitope contacts showed lower fluctuations, suggesting relative structural stability in these regions under the simulated conditions. These observations highlight computational trends in the modeled complexes and may inform hypotheses regarding potential packing and interaction patterns at the antigen–antibody interface. This requires experimental validation to assess their biological significance.
The Rg values oscillated between approximately 20 and 22 Å throughout the simulation, without sustained upward drift, reflecting typical “breathing” motions of the complexes while maintaining overall compactness. These Rg patterns indicate that the global fold of the antigens was largely preserved under simulated conditions, providing computational insights into their structural behavior.
Consistently, SASA values fluctuated between 120 and 140 nm², paralleling the Rg dynamics and suggesting minor, reversible surface rearrangements. These fluctuations may correspond to transient motions in flexible loops. Together with Rg observations, these results highlight trends in simulated structural dynamics and can be used to generate hypotheses regarding the potential folding behavior and surface properties of the modeled complexes.
Finally, the FELs, projected onto the first two principal components (RMSD and Rg), displayed a single dominant basin with relatively shallow barriers (<1.0 a.u.) for all modeled complexes. Relative free-energy values ranged from 0 to 0.96 a.u. and were derived from probability density distributions of conformational states, intended for comparative visualization of conformational trends rather than absolute energetic quantification. These observations suggest that the sampled conformations are closely related within the simulation framework and separated by low energetic barriers, highlighting computational trends in macromolecular behavior. The smooth landscape may inform hypotheses regarding possible transitions between energetically similar sub-states, without implying confirmed structural stability or functional patterns in biological systems.
In the MeRPYS1–anti-HBs complex, Glu45:NH and Arg102:NH were observed to form hydrogen bonds with occupancies of approximately 87.9% and 76.4%, respectively, and average lifetimes of ~11–14 ns under the simulation conditions. In MeRPYS2, hydrogen bonds exhibited lower persistence, with occupancies ranging from 40–60% and lifetimes shorter than 10 ns. In MeRPYS3, Glu45:NH and Arg102:NH showed occupancies greater than 87%, with average lifetimes exceeding 16 ns and a maximum continuous lifetime of over 40 ns. Additionally, Tyr33:OH displayed a hydrogen-bond occupancy of approximately 75%. These observations highlight computational trends in hydrogen-bond formation and occupancy within the modeled complexes. They may be used to generate hypotheses regarding potential interactions and structural behavior at the antigen–antibody interface, while experimental validation is required to assess their biological relevance.
The convergence of all evaluated parameters (RMSD, RMSF, Rg, SASA, and FEL) over the 110 ns trajectories suggests that the simulation length was sufficient to observe consistent trends in the modeled complexes. Collectively, these analyses highlight computationally derived patterns in structural behavior and dynamics. Observed features, including relatively low RMSD and RMSF values, stable Rg and SASA profiles, and shallow, single-basin FELs, indicate small-amplitude motions consistent with possible thermal fluctuations under the simulated conditions. These trends are further complemented by hydrogen-bond analysis, which may identify potential interfacial contacts that may guide hypotheses regarding interactions at the antigen–antibody interface.
Recent studies have employed similar analyses to evaluate the stability and rigidity of vaccine candidates. For instance, a study on a novel dual-pathogen multi-epitope mRNA vaccine carried out MD simulations to assess the stability of the vaccine-TLR4 complex. The findings indicated minimal distortion in the residues of the complex showing less deviation, which supports the stability of the interaction between the mRNA vaccine and TLR4 [45]. Another study on the SARS-CoV-2 main protease and inhibitors complexes utilized MD simulations, providing insights into the stiffness and stability of the protein-ligand complexes [46].
A study on the polymerase protein for multi-epitope vaccine prediction against HBV also emphasized the importance of protein stability in vaccine design. It underscores that the stability of the DNA polymerase protein is crucial for inducing immune responses and developing effective vaccines [47]. Another study that employed MD simulations on azidolysozyme analyzed cross-correlated motions to understand the dynamic behavior of the protein. The study highlighted the significance of correlated and anti-correlated motions in protein function and stability [48].
The findings of these studies are consistent with the computational trends observed in our study. Antigen–antibody complexes analyzed in MD simulations, including our designs, exhibited modeled patterns of structural behavior and interaction tendencies. These observations may inform hypotheses regarding potential structural organization and interaction dynamics, but they do not provide evidence of confirmed rigidity, stability, or functional activity, which would require experimental validation.
Conclusion
This study aimed to identify EMAVs circulating in Ethiopia and incorporate them into the design of region-specific recombinant proteins to guide future diagnostic development. Using in silico approaches, prevalent EMAVs were identified, and three recombinant proteins were designed incorporating both wild-type and EMAV sequences. Predicted B-cell and T helper cell epitopes were mapped within the designed proteins, and computational assessments indicated that the proteins could potentially be expressed in E. coli and purified in soluble forms. Structural modeling suggested that the antigens may adopt coherent secondary and tertiary conformations, while molecular docking and subsequent MD simulations highlighted predicted trends in potential interactions between the designed proteins and the anti-HBs antibody. Collectively, these findings generate testable hypotheses for further experimental investigation, including protein expression and purification, antibody production in animal models, and evaluation in diagnostic assays, without implying confirmed functional performance.
Limitation of the study
We analyzed publicly available HBsAg amino acid sequence data from Ethiopian HBV isolates retrieved from the NCBI database; no new sampling or sequencing was performed. Since, the dataset depends on the scope of local surveillance and reporting, it may carry geographic sampling bias and potentially over-represent some regions of the country.
Some limitations exist within the molecular characterization of the computationally designed recombinant proteins. T-cell epitope modeling was restricted to murine MHC class II alleles, and extrapolation to human immune responses will require validation using human HLA molecules.
Another key limitation of this study is that molecular docking and MD simulations were performed using a single anti-HBs antibody structure with only one simulation per antigen–antibody complex and without replica runs. Binding free energy calculations were not performed. Consequently, the analyses provide qualitative, predictive insights into potential antigen–antibody interactions rather than quantitative measures of binding affinity or immune-reactivity.
This study employed an integrated computational workflow to design and evaluate recombinant protein constructs using multiple in silico tools. The findings are entirely prediction-based and should be interpreted as hypothesis-generating. The proposed antigens have not yet undergone experimental validation, including protein expression, purification, structural characterization, or immunological assessment. Therefore, further experimental studies are required to evaluate these constructs. Such studies could include testing in specific diagnostic formats, such as ELISA or lateral flow assays, to assess their practical utility and performance relative to established reagents.
Future directions
In our future work, the designed constructs will be expressed in E. coli and purified by Ni-NTA affinity chromatography. Their antigenicity will be evaluated by ELISA using panels of Ethiopian HBV-positive sera and HBV-negative controls. In addition, the purified antigens will be used to immunize Swiss albino mice for the production of polyclonal antibodies. Immunological diagnostic assays will be developed and optimized and their clinical performance will be compared with marketed HBV tests.
Supporting information
S1 File. Frequencies of EMAVs in the MHR of HBV genotypes A and D circulating in Ethiopia.
https://doi.org/10.1371/journal.pone.0344362.s001
(PDF)
S1 Fig. Molecular Docking Plots of MeRPYS1–3-anti-HBs complexes, showing predicted binding residues of the antigen and antibody.
https://doi.org/10.1371/journal.pone.0344362.s002
(TIF)
References
- 1.
Centers for Disease Co\\j-fs04\J-PLOS-L\Production\PONE\pone.0344362\FROM_CLIENT\Accepted_manuscripts\Supplntrol and Prevention CDC. Viral Hepatitis Surveillance and Case Management - Hepatitis B. CDC. 2021. https://www.cdc.gov/hepatitis/statistics/surveillanceguidance/HepatitisB.htm#:~:text=Approximately
- 2.
WHO. Global hepatitis report, 2017. World Health Organization. 2017. http://apps.who.int/iris/bitstream/handle/10665/255016/9789241565455%20eng.pdf;jsessionid=0B7781CE8BA27156EA78C5978103E24A?sequence=1
- 3.
WHO. Global hepatitis report 2024. 2024. https://www.who.int/publications/i/item/9789240091672
- 4. Magnius L, Mason WS, Taylor J, Kann M, Glebe D, Dény P, et al. ICTV virus taxonomy profile: Hepadnaviridae. J Gen Virol. 2020;101(6):571–2.
- 5. Lamontagne RJ, Bagga S, Bouchard MJ. Hepatitis B virus molecular biology and pathogenesis. Hepatoma Res. 2016;2:163–86. pmid:28042609
- 6. Tsukuda S, Watashi K. Hepatitis B virus biology and life cycle. Antiviral Res. 2020;182:104925. pmid:32866519
- 7. He C, Liu Y, Jiang X, Xu Z, Xiang Z, Lu Z. Frequency of HBsAg variants in occult hepatitis B virus infected patients and detection by ARCHITECT HBsAg quantitative. Front Cell Infect Microbiol. 2024;14:1368473. pmid:38766475
- 8. Khedive A, Norouzi M, Ramezani F, Karimzadeh H, Alavian SM, Malekzadeh R, et al. Hepatitis B virus surface protein mutations clustered mainly in CTL immune epitopes in chronic carriers: results of an Iranian nationwide study. J Viral Hepat. 2013;20(7):494–501. pmid:23730843
- 9. Chaouch H, Taffon S, Villano U, Equestre M, Bruni R, Belhadj M, et al. Naturally occurring surface antigen variants of hepatitis b virus in tunisian patients. Intervirology. 2016;59(1):36–47.
- 10. Velkov S, Ott JJ, Protzer U, Michler T. The Global Hepatitis B virus genotype distribution approximated from available genotyping data. Genes (Basel). 2018;9(10):495. pmid:30326600
- 11. Fernandes da Silva C, Keeshan A, Cooper C. Hepatitis B virus genotypes influence clinical outcomes: A review. Can Liver J. 2023;6(3):347–52. pmid:38020195
- 12. Hundie GB, Raj VS, Michael DG, Pas SD, Osterhaus ADME, Koopmans MP. Molecular epidemiology and genetic diversity of hepatitis B virus in Ethiopia. Journal Title Abbreviation. 2016;1043(12):1035–43.
- 13. Ambachew H, Zheng M, Pappoe F, Shen J, Xu Y. Genotyping and sero-virological characterization of hepatitis B virus (HBV) in blood donors, Southern Ethiopia. PLoS One. 2018;13(2):e0193177. pmid:29462187
- 14. Belyhun Y, Maier M, Liebert UG. HIV therapy with unknown HBV status is responsible for higher rate of HBV genome variability in Ethiopia. Antivir Ther. 2017;22(2):97–111. pmid:27354181
- 15. Deressa T, Damtie D, Fonseca K, Gao S, Abate E, Alemu S, et al. The burden of hepatitis B virus (HBV) infection, genotypes and drug resistance mutations in human immunodeficiency virus-positive patients in Northwest Ethiopia. PLoS One. 2017;12(12):e0190149. pmid:29281718
- 16. Hundie GB, Stalin Raj V, Gebre Michael D, Pas SD, Koopmans MP, Osterhaus ADME, et al. A novel hepatitis B virus subgenotype D10 circulating in Ethiopia. J Viral Hepat. 2017;24(2):163–73. pmid:27808472
- 17. Meier-Stephenson V, Deressa T, Genetu M, Damtie D, Braun S, Fonseca K, et al. Prevalence and molecular characterization of occult hepatitis B virus in pregnant women from Gondar, Ethiopia. Can Liver J. 2020;3(4):323–33. pmid:35990510
- 18. Memirie ST, Desalegn H, Naizgi M, Nigus M, Taddesse L, Tadesse Y, et al. Introduction of birth dose of hepatitis B virus vaccine to the immunization program in Ethiopia: an economic evaluation. Cost Eff Resour Alloc. 2020;18:23. pmid:32704237
- 19.
WHO. WHO guidelines on hepatitis B and C testing. Geneva: World Health Organization; 2017. Licence: CC BY-NC-SA 3.0 IGO. 2017.
- 20.
WHO. WHO Prequalification of In Vitro Diagnostics Public Report Product: Determine HBsAg 2. 2019. https://extranet.who.int/prequal/WHOPR/public-report-determine-hbsag-2-pqdx-0451-013-00?utm_source=chatgpt.com
- 21. Orlien SMS, Ahmed TA, Ismael NY, Berhe Belay N, Kran A-MB, Gundersen SG, et al. Field performance of HBsAg rapid diagnostic tests in rural Ethiopia. J Virol Methods. 2021;289:114061. pmid:33388369
- 22. Olotu FA, Soliman MES. Immunoinformatics prediction of potential B-cell and T-cell epitopes as effective vaccine candidates for eliciting immunogenic responses against Epstein-Barr virus. Biomed J. 2021;44(3):317–37. pmid:34154948
- 23. Ponomarenko J, Bui H-H, Li W, Fusseder N, Bourne PE, Sette A, et al. ElliPro: a new structure-based tool for the prediction of antibody epitopes. BMC Bioinformatics. 2008;9:514. pmid:19055730
- 24. An Y, Ali SL, Liu Y, Abduldayeva A, Ni R, Li Y, et al. CP91110P: A computationally designed multi-epitope vaccine candidate for tuberculosis via TLR-2/4 Synergistic Immunomodulation. Biology (Basel). 2025;14(9):1196. pmid:41007341
- 25. Dagnew M, Moges F, Tiruneh M, Million Y, Gelaw A, Adefris M, et al. Molecular diversity of hepatitis B virus among pregnant women in Amhara National Regional State, Ethiopia. PLoS One. 2022;17(11):e0276687. pmid:36378635
- 26. Ali MJ, Shah PA, Rehman KU, Kaur S, Holzmayer V, Cloherty GA, et al. Immune-escape mutations are prevalent among patients with a coexistence of HBsAg and Anti-HBs in a tertiary liver center in the United States. Viruses. 2024;16(5):713. pmid:38793596
- 27. Osasona OG, Oguntoye OO, Arowosaye AO, Abdulkareem LO, Adewumi MO, Happi C, et al. Patterns of hepatitis b virus immune escape and pol/rt mutations across clinical cohorts of patients with genotypes a, e and occult hepatitis b infection in Nigeria: A multi-centre study. Virulence. 2023;14(1):2218076. pmid:37262110
- 28. Liu H, Chen S, Liu X, Lou J. Effect of S-region mutations on HBsAg in HBsAg-negative HBV-infected patients. Virol J. 2024;21(1):92. pmid:38654327
- 29. Song S-J, Diao H-P, Moon B, Yun A, Hwang I. The B1 domain of streptococcal protein G serves as a multi-functional tag for recombinant protein production in plants. Front Plant Sci. 2022;13:878677. pmid:35548280
- 30. Köppl C, Lingg N, Fischer A, Kröß C, Loibl J, Buchinger W. Fusion Tag Design Influences Soluble Recombinant Protein Production in Escherichia coli. Int J Mol Sci. 2022;23(14):1–12.
- 31. de Souza MQ, Galdino AS, dos Santos JC, Soares MV, de Nóbrega YC, Alvares A da CM, et al. A recombinant multiepitope protein for hepatitis B diagnosis. Biomed Res Int. 2013;2013:148317. pmid:24294596
- 32. Jayakrishnan A, Rosalina W, Rosli W, Rashidi A, Tahir M, Syafiq F. Evolving paradigms of recombinant protein production in pharmaceutical industry: A rigorous review. Sci. 2024;6(9):1–24.
- 33. Mandelker D, Schmidt RJ, Ankala A, McDonald Gibson K, Bowser M, Sharma H, et al. Navigating highly homologous genes in a molecular diagnostic setting: a resource for clinical next-generation sequencing. Genet Med. 2016;18(12):1282–9. pmid:27228465
- 34. Chaouch H, Taffon S, Villano U, Equestre M, Bruni R, Belhadj M, et al. Naturally occurring surface antigen variants of hepatitis B virus in tunisian patients. Intervirology. 2016;59(1):36–47. pmid:27544241
- 35. Jespersen MC, Peters B, Nielsen M, Marcatili P. BepiPred-2.0: improving sequence-based B-cell epitope prediction using conformational epitopes. Nucleic Acids Res. 2017;45(W1):W24–9. pmid:28472356
- 36. Baneshi M, Sadeghi M, Hashemi S, Alcedo C, Muro A, Manzano-Román R, et al. Design of a novel multi epitope antigen for diagnosis of visceral leishmaniasis using an immunoinformatics approach. Sci Rep. 2025;15(1):39308. pmid:41214019
- 37. Yang G, Wang J, Sun P, Qin J, Yang X, Chen D, et al. SARS-CoV-2 epitope-specific T cells: Immunity response feature, TCR repertoire characteristics and cross-reactivity. Front Immunol. 2023;14:1146196. pmid:36969254
- 38. Fleri W, Paul S, Dhanda SK, Mahajan S, Xu X, Peters B, et al. The Immune Epitope Database and Analysis Resource in Epitope Discovery and Synthetic Vaccine Design. Front Immunol. 2017;8:278. pmid:28352270
- 39. Demissie EA, Park S-Y, Moon JH, Lee D-Y. Comparative Analysis of Codon Optimization Tools: Advancing toward a Multi-Criteria Framework for Synthetic Gene Design. J Microbiol Biotechnol. 2025;35:e2411066. pmid:40223268
- 40. Ehsasatvatan M, Baghban Kohnehrouz B. Designing and immunomolecular analysis of a new broad-spectrum multiepitope vaccine against divergent human papillomavirus types. PLoS One. 2024;19(12):e0311351. pmid:39621646
- 41. Ayyagari VS, T C V, K AP, Srirama K. Design of a multi-epitope-based vaccine targeting M-protein of SARS-CoV2: an immunoinformatics approach. J Biomol Struct Dyn. 2022;40(7):2963–77. pmid:33252008
- 42. Sarvmeili J, Baghban Kohnehrouz B, Gholizadeh A, Shanehbandi D, Ofoghi H. Immunoinformatics design of a structural proteins driven multi-epitope candidate vaccine against different SARS-CoV-2 variants based on fynomer. Sci Rep. 2024;14(1):10297. pmid:38704475
- 43. Al-Khayyat MZS, Al-Dabbagh AGA. In silico prediction and docking of tertiary structure of luxi, an inducer synthase of vibrio fischeri. Rep Biochem Mol Biol. 2016;4(2):66–75. pmid:27536699
- 44. Du X, Li Y, Xia Y-L, Ai S-M, Liang J, Sang P, et al. Insights into Protein-Ligand Interactions: Mechanisms, Models, and Methods. Int J Mol Sci. 2016;17(2):144. pmid:26821017
- 45. Zhu Y, Shi J, Wang Q, Zhu Y, Li M, Tian T, et al. Novel dual-pathogen multi-epitope mRNA vaccine development for Brucella melitensis and Mycobacterium tuberculosis in silico approach. PLoS One. 2024;19(10):e0309560. pmid:39466745
- 46. Adeoye AO, Oso BJ, Olaoye IF, Tijjani H, Adebayo AI. Repurposing of chloroquine and some clinically approved antiviral drugs as effective therapeutics to prevent cellular entry and replication of coronavirus. J Biomol Struct Dyn. 2021;39(10):3469–79. pmid:32375574
- 47. Ahmed RA, Almofti YA, Abd-elrahman KA. Structural analysis of the polymerase protein for multiepitopes vaccine prediction against hepatitis B virus. Biosci Biotechnol Res Asia. 2021;18(1):125–46.
- 48. Salehi SM, Meuwly M. Cross-correlated motions in azidolysozyme. Molecules. 2022;27:1–10.