AncesTree: An interactive immunoglobulin lineage tree visualizer

High-throughput sequencing of human immunoglobulin genes allows analysis of antibody repertoires and the reconstruction of clonal lineage evolution. The study of antibodies (Abs) affinity maturation is of specific interest to understand the generation of Abs with high affinity or broadly neutralizing activities. Moreover, phylogenic analysis enables the identification of the key somatic mutations required to achieve optimal antigen binding. The Immcantation framework provides a start-to-finish set of analytical methods for high-throughput adaptive immune receptor repertoire sequencing (AIRR-Seq; Rep-Seq) data. Furthermore, Immcantation’s Change-O package has developed IgPhyML, an algorithm designed to build specifically immunoglobulin (Ig) phylogenic trees. Meanwhile Phylip, an algorithm that has been originally developed for applications in ecology and macroevolution, can also be used for the phylogenic reconstruction of antibodies maturation pathway. To complement Ig lineages made by IgPhyML or Dnaml (Phylip), we developed AncesTree, a graphic user interface (GUI) that aims to give researchers the opportunity to interactively explore antibodies clonal evolution. AncesTree displays interactive immunoglobulins phylogenic tree, Ig related mutations and sequence alignments using additional information coming from specialized antibody tools. The GUI is a Java standalone application allowing interaction with Ig tree that can run under Windows, Linux and Mac OS.


30
Development of Next Generation Sequencing (NGS) methodology and its use for high-throughput 31 sequencing of the Adaptive Immune Receptor Repertoire (AIRR-seq) has provided unprecedented 32 molecular insight into the complexity of the humoral adaptive immune response by generating Ig data 33 sets of 100 million to billions of reads. Different computational methods have been developed to 34 exploit and analyze these data (1). Retracing the antigen-driven evolution of Ig repertoires by 35 inferring antibody evolution lineages is a powerful method to understand how vaccines or pathogens 36 shape the humoral immune response (2-5). Indeed, Abs maturation is the result of clonal selection 37 during B cell expansion. A clonal lineage is defined as immunoglobulin sequences originating from 38 the same recombination event occurring between the V, D and J segments (6). Upon B cell receptor 39 (BCR) engagement by a given antigen, somatic hypermutations (SHMs) events will generate a large 40 BCR diversity, leading to antibodies with mutated Ig variable regions, thus forming a specific B-cell 41 lineage that extends from the naive unmutated B-cells, to somatically hypermutated and class 42 switched memory B or plasma-cells (7). Lineage tree building requires a common preprocessing step, 43 where all sequences with identical V, J genes and CDR3 length (with a high CDR3 similarity) are 44 grouped together (8-12). However, there is no consensus as to which phylogenetic method is optimal 45 to infer the ancestral evolutionary relationships among Ig sequences (13,14 Dnaml via the GUI, which is a standalone application that is platform independent and only need 56 JAVA JRE 12 or higher as prerequisite software installed.   The required input for AncesTree usage is the output text file generated by Dnaml. Optionally, a fasta 75 file with data obtained from IMGT® can also be used to have full AncesTree features. After running AncesTree, a sub-folder is automatically created in the 'output' folder, it uses the name 102 of the Dnaml output file. The folder will contain all produced files such as a XML file that can be 103 used for direct loading into the GUI.

105
AncesTree displays the processed tree in the main panel of the GUI (Fig 2A). The number of 106 nucleotide and amino acid mutations written on the edge between each node/sequence (with amino 107 acid mutations shown in parenthesis) is clickable and enables the opening of a new window frame 108 that displays the detailed location of each mutation (Fig 2B). Of note, the color of the box around 109 each mutated codon indicates whether the mutation is replacement (R) in red or silent (S) in green.

110
This information is also available as R/S numbers under each region. The user can view the amino 111 acid mutations, and have access by default to the Kabat numbering of the related amino acid 112 position (without internet access, AncesTree will use the absolute position). To obtain the 113 nucleotide or protein sequence of a node, the user can click on it (Fig 2C). The user has also the 114 possibility to enter the EC50 for the specified Ig. The sequence alignments (DNA or protein) are 115 also accessible in a new frame via the 'Menu' button on the top (Fig 2D). The alignment view is 116 customizable: the sequences can be selected or deselected, as well as the different positions or 117 regions. Different color modes can be chosen.

118
If the user is interested in a BASELINe analysis of its clonal family of interest, and if the optional 119 input fasta file (with the UCA VDJ sequence including gaps) was provided, AncesTree generates 120 automatically the fasta input file needed for this software (http://selection.med.yale.edu/baseline/).

121
Once BASELINe is processed, its output can be loaded into AncesTree to have a nice graphic view 7 122 of antigen-driven selection occurring for this particular clonal family. All generated graph can be 123 exported in PNG or EPS format, the alignment can also be exported in a Tab

148
The Ig sequences were clustered by grouping antibodies sharing the same VH and VL gene usage, 149 HCDR3 length and identity (at least 85% for HCDR3). Among the clusters generated, we chose Igs 150 targeting the antigenic site V of RSV F located near amino acid 447 between the α3 helix and β3/β4 151 hairpin of F-RSV in prefusion (Fig 3A). About 70% of the mAbs targeting this site use the same VH 152 and VL germline pair (VH1-18 and VK2-30) (37-39). We identified an Ig family of interest 153 containing potent neutralizers targeting site V with one outlier, the mAb ADI-14576, being less potent 154 and with a 10-fold decrease in binding affinity (Fig 3B). We used Dnaml to generate VH sequences 155 phylogenic tree and launched AncesTree to analyze and interact with the produced phylogenic tree 156 (Fig 4A). The EC50 (ng/ml) related to the neutralization assay against RSV subtype A are reported 157 in each node (of note, EC50 against subtype B are in the same range for each Ig). Surprisingly, a 158 common mutation 92:G->A (kabat position 31: S ->N) is shared between all the Igs, except for ADI-9 159 14576 that does not share this mutation. The alignment of the Ig protein sequences highlights clearly 160 this shared mutation (Fig 4B). A result suggesting that ADI-14576 underwent less affinity maturation 161 and therefore diverges from all the other family members. Interestingly, the 31:S->N mutation is 162 located in the HCDR1 and asparagine residues are often involved in protein binding sites. It is 163 tempting to speculate that the Serine to Asparagine substitution is in part responsible for the higher 164 potency and binding titer of the antibodies.