Fig 1.
Schematic Diagram of IgHuAb and SynAbLib.
The curated data contains paired human antibody sequences from the Paired OAS and the PLAbDab databases. For domain adaptation fine-tuning, the ProGen2-OAS model was used. After fine-tuning, the IgHuAb model was used to generate the paired human antibody sequences. The synthetic paired antibody library SynAbLib consists of IgHuAb-generated sequences that passed quality control: human germline gene assignment, CDR assignment, mutational load and humanness score estimation.
Fig 2.
OASis Identity Score and Number of Mutations of the IgHuAb generated HC (in red) and LC (in green) sequences for top P
=0.85, 0.9, 0.95 and T=0.5 - 1.5. (A) OASis Identity Score: top P=0.85 and T=0.5 - 1.5, (B) Number of Mutations: top P=0.85 and T=0.5 - 1.5, (C) OASis Identity Score: top P=0.9 and T=0.5 - 1.5, (D) Number of Mutations: top P=0.9 and T=0.5 - 1.5, (E) OASis Identity Score: top P=0.95 and T=0.5 - 1.5, (F) Number of Mutations: top P=0.95 and T=0.5 – 1.5.
Fig 3.
Number of Mutations (up to CDR3) of the IgHuAb generated HC and LC for top P
=0.9 and T=1, 1.1 and 1.2 vs Number of Mutations of OAS HC and LC (in blue). (A) Number of Mutations of HC: top P=0.9 and T=1, (B) Number of Mutations of LC: top P=0.9 and T=1, (C) Number of Mutations of HC: top P=0.9 and T=1.1, (D) Number of Mutations of LC: top P=0.9 and T=1.1,(E) Number of Mutations of HC: top P=0.9 and T=1.2. OAS(mean 10.24, std 6.36), IgHuAb(mean 9.86, std 5.35), (F) Number of Mutations of LC: top P=0.9 and T=1.2 OAS(mean 5.72, std 4.69), IgHuAb(mean 4.42, std 3.86).
Fig 4.
Distribution of OASis BioPhi Identity Scores.
OAS(in blue) (mean 87.42, std 8.30), [HC] prompt (in red)(mean 89.00, std 5.92), [LC] prompt (in green)(mean 88.86, std 6.53).
Fig 5.
Comparison of CDR3 Length Distributions in IgHuAb and OAS.
(A) Heavy Chain CDR3 Length Distribution: Paired OAS in blue(mean 14.99, std 3.76), generated with the [HC] prompt in red(mean 15.74, std 3.82), generated with the [LC] prompt in green(mean 15.76, std 3.87(B)Light Chain CDR3 Length Distribution: Paired OAS in blue(mean 9.63, std 1.11), generated with the [HC] prompt in red(mean 9.68, std 1.12), generated with the [LC] prompt in green(mean 9.66, std 1.11).
Fig 6.
Distribution of Levenshtein Distances to OAS.
(A) Minimum Levenshtein Distance to OAS, [HC] prompt: distance of generated heavy chains to heavy chains in OAS in red (mean 16.48, std 5.94), distance of generated light chains to light chains in OAS in green (mean 5.09, std 4.04), (B) Minimum Levenshtein Distance to OAS, [LC] prompt: distance of generated heavy chains to heavy chains in OAS in red (mean 16.12, std 6.36), distance of generated light chains to light chains in OAS in green (mean 5.43, std 4.11).
Fig 7.
Number of Mutations of the IgHuAb generated HC and LC (in red) vs Number of Mutations of OAS HC and LC (in blue) vs Number of Mutations of the p-IgGen generated HC and LC (in green).
(A) Number of Mutations (up to CDR3) for IgHuAb(mean 9.86, std 5.35), p-IgGen(mean 3.53, std 5.93) and OAS(mean 10.24, std 6.38), HC, (B) Number of Mutations (up to CDR3) for IgHuAb(mean 4.42, std 3.86), p-IgGen(mean 2.28, std 4.25) and OAS(mean 5.72, std 4.69), LC, (C) Mean Number of Mutations per antibody region for IgHuAb, p-IgGen and OAS, HC, (D) Mean Number of Mutations per antibody region for IgHuAb, p-IgGen and OAS, LC, (E) Minimum Levenshtein Distance to OAS, HCDR3: IgHuAb(mean 4.77,std 2.13), p-IgGen(mean 4.58, std 2.04), (F) Minimum Levenshtein Distance to OAS, LCDR3: IgHuAb(mean 0.50, std 0.75), p-IgGen(mean 0.33, std 0.70).
Fig 8.
Percent germline sequences (up to CDR3) for the HCs of IgHuAb(in red) and p-IgGen(in green); percent germline sequences (up to CDR3) for the LCs of IgHuAb(in red) and p-IgGen(in green); OASis Identity score comparison of OAS(in blue).
IgHuAb(in red) and p-IgGen(in green), (A) Percent germline VH sequences for IgHuAb(3.45) and p-IgGen(68.30), (B) Percent germline VL sequences for IgHuAb(13.44) and p-IgGen(67.6), (C) OASis BioPhi Identity Scores for OAS(mean 87.42, std 8.30), IgHuAb(mean 89.00, std 5.92) and p-IgGen(mean 92.3, std 5.77).
Fig 9.
FLAb Benchmarking- OAS(in blue
). IgHuAb(in red) and p-IgGen(in green); IgLM Perplexity, AntiBerty Perplexity and Rosetta Energy, (A) IgLM Perplexity: OAS(mean 1.92, std 0.42), IgHuAb(mean 1.80, std 0.27) and p-IgGen(mean 1.59, std 0.35), (B) AntiBERTy Perplexity: OAS(mean 1.62, std 0.31), IgHuAb(mean 1.54, std 0.21) and p-IgGen(mean 1.33, std 0.28). (C) Rosetta Energy: OAS(mean -449.42, std 150.80), IgHuAb(mean -464.38, std 167.10) and p-IgGen(mean -455.00, std 154.78).
Fig 10.
Heavy V/Light V Gene Pairing: OAS vs IgHuAb [HC] prompt generated, IgHuAb [LC] prompt generated, and p-IgGen.
(A) Heavy V/Light V Gene Pairing: OAS vs IgHuAb [HC] prompt generated (Pearson correlation 0.913, p<0.0001), (B) Heavy V/Light V Gene Pairing: OAS vs IgHuAb [LC] prompt generated (Pearson correlation 0.920, p<0.0001), (C) Heavy V/Light V Gene Pairing: OAS vs p-IgGen generated (Pearson correlation 0.904, p<0.0001).
Fig 11.
Antibody Expression Yield for Selected Antibodies from SynAbLib.
(A) Yield from generated antibodies with diverse VH/VL combinations, (B) Yield from generated antibodies with the same VH/VL combination and high CDRH3/L3 identity, (C) Yield from generated antibodies with the same heavy chain (y-axis) vs sequence identity percent to the native LC sequence (x-axis). The original native antibody heavy-light chain pairing is shown in red.