Clinical impact of a targeted next-generation sequencing gene panel for autoinflammation and vasculitis

Background Monogenic autoinflammatory diseases (AID) are a rapidly expanding group of genetically diverse but phenotypically overlapping systemic inflammatory disorders associated with dysregulated innate immunity. They cause significant morbidity, mortality and economic burden. Here, we aimed to develop and evaluate the clinical impact of a NGS targeted gene panel, the “Vasculitis and Inflammation Panel” (VIP) for AID and vasculitis. Methods The Agilent SureDesign tool was used to design 2 versions of VIP; VIP1 targeting 113 genes, and a later version, VIP2, targeting 166 genes. Captured and indexed libraries (QXT Target Enrichment System) prepared for 72 patients were sequenced as a multiplex of 16 samples on an Illumina MiSeq sequencer in 150bp paired-end mode. The cohort comprised 22 positive control DNA samples from patients with previously validated mutations in a variety of the genes; and 50 prospective samples from patients with suspected AID in whom previous Sanger based genetic screening had been non-diagnostic. Results VIP was sensitive and specific at detecting all the different types of known mutations in 22 positive controls, including gene deletion, small INDELS, and somatic mosaicism with allele fraction as low as 3%. Six/50 patients (12%) with unclassified AID had at least one class 5 (clearly pathogenic) variant; and 11/50 (22%) had at least one likely pathogenic variant (class 4). Overall, testing with VIP resulted in a firm or strongly suspected molecular diagnosis in 16/50 patients (32%). Conclusions The high diagnostic yield and accuracy of this comprehensive targeted gene panel validate the use of broad NGS-based testing for patients with suspected AID.


Methods
The Agilent SureDesign tool was used to design 2 versions of VIP; VIP1 targeting 113 genes, and a later version, VIP2, targeting 166 genes. Captured and indexed libraries (QXT Target Enrichment System) prepared for 72 patients were sequenced as a multiplex of 16 samples on an Illumina MiSeq sequencer in 150bp paired-end mode. The cohort comprised 22 positive control DNA samples from patients with previously validated mutations in a variety of the genes; and 50 prospective samples from patients with suspected AID in whom previous Sanger based genetic screening had been non-diagnostic.

Results
VIP was sensitive and specific at detecting all the different types of known mutations in 22 positive controls, including gene deletion, small INDELS, and somatic mosaicism with allele fraction as low as 3%. Six/50 patients (12%) with unclassified AID had at least one class 5 (clearly pathogenic) variant; and 11/50 (22%) had at least one likely pathogenic variant (class 4). Overall, testing with VIP resulted in a firm or strongly suspected molecular diagnosis in 16 Introduction informed consent to participate; and parental consent was obtained for all children involved in the study. A total of 72 patients divided into 2 cohorts were recruited. The first cohort included 22 patients with known molecular diagnoses, and served as positive controls for testing sensitivity and specificity of the gene panel. The second group consisted of 50 patients with undiagnosed inflammatory diseases in whom we had failed to demonstrate a genetic cause using standard conventional routine genetic tests (https://www.ucl.ac.uk/amyloidosis/nac/moleculargenetic-testing). The inclusion criteria were: 1. Clinician suspicion of a genetic cause for the observed inflammatory phenotype and 2. Signed informed consent form to participate. Whole blood DNA samples from the patients were derived from different sources: (i) The National Amyloidosis Centre (NAC) based at The Royal Free Hospital; (ii) Great Ormond Street Hospital NHS Foundation Trust (GOSH), and (iii) the NE Thames Regional Genetics Laboratory.

Targeted VIP gene panel and capture design
The genes for this panel were chosen following consideration of phenotypes referred to our clinical service, which specializes in autoinflammation and vasculitis in children (at GOSH) and autoinflammation and amyloidosis in adults (at the NAC). Important mimics of AID and vasculitides, and three novel genes discovered by our group; WDR1 [13], TRAP1 and DNASE2 (manuscripts in preparation) were also included. To facilitate data analysis, the genes are listed in 11 broad disease subgroups ( Table 1). The Agilent online SureDesign tool (https://earray. chem.agilent.com/suredesign/) was used to initially design an NGS panel targeting 113 genes in the first iteration of the panel known as VIP1 ( Table 1; see S1 Table for detailed gene list). Version 2 of the panel (VIP2) evolved after ongoing discussion and scrutiny of the rapidly evolving literature in this field, resulting in the addition of 53 genes inclusive of a relevant regulatory intronic region of UNC13D, to give a final list of 166 genes ( Table 1; see S2 Table for detailed gene list). The captured sequences included all coding and untranslated exons with at least 10 bp of the flanking intronic sequence to cover canonical splicing donor and acceptor sites. Agilent provides a synthesis service for the capture probes. Information regarding the designed probes for VIP1 and VIP2 are presented in S3 Table.

Bioinformatics analysis
Read alignment, variant calling, and annotation were performed for the first run using three different bioinformatics pipelines: 1. the web-based Galaxy project workflow, as previously used for our whole exome analysis [6]; 2. an in-house pipeline, Genesis, developed at our NE Thames Regional Genetics laboratory; and 3. the Agilent SureCall v3.5.1.46 software. For all 3 pipelines, paired end reads from Illumina MiSeq instrument were mapped to the human genome (GRCh37) using Burrows-Wheeler Aligner (BWA)-MEM [14]. The alignment step in Genesis and SureCall are limited only to the regions of the targeted genes. Supporting information document (S1 File) provides details of the parameters used for both Genesis and SureCall pipelines. The output variant call format (VCF) file from SureCall was annotated through wANNOVAR, the web-based user interfaced ANNOVAR tool from Wang Genomic Labs (http://wannovar.usc.edu/index.php) which provided allele frequencies from public databases, and in silico predictions of pathogenicity [15]. Identified variants were evaluated for coverage and visually inspected using the Integrative Genomics Viewer (Broad Institute).

Pathogenicity assessment of identified variants
The workflow for detecting pathogenic mutation was a multistep process. In the first step, synonymous variants were filtered out. As most pathogenic variants for rare monogenic disorders are relatively uncommon, we excluded common polymorphic variants found in public databases with minor allele frequency of more than 1%. Exceptions to this were 3 relatively common pathogenic variants that are relevant to our cohort of patients: the PRF1 monoallelic p. A91V variant with MAF of 2% in 1000G (but as high as 9% in other populations), since this variant is known to impair cytotoxic function of natural killer (NK) cells [16]; the p.R92Q substitution in TNFRSF1A present at 2-10% depending on ethnic background [17,18], but considered disease-causing in some patients [17]; and the low-penetrant p.V198M in NLRP3 [19].  [20]. The level of evidence was assigned using the 2015 American College of Medical Genetics guidance [21]. Clinically actionable identified class 5 variants resulting in a molecular diagnosis were confirmed by Sanger sequencing where indicated, and referred to our accredited genetic testing laboratory for validation. Primer sequences and reaction conditions used for Sanger sequencing are available on request. Familial segregation analysis for potentially pathogenic mutations was performed when DNA from family members were available, with consent.

Results
Gene coverage and the performance of VIP target enrichment The Genesis pipeline was used to access the read depths for all the captured regions per sample.
The mean depth-of-coverage (DoC) plot over the whole targeted regions for the 5 runs of 16 multiplexed samples showed that >97% of the captured regions had mean read depth greater than 30x, a commonly accepted cut-off for diagnostic purposes (Fig 1) [22,23]. An exon or a region was referred to as being a "low-coverage exon" if any single nucleotide in the exon had a coverage <30x. Using that definition, 2.2% of the targeted regions, corresponding to 15 genes (ADAR, AP3B1, C4B, C5, CFI, COL5A2, CORO1A, IFNGR2, IKBKG, NCF1, NOTCH3, POMP, PTEN, TNFRSF11A, VPS13B), had mean DoC<30x (mean 5, range 0-25) (see S4  Table for details). Of these regions, C4B, CORO1A, IKBKG and NCF1 had reads that could not be confidently mapped to the genome (mapping quality score of 0) because of the pseudo-gene phenomenon [23]. Although intra-sample coverage showed some variation, coverage per region was highly reproducible between the different multiplexed runs (16 patient DNA samples/run). By examining DoC per individual patient samples, other targeted regions in 6 genes (AP3B1, C4B, SH2D1A, STX11, TGFBR1 and TRNT1) with mean read depth >30x (mean 173, range 33-432) were found to have "0" reads in any one sample (indicated with an asterisks in S4 Table). Interestingly, the absence of reads in SH2D1A and STX11 corresponded to known pathogenic deletions in 3 samples (patient 5, 10 and 13, Table 2). These deletions were also detected by Genesis CNV analysis. Additional baits were added to 6 regions in 5 genes (ADAR exon 1, DCLRE1C exon 3, GSN exon 1 and 3, NCF2 exon 1 and TGFBR1 exon 1) to improve coverage (S5 Table).  Validation of VIP capture design using DNA from patients with known pathogenic mutations To evaluate the sensitivity and specificity of the newly designed VIP1 gene panel, the first run consisted solely of 16 anonymised positive control samples with 21 known pathogenic mutations in 11 different genes previously identified using Sanger sequencing (patients 1 to 16, Table 2). An additional 6 positive controls were subsequently studied (patients 17-22,   Table 2), but the initial validation of VIP1 was performed using samples from patients 1 to 16. The scientist that undertook the VIP1 assay (EO) was blinded to any clinical information about patient samples 1 to 16. VIP1 was able to blindly identify 15 of the 21 known pathogenic mutations in the 16 patient samples, including an NLRP3 p.E567K somatic mosaic mutation with allelic fraction of 3%. The 6 mutations that were not detected in this initial blinded analysis were the SH2D1A and STX11 deletions in 3 cases (patients 5, 10 and 13) and 3 pathogenic variants with MAF >1% in the 1000G database in 3 other cases; patients 3 with intronic UNC13D c.118-308C>T (MAF 0.28) for FHL3 [24], patient 8 with the common monoallelic PRF1 p.A91V (MAF 0.02) for FHL2 [16], and patient 11 with CECR1 -12233delC in the 5'UTR region (MAF 0.07) for deficiency of adenosine deaminase 2 (DADA) [25]. A subsequent unblinded review of the list of variants for each of these 6 cases revealed the presence of the originally missed variants, apart from a deep intronic variant in UNC13D (c.118-308C>T) which was outside the +/-10 exon-flanking boundaries of the captured design in VIP1. Manual inspection of the sequence alignment file showed the presence of this UNC13D variant (c.118-308C>T) in 5 of 11 reads mapping to the region. Failure to initially detect this intronic variant was therefore attributed to low coverage. Upon excluding regions beyond the +10 and -10 exon-flanking position, which were not within the captured regions, and thus not reliably detected, we could confirm both the expected mutations and the allele state, resulting in a detection rate of 100%. Since UNC13D c.118-308C>T is a significant pathogenic variant that is associated with familial haemophagocytic lymphohistiocytosis type 3 (FHL3) [24], we subsequently modified the capture design in VIP2 to include intron 1 of UNC13D.
Assessing the calling of the 21 positive variants between the 3 bioinformatic pipelines used in this study demonstrated that the NLRP3 p.E569K mosaic mutation with low allelic fraction of 3% (patient 16, Table 2) was only identified by SureCall. This pipeline was therefore chosen for subsequent analysis, as it demonstrated optimal sensitivity for the detection of somatic mosaicism. From our experience of these initial 16 positive controls, we were able to ascertain the following four practical criteria for subsequent analyses: 1. Coverage data for genes and exons should be examined for the detection of deletions.
2. For recessive disorders where a single heterozygote rare variant is found, it is important to examine the full list of variants without applying the MAF <0.01 cut-off filter, since the combination of a rare pathogenic variant and a more common variant of reduced penetrance may cause disease in some instances.
3. Examining the consistency of the inheritance model of disease and zygosity of the mutation is another important step to identify causative variants. Of the 166 VIP2 genes, approximately 51% are inherited as autosomal recessive, 37% as autosomal dominant and 7% as Xlinked disorders.
4. Although SureCall is a sensitive pipeline, the sequence alignment (BAM) file of relevant genes should be manually inspected if somatic mosaicism is suspected (e.g. CAPS, TRAPS, and Blau syndrome).
Applying these criteria to a subsequent 6 disease controls (patients 17-22, Table 2), the known mutations in these additional positive control samples were all detected.

Performance of VIP gene panel in patients with unknown diagnoses
The sequencing procedure and bioinformatic analyses established from run 1 were tested on 50 subjects with undefined AID. Detailed descriptions of these 50 patients are provided in Tables 3, 4 and S6. The 50 patients (23 males; 27 females) were of median age 9 years (range 7  months to 75 years), and had various clinical diagnoses prior to VIP sequencing that included vasculitis, haemophagocytic lymphohistiocytosis, amyloidosis of unknown cause, or "unclassified autoinflammatory disease". We identified a total of 325 rare variants in 48/50 of these patients (median 6.5, with a range of 1 to 16 rare variants per patient; Tables 3 and S6). Two/ 50 patients (patients 40 and 41) carried no rare variants ( Table 4). Manual inspection of the alignment files of the class 5 and 4 variants showed good quality mapped reads, with Sanger sequencing confirmation performed for 3 class 5 variants; PTEN p.V217D, TNFAIP3 p.R217X and RNF213 p.D4013N. (S2 Fig). Confirmatory analyses by Sanger sequencing of class 4 or 5 variants were not performed for this study in all instances since this has now been shown to be redundant for capture-based methods with good coverage, [26]; however, all patients with potentially clinically actionable results were referred on to regional genetics services for confirmation of any relevant genetic findings as part of routine clinical care. Clearly pathogenic variants (Class 5). Six/50 patients (12%) with unknown diagnoses had at least one class 5 (clearly pathogenic) variant ( Table 3; patients 23-28). These patients fulfilled the pathogenicity criteria from literature evidence and pertinent functional laboratory immunological data supporting disease-genotype concordance as discussed below.

CBS
One child (patient 23), referred with cutaneous vasculitis and recurrent upper respiratory tract infection was found to have the deleterious p.V217D mutation, in Phosphatase and Tensin homolog (PTEN) gene. This mutation has been previously described in a Korean patient with Cowden syndrome [27]. Clinical examination of our patient showed that he had features compatible with Cowden syndrome including autism and macrocephaly. This mutation was confirmed to be de novo in the index case by Sanger sequencing of the index case and parents.
A diagnosis of haploinsufficiency of A20 (HA20) was made in patient 24 who presented with uveitis, mouth ulcers, and vasculitic skin lesions. She was heterozygous for the highly penetrant loss-of-function nonsense mutation in TNFAIP3 (p.R271X), recently reported by Zhou et al [28] as the cause of HA20. Testing of the parents revealed that this heterozygous mutation was inherited from the mother, who had previously been investigated for a milder, uncharacterised inflammatory phenotype.
We found a genetic cause of familial moyamoya disease in patient 25, who was heterozygote for the RNF213 p.D4013N mutation previously reported in familial moyamoya disease [29,32]. This mutation was confirmed by Sanger sequencing and found to segregate with the phenotype in the affected father and sister. A firm molecular diagnosis could not be made in two patients (Patient 26 and 27) who were monoallelic for the UNC113D p.R966W variant, previously reported in association with digenic familial haemophagocytic lymphohistiocytosis [30]. Patient 28, with unclassified AID responsive to colchicine, was found to have the highly penetrant p.E131K mutation in the WAS gene. This is a well characterised mutation in males with the X-linked Wiskott-Aldrich syndrome (WAS) [31], associated with early onset microthrombocytopenia, eczema, and immunodeficiency. The family history of this patient was notable because there were no males in eight generations. Although our female patient was a carrier and had a normal platelet count, this mutation could contribute to her autoinflammation since it is increasingly recognised that autoinflammation and autoimmunity are important features of WAS [33], and symptomatic female carriers have been reported. At the time of writing, studies examining WASP levels and X-inactivation are ongoing.
Low penetrance AID mutations were found in 2 patients: TNFRSF1A p.R92Q in patient 37 with clinical features of TRAPS; and NLRP3 p.V198M in patient 38 with clinical features of CAPS.
A diagnosis of Majeed syndrome was confirmed for patient 29 with a very typical phenotype ( Table 3) and compound heterozygous mutations in LPIN2 (p.P626S and p.S203F). Although only the p.S203F LPIN2 variant was predicted damaging by MutationTaster, the frequency of the p.P626S variant is reported to be significantly higher in patients with AID [10].
We also observed the LPIN2 p.P626S variant in Patient 30, who initially presented with an unclassified AID. This patient's clinical features were not compatible with Majeed syndrome, but were compatible with the autoimmune lymphoproliferative syndrome (ALPS). This patient was also found to have a class 4 variant in CASP10 (p.K99E); this prompted further immunological investigations which revealed abnormalities consistent with a diagnosis of ALPS type 2A ( Table 3) [34].
Patient 31 with a strongly suspected diagnosis of APLAID carried two mutations in PLCG2, a recently described dominant AID. Co-segregation analyses and further investigation is ongoing in available family members. Patient 39 had 3 predicted deleterious variants in TRAP1, a novel gene found by our group to be associated with a new autosomal recessive AID [35].
Two patients with suspected DADA (patients 33 and 35) both carried the 5'UTR CECR1 variant previously described to be associated with DADA [25], in combination with different exonic CECR1 variants.
Patient 34 had two DOCK8 variants (p.V1027I and p.D1347E), and a very convincing clinical phenotype for hyper IgE syndrome.
Anecdotal evidence by De Jesus et al [36] support the importance of 2 novel heterozygous mutations found in the tyrosine-protein kinase (LYN) gene in 2 unrelated patients (patients 32 and 36). Interestingly, the LYN p.Y508F variant identified in patient 34 leads to a loss of the phosphorylation site, as was also found in the case reported by De Jesus et al who had nonsense mutation at the same residue. This tyrosine residue at position 508 has been shown to be an important regulatory site, as mice with the p.Y508F mutation have enhanced enzymatic activity and present with haemolytic anaemia [37], lethal autoimmune glomerulonephritis and positive autoreactive antibodies [38].

Variants of unknown significance (class 3) or carrier status for incidental mutations.
A total of 147 unique variants of unknown significant (VUS) in 78 genes were found in 31 patients who had no class 5 or 4 variants. Details of each patient and the various class 3 variants are presented in S6 Table. Some of these variants were seen in multiple individuals, in particular 5/31 (patients 43,49,54,62,71) are carriers of the mild pyridoxine responsive CBS p.I278T variant associated with homocystinuria [39]. Other recurrently observed class 3 variants in at least 5 or more individuals were NCF1 p.R90H (found in 23 of the 30 patients) which may be caused by pseudogene interference, TGFBR1 non-frameshift deletion p.17-20del (found in 5 of the 31 patients), and TGFBR2 p.E150fs (found in 15 of the 31 patients). During this study, a diagnosis of myelodysplasia emerged for patient 51, and Schnitzler syndrome for patient 71, both probably non-monogenic diseases that accounted for the phenotypes observed, and thus compatible with the absence of any class 4 or 5 variants in these patients.

Validation of VIP2
For the validation of VIP2 with 166 genes in run 4, we chose 7 samples from previous runs analysed by VIP1 (patients 3,5 and 16,27,30,58 and 65) to act as an internal control for VIP2. Overall, there was good concordance for all variants detected between the 2 runs for each of the 7 patient samples, with only discrepancies found in 2 samples (patients 58 and 65; S7 Table). Three extra variants (class 3) were called for both patients 58 and 65 in the VIP2 run due to improved coverage of certain regions in the VIP2 run (S7 Table).

Discussion
Gene-by-gene sequencing is an increasingly outdated, expensive, and often futile diagnostic approach for patients with AID because there is an ever-increasing number of monogenic diseases now known to cause autoinflammation, with increasingly overlapping phenotypes that now also include vasculitis and immunodeficiency [2][3][4]. Furthermore, the phenomenon of somatic mosaicism is particularly clinically relevant for autosomal dominant AID, and currently not confidently detected by conventional Sanger sequencing methodologies [5,6,40,41]. NGS now provides the potential for sufficient breadth and depth of genetic sequencing to overcome the inherent limitations of conventional sequencing in this clinical context [10].
We designed a targeted next-generation sequencing gene panel (VIP) to screen patients referred to a specialist clinical service for autoinflammation and vasculitis. The inclusion criteria for access to this screening test were deliberately liberal since this most reliably reflects the nature of the referrals and clinical need of our specialist service. VIP was sensitive and specific for the detection of known mutations in 22 controls, although unblinded analyses of the first 16 of these controls resulted in a higher yield. This emphasises the importance of communication between clinicians and clinical scientists for maximising clinical impact.
Application of VIP to a cohort of 50 patients with unknown diagnoses resulted in a class 5 mutation detection rate of 12%, and class 4 variant detection rate of 22%. Overall, the clinical impact of VIP was a firm or strongly suspected molecular diagnosis in 16/50 (32%) previously undiagnosed patients ( Table 3). VIP reliably detected different types of mutations, including rare and common SNV's, insertion/deletions, splice-junction and variants in upstream promoter regions, and somatic mosaicism. Regarding this latter point, the first version of the panel (VIP1; targeting 113 genes) reliably detected NLRP3 somatic mosaicism of 3%; VIP2 provided broader coverage since it targeted 166 genes, and also detected the aforementioned 3% somatic mosaicism for NLRP3, emphasising the superior breadth and depth of next-generation sequencing. Since 3% mosaicism is arguably a very low level and probably uncommon in this setting, we suggest that this sensitivity will capture most (if not all) mosaic CAPS patients, since most reported mosaic NLRP3 mutation cases are 4.2-35.8% [6,9,[40][41][42][43].
The best choice of NGS methodology to use (massively parallel sequencing of selected genes; WES; whole genome sequencing [WGS]; or targeted gene panel sequencing) is highly dependent on factors that include the intended clinical setting and indication for the test, cost, and availability of sufficient computing capacity and bioinformatics expertise to handle the different size and type of datasets appropriately [44]. The main argument for utilising a targeted approach in routine clinical care is that it minimises the ethical issue of incidental findings of mutations in genes that bear no relation to the clinical phenotype under scrutiny, as emphasised by the European Society of Human Genetics [45]. Clinicians can also design panels targeting genes of interest to suit their own clinical practice: as well as AID genes, we included a range of immunodeficiency genes in VIP since we were increasingly aware that autoinflammation could be a feature of primary immunodeficiency [13]; and important genetic mimics of vasculitis (congenital vasculopathies). Moreover, targeted approaches provide superior sensitivity for the detection of variants with low level allele frequency compared with WES; and are more amenable to report and return of clinically actionable results in a timely fashion [46].
A notable limitation of gene panels targeting known genes is that this approach cannot be used to discover novel genetic diseases; that said, unexpected phenotypes can still be detected, as exemplified by patient 23 who presented with cutaneous vasculitis caused by immune dysregulation associated with Cowden syndrome caused by mutation in PTEN [47,48]; and patient 28, a female with unclassified autoinflammation and the unexpected finding of the highly penetrant c.391G>A, p.E131K mutation in WAS (Table 3) [31]. Targeted panels also require intermittent updating and refinement as new diseases genes are discovered. Clinical WES with targeted gene analysis could offer the opportunity to combine targeted genetic screening with future research for gene discovery. In our experience, however, technical issues in relation to depth of coverage, bioinformatics, and manpower required to interpret results currently limit this approach for routine clinical genetic screening.
In terms of the time and cost, it is difficult to formally quantify the exact savings when directly comparing our VIP panel to Sanger sequencing. The direct sequencing cost of mutation screening for the 166 genes listed in VIP2 (S2 Table) was £397, and thus comparable to the cost of screening one single gene using Sanger methodology (£400). Thus, whilst the direct costs of genes sequenced is substantially lower than conventional sequencing, there are other costs associated with targeted gene panels that require consideration, particularly in relation to time spent on interpretation of results, and report generation.

Conclusions
In conclusion, we have described the development of a NGS targeted gene panel, the "Vasculitis and Inflammation Panel" (VIP). We then evaluated its clinical impact for paediatric and adult patients referred to a highly specialised service for autoinflammation and vasculitis. A significant diagnostic contribution was observed in 32% of patients with previously unclassified phenotypes. The level of diagnostic yield obtainable in a timely manner can have a profound impact on patient management, with improved use of targeted therapies, prognostication, and genetic counselling. We emphasise that the success of this approach relies upon its use in the context of a highly specialist clinical service for patients with AID.