A genomic variant of ALPK2 is associated with increased liver fibrosis risk in HIV/HCV coinfected women

HIV coinfection is associated with more rapid liver fibrosis progression in hepatitis C (HCV) infection. Recently, much work has been done to improve outcomes of liver disease and to identify targets for pharmacological intervention in coinfected patients. In this study, we analyzed clinical data of 1,858 participants from the Women’s Interagency HIV Study (WIHS) to characterize risk factors associated with changes in the APRI and FIB-4 surrogate measurements for advanced fibrosis. We assessed 887 non-synonymous single nucleotide variants (nsSNV) in a subset of 661 coinfected participants for genetic associations with changes in liver fibrosis risk. The variants utilized produced amino acid substitutions that either altered an N-linked glycosylation (NxS/T) sequon or mapped to a gene related to glycosylation processes. Seven variants were associated with an increased likelihood of liver fibrosis. The most common variant, ALPK2 rs3809973, was associated with liver fibrosis in HIV/HCV coinfected patients; individuals homozygous for the rare C allele displayed elevated APRI (0.61, 95% CI, 0.334 to 0.875) and FIB-4 (0.74, 95% CI, 0.336 to 1.144) relative to those coinfected women without the variant. Although warranting replication, ALPK2 rs3809973 may show utility to detect individuals at increased risk for liver disease progression.


Introduction
The emergence of highly active antiretroviral therapy has transitioned the once acutely fatal human immunodeficiency virus (HIV) infection to a chronic disease. However, the longer survival of persons living with HIV infection presents a new set of morbidities and increased risk for mortality [1][2][3]. Due to common modes of transmission, those infected with HIV are at higher risk of contracting hepatitis C virus (HCV). Accelerated progression of liver disease in HIV/HCV coinfected patients compared to those with HCV monoinfection is well documented and liver disease has become a leading cause of non-AIDS related death in coinfected individuals [4,5]. The pathogenesis of liver disease in coinfected individuals is multifactorial [6]. Despite substantial progress in identifying risk factors for liver disease progression in coinfected persons, our understanding of risk factors remains incomplete, with genomic factors among those that remain ill-defined. Even with the emergence of direct-acting HCV antivirals, the ability of these agents to regress fibrosis upon HCV clearance remains unclear [7] and the cost of treatment is often inaccessible to at-risk populations [8]. Taken together, these considerations document that liver fibrosis remains a challenge in coinfected and HCV monoinfected patients. In this study, we examined the impact of non-synonymous single nucleotide variants (nsSNV) affecting N-glycosylation both directly at the NxS/T sequons of proteins and indirectly through the enzymes and lectins of glycosylation-related pathways.
Glycosylation is one of the most common and structurally diverse protein modifications and affects protein synthesis, structure, and function [9]. Glycosylation enzymes or the glycoproteins they produce are involved in immune surveillance and host-pathogen interactions [10,11] as well as in the progression of viral liver disease [12]. To elucidate the impacts of glycosylation on the pathogenesis of fibrosis in HIV/HCV coinfected patients, we focused on nsSNV affecting the NxS/T sequons required for the attachment of N-glycans to proteins and thus the potential to change the number of glycans on the protein surface [13,14]. The Women's Interagency HIV Study (WIHS) is a longitudinal natural history study of HIV infection that features a sufficient number of HIV/HCV coinfected participants and biomarkers of liver disease to evaluate the impact of risk factors for liver disease progression [15], including genetic risk factors [16][17][18]. Given its longitudinal cohort design, the Fibrosis-4 Index (FIB-4) [19] and the AST to Platelet Ratio Index (APRI) [20] were collected in the WIHS cohort as noninvasive surrogate measures of hepatic fibrosis. These measures were used in lieu of serial tissue biopsy as the means with which to evaluate genetic risk factors in HIV/HCV coinfection [21]. A set of 887 nsSNV were extracted from a genome-wide association study performed in the WIHS cohort. Of these, 278 nsSNV produce an amino acid substitution that alters the potential glycosylation of target proteins by altering the number of N-glycans decorating a protein, which we analyzed in relation to surrogate biomarkers of liver fibrosis. Given that previous work demonstrated that genetic variation in glycosylation-related enzymes and lectins can alter kinetics and binding affinities respectively, the remaining 609 of 887 nsSNV were glycosylation-related proteins evaluated in relation to surrogate biomarkers of liver fibrosis in coinfected participants [22,23].

Ethics statement
The parent WIHS study and this sub-study conformed to the procedures for informed written consent approved by institutional review boards (IRB) at all sponsoring organizations and to human-experimentation guidelines set forth by the United States Department of Health and Human Services, and finally reviewed and approved by the local IRBs in the Baltimore/DC

Study population
The WIHS is an active, multicenter prospective study of the natural history of HIV infection among women with or at risk for HIV-infection in the United States. Established in 1994, a total of 4,982 women (3,677 HIV-seropositive) have been enrolled. At semi-annual visits, participants completed socio-demographic and medical questionnaires, laboratory testing, and a limited physical examination. Data included in this analysis include age, race and ethnicity, continuous clinical measures of liver fibrosis (APRI and FIB-4), plasma HIV RNA viral titer, HCV infection status (at all visits), and HCV viral titer (at baseline only) [24]. HCV status was defined as "positive" when participants had both positive HCV antibody and detectable HCV RNA at their baseline visit. Those that were defined as "negative" had no detectable HCV antibody upon testing. Self-reported race and ethnicity was used to define four groups: "White" (non-Hispanic), "African American" (non-Hispanic), "Hispanic", or "Other". In addition to self-reported race and ethnicity, genomic estimates of ancestry were derived using principle component analysis of ancestry informative genetic markers [25].

Genotyping
Genotype data in WIHS was generated from genomic DNA from peripheral blood mononuclear cells using the Infinium Omni2.5 BeadChip (Illumina, San Diego, CA, USA) [24]. Of the 10,141 genes (602 glycogene + 9,539 NxS/T-containing genes) known to exist in the human genome, data on 1,029 nsSNVs (698 glycogene + 331 NxS/T) spanning 660 genes (349 glycogene + 311 NxS/T) from 2,120 WIHS participants were available for analysis [13]. Of these 2,120 women, 262 participants were either HCV antibody positive in the absence of detectable HCV RNA (N = 156) or HCV serostatus was never assessed at baseline (N = 106) and therefore excluded (S1 Fig). Of the 331 NxS/T nsSNVs, 44 were mono-allelic (non-polymorphic) and therefore excluded. Among the 287 nsSNVs that remained, missing genotypes across 1,858 patients were imputed with the most common genotype for each nsSNVs (mean participants imputed per nsSNV±SD = 5±6.04). Of the 698 glycogene nsSNVs, 75 were monomorphic and excluded. Among the 623 nsSNVs that remained, missing genotypes across 1,858 patients were imputed with the most common genotype for each nsSNV (mean participants imputed per nsSNV±SD = 5±6.21). A minor allele frequency (MAF) threshold �0.001 (�0.1%) was applied using the MAFs of the coinfected population (N = 661) to further isolate SNVs sufficiently frequent to allow for statistical analysis. Using this method, an additional 9 NxS/T and 14 glycogenes were eliminated from analysis. After applying all exclusion criteria, clinical and genotype (887 glycogene [609 nsSNV, 278 NxS/T nsSNV] spanning 564 genes) data were available for 1,858 women (S1 Fig). The nsSNV reference sequence identifiers (rsID), major (or common) and minor (or rare) alleles, and MAF across serotypes are provided as supplementary tables for the NxS/T (S1 Table) and glycogene (S2 Table) nsSNVs utilized in our analysis.

Serologic markers of fibrosis
Fibrosis-4 Index (FIB-4) and the AST to Platelet Ratio Index (APRI) were used as measures of hepatic fibrosis as described in previous literature [21]. For the FIB-4 index, scores <1.45 and >3.25 indicate a high negative and a high positive predictive value for advanced fibrosis respectively [19]. For APRI, scores <0.5 have a high negative predictive value for liver disease while scores >0.7 and >2 indicate a high positive predictive value for moderate and severe hepatic fibrosis respectively [20]. Genetic polymorphisms were independently analyzed against each continuous surrogate index (i.e., APRI, FIB-4) at baseline to identify variants with stronger associations that would manifest across both current clinical diagnostic resources.

Statistical analysis
For descriptive summaries, continuous variables were summarized using means and standard deviations while categorical variables were summarized using frequency counts and percentages. HIV and HCV RNA status were categorized into 4 serostatus groups: both HIV/HCV noninfected, HIV monoinfected, HCV monoinfected, and HIV/HCV coinfected. Distributions of APRI, FIB-4, and HIV RNA viral load among the four serostatus groups were summarized using descriptive statistics and were compared using chi-square analyses or one-way Analysis of Variance (ANOVA). APRI and FIB-4 scores used for analysis were obtained at baseline visits (pre-2003) at a time prior to the broad use of HCV therapy. For each variant from the 278 NxS/T and 609 glycogene nsSNV that met inclusion criteria, a separate multiple variable linear regression model was constructed for the HIV/HCV coinfected patient population for continuous APRI and FIB-4 outcomes, respectively. Explanatory variables included each variant, age, race and ethnicity, HIV viral load, and HCV viral load where HIV and HCV viral load were normalized using log 2 (x+1) transformation. Model results were near indistinguishable when adjusted for either HCV viral load or HCV status. Genomic estimates of racial and ethnicity were estimated using principle component analysis of 185 ancestry informative markers (SNV) selected to differentiate major racial and ethnic groups in the cohort (i.e., European, African, Hispanic) [25] The first three principle components (PC1, PC2, PC3), which explain >90% variation, were utilized to adjust for genomic estimates of race and ethnicity in the aforementioned linear regression models. HIV and HCV viral load were normalized using log 2 (x+1) transformation. To reassure our findings were not sensitive to confounding factors of liver fibrosis, we performed a sensitivity analysis on the linear regression models with adjustment for additional factors including Hepatitis B (
Utilizing the larger cohort for race and age adjustment helped to ensure that we had the population size to account for the impact of these variables on liver fibrosis. Whereas the first principle component of the PCA of genomic markers of ancestry (PC1) appeared to adequately differentiate African-Americans (non-Hispanic), Hispanics, and Whites (non-Hispanic) (Fig 2A), we took a conservative approach and included the first three principle components (PC1, PC2, PC3). Although adjusting for race/ethnicity using self-report identified the same genetic associations, we opted to employ the 3 PCs to better account for confounding that can occur with self-reported race and ethnicity. For example, many participants self-reporting as "Hispanic" or "Other" in the coinfected subgroup present with a genetic background more consistent with the African-American (non-Hispanic) and White (non-Hispanic) clusters (Fig 2B).
After adjustment for multiple testing, seven nsSNV mapping to separate genes met the a priori criterion of P FDR <0.05 for at least one of the biomarkers of liver fibrosis ( Table 2). Of these, only rs52828316 (MAN2A2) was significant by P FDR for both indices. For all nsSNV, two copies of the minor allele (i.e., minor allele homozygotes) was associated with increases in APRI and/or FIB-4 ( Table 2). Upon sensitivity analysis of additional adjustable factors of HBV Status, alcohol usage, and BMI (N = 616), all statistically significant nsSNVs were maintained, and APRI and FIB-4 estimates for all but rs1800472 (TGFB1) were within 10% of that of the initial model (S3 Table). Two additional nsSNV were found to be significant when additionally adjusting for HBV status, alcohol, and BMI, namely rs3745925 (MADCAM1) and rs2307145 (IL12RB2) (S3 Table). ALPK2 rs3809973 was evaluated further in relation to hepatic fibrosis

PLOS ONE
among the four viral serogroups because it was the only variant sufficiently frequent (n>100) to result in all three genotypic groups (i.e., major allele homozygotes, heterozygotes, minor allele homozygotes), and the ALPK2 variant was not sensitive to adjustable factors on the linear regression model. In order to see if the ALPK2 variant's tentative association with increased liver fibrosis in coinfected women was preferentially impacted by the effects of either virus, we analyzed each genotype of the ALPK2 variant across all viral serostatuses utilized in the cohort. For the ALPK2 rs3809973, HIV/HCV coinfected participants who were homozygous for the minor allele had significantly higher mean APRI and FIB-4 scores relative to the coinfected participants homozygous for the major allele (APRI, 0.61, 95% CI, 0.334 to 0.875; FIB-4, 0.74, 95% CI, 0.336 to 1.144) and compared to those HCV monoinfected participants homozygous for the minor allele (APRI, 0.79, 95% CI, 0.370 to 1.200; FIB-4, 1.24, 95% CI, 0.820 to 1.650) (Fig 3A and 3B). Evaluation of the association of ALPK2 rs3809973 with APRI and FIB-4 by racial and ethnic group (i.e., non-Hispanic African-American, non-Hispanic Caucasian, Hispanic) revealed similar patterns of association with one exception in non-Hispanic African-American where no difference by genotypic group was observed with FIB-4 (data not shown). The heterozygotes of coinfected individuals displayed no significant increases in APRI or FIB-  Table 1. Pairwise comparisons of liver outcomes were performed for all viral-serotype group-comparisons within each liver metric. Apart from those comparisons labeled not significant (ns), all other comparisons within fibrosis index aside from those labeled as significant were also below the 0.05 p-value threshold. https://doi.org/10.1371/journal.pone.0247277.g001

PLOS ONE
4 relative to the coinfected major allele homozygotes (APRI, 0.10, 95% CI, -0.128 to 0.332; FIB-4, 0.15, 95% CI, -0.195 to 0.492). These coinfected population findings for the ALPK2 nsSNV (rs3809973) were observed against a background of similar CD4 + T-cell percentage, measured relative to other leukocyte types, and detectable HIV viral loads at baseline (Fig 3C and 3D). HIV monoinfected participants homozygous for the minor allele displayed significantly increased FIB-4 (0.45, 95% CI, 0.028 to 0.879) scores relative to noninfected participants homozygous for the minor allele, but the finding was not significant for the same comparison using APRI (0.00, 95% CI, -0.422 to 0.429) (Fig 3A and 3B). For the HIV monoinfected genotype, a slight significant difference in the CD4 + T-cell percentage (P = 0.007) accompanied this FIB-4 finding. Impacts of variant ALPK2 allele burden between genotypes in the noninfected and HCV monoinfected serogroups were unremarkable (Fig 3A and 3B). A parallel set of analyses were conducted with adjustment for the additional factors of HBV status, alcohol use, and BMI. Overall, little differences from the present analysis was observed ( S2 Fig). However, when comparing the heterozygotes of coinfected individuals to the coinfected major allele homozygotes, FIB-4 was found to be significantly increased (0.19, 95% CI, 0.006 to 0.382) with the APRI measure of the same comparison borderline insignificant (0.17, 95% CI, -0.022 to 0.353) when adjusting for these additional factors (S2 Fig).

Discussion
This study identified several novel genetic associations among glycogenes and biomarkers of liver fibrosis. Average liver fibrosis scores in the coinfected serogroup analyzed at baseline were significantly higher than that of either HIV or HCV monoinfected participants as shown in Fig 1, recapitulating the fibrotic trends described in the literature for coinfected populations PLOS ONE [6,26]. Of the 887 nsSNV assessed that either directly (nsSNV affecting NxS/T sequons) or indirectly (nsSNV in glycosylation or lectin genes) alter glycosylation pathways or products, we found seven nsSNV that were associated with an increased risk of hepatic fibrosis among the HIV/HCV coinfected population. We observed higher APRI and FIB-4 values (indicative of greater fibrosis) in coinfected participants homozygous for the ALPK2 rs3809973 minor allele when compared with participants homozygous for the ALPK2 rs3809973 major allele irrespective of HIV and HCV viral load, BMI, HBV Status, or alcohol usage (Fig 3 and S2 Fig). As elevations in viral HIV titers have been correlated with increased liver injury in the coinfected [27], our findings suggest that the genetic risk of hepatic fibrosis among coinfected individuals carrying the ALPK2 rs3809973 risk allele is not confounded by these titers. The lack of association in coinfected participants between the ALPK2 variant heterozygote and increased liver fibrosis suggested the need for variant homozygosity for the detrimental impacts of the variant to manifest in the coinfected and conforms with a recessive mode of inheritance. As shown in Fig 3, among HIV monoinfected participants, the relative CD4 + T-cell percentage among leukocytes was inversely associated with FIB-4. Furthermore, coinfected participants that were homozygous for the minor allele were significantly increased in terms of fibrosis risk   N = 1,858). The number of samples representing each genotype for the respective serogroups is displayed as follows: homozygous for the major allele (green), heterozygous (orange), and homozygous for the minor allele (blue). Comparisons were shown, for reference, relative to the major allele homozygote of the noninfected serogroup. ANOVA was used to compare relative CD4 + T-cell percentages (C) or HIV/HCV viral loads (D) between the genotypes of each serogroup. Asterisks indicate two-sided P-values below 0.05 ( � ) and 0.01 ( �� ) respectively. https://doi.org/10.1371/journal.pone.0247277.g003

PLOS ONE
when compared to HCV monoinfected participants of the same genotype; suggesting a need for general HIV-associated immunosuppression to produce the detrimental fibrosis risk increase. Our comparisons of ALPK2 rs3809973 among serogroups at baseline therefore leads us to speculate that the risk allele's impact on liver fibrosis is reliant on both HCV-mediated liver damage and its perpetuation by HIV-mediated immunosuppression. The ALPK2 gene and rs3809973 have not been linked to a pathological liver phenotype to date. Although the association of ALPK2 rs3809973 with fibrosis in the setting of HIV/HCV coinfection warrants replication, some circumstantial evidence suggests a potential role for ALPK2 in risk for viral hepatic fibrosis. Hepatic Stellate cells (HSC) lining the perisinusoidal space of the liver are the fibrogenic cell type responsible for extracellular matrix deposition and subsequent fibrosis in the setting of hepatic inflammation. Although this fibrogenic response is designed to protect against liver injury and typically reverses after the hepatic insult subsides, progressive inflammation and the chronic activation of HSCs can lead to cirrhosis and an increased risk of developing hepatocellular carcinoma [28,29]. Signaling molecules and pathways involved in HSC activation have been described, but most notable is the expression of toll-like receptors on HSCs enables their activation upon exposure to structurally conserved microbial-derived products such as lipopolysaccharide [30]. Although incompletely characterized, the ALPK2 protein was first identified in cardiomyocytes as an integral regulator of heart development [31]. However, most studies of ALPK2 have focused on the pathological processes of the gastrointestinal tract. ALPK2 is associated with luminal apoptosis in colorectal cancer cell lines [32]. Luminal shedding is a phenomenon of the enteric innate immune system designed to maintain the function of the gut barrier by preventing the invasion of virulent microbes into systemic circulation. The increased apoptosis of enterocytes in the absence of their concomitant replacement can yield an increasingly permeable intestinal membrane that leaves the host vulnerable to augmented translocation of microbial products [33]. Since increased microbial translocation is a hallmark of HIV infection and lipopolysaccharide activates HSCs [34], any augmentation of this translocation process would exacerbate liver fibrosis in the context of existing hepatitis. In addition, a genome-wide association study of inflammatory bowel disease, a condition in which pathological epithelial shedding is observed, found the mRNA of ALPK2 to be up-regulated in inflamed mucosa compared to control samples [35]. Therefore, we speculate that ALPK2 rs3809973 could enhance liver fibrosis by interfering with enteric immunity. Even though this variant generates a novel NxS/T sequon, augmentation of glycan attachment onto the amino acid substitution encoded by the minor allele has not been demonstrated. Structural studies indicate that the variant does not interrupt the kinase domain nor any residues known to be post-translationally modified, glycosylation or otherwise [31,36]. Therefore, the impact of the K829N substitution encoded by rs3809973 on ALPK2 structure and function warrant investigation.
Although the low minor allele frequency of the remaining 6 nsSNV associated with increased liver fibrosis (i.e., CCR7 rs2228015, GLT8D2 rs17035120, MAN2A2 rs52828316, MBL2 rs1800450, MCOLN2 rs77452813, and TGFB1 rs1800472) precluded detailed modeling, their previous associations with liver disease, metabolism, and viral immunity in other literature provide impetus for further investigation. TGFB1 has long been linked to enhanced hepatocyte destruction and HSC activation [37]. GLT8D2 and MAN2A2 have been implicated in non-alcoholic fatty liver disease [38] and coinfected liver disease [39], respectively. Finally, CCR7, MBL2, and MCOLN2 have all been connected to the modulation of host cellular and immune responses in viral infection [40][41][42]. Taken together, all seven nsSNV and their cognate genes have plausible roles in liver disease pathology and warrant replication in an independent sample(s).
We note that there are limitations to this study. As this is an exclusively female cohort, future studies should aim to validate these findings in men. Although we identified that there are potentially more than 42,000 nsSNV that either directly altered N-linked glycosylation sites (NxS/T) or that may indirectly alter the function of enzymes or lectins interacting with glycoproteins, only a subset (2.5%) were available for analysis in the cohort due to a combination of low nsSNV allele frequency in the population and/or lack of inclusion of the majority of nsSNV on the commercial array used. A number of factors hindered our attempts at a longitudinal assessment of the genetic risk factors on the coinfected population. For one, as the standard of care for HIV treatment changed for coinfected WIHS participants longitudinally (S3A and S3B Fig), adjusting for shifting antiviral treatment regimens at different time points to meet this standard of care was a challenge. Secondly, as this cohort was initially designed to investigate HIV progression, the HCV virologic status was only determined at baseline and HCV clearance was not assessed further, thereby making longitudinal analyses of HCV-infection status susceptible to misclassification bias. In addition, we had no way of determining the duration of HCV infection at the time of serologic assessment. Along similar lines, survival bias and the availability of more effective and less toxic HIV treatments for new enrollees over time in WIHS may have attenuated the relationship between susceptibility alleles and fibrosis [27]. The rate of loss to follow-up was also problematic in that nearly one third of the coinfected participants at baseline were no longer in the cohort after four visits (2 years) (S3C and S3D Fig). With regard to our sensitivity analysis, only 32 patients (2%) were chronically infected with HBV and this may not have been sufficient power to account for the influence of HBV on liver fibrosis in the linear regression model. Although BMI is a practical means of estimating obesity, it was used in the sensitivity analysis as an imperfect measure of NAFLD and may not fully account for the impact of the condition of liver fibrosis. Along similar lines, alcohol usage was self-reported and may be subject to under reporting in the sensitivity analysis, as the largest fraction of subjects from each of the virally infected serogroups in Table 1 reported to abstain from all alcohol consumption. Finally, APRI and FIB-4 scores are correlated with liver fibrosis risk, but both are imperfect surrogate measures. The utility of these surrogate fibrosis metrics has been validated against biopsy and other measures (i.e. transient elastography) in previous studies [43], but were not validated by such methods at present.
Given the goal of improving the prognosis of liver disease in the HIV/HCV coinfected population, we have identified candidate genes that may participate in hepatic fibrosis. Although these genetic associations require replication, demonstration of the impact of each of the seven nsSNVs on N-linked glycosylation of the site and its cognate protein warrants investigation. Longitudinal HCV testing and assessment of viremia along with additional studies in other HIV/HCV coinfected cohorts may permit better stratification of serogroups and to perform longitudinal analysis of these candidate gene nsSNV.  Fig. rs3809973 (ALPK2) impact on APRI and FIB-4 outcomes with additional HBV, alcohol, and BMI adjustment. Mean baseline APRI (A) and FIB-4 (B) score with 95% confidence interval shown for the allele pairs of each serotype (Total N = 1,790). The number of samples representing each genotype for the respective serogroups is displayed as follows: homozygous for the major allele (green), heterozygous (orange), and homozygous for the minor allele (blue). Comparisons were shown, for reference, relative to the major allele homozygote of the noninfected serogroup. Different from