Chromosome 9p21 SNPs Associated with Multiple Disease Phenotypes Correlate with ANRIL Expression

Single nucleotide polymorphisms (SNPs) on chromosome 9p21 are associated with coronary artery disease, diabetes, and multiple cancers. Risk SNPs are mainly non-coding, suggesting that they influence expression and may act in cis. We examined the association between 56 SNPs in this region and peripheral blood expression of the three nearest genes CDKN2A, CDKN2B, and ANRIL using total and allelic expression in two populations of healthy volunteers: 177 British Caucasians and 310 mixed-ancestry South Africans. Total expression of the three genes was correlated (P<0.05), suggesting that they are co-regulated. SNP associations mapped by allelic and total expression were similar (r = 0.97, P = 4.8×10−99), but the power to detect effects was greater for allelic expression. The proportion of expression variance attributable to cis-acting effects was 8% for CDKN2A, 5% for CDKN2B, and 20% for ANRIL. SNP associations were similar in the two populations (r = 0.94, P = 10−72). Multiple SNPs were independently associated with expression of each gene (P<0.05 after correction for multiple testing), suggesting that several sites may modulate disease susceptibility. Individual SNPs correlated with changes in expression up to 1.4-fold for CDKN2A, 1.3-fold for CDKN2B, and 2-fold for ANRIL. Risk SNPs for coronary disease, stroke, diabetes, melanoma, and glioma were all associated with allelic expression of ANRIL (all P<0.05 after correction for multiple testing), while association with the other two genes was only detectable for some risk SNPs. SNPs had an inverse effect on ANRIL and CDKN2B expression, supporting a role of antisense transcription in CDKN2B regulation. Our study suggests that modulation of ANRIL expression mediates susceptibility to several important human diseases.


Introduction
The chromosome 9p21.3 region adjacent to the loci encoding the cyclin-dependent kinase inhibitors CDKN2A (ENSG00000147889) and CDKN2B (ENSG00000147883) is an important susceptibility locus for several diseases with a complex genetic background. Recent genome-wide association (GWA) studies have shown that single nucleotide polymorphisms (SNPs) in this region are associated with coronary artery disease (CAD) [1][2][3][4], ischaemic stroke [5,6], aortic aneurysm [7], type II diabetes [8,9], glioma [10,11], and malignant melanoma [12]. Candidate gene approaches have also reported SNPs in this region to be associated with breast [13,14], ovarian [15], and pancreatic carcinoma [16], melanoma [17], and acute lymphoblastic leukaemia [18], as well as with poor physical function in the elderly [19]. Variants associated with these diseases are represented in Figure 1. Most of the risk variants in the chromosome 9p21 region identified by GWA studies are in noncoding regions, suggesting that their effects are likely to be mediated by influences on gene expression. Sequence variation can influence expression by cis or trans mechanisms. Trans-acting elements influence transcript levels of both alleles via diffusible factors and are usually located distant to the genes they regulate, whereas cis-acting elements act on genes on the same chromosome in an allelespecific manner and are usually located close to the genes they regulate. Since most reported risk variants in the 9p21 region do not appear in mature transcripts, and there are no known or predicted microRNAs mapping to this region [20][21][22][23], these variants are unlikely to produce diffusible trans-acting factors and are therefore likely to influence expression of nearby genes in cis. Genes in the region include the cyclin-dependent kinase inhibitors CDKN2A (p16 INK4a ) including its alternative reading frame (ARF) transcript variant (p19 ARF ), CDKN2B (p15 INK4b ), and a recently-discovered noncoding RNA, designated ANRIL (CDKN2BAS, ENSG00000240498), that undergoes splicing and is transcribed from the opposite strand to CDKN2A/B. The ARF/CDKN2A/B proteins are established tumour suppressors deleted in a range of cancers including familial cutaneous malignant melanoma [24]; they block cell cycle progression and influence key physiological processes such as replicative senescence, apoptosis, and stem-cell self-renewal [25]. Cis-acting regulatory elements for these genes have been identified in vitro using reporter assays [26][27][28][29][30], but expression levels are also influenced by factors such as age, chemotherapeutic agents, DNA damage by ultraviolet or ionizing radiation, and levels of transcriptional regulators [31], all of which are likely to act in trans.
The function of ANRIL is unknown, but other processed non-coding RNAs are involved in the regulation of gene expression through transcriptional and translational control mechanisms [32].
Genetic effects on expression can be assessed by comparing total expression levels in individuals with different genotypes at a putative regulatory locus. This is termed expression quantitative trait locus (eQTL) mapping [33]. This approach utilises information from all members of the population, but reflects the net effect of both cis and trans-acting influences; the sensitivity to detect cisacting effects is therefore reduced in the presence of significant variation in trans-acting influences such as the environmental factors outlined above. An alternative approach that is specific for mapping cis-acting influences is to measure allelic expression (aeQTL mapping). An unequal amount of transcript arising from each allele in an individual heterozygous for a transcribed polymorphism indicates the presence of cis-acting effects on expression. While traditional eQTL analysis assesses the influence of polymorphisms by comparing expression between samples, allelic expression analysis compares the expression levels of alleles within individual samples, making it much more robust to transacting influences that affect both alleles such as age, gender, population stratification, or experimental variability. This maximises the sensitivity for detecting cis-acting effects [34].
Variants associated with CAD span a region greater than 100kb, but the association is accounted for by SNPs within a 53kb interval that define a core risk haplotype [35]. Lead SNPs for CAD and diabetes are in separate LD blocks in Caucasians and are independently associated with the two separate diseases [35]. To date, CAD risk SNPs have shown inconsistent association with CDKN2A, CDKN2B and ANRIL by eQTL mapping. One CAD risk SNP was associated with altered ANRIL expression in blood, but not with CDKN2A or CDKN2B expression [36], whilst a different CAD risk SNP has been associated with reduced expression of all three genes in peripheral blood T-cells [37]. However, the latter study found no association with expression for other CAD risk SNPs [37], and another report also found no association of a lead CAD risk SNP with these genes or with global gene expression in primary vascular tissue and lymphoblastoid cells [38]. Based on evolutionary conservation and effects on expression, individual SNPs (rs10757278 and rs1333045) have been highlighted as potential causal variants for the association with CAD [36,37]. However, if multiple cis-acting effects are present at a locus, resolving a disease association by fine-mapping may not be possible. Examining gene expression rather than disease phenotype increases the power to map cis-acting effects, and we used this approach to determine whether multiple sites independently influence expression. Caucasian populations have strong linkage disequilibrium (LD) in the chromosome 9p21 region which limits the ability to separate the effects of individual SNPs on expression [35]. Populations of African ancestry have less LD [39,40], which can be exploited to improve the fine-mapping of functional polymorphisms associated with quantitative traits [41,42].
We therefore used eQTL and aeQTL mapping to perform detailed fine-mapping of the association of SNPs at the 9p21.3 locus with expression of CDKN2A, CDKN2B and ANRIL using a mixed-ancestry South African (SA) population, as well as a British Caucasian cohort. We identified multiple SNPs independently associated with expression of each gene, suggesting that several sites may modulate disease susceptibility. The markers identified in GWA studies were all associated with allelic expression of ANRIL, Figure 1. SNPs associated with disease in the chromosome 9p21.3 region. Genes are illustrated in blue at the top, with arrows representing the direction of transcription. SNPs typed in our study and SNPs associated with various diseases are represented by black bars. Diseases in bold are those with association data from genomewide association studies. The hatched box represents the core risk haplotype for CAD defined by Broadbent et al [35]. Promoter regions for each gene are shown as pale blue boxes. DM = diabetes mellitus type II; BCC = basal cell carcinoma. doi:10.1371/journal.pgen.1000899.g001

Author Summary
Genetic variants on chromosome 9p21 have been associated with several important diseases including coronary artery disease, diabetes, and multiple cancers. Most of the risk variants in this region do not alter any protein sequence and are therefore likely to act by influencing the expression of nearby genes. We investigated whether chromosome 9p21 variants are correlated with expression of the three nearest genes (CDKN2A, CDKN2B, and ANRIL) which might mediate the association with disease. Using two different techniques to study effects on expression in blood from two separate populations of healthy volunteers, we show that variants associated with disease are all correlated with ANRIL expression, but associations with the other two genes are weaker and less consistent. Multiple genetic variants are independently associated with expression of all three genes. Although total expression levels of CDKN2A, CDKN2B, and ANRIL are positively correlated, individual genetic variants influence ANRIL and CDKN2B expression in opposite directions, suggesting a possible role of ANRIL in CDKN2B regulation. Our study suggests that modulation of ANRIL expression mediates susceptibility to several important human diseases.
but association with the other two genes was only detectable for some of them. Our study suggests that modulation of ANRIL expression mediates susceptibility to a range of important human diseases.

Results
We measured expression of CDKN2A, CDKN2B and ANRIL in peripheral blood from 310 healthy SA individuals (demographic details provided in the Methods section). Allelic expression was assessed for each gene using two transcribed SNPs located within the same exon. We selected 56 SNPs that tag the common variation in the region, specifically including SNPs with previously reported phenotypic associations. The results of allelic expression mapping in this population were compared with conventional mapping using total expression in the same samples; and with allelic expression mapping in a separate population of 177 healthy British Caucasians. Information on the selected SNPs and genotyping data are summarised in Table S1.

Inter-individual variation in expression
Total expression levels showed substantial inter-individual variation for each of the three genes, up to 13.9-fold for CDKN2A, 36.1-fold for CDKN2B, and 25.5-fold for ANRIL. Allelic expression ratios at individual transcribed markers also showed considerable inter-individual variation, up to 5.6-fold for CDKN2A, 2.4-fold for CDKN2B, and 6.8-fold for ANRIL. Plots of the allelic expression ratios at each transcribed SNP in the SA and Caucasian cohorts are shown in Figure S1 and Figure S2 and plots of the normalised total expression Ct values are shown in Figure S3. Standard errors for ANRIL were higher than for the other two genes in both the allelic and total expression assays, which is likely to be due to the fact that peripheral blood expression of ANRIL was lower than for CDKN2A and CDKN2B.
We estimated the proportion of the variance in total expression that can be attributed to cis-acting effects for each transcribed SNP in the three genes, as described in the Methods section. For CDKNA this proportion was 8% when rs3088440 was used to estimate the variance in cis acting effects, and 4% when rs11515 was used. For CDKN2B the corresponding values were 5% (using rs3217992), 5% (using rs1063192) and for ANRIL 20% (using rs10965215), and 19% (using rs564398).
Correlation of CDKN2A, CDKN2B, and ANRIL expression Total expression levels of CDKN2A, CDKN2B and ANRIL were positively correlated (r = 0.24 to 0.30, all P,4610 25 ) as shown in Figure S4, suggesting that expression of these genes is co-regulated.
Allelic expression versus total expression for mapping cis-acting effects Allelic expression ratios (AER) measured at the two transcribed SNPs in each gene were highly correlated (CDKN2A r = 0.68 P = 1.7610 23 ; CDKN2B r = 0.80 P = 1.7610 212 ; ANRIL r = 0.90 P = 1.0610 226 ; all genes combined r = 0.96 P = 3610 261 ) as shown in Figure S5. This was expected since the two transcribed SNPs selected to assess AER in each gene are located in the same exon and the same transcripts. We therefore used the AERs from both transcribed markers in each gene (as described in the Methods section) for the aeQTL analysis. This increased the number of informative heterozygotes at which allelic expression could be assessed for each gene and increased the power to detect significant effects, as shown in Table 1.
Unlike allelic expression ratios, total expression data may be influenced by covariates that influence expression in trans. We therefore corrected total expression values for covariates (age, sex, and ethnicity) and excluded outlying individuals as described in the Methods section. These corrections did not significantly alter the results of the eQTL analysis, as shown in Figure S6. All subsequent analyses are presented using the covariate-corrected eQTL data. We compared cis-acting effects assessed by eQTL and aeQTL mapping, as shown in Figure 2. There was a strong correlation both for the effect size (r = 0.87, P = 4.7610 251 ) and significance of association (r = 0.97, P = 4.8610 299 ) at each mapping SNP between the two techniques. However, the associations were more significant for allelic expression than for total expression analysis, indicating that allelic expression had greater power for detecting cis-acting effects. This suggests that trans-acting effects make a substantial contribution to the overall variance of expression in these genes, which is consistent with our estimates that cis-acting effects account for only between 4 and 20% of the overall variance in expression of these genes.

Comparison of cis-acting effects between populations and combined analysis
We compared aeQTL analysis between the SA and British Caucasian samples. Results of aeQTL mapping were highly correlated between the two populations, both for the significance of the detected association (r = 0.94, P = 10 272 ) and the estimated magnitude of the effect on expression for each SNP (r = 0.82, P = 2610 238 ), as shown in Figure 3. Patterns of LD in the two populations are shown in Figure S7. Minor allele frequency in the SA population was higher (which increases the proportion of informative heterozygotes for allelic expression analysis) for 33 of the 53 SNPs typed in both populations.
In view of the similarity of the effects in the two cohorts, we combined the data in subsequent analyses, increasing the power to detect cis-acting effects of smaller magnitude and enabling us to adjust for the effects of individual SNPs. The significance of associations for individual SNPs in the combined cohort is shown in Figure 4. Subsequent results refer to the combined dataset, with specific discussion of differences between the populations where relevant.
As described in the Methods section, we defined significance thresholds for all SNP associations using the family wise error rate (FWER) where multiple testing was taken into account by using a Bonferroni correction for the 56 SNPs tested. Associations with a FWER threshold of 0.05 (corresponding to a nominal P-value of 8.9610 24 , 2log 10 P of 3.05, and 2log 10 FWER of 1.3) were regarded as significant. Table S2 shows the 2log 10 of the nominal P-values and FWER for all SNP associations, and nominal P-values are reported in the text.
The effect of each SNP on AER is also shown in Table S2. The maximum change in allelic expression associated with any SNP was 1.4-fold for CDKN2A, 1.33-fold for CDKN2B, and 1.97-fold for ANRIL. Due to the power of our combined dataset we were able to detect SNP effects on allelic expression as small as 1.05-fold that were significant.
Multiple sites influence CDKN2A, CDKN2B, and ANRIL expression As shown in Figure 4, multiple SNPs were associated with cisacting influences on expression of CDKN2A, CDKN2B and ANRIL. This could be the result of multiple independent loci influencing expression of each gene, but could also be a reflection of strong LD in the region since associations might be observed for 'non-functional' SNPs (that do not directly influence expression) which are in LD with other 'functional' polymorphisms. Adjusting for the effect of individual SNPs was used to assess whether multiple SNPs were independently correlated with expression of the three genes, as shown in Figure 5. For each gene stepwise adjustments were made for the effect of the SNP which showed the most significant association with expression, until independent effects could no longer be detected. Associations remained significant after adjusting for the top SNP for CDKN2A and CDKN2B, and the top two SNPs for ANRIL.
Our results indicate that even after adjusting for the effects of the most significant marker, some of the remaining SNPs still showed significant association with ANRIL expression. This could be explained by the presence of more than one functional polymorphism affecting expression, but could also reflect the presence of a functional polymorphism that is in disequilibrium with both markers. However, examination of the allelic expression patterns provides additional support for the presence of multiple sites affecting expression. For example, Figure 6 shows the allelic expression ratios observed at the transcribed SNP rs564398 in ANRIL, grouped according to the genotype at rs10965215. These two SNPs are in strong LD (D9 = 0.98), hence the absence of individuals homozygous for the A allele at rs10965215 that are heterozygous at rs564398. We observe that the G allele of the transcribed SNP (rs564398) is overexpressed (G/A AER values greater than 1), however overexpression is stronger (P = 10 215 using the Mann-Whitney test) for individuals that are also heterozygous at the second polymorphism (rs10965215). This pattern is not consistent with allelic expression being determined by a single biallelic polymorphism acting in cis and suggests that there is more than one functional polymorphism or that this polymorphism is multiallelic. Such patterns were common in our data.
The direction of cis-acting effects on expression was compared between genes for SNPs showing significant associations with expression of each gene, as shown in Table 2. SNP effects for CDKN2A and ANRIL were in the same direction for all 10 SNPs, meaning that alleles associated with overexpression of CDKN2A were also associated with overexpression of ANRIL. By contrast, for all 8 SNPs that were significantly associated with allelic expression of both CDKN2A and CDKN2B, the alleles associated with CDKN2A overexpression were associated with CDKN2B underexpression. Similarly for all 3 SNPs significantly associated with allelic expression of both CDKN2B and ANRIL, alleles associated with overexpression of CDKN2B were associated with ANRIL underexpression. The total expression analysis had insufficient power for similar analyses to be performed.

In vivo effects of putative regulatory elements identified in vitro
We investigated whether SNPs within regulatory regions previously identified by in vitro reporter assays were associated with cis-acting effects on expression in vivo. The effect on gene expression and significance of the association for each SNP is summarised in Table S2.
CDKN2A expression was significantly correlated with SNPs in its promoter and the ARF transcript promoter [26][27][28][29], and with SNPs close to the RD INK4/ARF domain that has been shown to regulate expression of CDKN2A, ARF and CDKN2B in vitro [30].
CDKN2B expression was also significantly correlated with SNPs in the CDKN2A and ARF promoter regions, suggesting that these elements influence expression of both genes. CDKN2B expression was not significantly correlated with the single SNP typed in its promoter (rs2069418) prior to adjustment, but this became significant after adjustment for the most significant SNP in the ARF promoter (rs3218018).
ANRIL expression was strongly associated with SNPs in the CDKN2B promoter (P = 10 272 ), ARF promoter (P up to10 253 ) and RD INK4/ARF domain (P = 10 212 ), as well as with SNPs adjacent to the CDKN2A promoter (rs3731239, P = 10 225 ). These data validate in vivo the function of the regulatory elements identified by in vitro transfection studies, and confirm that shared cis-acting elements influence expression of CDKN2A, CDKN2B and ANRIL.

CAD, diabetes, and cancer risk variants are associated with cis-acting effects on expression
We examined the correlation of allelic expression of CDKN2A, CDKN2B and ANRIL with SNPs reported to confer disease susceptibility. The effect on gene expression and significance of the association for each SNP is summarised in Table 3.
CAD and stroke. SNPs within the core risk haplotype region for CAD [35] were associated with ANRIL expression (P up to 10 221 ), but none were associated with CDKN2A or CDKN2B expression. CAD risk alleles were all associated with reduced ANRIL expression, up to 1.9-fold, suggesting that expression of ANRIL, rather than CDKN2A or CDKN2B, might mediate atherosclerosis susceptibility. However, other CAD risk variants located telomeric to the core risk haplotype region such as  . Significance of association with expression for SNPs in the combined population. The Y-axis represents the 2log P value for individual SNPs (shown in chromosomal order along the X-axis) for: CDKN2A (A); CDKN2B (B); ANRIL (C). The horizontal black line on each graph represents the significance threshold after adjustment for multiple testing (family wise error rate of 0.05 corresponding to 2log 10 P = 3.05). The relative location of genes and promoter elements is represented at the top (CDKN2A and CDKN2A/ARF promoters yellow; ANRIL promoter blue; CDKN2B promoter orange; CDKN2A/ARF regulatory domain red). Letters along the bottom represent associations from GWA studies (C = CAD, D = diabetes, M = melanoma, G = glioma) and the black bar at the bottom represents the core risk haplotype for CAD defined by Broadbent et al [35]. doi:10.1371/journal.pgen.1000899.g004 rs7044859 and rs496892 showed substantially larger effects and stronger associations with ANRIL expression (P,10 260 for each SNP), and were also significantly associated with CDKN2A and CDKN2B expression (P,10 24 for each SNP). The CAD risk alleles at these SNPs correlated with reduced expression of ANRIL and CDKN2A, but increased CDKN2B expression. Associations for these SNPs remained significant after adjusting for the effect of the lead CAD SNPs within the core risk haplotype region (rs10757274, rs2383206, rs10757278 and rs1333049) [35], but SNPs within the core risk haplotype were no longer significantly associated with ANRIL expression after adjusting for the effect of SNPs at the distal locus (rs10965215 and rs564398). This suggests that the core CAD risk haplotype does not account for all of the observed association with ANRIL expression in peripheral blood.
Based on evolutionary conservation and effects on ANRIL transcription, rs1333045 within the core risk haplotype has been previously highlighted as a potential functional variant responsible for conferring susceptibility to CAD at the 9p21 locus [36]. In our analysis rs1333045 was associated with ANRIL expression (P = 10 212 ), but not with CDKN2A or CDKN2B expression. Its effects were similar to those of other SNPs in the core risk haplotype for CAD. After adjusting for the effect of rs1333045, 32 SNPs remained significantly associated with ANRIL expression, suggesting that the effect attributed to such variants was not due to LD with rs1333045.
Diabetes. The lead chromosome 9p21 SNPs associated with diabetes in GWA studies are located in a separate LD block to the CAD risk variants [7,9], and the phenotypic effects of CAD and diabetes variants have been shown to be independent [35]. Diabetes risk alleles in this region (rs10811661-T and rs2383208-A) were associated with under-expression of ANRIL, but were not associated with CDKN2A or CDKN2B expression in our Caucasian population. However, these SNPs showed no association with expression of ANRIL in the SA population, despite greater power to detect effects in this cohort.
A separate locus for diabetes susceptibility in the chromosome 9p21 region in Caucasians is located within the region associated with CAD risk. The rs564398-T risk allele at this locus is associated with diabetes [8], CAD [35] and stroke [5]. This SNP had the strongest association with ANRIL expression of all the SNPs we tested (P = 10 281 ), but was not significantly associated with CDKN2A or CDKN2B expression. The rs564398-T risk allele was associated with ANRIL underexpression, and the association remained significant after adjusting for the effect of rs10811661, the lead diabetes SNP. However, the association with rs10811661 was no longer significant after adjustment for rs564398. Cancers and frailty. GWA studies have recently identified chromosome 9p21 SNPs correlated with susceptibility for glioma [10,11] and malignant melanoma [12]. The glioma risk allele rs1063192-C was highly correlated with increased ANRIL expression (P = 10 261 ), while the melanoma risk variant rs1011970-T correlated with reduced expression of ANRIL. Neither was associated with CDKN2A or CDKN2B expression.
Multiple candidate gene association studies have reported associations between SNPs in this region and susceptibility to a variety of diseases. These have mostly involved cancer phenotypes because the cell-cycle regulators CDKN2A and CDKN2B are recognised to be involved in predisposition to certain cancers. Such association studies have implicated 9p21 SNPs as being potentially involved in the development or therapeutic response to pancreatic [16,43], breast [13,14,44], ovarian [15,45], and bladder [46] carcinoma, as well as acute lymphoblastic leukaemia [18], and melanoma [17,47,48]. The SNPs associated with these phenotypes showed a significant correlation with allelic expression of one or more of the genes we examined, as summarised in Table 3. A SNP (rs2811712) that is associated with severely limited physical function in older people [19] was significantly associated with CDKN2B expression, but not with ANRIL expression.

Discussion
This is the most detailed study to date of cis-acting influences on expression at the chromosome 9p21 locus. We have shown that multiple sites in the 9p21 region independently influence CDKN2A, CDKN2B and ANRIL expression, and demonstrated that SNPs associated with diseases including CAD, diabetes, and cancers are all highly associated with ANRIL expression, suggesting that modulation of ANRIL expression may mediate disease susceptibility. We also report novel methodology for allelic expression analysis that allowed us to combine data from multiple transcribed polymorphisms and to adjust for the effects of particular SNPs. We have demonstrated that this approach has greater power than total expression analysis for mapping cis-acting effects.
Total expression levels of CDKN2A, CDKN2B and ANRIL, which reflect the combined influence of cis and trans-acting factors, were positively correlated. This corroborates other recent data [37], and suggests that expression of these genes is co-regulated. We have shown that trans-acting influences account for the majority of the observed variance in expression of these genes (80-96%), and the correlation in total expression levels is likely to reflect co-regulation of the genes through trans-acting factors. In addition, our allelic expression analysis demonstrated that expression is also influenced by shared cis-acting elements in the region. Despite the positive Figure 6. Effect of genotype at rs10965215 on allelic expression ratio of transcribed ANRIL SNP rs564398. Diamonds represent the allelic expression ratio for each individual, all of whom are heterozygous for the transcribed SNP rs564398. The first column shows individuals who are homozygous for rs10965215 (mean ratio 1.57), and the second column shows individuals who are heterozygous for rs10965215 (mean ratio 2.00). The third column shows the expression ratio obtained from genomic DNA in individuals who are heterozygous for the transcribed SNP rs564398, where the two alleles are present in a 1:1 ratio (mean ratio 1.00). doi:10.1371/journal.pgen.1000899.g006 Table 2. Correlation of SNP effects between genes by aeQTL mapping. The  correlation in total expression levels, cis-acting effects associated with individual SNP alleles may act in opposite directions; the effect of individual SNPs on CDKN2B expression were opposite to effects on CDKN2A and ANRIL expression (which were concordant) in our study. Because cis-acting effects represent only a small proportion of the overall variance in expression of these genes, the effects acting in trans are likely to account for the positive correlation seen in total expression, but this does not diminish the potential biological significance of the cis-acting effects. ANRIL overlaps and is transcribed in antisense with respect to CDKN2B [49]. It is modestly conserved across species [36] and its function is not known, but recent work has demonstrated that antisense transcription from CDKN2B downregulates CDKN2B expression in cis through heterochromatin formation [50]. This is consistent with our observation of an inverse effect of SNPs on ANRIL and CDKN2B expression. By contrast, CDKN2A and ANRIL showed positive correlations for both allelic and total expression in our study. CDKN2A and ANRIL do not overlap, but are transcribed divergently from transcription start sites separated by just 300 base pairs. Although the ANRIL promoter is currently not characterised, it may share promoter elements with CDKN2A and the resulting co-regulation could account for the positive correlation in expression we observed for these genes, similar to that described at other sites [51]. In this context, inhibition of CDKN2B expression by ANRIL would enable a level of crosstalk between CDKN2A and CDKN2B expression, which would be consistent with the inverse cis-acting effect of SNPs on CDKN2A and CDKN2B that we observed. The observation that cis-acting genetic effects played a greater role in expression of ANRIL compared to CDKN2A and CDKN2B (20% compared to less than 8% and 5% respectively) makes it a good candidate for genetic causation mediated through influences on expression. We compared total expression and allelic expression for the investigation of cis-acting influences on expression. While traditional eQTL analysis assesses the influences of polymorphisms by comparing expression between samples, allelic expression analysis compares the expression levels of alleles within individual samples, making it more robust to influences that affect both alleles such as age, gender or population stratification. This offers an important advantage for dissecting such cis-acting influences on expression, which although of lesser magnitude than trans-acting influences, may be of biological importance and possibly account for the genetic susceptibility observed in recent GWA studies. For aeQTL mapping we used a novel adaptation of our previously reported methodology [52] to combine multiple transcribed SNPs per gene, which increased the number of informative individuals and the power for detecting cis-acting effects. We demonstrated this approach using two transcribed polymorphisms per gene, but our methodology offers the potential for the inclusion of multiple additional transcribed variants. The results obtained by eQTL and aeQTL mapping were similar, consistent with previous work suggesting that the two approaches identify the same cis-acting loci [42]. However, we demonstrated that aeQTL analysis had substantially greater power than the eQTL approach. Adjusting for trans-acting covariates including age, sex and ethnicity in our eQTL analysis did not substantially alter the results. An influence of age on CDNK2A has been reported [53], but there was little variability in the age of our SA cohort (90% of whom were between the ages of 18 and 30 years). The fact that allelic expression is a more efficient way to identify cis-acting influences on expression has implications for future studies investigating the effects of SNPs on expression at other loci, for example for the hundreds of non-coding SNPs correlated with different diseases by recent GWA studies [54].
Allelic expression quantifies the relative contributions of each allele to the mRNA pool irrespective of the absolute mRNA levels, and therefore provides information about transcriptional effects and polymorphisms within the transcript influencing RNA degradation in cis. By contrast, total expression analyses that quantify absolute mRNA levels are also sensitive to posttranscriptional regulatory effects, such as mRNA degradation by microRNAs. In extreme cases tight post-transcriptional regulation could keep total mRNA levels constant irrespective of the contributions of each allele to the total mRNA pool. The fact that the results of eQTL and aeQTL mapping were so similar in our study suggests that the effect of regulation at the posttranscriptional level is limited, although regulation of CDKN2A expression by a microRNA has been described [55]. In general, although allelic expression is a robust method for mapping sites influencing expression in cis, investigation of total expression and other intermediate phenotypes such as protein levels or protein activity will provide complementary information that contributes to fully understanding the phenotypic effects of cis-acting polymorphisms. It would be desirable to determine whether the significant associations with mRNA expression observed for CDKN2A and CDKN2B are confirmed at the protein level.
Although we had hoped to use trans-ethnic fine-mapping to refine the associations with expression, the results of aeQTL mapping were in fact very similar in the SA and Caucasian populations. This replication in a separate cohort strongly supports the validity of our findings and enabled us to perform a combined analysis of the two cohorts. This approach of pooling data from ethnically-divergent populations has been previously shown to increase the power to detect influences on expression that are shared across populations [42,56]. The principal difference we identified between the two populations was for the SNPs associated with type II diabetes. The lead diabetes SNP rs10811661 was correlated with ANRIL underexpression in the Caucasian cohort, but not in the SA population, despite greater power to detect effects in that cohort. This may reflect differences in LD between the populations, but suggests that rs10811661 may not itself be the causal variant influencing diabetes susceptibility through effects on ANRIL expression. Studies to determine whether this SNP is associated with diabetes in populations of African origin would be of interest.
The power of our analyses to detect differences in expression enabled us to adjust for the effects of individual SNPs. Using this we were able to demonstrate that expression, and therefore probably disease predisposition, is independently influenced by multiple sites and that the observed effects cannot be explained by a single polymorphic site. From our analysis we cannot exclude the existence of rare variants with large effects, but previous resequencing studies in this region did not find rare variants associated with disease phenotypes [2,3]. We are unable to say whether the individual SNPs for which we found associations are the actual 'causal' variants responsible for the effects on expression, or if the association simply reflects linkage disequilibrium between these SNPs and the causative polymorphisms. Although fine mapping studies often purport to identify causal variants, in the context of complex diseases characterising the pathways involved in disease predisposition may be more important. This is of particular interest for these genes where variation in expression is mostly due to trans effects which may be substantially influenced by non-genetic factors, raising the prospect that it may be amenable to therapeutic modulation. The putative causal variants rs10757278 and rs1333045 previously associated with altered ANRIL expression [36,37] were significantly associated with reduced ANRIL expression in vivo in our analysis, but their effects were relatively modest compared to other SNPs in the region and adjustment for the effect of these SNPs accounted for only a small proportion of the effect observed at other SNPs. The maximum changes in expression associated with individual SNPs were substantial, up to 2-fold for ANRIL, but we were also able to detect effects of much smaller magnitude; the minimum significant effect was associated with just a 1.05-fold change in expression. Although the associations of SNPs with expression that we observed were statistically highly significant, we cannot say what impact such effects on expression have on disease risk. However, even small differences in gene expression due to genetic factors that are present throughout an individual's lifetime could contribute to differences in common late-onset phenotypes such as CAD and diabetes, and the effects may be even greater in tissues related to disease.
We examined in vivo expression in primary cells rather than in transformed cell lines. Although cell lines have been extensively used to investigate cis-acting influences on expression [56,57], patterns of expression may be altered in immortalised cells, particularly for genes such as these that are associated with senescence and cell-cycle regulation. Furthermore, widely used cell lines are pauciclonal or monoclonal [58,59] and since a significant proportion of human genes exhibit random patterns of monoallelic expression within single clones of cell lines [60], cisacting effects in these cells are unlikely to be representative of polyclonal cell populations in vivo. Previous studies have delineated the promoters and other elements regulating CDKN2A/ARF and CDKN2B expression using reporter assays [26][27][28][29][30]. Such studies are valuable to identify causative polymorphisms, but since they examine the effects on expression outside of the normal haplotype, chromatin and cellular context their findings require confirmation by in vivo studies [34,61]. Our analysis confirmed that polymorphisms in upstream regulatory elements identified by in vitro assays were significantly associated with cis-acting effects on expression in vivo, but we also demonstrated that other loci located up and downstream were associated with effects on expression of similar or even larger magnitude. These data highlight the complexity and multiplicity of sites influencing expression in the region. The assays we used to investigate CDKN2A expression also included the ARF transcript variant. This gave the possibility to detect sites influencing expression of both transcripts, and we were able to detect effects of SNPs in both the CDKN2A and ARF promoter regions, although differential effects of loci on individual transcripts cannot be distinguished using this approach.
All of the SNPs in the region associated with disease in GWA studies were associated with influences on ANRIL expression, suggesting that modulation of ANRIL expression may mediate susceptibility to these phenotypes. SNPs in the CAD core risk haplotype region [35] that are most strongly associated with CAD in GWA studies were associated with reduced ANRIL expression, but other SNPs associated with CAD which lie outside of the core risk haplotype showed independent and stronger associations with ANRIL underexpression. This may reflect differences in the relative importance of particular sites in the tissues responsible for the association with CAD. Indeed, the patterns of association we have observed in peripheral blood in healthy individuals may differ from those in primary disease tissues. Similarly, differences in the relative contribution of each SNP to modulation of expression in the tissues crucial for the pathogenesis of the different conditions could explain why particular diseases are associated with different subsets of SNPs that influence ANRIL expression. Recent work also suggests that ANRIL has multiple transcripts, which may be differentially expressed between tissues [36,38]. Confirmation of our findings in tissues relevant to each disease and for different ANRIL transcripts would therefore be desirable, although for CAD and other complex diseases the cell populations responsible for mediating disease susceptibility are unknown and may be inaccessible. Although tissue specificity of cis-acting influences is well documented, variation in cis-acting effects is primarily explained by genetic variation, with allele-specific expression at most SNPs being the same between tissues in the same individual [62]. Analysis of expression in blood is therefore likely to give biologically relevant information despite the fact that this may not be the tissue in which influences on expression actually mediate disease susceptibility.
Previous genomewide expression analyses using microarrays and immortalised cell lines did not identify association of CDKN2A and CDKN2B expression with markers in this region, although they did not examine ANRIL expression [56,57]. However, two recent studies specifically examining expression in the chromosome 9p21 region in primary cells reported associations between CAD risk SNPs and gene expression in blood [36,37]. Jarinova et al found significant association of CAD risk variant rs1333045-C with ANRIL expression, but not with CDKN2A or CDKN2B expression [36]. Liu et al reported that a different CAD risk allele rs10757278-G was associated with reduced expression levels of CDKN2A, CDKN2B, and ANRIL, but in the same study found no correlation for five other SNPs tested, including two additional SNPs associated with CAD (rs518394 and rs564398). They also found no association for two SNPs associated with diabetes (rs10811661 and rs564398), the frailty risk SNP rs2811712, and a melanoma risk SNP (rs11515) [37]. We demonstrated that CAD risk SNPs rs1333045-C and rs10757278-G both correlate with ANRIL underexpression, but found no correlation of these SNPs with CDKN2A or CDKN2B expression. However, we identified highly significant influences on expression associated with other SNPs for which Liu et al found no association (rs10811661 with CDKN2A and ANRIL; rs564398 with ANRIL; rs2811712 with CDKN2B; rs11515 with CDKN2A and CDKN2B). These findings are likely to reflect the greater power of our analysis for detection of cis-acting effects due to the larger sample size and increased sensitivity of our aeQTL mapping approach.
The finding that disease associated SNPs are all associated with ANRIL expression suggests that ANRIL plays a role in influencing disease susceptibility. Although little is known about the targets of ANRIL, its effects may be mediated through antisense transcription regulation of CDKN2B in the tissues critical for the pathogenesis of the different diseases. The observation that the effects of sequence variants acting in cis were stronger for ANRIL than for CDKN2B may reflect selection pressure against variants that have substantial direct effects on the expression of critical genes. CDKN2A, ARF and CDKN2B are cell cycle regulators and are plausible candidates for involvement in the pathogenesis of the diseases for which we found SNP associations with ANRIL. Mutations involving these genes are well documented in glioma [63,64] and melanoma [49,65,66]. Overexpression of CDKN2A and CDKN2B in murine models is associated with pancreatic islet hypoplasia and diabetes [67,68], and there is also emerging evidence that vascular cell senescence involving these pathways is involved in the pathogenesis of atherosclerosis [69,70].
Our data show that multiple independent sites in the chromosome 9p21 region influence CDKN2A, CDKN2B and ANRIL expression. SNPs associated with disease in GWA studies are all associated with ANRIL expression, indicating that modulation of ANRIL expression mediates susceptibility to a variety of conditions.

Participants
Peripheral blood for DNA and RNA analysis was collected from anonymous adult volunteers in two cohorts: 310 SA blood donors and 177 British Caucasians from north-east England. The selfreported ethnicity of the SA cohort was: 200 Cape mixed-ancestry; 67 African black; 19 Indian; 10 white; 4 other/unknown. 42% were male, with median age 20 years (range 17-60, lower quartile 19, upper quartile 23). In the Caucasian cohort, 50% were male, with median age 63 years (range 25-101, lower quartile 51, upper quartile 69).

Ethics statement
The study complies with the principles of the Declaration of Helsinki. Informed consent was obtained from all participants and the study was approved by the Newcastle and North Tyneside Local Research Ethics Committee and the University of Cape Town Faculty of Health Sciences Research Ethics Committee.

DNA and RNA extraction and cDNA synthesis
For the South African samples DNA was extracted using a phenol/chloroform method from 4ml of peripheral blood in EDTA collected at the time of the RNA sample. For the British samples, DNA was obtained from the RNA solution prior to DNase treatment.
RNA was extracted from 2.5ml of peripheral blood collected using the PAXgene system (Qiagen) following the manufacturer's protocol and was DNase treated using RQ1 RNase-Free DNase (Promega). For AEI measurements, approximately 2mg of total RNA was reverse transcribed and eluted in 20ml, using SuperScript VILO cDNA Synthesis Kit (Invitrogen) for the SA samples and SuperScript III First-Strand Synthesis System for RT-PCR (Invitrogen) for the British samples. For real-time PCR measurements, 500ng of total RNA was reverse transcribed using High Capacity RNA-to-cDNA Master Mix (Applied Biosystems) and eluted in 20ml.

Selection of transcribed SNPs for allelic expression analysis
Using the NCBI Entrez Gene database (http://www.ncbi.nlm. nih.gov/, 28/01/08), transcribed SNPs with expected heterozygosity .0.2 in the HapMap CEU population were selected as suitable candidates for assessment of allelic expression. Transcribed polymorphisms in ANRIL, which was not annotated in the databases at the time of the design, were identified by comparing the reported mRNA sequence [49] with NCBI dbSNP. Transcribed SNPs selected using these criteria were: rs3088440 and rs11515 in exon 3 of CDKN2A; rs3217992 and rs1063192 in exon 2 of CDKN2B; rs10965215 and rs564398 in exon 2 of ANRIL. The two CDKN2A SNPs are also present in ARF, allowing the assessment of cis-acting influences on both of these transcripts. Another SNP rs10738605 in exon 3 of ANRIL also satisfied these criteria but was subsequently excluded due to poor performance of the assay.

Genotyping
Multiplex SNP genotyping was performed by primer extension and MALDI-TOF mass spectrometry using iPLEX Gold technology from Sequenom (Sequenom Inc, San Diego, USA). SNP assays were designed using Sequenom's RealSNP (www. RealSNP.com) and MassARRAY Assay Design v3.0 Software (multiplex details and primer sequences available in Table S4). PCR was performed using 20ng of DNA in a 10ml reaction volume for 35 cycles using standard iPLEX methodology. Spectra were analysed using MassARRAY Typer v3.4 Software (Sequenom). Spectra and plots were manually reviewed and auto-calls were adjusted if required. Positive and negative controls were included. Individual samples with low genotype call rates (,80%) and SNP assays with poor quality spectra/cluster plots were excluded. Correspondence to Hardy-Weinberg proportions was checked for each SNP.

Measurement of allelic expression ratios
PCR primers for the selected transcribed SNPs were designed using Primer3 (v.0.4.0) software [78]. CDKN2A primers span exons 3-4 and include both transcribed SNPs (rs3088440 and rs11515) in the same amplicon. ANRIL primers span exons 1-2 and include both transcribed SNPs (rs10965215 and rs564398) in the same amplicon. For CDKN2B, separate primer pairs for transcribed SNPs rs1063192 and rs3217992 were designed entirely within exon 2 (due to the distance of transcribed SNPs from the exon boundary).
Quantification of the allelic expression ratio was performed by primer extension and MALDI-TOF mass spectrometry using iPLEX Gold with similar parameters to the genotyping assay. Spectra were analysed using MassARRAY Typer v3.4 Software (Sequenom) and allelic ratios were estimated as the ratios of the area under the peak representing allele 1 to that representing allele 2. Measurements were performed in four replicates using 50ng of cDNA template. Results from amplification of genomic DNA were used as an equimolar reference to normalise the cDNA values. Genomic normalisation reactions for CDKN2B used the same PCR primers as used for cDNA, but for CDKN2A and ANRIL (where primers were cDNA-specific) separate assays designed to be as close as possible in size and location to the cDNA primers were used. Primer sequences are shown in Table S3. For some assays the allelic ratios measured in gDNA ratios did deviate from a 1:1 ratio, as shown in Table S4, confirming that allelic ratios in cDNA required correction for assay bias. However, as expected the gDNA ratios for each assay were relatively homogeneous with little inter-individual variability compared to cDNA ratios ( Figure S1 and Figure S2). We compared the results of expression mapping using two different normalisation strategies in the SA cohort: normalising to a mean population normalisation factor versus normalising each individual's cDNA to their own gDNA ratio. There was no difference in the results obtained using these two normalisation strategies, as shown in Figure S8. The mean gDNA ratios for each assay were the same in the SA cohort and a sample of Caucasian individuals (no significant difference using a two sample t-test), and we therefore used the mean gDNA ratios for normalisation of all samples.
The appropriateness of genomic normalisation ratios and linearity of the AER response were checked by mixing PCR products from individuals homozygous for the minor and major alleles in varying ratios (8:1, 4:1, 1:1, 1:4, 1:8) and using these as template for the allelic expression assays. These experiments confirmed that allelic expression showed a linear response and that normalisation ratios obtained using allelic expression assays on a 1:1 mixture of alleles for each SNP correspond to normalisation ratios obtained from genomic DNA (Table S4 and Figure S9).
Allelic expression ratios for the two transcribed markers in each gene were highly correlated (CDKN2A r = 0.68, p = 1.7610 23 ; CDKN2B r = 0.80, p = 1.7610 212 ; ANRIL r = 0.90, p = 1.0610 226 ; all genes combined r = 0.96, p = 3610 261 ) as shown in Figure S5; we therefore used a novel approach of combining allelic ratios from the two transcribed markers in each gene to increase the number of informative heterozygotes.

Relative quantification of total gene expression using real-time PCR
Real-time PCR reactions were performed using TaqMan gene expression gene expression probes and reagents (Applied Biosystems) and run on a 7900HT Real-Time PCR System (Applied Biosystems). Commercially available FAM-labelled TaqMan assays were used for CDKN2A exons 2-3 (Hs00923894_m1) and ANRIL exons 1-2 (Hs01390879_m1). A custom FAM-labelled assay was used for exon 2 of CDKN2B. Commercially available VIC-labelled TaqMan assays were used for three reference genes shown to be suitable for normalisation of expression in peripheral blood [79,80]: B2M (4326319E), GAPD (4326317E), and HPRT1 (4326321E). TaqMan assays are validated by the manufacturer to have close to 100% amplification efficiency and assays were selected to quantify the same transcripts as the allelic expression assays. PCR was performed according to the manufacturer's protocol using four replicates, 25ng cDNA template per reaction, and the following multiplex combinations: CDKN2A/B2M, CDKN2B/GAPD, and ANRIL/HPRT1.
Relative total expression was analysed using the comparative cycle threshold (Ct) method. Ct values for each target gene were normalised to the mean Ct value of the three reference genes [79]. Standard errors and variances of measurements for allelic and total expression analyses in the SA population are shown in Table  S5.

Statistical analyses
The association between total expression, as measured by real time PCR, and each of the SNPs was assessed using linear regression of the log transformed normalized expression values on the genotype assuming no dominance or interactions between the effects of different SNPs. The effect of including age, sex, and ethnicity as covariates, as well as excluding outlying individuals as determined by visual inspection (highlighted in Figure S4) were investigated. Self reported ethnicity was included as a categorical variable (categorised as ''Cape mixed-ancestry'', ''black African'', ''white'', ''Indian'', and ''other''). These corrections made no significant difference to the results of eQTL mapping ( Figure S6). All analyses were performed using the corrected data. Plots illustrating the associations between genotype and total expression for selected SNPs are shown in Figure S10.
We analysed allelic expression ratios using an extension of the approach we published previously [52]. We restrict ourselves to biallelic markers, and code one arbitrarily chosen allele as 0 and the other as 1. We designate with g the phase-known and with T the phase-unknown genotype of an individual. The latter can be ascertained through genotyping. We assume that the amount of mRNA originating from a single allele follows a lognormal distribution where the variance does not vary between different alleles. The log of the ratio between the expression levels of both alleles, I, can therefore be assumed to be normally distributed.
For an individual that is heterozygous for m transcribed polymorphisms, m ratios can be determined. We designate the vector of the logarithms of these ratios as I~I 1 ,:::,I m À Á 0 . Under the assumptions above, the components of I are normally distributed with I k *N m k g ð Þ,s k ð Þwhere the means m k g ð Þ depend on the genotype g but the variance s k is genotype independent but may depend on the site used to measure the allelic expression ratio. We model the expected value as a linear combination of the influences of the typed polymorphisms: where a i represents the effect of the i th cis acting markers; and h ik characterizes the phase between transcribed and putative cis acting markers: if the genotype at markers i and k is 11=00 In order to assess the association between a specific SNP and allelic expression, let us consider a set of L individuals. For an individual l (l~1,:::,L) we can measure the unphased genotype T l and a vector representing the log of the allelic expression ratios I l .
Up to a multiplicative constant the likelihood of observing a certain pattern of imbalance in this set of individuals given their genotyping results is: where P gDT l ð Þdesignates the probability of the phased genotypes g given the genotyping results T l and f I l Dg ð Þ describes the density of the distribution of I l given the genotype g. P gDT ð Þ was estimated using the hap procedure from the R-package gap (as deposited in the CRAN archive http://cran.r-project.org/) to phase the genotypes of the two populations separately.
We assume that the allelic expression ratios measured at different sites are conditionally independent given the genotype. Therefore: where f I k l Dg À Á~f I k l ; m k g ð Þ,s k À Á and f I k l ; m k g ð Þ,s k À Á denotes the density of a normal distribution with the individual expression ratio I k l as variate, a genotype dependent mean m k g ð Þ and a variance s 2 k . Therefore L depends on a i (i = 1,…,n) and s k (k = 1,…,m), and maximisation of this likelihood allows assessment of the effects of single or groups of SNPs and to adjust for the effects of other markers by comparing nested models using likelihood ratio tests.
For both total and allelic expression multiple testing was taken into account by calculating the family wise error rate using a Bonferroni correction for the 56 SNPs tested. Associations with family wise error rate below a threshold of 0.05 (corresponding to a nominal P-value of 8.9610 24 , 2log 10 P of 3.05, and 2log 10 FWER of 1.3) were called significant.
From our allelic and total expression data we also estimated the proportion of total expression variance that is due to cis-acting effects. This assumes that cis and trans-acting factors act in an additive manner, do not interact, are independent and that there is random mating, no segregation distortion, and the locus is not subject to imprinting. Given these assumptions, we estimate the variance due to cis acting effects, V c ð Þ, asV V c ð Þ1