Efficient Arsenic Metabolism — The AS3MT Haplotype Is Associated with DNA Methylation and Expression of Multiple Genes Around AS3MT

Arsenic is a very potent toxicant. One major susceptibility factor for arsenic-related toxicity is the efficiency of arsenic metabolism. The efficiency, in turn, is associated with non-coding single nucleotide polymorphisms (SNPs) in the arsenic methyltransferase AS3MT on chromosome 10q24. However, the mechanism of action for these SNPs is not yet clarified. Here, we assessed the influence of genetic variation in AS3MT on DNA methylation and gene expression within 10q24, in people exposed to arsenic in drinking water. DNA was extracted from peripheral blood from women in the Argentinean Andes (N = 103) and from cord blood from new-borns in Bangladesh (N = 127). AS3MT SNPs were analyzed with Sequenom or Taqman assays. Whole genome epigenetic analysis with Infinium HumanMethylation450 BeadChip was performed on bisulphite-treated DNA. Whole genome gene expression analysis was performed with Illumina DirectHyb HumanHT-12 v4.0 on RNA from peripheral blood. Arsenic exposure was assessed by HPLC-ICPMS. In the Argentinean women, the major AS3MT haplotype, associated with more efficient arsenic metabolism, showed increased methylation of AS3MT (p = 10−6) and also differential methylation of several other genes within about 800 kilobasepairs: CNNM2 (p<10−16), NT5C2 (p<10−16), C10orf26 (p = 10−8), USMG5 (p = 10−5), TRIM8 (p = 10−4), and CALHM2 (p = 0.038) (adjusted for multiple comparisons). Similar, but weaker, associations between AS3MT haplotype and DNA methylation in 10q24 were observed in cord blood (Bangladesh). The haplotype-associated altered CpG methylation was correlated with reduced expression of AS3MT and CNNM2 (rs = −0.22 to −0.54), and with increased expression of NT5C2 and USMG5 (rs = 0.25 to 0.58). Taking other possibly influential variables into account in multivariable linear models did only to a minor extent alter the strength of the associations. In conclusion, the AS3MT haplotype status strongly predicted DNA methylation and gene expression of AS3MT as well as several genes in 10q24. This raises the possibility that several genes in this region are important for arsenic metabolism.


Introduction
Arsenic is a widespread environmental pollutant with strong adverse effects in humans, including increased incidence of cancer and effects on skin, respiratory tract, liver, and immune function [1], [2], [3], [4], [5]. Inorganic arsenic is efficiently absorbed in the gastrointestinal tract and is metabolized in the body by a series of reduction and methylation reactions, producing methylarsonic acid (MMA) and dimethylarsinic acid (DMA), both of which are excreted in the urine [6]. In humans, efficient methylation from inorganic arsenic to DMA is associated with a high rate of arsenic excretion in the urine [6]. Incomplete arsenic metabolism, with higher urinary inorganic arsenic and MMA, but lower DMA, seems to be a marker of increased susceptibility to arsenic-related diseases, including cancer [7], [8].
The main methyltransferase in arsenic metabolism is arsenic (+3 oxidation state) methyltransferase (AS3MT), which can methylate both inorganic arsenic and MMA [9], although its affinity for these two substrates may differ. The AS3MT gene is located in chromosome band 10q24, contains 11 exons and spans 32 kilobases [10]. Real time-polymerase chain reaction (PCR) analysis of mature rat tissues detected AS3MT expression in the heart, adrenal gland, urinary bladder, brain, kidney, lung, and liver [9].
Different human populations have varying efficiencies of arsenic metabolism and these efficiencies may reflect underlying genetic differences. For example, compared to other populations, indigenous populations in the Andes demonstrate a uniquely proficient arsenic metabolism, with low urinary excretion of MMA and high excretion of DMA [11]. Moreover, this efficient-metabolizing phenotype was recently shown to be determined by genetic factors; eight single nucleotide polymorphisms (SNPs) (seven non-coding) in AS3MT had strong effects on arsenic metabolism in the population living in the Argentinean Andean highlands, and the efficient-metabolizing haplotype was very common (70%) [12]. Six of these SNPs were also shown to influence arsenic metabolism in people in Bangladesh [12], but the haplotype frequency was much lower (17%) than that in the Argentinean population. All but one of the SNPs in the AS3MT efficient-metabolizing haplotype are non-coding; two of the non-coding SNPs were shown to decrease the expression of AS3MT [12]. Still, the mechanisms for alteration of gene expression are unknown; possibly the haplotype is associated with epigenetic changes in DNA methylation that in turn affects gene expression. The aim of the present study was to assess the influence of genetic variation in AS3MT on DNA methylation and gene expression within 10q24. This work was performed in two distinct human populations, in a subgroup from the above-mentioned population from the Argentinean Andean highlands [12] and a mother-child cohort from Bangladesh.

Background Data, AS3MT SNPs in CpG Sites and Influence on AS3MT Methylation
The characteristics of the different study groups in Argentina and Bangladesh are described in Table 1. In Argentina, only women were analyzed (N = 94 for DNA methylation; N = 90 for gene expression) and their median age was 32 years, they had a median BMI of 24 and 42% reported that they chew coca leaves. Their median total urinary arsenic was 188 mg/L (5/95 percentile 16/620). Their characteristics were similar to the full study population (N = 172) ( Table 1). In Bangladesh both newborn boys (N = 62) and girls (N = 65) were examined. Their mothers had a median age of 25, BMI of 20 and their median total arsenic in urine was 68 mg/L (5/95 percentile 20/460).
We next examined the frequencies of the AS3MT haplotype in these two populations and found them to be 69% in Argentina (9% with 0 copies, 44% with 1 copy and 47% with 2 copiesestimated from eight SNPs) and 17% in Bangladesh (71% with 0 copies, 25% with 1 copy and 4% with 2 copies -estimated from five SNPs). The AS3MT haplotype appeared to have a strong effect on the arsenic metabolite pattern in urine, an example in the Argentinean women: increasing copies of the AS3MT haplotype were associated with a higher percentage of DMA (medians 73.0% for 0 copies, 76.0% for 1 copy, and 83.4% for 2 copies; p,0.001 in a general linear model adjusted for U-As). There were no statistically significant differences between the haplotypes in terms of total urinary arsenic (marker of arsenic exposure), age, BMI, and chewing of coca leaves (evaluated in the Argentinean population), or in terms of arsenic exposure during different points during gestation, measured in maternal urine around gestational weeks 6-8 or 30, mothers' age, mothers' BMI, asset score (similar to socioeconomic status), sex of the baby and gestational age (evaluated in the Bangladeshi population) ( Table 1).
One potential effect of SNPs on DNA methylation may come from sequence changes that introduce or remove methylation sites; therefore we examined the SNPs associated with the AS3MT haplotype to determine whether these SNPs altered methylation sites and whether these sites were in CpG islands or shores. In the Argentinean women with the major AS3MT haplotype (constituted by rs7085104 G, rs3740400 C, rs3740393 C, rs3740390 A, rs11191439 T, rs11191453 C, rs10748835 A, and rs1046778 C), seven out of the eight AS3MT SNPs are non-coding [12]. The noncoding rs7085104 (59 of AS3MT) and rs3740400 (intron 1) and the non-synonymous rs11191439 (MetThr exchange at exon 6) SNPs result in new CpG sites (Table S1). The rs7085104 G-and rs3740400 C-alleles introduce CpG sites in the 59region and the rs11191439 T-allele removes a CpG site. Rs7085104 is situated in a CpG shore 181 basepairs (bp) 59 of a 424 bp long CpG island, in which rs3740400 is situated. Rs11191439 was not situated in a CpG shore or a CpG island. Therefore, we found that the alleles associated with rapid arsenic metabolism result in the gain of two CpG sites in CpG islands or shores and loss of one CpG site.
We next examined the methylation status of the potential CpG site in rs3740400 C carriers from Argentina by pyrosequencing. An allele-dose increase was found in methylation for the C-allele (comparing individuals of CA or CC genotypes for the rs3740400 allelic site there was significant difference between the groups, p = 0.031; Fig. S1). This result was expected and is an artefact, because for subjects with one CpG (CA) the methylation level would be half of that for subjects with two CpGs (CC). However, we showed with the analysis of rs3740400 that this potential CpG site was differently methylated depending on AS3MT genotype, supporting the idea that non-coding SNPs may render DNA methylation targets. Nevertheless, the degree of methylation was very low (average 3%) in the CC carriers, indicating a small effect on AS3MT gene expression. Due to the LD within and around AS3MT [12], [23], [30], we therefore decided to make a chromosome-region approach and analyzed many CpG sites in 10q24 in relation to AS3MT haplotype. We analyzed the association of the whole AS3MT haplotype on DNA methylation of AS3MT by array analysis (Infinium HumanMethylation 450K BeadChip). In both the Argentinean and Bangladeshi study groups, the AS3MT haplotype was strongly associated with the methylation status of AS3MT (Table 2; Fig. S2). The strongest association was found for the CpG site cg18534077 in the Argentinean study group where AS3MT haplotype was associated with more methylation (r = 0.63, p = 10 26 , adjusted for multiple testing). An opposite pattern was found for cg15744005 and cg08772003, where the haplotype was associated with less methylation (Table 2). Only the associations for cg15744005 remained significant in both populations after adjustments for multiple comparisons. Cg15744005 was situated at enhancer elements (Table S1). AS3MT cg18534077 contained a SNP (rs7085104) that could potentially influence base extension (Table 2), although this SNP was situated 27 bp away from the CpG site for cg18534077.
Whole-genome Epigenetic Analysis: Correlation of AS3MT Haplotype on the DNA Methylation Pattern of Chromosome 10 To determine whether adjacent genes are also similarly related to the haplotype, we extended the analysis to an 800 kilobp long region of chromosome 10. In the Argentinean study group, the AS3MT haplotype was strongly associated with DNA methylation of genes located 59 and 39 of AS3MT (a region totalling 800 kilobp around the AS3MT gene): CNNM2, NT5C2, USMG5, TRIM8, C10orf26, and CALHM2 (in decreasing order of association, Table 2, Fig. 1, and Fig. S2; full gene names are presented in Table 1). Three of these methylation sites were situated at enhancer elements (Table S1). Three of the top twelve methylation sites (NT5C2 cg09803321, CNNM2 cg11667387, and USMG5 cg18367433) contained SNPs that could potentially influence base extension ( Table 2). Two of these (CNNM2 cg11667387 and USMG5 cg18367433) were situated less than 10 bp from the CpG site.
The AS3MT haplotype was associated with more methylation of two sites (one of borderline significance) in CNMM2 and one site in NT5C2. The AS3MT haplotype was also associated with less methylation of one site each in CNNM2, NT5C2, C10orf26, USMG5, TRIM8, and CALHM2. There was a similar pattern in the Bangladeshi population, where AS3MT haplotype strongly influenced the methylation status of CNNM2 ( Table 2). The directions of associations (i.e., whether the haplotype was associated with more or less methylation of a specific CpG site) were the same in Bangladesh as in Argentina for the other methylation sites, although these associations were not statistically significant in Bangladesh after adjustments for multiple testing.
The DNA methylation of AS3MT correlated with the DNA methylation of other genes (59-39 direction: TRIM8, C10ORF26, CNNM2, NT5C2, USMG5, and CALHM2) within the 800 kb segment of chromosome 10q24 both in Argentina and in Bangladesh (Table S2). We evaluated if the correlation in DNA methylation of genes around AS3MT was particularly strong compared to other regions on chromosome 10, by analyzing windows of 20 consecutive genes and the fraction of CpGs that were correlated between the genes with Pearson correlation  .60.5 (excluding the correlations of CpGs within the same gene). When compared with a permuted (randomized) version of chromosome 10 (green line in plot; Fig. S3) there were some regions that were more correlated than expected. AS3MT (dashed red line) seemed to be in a region with average correlation values.
One mechanism of arsenic toxicity may be through alterations in methylation; therefore, we next tested whether arsenic exposure affected DNA methylation in our samples. We found no clear effect of arsenic exposure on DNA methylation in the studied chromosomal region. In the Argentinean study group, arsenic concentrations in the women's urine were positively correlated with DNA methylation levels of CALHM2 (r s = 0.29, p = 0.012), but not with the other methylation sites that were statistically significantly associated with AS3MT haplotype. In the Bangladeshi study group, arsenic concentration in maternal urine was inversely correlated with CNNM2 CpG sites cg11667387 (r s = 20.23, p = 0.012) and cg00894378 (r s = 20.21, p = 0.019) in cord blood.

Influence of AS3MT Haplotype on the DNA Methylation Pattern of Chromosome 10: Multivariate Models
CpG sites that were statistically significantly correlated (after adjustments for multiple comparisons) with AS3MT haplotype in either Argentina or Bangladesh were further analyzed in univariable and multivariable regression models (Table 3). In total, twelve sites were evaluated; here no further adjustments were made for multiple comparisons. Adjustments were made for total urinary arsenic (natural ln transformed), age and use of coca leaves in Argentina; and for total arsenic in maternal urine (gestational weeks 6-8; natural ln transformed), maternal age and sex of the child in Bangladesh. The results were very similar in the univariable and multivariable models; all sites that were statistically significantly associated with AS3MT haplotype in the univariable analyses remained statistically significant in the multivariable analyses, and the b-values remained the same or changed only marginally.

Whole Genome Expression Data: Associations between DNA Methylation Pattern of Chromosome 10 and Gene Expression (Argentina)
To determine the effect of the altered DNA methylation in the AS3MT haplotype, we also measured expression of AS3MT and adjacent genes in our samples from Argentina. The assay of the AS3MT 39untranslated region (UTR), NT5C2, TRIM8 and USMG5 showed the largest variation in expression between individuals (Table S3). Expression of all genes associated with the AS3MT haplotype was correlated with each other (Table S4), but not as strongly as these genes' DNA methylation was correlated with each other. For the CpG sites positively associated with the AS3MT haplotype, inverse correlations (Table 4) were found with gene expression of AS3MT (39UTR assay; p-values#0.002 or lower for all four CpG sites) and CNNM2 (p = 0.012 for one CpG site; Fig. 2), whereas positive correlations (Table 4) were found with expression of NT5C2 (p#0.014 for three CpG sites) and USMG5 (p#0.003 for all four CpG sites, Fig S4). AS3MT cg18534077 was associated with both AS3MT expression measured with the 39UTR assay (p = 0.002) and with the AS3MT transcript on the array (assay 1771732, p = 0.016).
An opposite pattern was observed for the CpG sites inversely associated with the AS3MT haplotype (Table 4): positive correlations were found with expression of AS3MT (p#0.003 for three CpG sites) and CNNM2 (p#0.022 for three CpG sites), whereas inverse correlations were found with expression of NT5C2 (p = 0.038 for one CpG site) and USMG5 (p#0.002 or lower for all four CpG sites) ( Table 4). This was also observed for the CpG sites in TRIM8, C10orf26 and CALHM2 that were inversely associated with the AS3MT haplotype; however, the correlations were less strong (Table S5) Table 4). Only CNNM2 1663975 remained statistically significant in the multivariable analyses (p = 0.021).  Table 3. Univariable and multivariable regression analyses comprising DNA methylation sites that were significantly correlated with AS3MT haplotype in either Argentina or in Bangladesh.

Discussion
Our results show that the AS3MT haplotype, previously associated with a more efficient arsenic metabolism [12], is strongly associated with the methylation status of AS3MT and multiple surrounding genes within a large region of 800 kilobp in chromosome band 10q24 in two different populations. In line with the general concept of CpG methylation functioning to reduce gene expression [13], the carriers of the AS3MT efficient metabolizing haplotype seemed to have reduced expression of AS3MT and CNNM2. However, the increased CpG methylation was also associated with higher expression of NT5C2, C10orf26, USMG5, TRIM8, and CALHM2. Overall, these associations remained stable and statistically significant also after adjustments for other potentially influential variables and show that AS3MT haplotype is the major determinant for the methylation and expression pattern in this chromosomal region. Our finding of correlations in DNA methylation between genes around AS3MT is not unusual, when we compared to other regions on chromosome 10. However, the strong association of the AS3MT haplotype with the DNA methylation pattern, and to some extent to the expression of genes in this region is striking and may reflect the possibility that the genes participate together in arsenic metabolism, or in a not yet characterized pathway.
The strongest association between the AS3MT haplotype and DNA methylation of AS3MT was found for the CpG site cg18534077, which is located in a CpG shore in the 59UTR of the gene. We evaluated if the closely located SNP rs3740400 was differentially methylated depending on genotype; however, the methylation levels were very low and this SNP is probably not functional in determining the expression status of AS3MT. We previously reported that the SNP rs3740400 is associated with Figure 2. The relationship between CNNM2 DNA methylation and AS3MT gene expression, stratified for AS3MT haplotype. Scatterplot depicting the relationship between DNA methylation (expressed as fraction of methylated CpG) for CNNM2 (X-axis) and gene expression for AS3MT (39UTR; expression relative to the median for individuals with 2 copies of the AS3MT haplotype) (Y-axis), where individuals are stratified for AS3MT haplotype (total N = 48). Outliers are denoted by *; the outlier to the left has a relative gene expression of 22.8 and the outlier to the right has a relative gene expression of 24.9. doi:10.1371/journal.pone.0053732.g002 reduced gene expression of AS3MT [12], but these data indicate that, despite the fact that rs3740400 creates a CpG site, its effect is probably through linkage with other SNPs in the haplotype. It was unexpected that the AS3MT haplotype was associated with reduced AS3MT expression, as this haplotype is associated with more proficient arsenic metabolism and more dimethylated arsenic (DMA) in urine [12]. AS3MT is capable of methylating inorganic arsenic to methylarsonic acid (MMA) as well as MMA to DMA [9], [14]. Inorganic arsenic converted into MMA by AS3MT can be released as MMA, undergo a second round of methylation to DMA, or dissociate as MMA bound to glutathione [14]. Why does the low expression of AS3MT correlate with increased production of DMA whereas the high expression of AS3MT correlates with augmented synthesis of MMA? A possibility is that polymorphisms in other genes linked to the efficient AS3MT haplotype influence the methylation capacity. Clearly, this surprising finding needs to be analyzed further.
The strongest associations with AS3MT haplotype and DNA methylation were found for CpG sites related to CNNM2 (three CpG sites), which in turn were associated with the expression of AS3MT, CNNM2, NT5C2, and USMG5. CNNM2 is a magnesium transporter and widely expressed throughout the body. It is mutated in the rare disorder familial hypomagnesia [15] and CNNM2 SNPs were shown to influence magnesium concentrations in serum [16]. NT5C2 is an enzyme that dephosphorylates noncyclic nucleoside monophosphates to produce nucleosides and inorganic phosphates [17] and its expression in muscles appears important for regulating energy metabolism and glucose transport [18]. Recently, SNPs in NT5C2 and the nearby gene CYP17A1 were found to be associated with fat mass in Japanese women [19]; these SNPs have also been associated with blood pressure [20], [21]. USMG5 was first recognized as a gene product whose mRNA level increased during skeletal muscle growth in rats [22], and more recently it was shown to be important in maintaining the ATP synthase population in mitochondria indicating that USMG5 is also important in cellular energy metabolism [23]. A recent genome-wide association study of the arsenic metabolism phenotype found SNPs near AS3MT to be associated with USMG5 expression [24].
Furthermore, our data showed that DNA methylation of three other genes (TRIM8, CALHM2, and C10orf26) was associated with the AS3MT haplotype. One report shows that TRIM8 is a direct target of p53 and that TRIM8 induces p53 stabilization and promotes the degradation of MDM2, which in turn directs the p53 Table 4. Correlations * and effect estimates (b) { between total urinary arsenic (As), DNA methylation and gene expression in Argentina, as well as the influence of haplotype on gene expression. A. denotes CpG sites that were positively associated with the AS3MT haplotype and B. denotes CpG sites that were inversely associated with the haplotype. *r s and p-value from Spearman correlation. response toward growth arrest and not apoptosis [25]. Under stress conditions, p53 promotes transcription of TRIM8. CALHM2 belongs to a family of transmembrane glycoproteins that has been suggested to modify calcium homeostasis [26], where three members are closely situated on chromosome 10 [25]. The function of C10orf26 is unknown. We did not find any association with other closely or more distantly situated genes to AS3MT, although several CpG sites per gene were present on the array. Due to the metabolism of arsenic through methylation by the same methyl donor as for DNA methylation, alterations of DNA methylation have been suggested as a toxic mechanism of arsenic [27]. Arsenic exposure per se appeared not to strongly modify the methylation or expression of this chromosome region. One speculation is that AS3MT also has other, arsenic-independent, functions in the body and factors related to such a function regulate AS3MT expression.
Some methodological aspects need to be commented upon. Firstly, four of the top methylation sites contained SNPs in and around the CpG site, which could bias the results due to impaired base extension since the probe cannot hybridize properly. The influence is stronger the closer the SNP is to the CpG site. Two SNPs (CNNM2 cg11667387 and USMG5 cg18367433) were situated less than 10 bp from the CpG site and this warrants caution. However, most of the top methylation sites (eight out of twelve sites), including the top site cg03493300, did not include any SNPs nearby the CpG site.
Secondly, haplotype was treated as a quantitative variable due to the fact that earlier studies of ours [12], [28], [29] have shown a strong allele-dose effect in these populations. The allele-dose effect is also clearly shown in the graphs presented in the supplemental material. In addition, analyses conducted using haplotype as a qualitative variable showed very similar results.
Thirdly, the efficient AS3MT haplotype is common in Argentina, but much rarer in Bangladesh. This lack of power in the Bangladeshi population may be a reason to the failure to detect some associations that were present in Argentina but not in Bangladesh.
Fourthly, we have measured DNA methylation in blood, which is a mixture of different cell types with different methylation pattern. We did not have the possibility to sort cells in the blood samples, as this was very difficult to achieve during the field studies. However, we knew from questionnaire data if the study participants had any chronic or acute disease at the time of sampling. These few individuals were excluded from the methylation analysis. Still, the consistency of results between the populations in terms of the associations with haplotype indicates that this likely isn't a concern in those analyses. We also measured gene expression in blood and not in liver, which is the main metabolizing organ for arsenic, and the DNA methylation and gene expression pattern of AS3MT may different in other tissues. Still, there most likely is a similar activity in blood based on the expression (albeit relative low) of the AS3MT gene in blood cells [30]. Obtaining tissue samples from individuals in screens of large cohorts is ethically problematic and extremely cumbersome, e.g. if liver biopsies are to be taken and preserved for analysis in the field of the Andes mountains, in contrast to blood sampling, which makes it the method of choice when applicable.
In the region around AS3MT, linkage disequilibrium (LD) exists, as noted in Argentina and Bangladesh [12], and the size of the LD block was recently characterized to be at least 350 kilobp in populations in Mexico [31]. A large LD block is present in Asian and European populations as well [10], [31]. One can speculate from the fact that the LD block is widespread in different populations, and that we here show a close relationship between haplotype, gene methylation status and to some extent also gene expression in an even larger region around AS3MT, that the LD block contains several genes important for a phenotype, possibly related to arsenic metabolism or to energy metabolism, as suggested from the functional role of several genes in the region.

Study Populations
Main cohort argentina. Participants were women (N = 103) living on the Andean plateau in Northern Argentina and exposed to varying levels of arsenic from their drinking water [32], [33]. Most of the study participants (N = 71) were from the village San Antonio de los Cobres (water arsenic 200 mg/L), with about 5,000 inhabitants, while the remaining participants (N = 32) were from small surrounding villages (with various levels of water As, range 7-73 mg/L). The study individuals were recruited with the assistance of medical personnel, except in the small mining village Tolar Grande, where we went from house to house, explained the project and invited the adults to participate. We included only women, as men were often away from home for work for most of the day, and therefore drank water from different sources. The study area has minimal industrial or traffic pollution. The women in this society rarely drink alcohol or smoke tobacco. 94 women were included in the DNA methylation analyses and 90 women were included in the gene expression analyses, based on availability of high-quality DNA and RNA: 81 women were included in both analyses. Furthermore, 55 of the 103 women were previously analyzed for expression of one transcript in the AS3MT gene [12] and these data were included in the analysis of DNA methylation versus gene expression.
Bangladeshi cohort. In short, our mother-child cohort in rural Bangladesh is nested in a large, randomized, populationbased food and multi-micronutrient supplementation trial, which evaluated nutritional and environmental impacts on pregnancy outcomes and child health [34], [35], [36]. The cohort consists of a sample of 127 women delivering singleton infants at the central Matlab hospital or any of the four connected subcenters during early daytime.

Ethics Statement
Both verbal and written informed consents were provided by all adult study participants. In Bangladesh, mothers' gave verbal and written informed consent for use of their cord blood samples. In Argentina, all study subjects gave verbal and written consent, and informed verbal consent was obtained from the next of kin, caretakers, or guardians on the behalf of minors participants involved in the study (N = 4), while the minors participants themselves gave verbal and written consent. The process was documented in special consent forms. The studies, as well as the consent procedure, were approved by the Ministry of Health in Salta, Argentina, the Ethical Review Committee of ICDDR,B, Bangladesh and the Regional Ethical committee at Karolinska Institute, Sweden. The authors complied with all of the legal requirements pertaining to the locations in which the work was done. The data described in our publication are freely available upon request apart from information that will break the confidentiality rules for the study participants.

Blood and Urine Collection
Argentina. Peripheral blood for DNA extraction was collected in K 2 EDTA tubes (VacuetteH, catalogue nr. sc-359548) and blood for RNA extraction was collected in PAX tubes (Beckton Dickinson, catalogue nr. 762165 Franklin Lakes, NJ). Spot urine samples were collected and processed as described previously [37].
Bangladesh. Cord blood specimens (mixed arterial and venous) were collected in heparin-coated sterile vials (Becton Dickinson, catalogue nr 367869) at the subcenter health clinics at delivery. Spot urine samples were collected in gestational weeks 6-8 and 30 and analyzed for arsenic.

Analysis of Arsenic in Blood and Urine
All analyses were performed in the same laboratory. Exposure to inorganic arsenic was assessed by the sum concentration (referred to as total arsenic) of inorganic arsenic, MMA and DMA in urine. Speciation of arsenic metabolites in urine (Argentina) was performed by high performance liquid chromatography (HPLC; Agilent 1100 series system, Agilent Technologies, Germany) coupled with hydride generation and inductively coupled plasma mass spectrometry (ICPMS) (Agilent 7500ce; Agilent Technologies, Japan), employing adequate quality controls [12]. Arsenic in urine from the Bangladeshi women was measured using hydride generation atomic absorption spectrophotometry. The total arsenic concentrations were adjusted to the mean specific gravity (SG) (1.020 g/mL for Argentina and 1.012 for Bangladesh).

DNA Isolation and Epigenetic Analysis
DNA was isolated using QIAampH DNA Blood Midi kit (Qiagen, catalogue nr 51183). DNA quality was evaluated on a NanoDrop spectrophotometer (NanoDrop Products,Wilmington, DE) and a Bioanalyzer 2100 (Agilent, Santa Clara, CA) and showed good quality (260/280 nm .1.80). DNA was bisulfitetreated using EZ DNA Methylation kit (Zymo, catalogue nr D5001). The SCIBLU facility (Lund, Sweden) used 200 ng bisulfite-treated DNA for hybridization to the Infinium Human-Methylation 450K BeadChip (lllumina,catalogue nr WG-314-1003). The rs3740400 pyrosequencing assay was evaluated with individuals in Argentina (N = 94). The assay was designed using PyroMark Assay Design 2.0 (Qiagen) software encompassing rs3740400 allelic site with two additional CpG sites. The forward primer was biotinylated. PCR was performed using PyroMark PCR reagents (Qiagen, catalogue nr 972807). The PCR product was purified using Streptavidin Sepharose High Performance beads (Amersham Biosciences, catalogue nr 17-5113-01). Pyrosequencing was done using the PSQ HS96 Pyrosequencing System (Qiagen). The presence of SNPs in probe or query sites was evaluated according to http://www.rforge.net/IMA/snpsites.txt, which contains a list of SNPs in and around the CpG site.

Genotyping
The AS3MT haplotype analyzed is constituted by (59 to 39 direction) rs7085104 G, rs3740400 C, rs3740393 C, rs3740390 A, rs11191439 T, rs11191453 C, rs10748835 A, and rs1046778 C. All SNPs, but one (rs11191439), are non-coding. Haplotypes were inferred by PHASE [38]. Argentina: Genotyping for these eight SNPs in AS3MT was performed in blood DNA using Sequenom technology (Sequenom Inc.,San Diego, CA) [12]. Genotype data was missing for one individual. Bangladesh: rs7085104, r3740400, rs3740393, rs3740390 and rs1046778 were genotyped by Taqman allelic discrimination assay (ABI 7900,catalogue nrs. C___3284563_10, C__27510174_10, C__25804287_10, C__27510172_10, C___9596558_10). Genotype frequencies are presented in Table S2. Previous analyses of the LD pattern in another study group from Bangladesh showed that these five SNPs provide a good proxy of the full eight-SNP haplotype [12].

Gene Expression Analysis
Argentina. RNA was extracted using the PAXgene Blood RNA kit (PreAnalytiX, catalogue nr 762174). RNA concentration and purity were evaluated on a NanoDrop spectrophotometer and RNA integrity (RIN) was evaluated on a Bioanalyzer 2100; the results showed good RNA quality (RIN.7.5). DirectHyb HumanHT-12 v4.0 (Illumina, catalogue nr. BD-103-0204) was used for gene expression analysis, at the SCIBLU facility of Lund University. Gene expression data were filtered from background signals with the BioArray Software Environment (BASE) software [39]. In this study, we only evaluated gene expression for the genes that demonstrated an association between AS3MT haplotype and DNA methylation pattern. Additionally, we used previously generated data from quantitative real time PCR analysis for expression of AS3MT (covering the 39untranslated region (UTR) (Applied Biosystems Assays-by-design) as further described in Engstrom et al [12]. This was analyzed for 55 individuals; these subjects were matched for age, weight, body mass index (BMI), arsenic in urine, with a more narrow range of arsenic in urine than the individuals included in the HumanHT-12 v4.0 assay (Table 1). Relative gene expression for the AS3MT 39UTR assay was evaluated according to individuals with 2 copies of AS3MT haplotype, for which the median gene expression was set to 1 [12].

Analysis of SNPs or DNA Methylation Sites in CpG Sites, Shores, Shelves and Islands or Enhancers
Whether the SNPs in the AS3MT haplotype introduced or removed a CpG site was evaluated by investigating the sequence around the SNP [40]. Whether the SNP or DNA methylation site was situated in a CpG shelf, shore or island was evaluated by using Emboss CpGPlot, which detects CpG-rich areas [40]. Enhancer elements for DNA methylation sites were defined according to annotations by Illumina (GPL-13534, available at http://www. ncbi.nlm.nih.gov/geo/query/acc.cgi?acc = GPL13534).

Statistical Analysis
Methylation levels are specified by beta (b)-values that represent the fraction of methylation in the sample and range from 0 (unmethylated) to 1 (methylated). b-values were extracted from BeadStudio software (lllumina, San Diego, CA, USA). Missing values were imputed using k-nearest neighbour imputation (k = 10). 0.1% of the values were missing in both Argentina and Bangladesh. Potentially, missing value imputation may increase the correlation in the data. In our case, with so few missing values, this does not shift the data. Principal component analysis (PCA), which capture the major directions of variation in the data, was performed in order to evaluate the influence of technical and biological variables on DNA methylation. For the PCA, we employed the universally applicable singular value decomposition (SVD). The PCA showed that methylation levels were correlated with analysis plate in Bangladesh (two plates were used, while only one plate was used in Argentina). We removed this association from the data by the following procedure: for each methylation site a linear model was fit with the analysis plate as regressor. The residuals of the linear model become the new data for the methylation levels. In this way, the levels of each of the methylation sites were unrelated to analysis plate. No other variables had a major impact on general DNA methylation.
The associations of AS3MT haplotype with methylation levels were determined using Pearson correlation (haplotype was used as a numerical variable according to the number of copies, 0-2). Resulting p-values from the 450 K array were corrected for multiple testing (adjusting for 450,000 comparisons) by the Benjamini-Hochberg method to obtain false discovery rates (FDR). In this study we considered FDR ,0.05 to indicate significant findings.
The top methylation sites were then further evaluated by building multivariable models. First a univariable model was performed, with each methylation site used separately as a dependent variable, and number of copies of AS3MT haplotype was employed as an independent continuous variable. We also evaluated a model using number of copies of AS3MT haplotype as a categorical variable; however the results were very similar, indicating an allele-dose effect, and thus haplotype was used as a continuous variable. Multivariable models were built by evaluating, in univariable models, the association between each methylation site and other variables potentially influencing DNA methylation. The variables evaluated in Argentina were urinary arsenic, age (years), BMI, and use of coca leaves, while the variables evaluated in Bangladesh were arsenic measured in maternal urine around gestational weeks 6-8 or 30, birth weight, mothers' age (years), mothers' BMI, asset score (similar to socioeconomic status), sex of the baby and gestational age. Variables significantly associated with a methylation site were included in the multivariable models (Table 3). We also evaluated if the haplotype groups were significantly different regarding the potentially influential variables, in order to evaluate stratification according to haplotype (Table 1). This was evaluated with the nonparametric Kruskal-Wallis test.
Argentina. The methylation sites that demonstrated a positive association with AS3MT haplotype were further analyzed for association with gene expression as well as with total arsenic in urine. The association between AS3MT haplotype and gene expression was also. These analyses were evaluated by Spearman's correlation. Furthermore, a linear regression analysis was performed with gene expression as dependent variable, and methylation level as independent variable, where adjustments were made for total arsenic in urine and RIN. Thereafter, a linear regression analysis was performed, with gene expression as dependent variable, and copies of AS3MT haplotype, total arsenic in urine and RIN (to adjust for variations in RNA quality) as independent variables. The statistical analyses were completed by the use of SPSS 18.0 (SPSS Inc, Chicago, IL USA) and R version 2.14.2. Statistical significance refers to P,0.05 (two-tailed). Figure S1 Influence of AS3MT rs3740400 on DNA methylation. Genotypes CA and CC are presented, since the genotype AA does not carry any potential CpG site.      Table S5 Correlations (Spearman ranks) between urinary arsenic (As), DNA methylation (horizontally) and gene expression (vertically) for TRIM8, C10orf26, CALHM2, and USMG5 (all CpG sites were negatively associated with the AS3MT haplotype). (DOCX)