Self-Reported Ethnicity and Genetic Ancestry in Relation to Oral Cancer and Pre-Cancer in Puerto Rico

Background Hispanics are known to be an extremely diverse and genetically admixed ethnic group. The lack of methodologies to control for ethnicity and the unknown admixture in complex study populations of Hispanics has left a gap in understanding certain cancer disparity issues. Incidence rates for oral and pharyngeal cancer (OPC) in Puerto Rico are among the highest in the Western Hemisphere. We conducted an epidemiological study to examine risk and protective factors, in addition to possible genetic susceptibility components, for oral cancer and precancer in Puerto Rico. Methodology/Principal Findings We recruited 310 Puerto Rico residents who had been diagnosed with either an incident oral squamous cell carcinoma, oral precancer, or benign oral condition. Participants completed an in-person interview and contributed buccal cells for DNA extraction. ABI Biosystem Taqman™ primer sets were used for genotyping 12 ancestry informative markers (AIMs). Ancestral group estimates were generated using maximum likelihood estimation software (LEADMIX), and additional principal component analysis was carried out to detect population substructures. We used unconditional logistic regression to assess the contribution of ancestry to the risk of being diagnosed with either an oral cancer or precancer while controlling for other potential confounders. The maximum likelihood estimates showed that study participants had a group average ancestry contribution of 69.9% European, 24.5% African, and 5.7% detectable Native American. The African and Indigenous American group estimates were significantly higher than anticipated. Neither self-identified ethnicity nor ancestry markers showed any significant associations with oral cancer/precancer risk in our study. Conclusions/Significance The application of ancestry informative markers (AIMs), specifically designed for Hispanics, suggests no hidden population substructure is present based on our sampling and provides a viable approach for the evaluation and control of ancestry in future studies involving Hispanic populations.


Introduction
According to 2000 United States Census data, 80.5 percent of Puerto Ricans considered themselves White, and 19.5 percent reported as Non-White; 8.0 percent claimed African origin (probably from West African ancestral groups, including Ibo and Yoruba people), and only 0.4 percent of Census respondents considered themselves descendants of the Puerto Rican Tainos [1]. The Tainos were a Native American tribe whose members populated the island before the start of the historical Hispanic influence [2][3].
It is documented that throughout the era of the Spanish empire, Puerto Ricans lived under a segregated social structure that was a construct of limited admixture of the three main ancestral population groups [2][3]. The existence of these social structures was recently examined by modern genomic testing technology among healthy Puerto Ricans [4].
Incidence rates for oral and pharyngeal cancer (OPC) in Puerto Rico are among the highest in the Western Hemisphere [5][6][7][8][9]. Further, ethno-regional differences have been reported in which OPC incidence and mortality rates are much higher among Hispanic men living in New York State than among US Hispanic males as a whole [10]. A possible link between ancestral genetic factors and the epidemiological evidence regarding OPC risk among Hispanics has not been investigated previously.
The use of ancestral informative markers allows for the identification of genetic patterns associated with population substructures and can be used to explore whether such markers are related to the risk of oral squamous cell carcinoma or its associated premalignant lesions. To examine risk and protective factors among the high incidence population of Puerto Rico we carried out our study supported by the United States National Institutes of Health. One of the main aims of the research project was to identify genetic susceptibility factors influenced by ethnographic differences in the Puerto Rican population. The goal of this analysis was to summarize associations between ethnicity and the risk of both oral premalignant lesions and squamous cell carcinoma among participants in our epidemiological study in Puerto Rico.

Ethics statement
The research project was approved by the Institutional Review Boards at the University of Puerto Rico, Medical Sciences Campus; New York University, and the University of New Mexico.

Study participants
Three hundred and ten participants diagnosed with either a benign oral condition, oral hyperkeratosis or epithelial hyperplasia (HK/EH), oral epithelial dysplasia (OED), or oral squamous cell carcinoma [mean age: 59.13 (SD612.75) years] were enrolled from 6 pathology laboratories in Puerto Rico (see Table 1). Participants provided written consent for being part of the research project and donated biological samples for DNA extraction. They also gave permission to review their oral tissue biopsy materials and corresponding H&E stained slides. Based on the latter, experienced, board-certified oral pathologists reviewed and validated each diagnosis.
Participants completed a detailed epidemiologic questionnaire that assessed self-identified race/ethnicity (White, Black, Mestiza and other), lifestyle, nutritional factors (e.g. fruit and vegetable consumption), known risk factors (including alcohol consumption, tobacco use), and oral hygiene practices.

Biological sample collection and genotyping
Buccal cell samples were collected during the period November 2003 through May 2008 from participants using six cytological brushes inside the mouth at selected sites and by subsequent mouthwash rinses for additional buccal cell collection. Participants swished with 10 ml of Scope mouthwash and then with 8 ml of distilled water to which we immediately added 2 ml of 70% ethanol to prevent bacterial and fungal growth during shipping. All biological samples were mailed to the University of New Mexico where genomic DNA was extracted using the Puregene DNA Buccal Cell Kit (Gentra Systems, Minneapolis, MN). All samples were processed according to the manufacturer's instructions. An average of 70-80 mg of primary source of genomic DNA was obtained and quantified using a NanoDrop spectrophotometer (Thermo Fisher Scientific Inc, Rockford, IL). Optical density was measured at 260 and 280 nanometers to assess DNA yield and quality. The samples were stored in 280uC freezers prior to genotyping.
Genotype results for 12 ancestry informative markers were generated using TaqMan7900 Real-Time PCR System (Applied Biosystems Inc., Carlsbad, CA) and quality assured primer sets of TaqMan SNP Genotyping Assays.

Ancestry informative markers (AIMs)
Ancestry informative markers (AIMs) were selected based on previously published information for Hispanics [11][12][13]. Ancestry informative markers are single nucleotide polymorphisms distributed randomly across the human genome and are helpful in discriminating the genetic contributions of main parental ethnic groups. The selected AIMs were relevant to Puerto Rican parental populations: Africans, Europeans and Indigenous Americans ( Table 2). They represent Indigenous American -European ancestry, European-African ancestry, and Native American -African ancestry differences. The allele frequency difference, called delta (d) values between two parental groups, is based on frequencies of the homozygous wild allele in one parental population compared to the other ancestral population's same allele frequency [13]. In addition to the literature data, the presence and frequencies of the homozygous wild allele for all twelve markers were validated using NCBI website HapMap data to ensure an accurate and updated selection of markers. Table 2 shows in detail the ancestry informative markers used in this study.

Statistical analysis
First, we determined frequencies of self-reported ethnicity among study participants. Next, we generated genetic admixture estimates to create admixture values using LEADMIX 1.0. After genotyping and allele frequency estimation, LEADMIX 1.0 (Likelihood Estimation of ADMIXture) software was used to calculate the contribution of the three main ancestral groups represented in our study sample. LEADMIX is a Fortran computer program estimating maximum likelihood for admixture proportions and genetic drift using population data collected on representative genetic markers. The software was created by Wang at the University of Oxford, Institute of Zoology, London, UK [14]. After registration, the software was downloaded, and the input file was created containing expected and detected allele frequencies of the applied 12 ancestral markers. Group-specific ancestry estimates were generated at the University of New Mexico Center for Advanced Computing core facility using 'custom-designed' supercomputer resources available for this project (id2010008).
Then, we compared disease and diagnosis group specific frequencies of each ancestral genetic marker to the expected allele frequencies of AIMs published in the literature. The comparison was made based on the expected frequency values of the wild type allele in each parental population [13]. We used Hardy-Weinberg equilibrium testing to estimate deviation from the expected frequency distributions. Two-sided p-values were used. Unconditional logistic regression was used to examine whether the genotype of each SNP was predictive of disease status. Finally, principal component analysis (PCA) was used to confirm that all of the 12 markers contributed evenly to the genetic structure of our population.

Results
Based on the questionnaire responses, self-identified ethnicity was not different among people in the different diagnostic categories. Table 1 shows the four main disease categories and the number of participants in each category. Only one individual was detected in the OED group who was self-identified Black; other disease diagnoses did not show remarkable aggregation or significant deviation by ethnicity.
The maximum likelihood estimates calculated by LEADMIX software showed that our study participants had a group average of 69.89% European, 24.45% African, and 5.66% detectable Native American ancestry contribution.
When we individually examined the parental allele frequencies of AIMs among our study participants, the allele frequencies were significantly different in our Puerto Rican study participants compared to the parental groups of Europeans, Africans and Indigenous Americans; however, we did not detect any ancestry markers that would explain a significant portion of any of the disease diagnoses (Table 3).
Using principal component analysis (PCA) to detect ethnic subgroups within our sample population, we did not identify any ancestry marker that showed a statistically significant contribution to an underlying population substructure. Unconditional multiple logistic modeling was used to assess the contribution of the 12 ancestry markers and each of the main parental groups (White European, Black, and Indigenous Americans) to the risk of being diagnosed with either an oral cancer or precancer (relative to that of a benign oral condition) while controlling for other potential confounders, including age, gender, education, smoking, and alcohol consumption. In each instance, the estimated odds ratios were relatively weak and none achieved statistical significance (Table 4).

Discussion
The population in Puerto Rico is historically and anthropologically admixed and segregated at the same time thereby providing an opportunity to investigate whether an underlying, undetected population substructure could affect the risk of oral cancer or precancer on the island. This analysis serves as a basis for our further genetic susceptibility research including variants in immune system genes and important candidate genes connected with metastatic potential in oral cancer.  None of the 12 genotyped ancestry markers showed population substructure among the participants; however, the frequencies were indicative of an admixed population status, a finding further confirmed by our group-specific maximum likelihood estimates. Our study enrolled cases (i.e., persons diagnosed with an oral precancer or cancer) and controls (persons diagnosed with a benign oral condition) through participating pathology laboratories on the island of Puerto Rico. Although we did not apply a population-based recruitment process, our detected maximum likelihood estimates were still very close to the known European contribution to the population (80.5% in 2000 year Census vs. 69.9%) and in keeping with the fact that people from the Iberian Peninsula began to populate Puerto Rico beginning in the early 1500s [2]. New 2010 US Census information shows even closer estimates as a decreased percentage of Puerto Ricans claimed that they were Whites (75.8%) and an increased percentage selfreported as Black or African-American (12.4% in 2010 from 8% in 2000) [15].
The group-specific frequency of African markers was significantly higher based on our maximum likelihood estimation than was expected based on published 2000 US Census data (24.5% vs. 8%; p,0.0001). Interestingly, the Native American ancestry contribution was much higher in our study population than any comparable population demographic data would indicate (5.7% vs. 0.4%; p,0.0001). These results point toward new venues in the study of chronic disease development among Puerto Ricans to include anthropological and social determinants.

Limitations of the study
This research was implemented in the midst of changing health care regulations in the United States and Puerto Rico (i.e., introduction of HIPAA). Policy changes and associated uncertainties among healthcare practitioners, pathology laboratories, and the general public posed challenges to implement data collection and personal interviews with participants, and resulted in a smaller than anticipated sample size. In addition, during implementation of the study, we identified a deficit in the detection of oral premalignant lesions on the island [16][17] which resulted in a lower than expected enrollment in the number of persons diagnosed with oral precancerous lesions (HK/EH and OED).
Participation bias in small study samples is an important concern in molecular epidemiology. To address this issue, we made every effort to control for undetected, potential sub-groups that would have posed problems when diagnostic groups were analyzed. We found that the study sample represented the total admixed population well. Nevertheless, more research is needed, preferably by creating a larger, pooled Hispanic cohort study that would be specifically designed to address, in detail, the ancestral contributions to genetic susceptibility for oral cancer, pre-cancer and other chronic diseases.
In summary, we found that neither self-identified ethnicity nor ancestry markers showed any significant associations with oral cancer/precancer risk in our study.
Further, the application of ancestry informative markers (AIMs), specifically designed for Hispanics, provides a viable approach for the evaluation and control of ancestry in future studies involving Hispanic populations.