Prevalence of Pathological Germline Mutations of hMLH1 and hMSH2 Genes in Colorectal Cancer

Abstract The prevalence of pathological germline mutations in colorectal cancer has been widely studied, as germline mutations in the DNA mismatch repair genes hMLH1 and hMSH2 confer a high risk of colorectal cancer. However, because the sample size and population of previous studies are very different from each other, the conclusions still remain controversial. In this paper, Databases such as PubMed were applied to search for related papers. The data were imported into Comprehensive Meta-Analysis V2, which was used to estimate the weighted prevalence of hMLH1 and hMSH2 pathological mutations and compare the differences of prevalence among different family histories, ethnicities and related factors. This study collected and utilized data from 102 papers. In the Amsterdam-criteria positive group, the prevalence of pathological germline mutations of the hMLH1 and hMSH2 genes was 28.55% (95%CI 26.04%–31.19%) and 19.41% (95%CI 15.88%–23.51%), respectively, and the prevalence of germline mutations in hMLH1/hMSH2 was 15.44%/10.02%, 20.43%/13.26% and 15.43%/11.70% in Asian, American multiethnic and European/Australian populations, respectively. Substitution mutations accounted for the largest proportion of germline mutations (hMLH1: 52.34%, hMSH2: 43.25%). The total prevalence of mutations of hMLH1 and hMSH2 in Amsterdam-criteria positive, Amsterdam-criteria negative and sporadic colorectal cancers was around 45%, 25% and 15%, respectively, and there were no obvious differences in the prevalence of germline mutations among different ethnicities.


Introduction
Colorectal cancer (CRC) is a major worldwide public health problem [1], and is the second leading cause of cancer death in developed countries. In developing countries, CRC represents the sixth or seventh leading cause of cancer death [2].
It is estimated that hereditary nonpolyposis colorectal cancer (HNPCC) accounts for somewhere between less than 1% to 13% [3,4] of colorectal cancers, which make it the most common inherited CRC syndrome [5,6]. HNPCC is characterized by an autosomal dominant inheritance pattern of early onset colorectal cancer, which is associated with extra colonic malignancies, such as endometrial, urological and upper gastrointestinal cancers [7]. There is no characteristic phenotype associated with HNPCC, and its diagnosis is dependent on the recognition of a strong family history suggestive of dominant inheritance [8].
HNPCC, also known as Lynch syndrome (LS), is caused by a germline mutation in the DNA mismatch repair (MMR) genes [9,10]. A normal functioning MMR system can recognize and correct the base-pair mismatches and small nucleotide (1-4 base pair) insertion/deletion mutations, which is essential for the maintenance of genomic stability [11].
The hMSH2 gene, which is a component of the DNA mismatch repair pathway, was the first gene identified to be associated with HNPCC. It serves as the ''scout'' that recognizes and binds directly to the mismatched DNA sequence [19,20] and can form a heterodimer with hMSH6 when a single base-pair mismatch is recognized or with hMSH3 if two to eight nucleotide insertions or deletions exist [11].
The hMLH1 gene protein product is also a component of the DNA mismatch repair pathway, which has been shown to form a heterodimer with the hMLH3, hPMS2 and hPMS1 genes. However, this protein has unknown enzymatic activity and likely acts as a ''molecular matchmaker'' that recruits other DNA repair proteins to the mismatch repair complex [21].
Since the hMLH1 and hMSH2 genes were found in humans, the prevalence of germline mutations has been widely studied not only in case of colorectal cancer with a suggestive family history but also in sporadic colorectal cancer. However, the results of these studies are inconsistent because the sample sizes were small, and the ethnic backgrounds were varied [22][23][24]. Therefore, a systematic review and meta-analysis is essential to provide recommendations for genetic tests based on family history and a basis for the prevention, early diagnosis and treatment of colorectal cancer.

Search strategy and selection criteria
Databases, including PubMed, Embase and Cochrane Library, were applied to search for related papers published from January 1993 to March 2011 with the following keywords: hMLH1, hMSH2, mutation, hereditary nonpolyposis colorectal cancer, colorectal cancer and/or carcinoma, tumor or neoplasm. Chosen papers were limited to those that were published in English and fulfilled the following selection criteria: 1) paper assessing only a specific type of mutation or only specific regions of genes were excluded; 2) the mutations had to be germline mutations with pathological features but not somatic, studies that revealed somatic alteration of the MMR genes presence were excluded; 3) case reports were excluded; 4) repetitive reports were unified by using the latest or the largest edition; 5) research on polymorphism was excluded; 6) Lynch syndrome patients with known MMR gene mutations were excluded; 7) the detection patient was limited to a diagnosis of colorectal cancer rather than other Lynch syndrome related cancer such as endometrial cancer. The specific process of study selection has been shown in Figure S4 in supporting information.

Classification of family history and ethnicity
We categorized colorectal cancer patients who met the stringent Amsterdam criteria (I or II) [25] as the Amsterdam-criteria positive group (AC+). Patients without any family history of cancer, regardless of the onset age, were categorized in the sporadic cancer group. Others who had a family history but did not strictly conform to the Amsterdam criteria were defined as the Amsterdam-criteria negative group (AC2). Additionally, we named the patients with an ambiguous family history or those who did not have enough information to be re-classified again, as the family history not clear group.
Because the information about ethnicity of patients was not well-defined, we had to define the ethnicity based on continents, including Asian, American multiethnic, European/Australian or mixed ethnicities (some studies did not offer this type of data or this data included American, European and Australian).

MSI status and category
If more than 30% of the typically used microsatellite markers show instability, the tumor will be considered MSI-high. MSIstable (MSS) is defined as no markers indicating instability [26,27]. Otherwise, the tumor is defined as MSI-low. For patients without information about microsatellite status, we classify them as MSInot identified. Additionally we define studies that combined MSIhigh and MSI-low tumors as MSI.

Determination of pathogenicity
We determined the pathogenicity of mutations primarily by three methods combined. First, we deferred to the interpretations of the original papers and the pathogenic definition including: a frameshift mutation that would be predicted to result in a truncated protein; nonsense mutations; missense mutations ascertained with a functional assay; large genomic deletions that removed at least one exon; or duplication of exon, to segregation of the alteration with cancer in the kindred [28]. Second, we used the analytic program PolyPhen to predict this mutation to be pathogenic [29], If PolyPhen score.2.0 then the change was predicted to affect protein function. Last, we checked two websites including ''International Society for Gastrointestinal Hereditary Tumours Incorporated (InSiGHT) (www.insight-group.org/ mutations/)'' and ''MMR Gene Unclassified Variants Database (www.mmruv.info)'' to further determine pathogenicity. To apply functional assays will be more accurate and objective when testing missense variants for pathogenicity. But many articles could not do this due to various limitations; some studies distinguished between pathological changes with polymorphism or determined pathogenicity when the same variants founding in the control population. In the InSiGHT database, its ''Reported pathogenicity'' was categorized as reported pathogenic or probably pathogenic and ''Concluded pathogenicity'' was unknown. We then considered it as reported pathogenicity according to these results. In our Metaanalysis, we categorized those reported pathogenic mutations or probably pathogenic meeting the definition as pathogenic mutations.

Data Extraction
Two investigators (Dandan Li and Fulan Hu) independently extracted data and checked all of the differences in the variables until an agreement was reached on all items. Information such as the first author, published years, continent, country, family history, mutation sites, mutation types, and MSI phenotype and detection methods was collected from each article.

Statistical analysis
Data were imported into Comprehensive Meta-Analysis V2, which estimated the weighted prevalence and compared the difference of prevalence among related factors. A significant a level of 0.05 was applied. For multiple tests, an a level of 0.05 was adjusted to a divided by the number of multiple tests. Heterogeneity between studies was assessed with meta-regression and I 2 statistics. I 2 statistics included 25, 50 and 75 corresponding to low, medium and high heterogeneity, respectively [30]. If I 2 was #50 combined with the characteristics of the data [31], the fixed-effects model was used. Otherwise, random-effects models were adopted. The publication bias was assessed visually using a funnel plot. The rank correlation method suggested by Begg [32] and the linear regression approach proposed by Egger et al. [33,34] were used to quantitatively analyze the potential publication bias.

Results
After filtering for potentially relevant citation, there were 796 abstracts retrieved. We then excluded those studies that had no clear gene mutation detection data. Finally, a total of 279 articles on hMLH1 and hMSH2 germline mutations in colorectal cancer were searched in an electronic database. However, there were only 102 papers included in this study [6,8,24,26, based on the selection criteria. A clear family history was provided in 82 of these papers. The detected population came from Asian, American, European/Australian and mixed ethnic populations in 22, 11, 63 and 6 papers, respectively. Basic characteristics of the included articles are shown in Table S1 in supporting information.

The prevalence of germline pathological mutations in different family histories
In total, 861 of 7057 and 698 of 7096 colorectal cancer cases reported had hMLH1 and hMSH2 gene mutations, respectively. Additionally, 1526 of 6965 cases had a mutation in one gene or the other when both genes were screened.
In the hMSH2 gene, the mutation prevalence ranged from 17.56% to 33.78% in the AC+ group, from 10.33% to 20.60% in the AC2 group and from 3.64% to 21.90% in the sporadic cancer group (AC+: P = 0.00,0.05; AC2: P = 0.91.0.05; sporadic: P = 0.00,0.05) in the four ethnicities evaluated. In the AC+ and sporadic cancer group, differences were seen in the mixed ethnicities group compared to the European/Australian group (P = 0.000,0.007) and in the Asian group compared to the mixed ethnicities group (P = 0.000,0.007), respectively (Table 1).
Refers to the articles that had both gene that were detected, in the AC+ group, the total mutation prevalence of hMLH1 and hMSH2 for Asian, American multiethnic, European/Australian and mixed ethnicities was 38.01%, 54.02%, 42.59% and 66.09%, respectively ( Table 2). In the AC2 group, the prevalence was around 25% (P = 0.83.0.05). In sporadic cases, there was a wide range and difference in the prevalence (from 5.31% to 37.63%, P = 0.00,0.05) ( Table 2). There were obvious differences among these ethnicities in the AC+ and sporadic cancer groups (all had P = 0.000,0.007). Further analysis showed that these differences were seen in Asian compared to mixed ethnicities and European/ Australian compared to mixed ethnicities. No differences were observed among the three clear ethnicities.

The mutation distribution in different exons
All of the exons in these two genes showed mutations. The highest mutation prevalence of 3.62% was found in exon 16 of the hMLH1 gene, with 2.19 mutations/100 bp, which, remarkably, accounted for 16.36% of all mutations. In addition to exon 16, the prevalence of mutation was higher in exon 2, exon 6, exon 8, exon 12, exon 13 and exon 19. Mutations in these seven exons (including exon16) accounted for 55.45% of the total mutations. In the hMSH2 gene, the mutation prevalence and densities in different exons were generally lower than those in the hMLH1 gene. The highest prevalence of mutation was 2.62% in exon 7. Those in exon 3, exon 5, exon 11 and exon 12 were also higher than in other exons. The total mutations in these five exons accounted for 53.39% of the total mutations (Table 3).

The mutation types
As shown in Table 4, there were three main types of gene mutations, including substitutions (with inclusion of transition and transversion), deletions or insertions and large genomic rearrangements. Substitution accounted for 60.97% and 53.77% of all three point mutations in hMLH1 and hMSH2 gene, respectively. The next highest was deletion, which accounted for 24.15% and 36.98% of the total, respectively.

Prevalence of germline mutations in different subject select setting
There were 28 population-based series articles and 29 clinicbased series articles that were evaluated in this study. In the hMLH1 gene, the mutation prevalence was 12.49 (95%CI 8. 65-17.71) in the population-based group and 17.39 (95%CI 13.62-21.93) in the clinic-based group (P = 0.13). In the hMSH2 gene, the mutation prevalence were 10.50% (95%CI 6.94%-15.59%) and 12.03% (95%CI 8.47%-16.80%), respectively (P = 0.62) ( Table S5 in supporting information). To further consider the difference in each family history group, there were no significant statistics in any group.

Publication bias
Funnel plots of the prevalence of pathological mutation in these two genes both in general and with different family histories showed some extent of asymmetry, with small studies on the left side of the plot ( Figure S1, S2 and S3 in supporting information). Detailed results of an Egger regression, a Begg correlation and a ''Trim and Fill'' analysis for different family histories with these two genes, both separately and together, are shown in Table 6.

Discussion
Based on a systemic review and meta-analysis, we found that the total mutation prevalence of hMLH1 and hMSH2 in patients having both genes screened was 44.70%, 24.65% and 11.56% in the AC+, AC2 and sporadic cancer groups, respectively. However, the reported mutations in these two genes were very different in Lynch syndrome [6,48,59]. One reason for the difference was that we limited the mutation region to exons and the mutation type to pathogenicity, which allowed us to provide more stable mutation prevalence results by executing a systematic review and meta-analysis.
Although papers on mutations in different ethnicities have been published, no reports have explicitly described any differences among them. Our analysis found that there was no substantial statistical difference between these four ethnicities with different family histories across both genes, either separately or together.
Although InSiGHT database have collected information about new mutations in different exons, few papers or websites provided information on exon-specific prevalence and detailed mutation types. Our results showed that a remarkable high prevalence of mutation occurred in exon 16. It was also noteworthy that the mutations in exon 16 and exon 2 were mainly aggregated at c.1852_1854delAAG and c.199G.A, which accounted for 37.78% and 29.03% of the mutations, respectively (data not shown). In hMSH2, the highest mutation prevalence was found in exon 7. The total mutations in exon 3, exon 5, exon 7, exon 11 and exon 12 amounted to 53.39% of the total (Table 3). Therefore, when performing hMLH1 and hMSH2 gene mutation tests, it would be important to focus attention on these exons and their common mutation points.
In Wei W. et al.   specific mutations in this population should be highlighted when screening for mutations in these two genes.
Our results showed that the major mutation type of both genes was substitution and deletion. The substitution of a nucleotide could result in missense, nonsense and silent mutations, while deletion and insertion typically lead to frameshift. There were no differences in mutation type by ethnicity in hMSH2 (deletion P = 0.18.0.05; insertion P = 0.11.0.05; substitution P = 0.85.0.05), but there were in the hMLH1 gene. For insertion, differences existed between the Asian and European/Australian populations (P = 0.00,0.05). Insertion mutations accounted for a larger proportion in the Asian population than in the European/ Australian population (Table 4 and Table S4 in supporting information).
These results suggested that not only point mutations occurred frequently in colorectal cancer but also large genomic rearrangements were present to some extent. Initially, the detection methods for large genomic rearrangements were mainly southern analysis [115] and conversion analysis [18]. Recently, more sensitive MLPA analysis was performed for patients who had no point mutations to determine the occurrence of large genomic deletions of these two genes [134]. In our 102 studies, there were only five papers using MLPA separately or combined with other methods, representing 4.90% of the total studies. The prevalence of large genomic rearrangements in hMLH1 and hMSH2 was 6.76% (95%CI 3.11%-14.05%) and 13.56 (95%CI 11.19%-16.32%) (Data not shown), respectively, which was higher than the results in Table 4, where the subjects and detection methods were not specified. Therefore, the mutation prevalence in future results is expected to be higher with the use of more sensitive methods to identify large genomic deletions.
Studies have revealed that cases with negative microsatellite instability may also carry germline mutations. The mutation prevalence is widely ranged in different MSI situations and with different family histories [70]. The prevalence of mutation was 53.41% in AC+ patients' with tumors exhibiting MSI-high phenotype, which suggested that the predicted value of MSI-high for mutations in these two genes was 53.41% in the AC+ group. The next highest was 38.80% in the AC2 group and then 22.54% in the sporadic group. If we took MSI as one group (combined MSI-high, MSI-low and MSI (cannot identify MSI-high or MSIlow)), the corresponding predicted value was 57.12% (95%CI 50.43%-63.55%) in the AC+ group (Table 5).
Several techniques for detecting mutations are commonly used, including immunohistochemistry followed by DNA-sequencing, single-strand conformational polymorphism followed by DNAsequencing, heteroduplex analysis followed by DNA-sequencing, denaturing gradient gel electrophoresis followed by DNAsequencing, denaturing high-performance liquid chromatography followed by DNA-sequencing, and direct DNA-sequencing. Analysis of the effect of different detection methods on the prevalence have found that, in general, there was no significant difference in prevalence detected by the four methods in AC+ (P = 0.60.0.05) and AC2 (P = 0.30.0.05) group. In the sporadic group, there were too few studies to analysis (Table S2 in supporting information).
A distinction between a population-based and clinic-based series was made, for this was a potentially important bias of the analysis. However, we observed that there were no significant differences in the mutation prevalence between the clinic-based and populationbased groups in either gene. When considering the effects of family history, the conclusion did not change (Table S5 in supporting information).   There was higher heterogeneity in total prevalence. Metaregression results showed that 22.34% of this heterogeneity was explained by different family histories (P = 0.00) ( Table S3 in supporting information). Subgroup analysis showed that the heterogeneity was at moderate or low levels for different family histories with respect to the prevalence in these two genes. In addition to family history, factors such as years since publication, different ethnicities and detection methods were also analyzed, and none of them showed any statistically significant effect on heterogeneity (Table S3 in supporting information).
From funnel plots ( Figure S2 and S3 in supporting information), we observed that the figures from the meta-analysis for the mutation prevalence with different family histories of the two genes were all skewed to the left. Further quantity analysis by Egger regression and Begg correlation methods on the extent of asymmetry found that the left-side asymmetry was statistically significant in some situations ( Table 6). This indicated that an increased number of accepted publications and small study sizes had no effect on positive results. This only illustrated that more studies with small sample sizes and lower detection ability were conducted and published. However, we can still use ''Trim and Fill'' to adjust the value of the original data [135]. We observed that the adjusted value increased from 28.55% to 33.94% in the hMLH1 gene and from 19.41% to 23.00% in the hMSH2 gene in the AC+ group (Table 6).
In addition to mutations in exons, some of most common pathogenic variants are intronic. So, we systematically searched papers on associations between intervening sequence or intron area mutations and Lynch syndrome, and we also analyzed the difference of prevalence among related factors such as family history and ethnicities (Table S6 in supporting information). When we calculated the two genes combined, the intronic mutation frequency was 12.30% (95%CI 9.80%-15.33%) in the AC+ group and 5.90% (95%CI 4.08%-8.47%) in the AC2 group. Moreover, we found that the most common pathogenic deleterious variants in Lynch syndrome were hMSH2 Intron5 c.942+3A.T, and hMLH1 Intron9 c.790+1G.A, with a mutational consequence of deletion of exon5 in hMSH2 and a deletion of exon9-10 in hMLH1.
Sensitivity analysis indicated that the results of our study were reliable and stable. However, this meta-analysis still has some limitations, such as the family history information of the patients not being clearly explained and studies providing insufficient ethnicity information having to be roughly classified. In the sporadic colorectal cancer group, the detection population was usually filtered by MSI phenotype or onset age [64,129]. Moreover, insufficient studies on sporadic colorecal cancer and information on gender need to be further analyzed. In addition, in order to control the quality and uniform standards for articles, the written language was limited to English, which may have affected the number of included studies.
Not only hMLH1 and hMSH2 gene, we remained concerned about the prevalence of these mutations of other MMR genes, in particular hMSH6 or hPMS2. In 2009, a systematic review was conducted and a meta-analysis was undertaken to determine the frequency of hMSH6 mutation in colorectal and endometrial cancers by our academic team [136]. As to hPMS2 gene, mutations in hPMS2 gene are a rare cause of Lynch syndrome [15,16]. And the mutation frequency were currently less than 2% (http://www. med.mun.ca/mmrvariants), moreover, there was fewer papers study hPMS2 gene mutation [137], so we did not include it avoiding destabilizing results. In spite of these limitations, our results are still reliable and yield important conclusions. Data on sporadic cases were not sufficient or detailed, and, hence, large, well-designed studies with information on ethnicity, gender and age of onset are needed.

Data statement
We declare that all the data analyzed in this paper were extracted from the on-line published articles and we take the responsibility for the integrity of the data and the accuracy of the data analysis. Table S1 Characteristics of included studies about weighted prevalence of hMLH1 and hMSH2 germline mutation in colorectal cancer. (DOC)