Skip to main content
Browse Subject Areas

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Increasing the Number of Thyroid Lesions Classes in Microarray Analysis Improves the Relevance of Diagnostic Markers

  • Jean-Fred Fontaine,

    Affiliations Max Delbrück Center for Molecular Medicine, Berlin, Germany, INSERM, UMR 694, Angers, France, Université d'Angers, Angers, France

  • Delphine Mirebeau-Prunier,

    Affiliations INSERM, UMR 694, Angers, France, Université d'Angers, Angers, France, CHU Angers, Laboratoire de Biochimie, Angers, France

  • Mahatsangy Raharijaona,

    Affiliations INSERM, UMR 915, l'institut du Thorax, Nantes, France, Université de Nantes, Nantes, France

  • Brigitte Franc,

    Affiliation Hôpital A Paré, Laboratoire d'Anatomie Pathologique, Boulogne, France

  • Stephane Triau,

    Affiliation CHU Angers, Laboratoire de Pathologie Cellulaire et Tissulaire, Angers, France

  • Patrice Rodien,

    Affiliations INSERM, UMR 694, Angers, France, Université d'Angers, Angers, France, CHU Angers, Département Endocrinologie-Diabétologie-Nutrition, Angers, France

  • Olivier Goëau-Brissonniére,

    Affiliation Hôpital A Paré, Service de Chirurgie vasculaire, Boulogne, France

  • Lucie Karayan-Tapon,

    Affiliation Université de Poitiers, EA 3805, Poitiers, France

  • Marielle Mello,

    Affiliation INSERM, UMR136, Marseille, France

  • Rémi Houlgatte,

    Affiliations INSERM, UMR 915, l'institut du Thorax, Nantes, France, Université de Nantes, Nantes, France

  • Yves Malthiery,

    Affiliations INSERM, UMR 694, Angers, France, Université d'Angers, Angers, France, CHU Angers, Laboratoire de Biochimie, Angers, France

  • Frédérique Savagner

    Affiliations INSERM, UMR 694, Angers, France, Université d'Angers, Angers, France, CHU Angers, Laboratoire de Biochimie, Angers, France, INSERM, UMR 915, l'institut du Thorax, Nantes, France



Genetic markers for thyroid cancers identified by microarray analysis have offered limited predictive accuracy so far because of the few classes of thyroid lesions usually taken into account. To improve diagnostic relevance, we have simultaneously analyzed microarray data from six public datasets covering a total of 347 thyroid tissue samples representing 12 histological classes of follicular lesions and normal thyroid tissue. Our own dataset, containing about half the thyroid tissue samples, included all categories of thyroid lesions.

Methodology/Principal Findings

Classifier predictions were strongly affected by similarities between classes and by the number of classes in the training sets. In each dataset, sample prediction was improved by separating the samples into three groups according to class similarities. The cross-validation of differential genes revealed four clusters with functional enrichments. The analysis of six of these genes (APOD, APOE, CLGN, CRABP1, SDHA and TIMP1) in 49 new samples showed consistent gene and protein profiles with the class similarities observed. Focusing on four subclasses of follicular tumor, we explored the diagnostic potential of 12 selected markers (CASP10, CDH16, CLGN, CRABP1, HMGB2, ALPL2, ADAMTS2, CABIN1, ALDH1A3, USP13, NR2F2, KRTHB5) by real-time quantitative RT-PCR on 32 other new samples. The gene expression profiles of follicular tumors were examined with reference to the mutational status of the Pax8-PPARγ, TSHR, GNAS and NRAS genes.


We show that diagnostic tools defined on the basis of microarray data are more relevant when a large number of samples and tissue classes are used. Taking into account the relationships between the thyroid tumor pathologies, together with the main biological functions and pathways involved, improved the diagnostic accuracy of the samples. Our approach was particularly relevant for the classification of microfollicular adenomas.


Over the past few years, the use of microarray technologies has contributed to the identification of new markers for the diagnosis and prognosis of human tumors. Cancer research usually involves the study of a single class of tumor and the corresponding normal tissue. Increasing the number of binary studies does not necessarily improve the relevance of the molecular signature. Thus, a meta-analysis comparing 40 types of cancer in various tissues relative to their normal counterpart allowed the identification of a common signature essential to carcinogenesis but may fail to distinguish between different classes of tumor affecting a given organ, which may therefore have specific prognoses [1]. Moreover, the same signature may also appear in a variety of other cellular contexts such as inflammatory processes.

Thyroid nodules are extremely common in the adult population, but less than 20% of the nodules are malignant [2]. Papillary thyroid carcinoma (PTC), diagnosed on the basis of characteristic nuclear features, is the most frequent malignant thyroid tumor. According to the 2004 WHO report, the diagnosis of minimal invasive follicular thyroid carcinoma (FTC) is problematic because of its morphological and molecular similarities to benign follicular thyroid adenoma (FTA) [3]. Moreover, atypical or oncocytic features render the differential diagnosis of follicular tumors difficult on histologic examination and call for new molecular or biological markers [4], [5]. To date, microarray analyses of thyroid tumors have essentially compared two classes of tissue [6][11]. These studies have either searched for specific markers by comparing a particular class of thyroid tumor to the corresponding normal tissue, or looked for markers of malignancy by examining the most frequent benign and malignant classes of thyroid tumor (usually the FTA and PTC classes). The predictive accuracy of the markers identified is therefore rather limited with regard to tumors belonging to other classes. For example, the CITED1 gene, which was claimed to be a significant marker distinguishing PTC from normal tissue [9] turned out to be less specific when data from FTC samples were included [8]. Furthermore, this gene did not even figure among the 42 best PTC marker genes detected by a meta-analysis that included benign tumors [12]. Meta-analyses may increase not only the number of classes required to define more relevant markers but also increase the imbalance in the representation of some classes. In a recent analysis, cross-validated marker genes from 21 studies differentiated benign from malignant thyroid tissues [13]. However, the majority of these were pertinent to the diagnosis of PTC since these thyroid cancers accounted for more than 40% of the samples studied. This highlights the bias due to the recruitment of thyroid samples for microarray studies and the consequent failure in identifying classifiers for clinical applications despite the large number of analyses exploited.

Few studies have simultaneously compared more than four types of thyroid tissue [5], [14][17]. In one of these [15], we were able to refine the diagnosis of tumors of uncertain malignancy by the simultaneous analysis of eight types of thyroid tissue. These promising results encouraged further research on relevant markers of thyroid tumors, taking into account more classes and subclasses of interest, as some authors have recommended [18]. In the present study, we have explored gene-expression signatures from the majority of the differentiated follicular pathologies of thyroid tissue. We have used published data or data available from the Gene Expression Omnibus (GEO) of the U. S. National Center for Biotechnology Information, as well as specifically generated data, to simultaneously analyze several datasets containing information on various pathologies, including datasets with many histological subclasses. Our analysis, covering the major subtypes of thyroid tissue, may be expected to enhance microarray data concerning thyroid tumors and lead to the definition of more reliable tissue-specific markers for thyroid lesions.


We have simultaneously analyzed 347 thyroid tissue samples from six datasets (Table 1). The Fontaine dataset, generated by our own laboratory [15], comprised 166 thyroid tissue samples, i.e. about half the total number of samples.

Molecular classification of thyroid pathologies

To analyze the global classification of the follicular thyroid lesions in the Fontaine dataset, we defined the centroid of each class of tissue by a composite signature of mean gene-expression levels. The hierarchical clustering of the class centroids (Figure 1A) revealed three groups. The first group comprised benign thyroid lesions and normal tissues (green boxes for FTA, MNG, WT, AT, and GD); the second contained malignant lesions (red boxes for FTC, TUM and PTC); and the third included oncocytic tumors and microfollicular adenomas (blue boxes for FTAb, OTUM, OTA and OTC). These groups, supported by a high robustness index of 0.938 and a discrepancy index of 0.337, are respectively referred to as the ‘benign’, ‘malignant’ and ‘oncocytic’ groups. The three groups, cross-validated in the Giordano dataset (Figure 1C), presented almost perfect robustness ( = 1) and discrepancy indices ( = 0.001). This dataset showed a highly correlated oncocytic group (OTA, OTC, correlation >0.6) and a group of malignant tumors (PTC, FTC+). The third group, predominantly composed of benign tumors and normal tissues (WT, FTA), also included the FTC- group.

Figure 1. Molecular classification of thyroid tissues.

The centroid signatures of the thyroid tissues were compared to each other in the Fontaine dataset generated by our laboratory [15], and compared to the Giordano dataset [5]. Centroids were defined as the mean gene-expression signature of each tissue class over all the genes. Panel A shows the hierarchical clustering of the tissues in the Fontaine dataset. The dendrogram can be cut into three robust branches (black circles) defining three groups of classes. The groups contain benign lesions and normal tissue (green boxes), malignant tumors (red boxes), or oncocytic tumors and microfollicular adenomas (blue boxes). Panel B shows pairwise correlation coefficients between the tissue classes from the Fontaine dataset. The heatmap represents a symmetrical matrix of color-coded correlation coefficients. Tissue classes are ordered as in the hierarchical clustering and marked with the same three colors (green, red and blue). Panel C shows the hierarchical clustering of the tissues in the Giordano dataset. The dendrogram can be cut into three robust branches (black circles) defining three groups of classes. The groups contain heterogeneous lesions and normal tissue (green boxes), malignant tumors (red boxes), or oncocytic tumors (blue boxes). The two datasets show three groups containing similar tissue classes. WT: wild type tissue; PTC: papillary thyroid carcinoma; FTA: macrofollicular thyroid adenoma; FTAb: microfollicular thyroid adenoma; GMN, multinodular goiter; FTC: follicular thyroid carcinoma (FTC+: with Pax8/PPARγ translocation; FTC-: without Pax8/PPARγ translocation); OTA: oncocytic thyroid adenoma; OTC: oncocytic thyroid carcinoma; TUM: tumor of uncertain malignancy; OTUM: oncocytic tumor of uncertain malignancy; GD: Grave's disease; and AT: autoimmune thyroïditis.

We computed the prediction accuracy for the benign, malignant and oncocytic groups of thyroid tissue classes as defined above. The signature of each group was trained versus all the other samples, and its accuracy was compared with the predictive accuracy of individual lesions (Table 2). In both datasets, the positive accuracy for each group was greater than the best accuracy reported for an individual lesion within the group. The high proportion of PTC (79%) in the malignant group from the Giordano dataset may explain the good accuracy for this group despite the rate of discrepancy shown in the Figure 1C. The separation of the samples into three groups on the basis of class similarities led to improved sample prediction in the two datasets.

Table 2. Prediction of samples by considering three groups of tissue classes. Values can be compared to the range of accuracies for individual lesions that composed each group.

The matrix of 66 pairwise correlation coefficients between the classes (Figure 1B) allowed a detailed observation of the similarities and dissimilarities in the Fontaine dataset. All the classes in the benign/normal group were positively correlated with each other. Some pairs of classes were highly correlated (correlation coefficient >0.4), e.g. FTA with MNG, and AT with GD. All the classes in the malignant group were positively correlated with each other. The TUM class had a stronger correlation with PTC than with FTC (respectively 0.354 and 0.274). This similarity between TUM and PTC has previously been demonstrated within this dataset and with independent samples [15]. In the oncocytic group, the classes of oncocytic tumors were positively correlated with each other, except for the FTAb class. Finally, we were able to identify 220 non-redundant genes for the molecular classification of each of the 11 subcategories of follicular thyroid pathologies explored (Table 3).

Table 3. Classifier genes from the Fontaine dataset (n = 220 genes).

Class prediction from binary and from complete training

We compared classifiers defined from many classes, i.e. the entire dataset (complete training), and two classes, i.e. one pathological class versus normal thyroid tissue (binary training). The predictive accuracy for the samples of a class, i.e. the positive accuracy, was always higher after binary training than after complete training in the two main datasets (Figure 2A). With regard to complete training, only two classes in each dataset had similar performances: PTC and AT in the Fontaine dataset, FTC+ and PTC in the Giordano dataset.

Figure 2. Class prediction of the samples.

Panel A shows the mean positive accuracy of classifiers during intra-dataset cross-validations for the two main datasets, i.e. the Fontaine dataset and the Giordano dataset. Each classifier is composed of 20 genes selected by the Greedy Pairs algorithm and using Diagonal Linear Discriminant Analysis for training and prediction. Some classifiers were trained with samples from the selected class and the normal tissue, i.e. with binary training (black bars). Other classifiers were trained with all the samples of the dataset, i.e. with complete training (gray bars). Panel B shows the class prediction of new samples by binary-trained classifiers in the two datasets. All the new samples come from the same dataset and should have been predicted as negative. The negative predictive value which was deduced from binary training (black bars) is compared to observations on the new samples (gray bars). WT: wild type tissue; PTC: papillary thyroid carcinoma; FTA: macrofollicular thyroid adenoma; FTAb: microfollicular thyroid adenoma; MNG, multinodular goiter; FTC: follicular thyroid carcinoma (FTC+: with Pax8/PPARγ translocation; FTC-: without Pax8/PPARγ translocation); OTA: oncocytic thyroid adenoma; OTC: oncocytic thyroid carcinoma; TUM: tumor of uncertain malignancy; OTUM: oncocytic tumor of uncertain malignancy; GD, Graves' disease; and AT, autoimmune thyroïditis.

The classifiers defined by complete training in the Fontaine dataset were evaluated in five other thyroid datasets (Table 4). With respect to the positive accuracy computed from intra-dataset cross-validations, the PTC, FTC, FTA and OTC classifiers turned out to be more accurate than expected. The OTA classifier was less accurate than expected, with a value of 0.59 versus 0.64, but only one other dataset contained this class of thyroid tissue.

Table 4. Inter-dataset cross-validations. Positive accuracies of the classifiers defined by the Fontaine dataset in five other datasets.

Functional analysis of cross-validated differential genes

In the two largest datasets containing the main kinds of thyroid lesions [5], [15], we searched for cross-validated differential genes. The two lists of differential genes had an intersection of 104 genes that separated five gene clusters from the hierarchical clustering of classes shared by the two datasets (Figure 3 and Table 5). The functions and pathways identified were mainly related to oncocytic and papillary tumors (Clusters 1, 3 and 4). Gene ontologies were associated with oxido-reductive mechanisms (Cluster 1, 28 genes), nitrogen metabolism and cell communication (Cluster 3, 17 genes), glucocorticoid receptor signaling and leukocyte extravasation (Cluster 4, 24 genes). A cluster of genes showed a specific profile for tumors of malignant potential, for which the transcriptional factors involved in the primary anabolic and catabolic cell pathways were down-regulated (Cluster 2, 25 genes). However in Cluster 2, the heterogeneous profiles of some classes (FTA) were suspected of containing molecular subgroups whereas the profiles for WT were almost homogeneous. A cluster of 10 genes (Cluster 5) was not clearly specific to the tissue classes as their profiles differed between the two datasets. In the Fontaine dataset, 52% of the cross-validated genes belonged to at least two of the 11 classes of lesion (data not shown).

Figure 3. Hierarchical clustering of cross-validated differential genes.

The heatmap shows gene-expression levels in the Fontaine dataset for the tissue classes common to the Giordano dataset (104 genes). Functional enrichments are shown for five identified gene clusters on the right, followed by p-values and the genes involved. The black text represents Level 3 Gene Ontology terms; the blue text represents canonical Ingenuity pathways. Gene profiles of the differential genes reflect the class similarities observed. WT: wild type tissue; PTC: papillary thyroid carcinoma; FTA: macrofollicular thyroid adenoma; FTC: follicular thyroid carcinoma; OTA: oncocytic thyroid adenoma; and OTC: oncocytic thyroid carcinoma.

Table 5. Cross validated differential genes from the two main datasets (n = 104).

Independent validation of selected genes by real-time RT-PCR

To test the relevance of some markers selected by cross-validation between the two main datasets, we measured the expression levels of six differential genes, CASP10, CDH16, CLGN, CRABP1, HMGB2 and ALPL2, on 32 new follicular tumor samples, including eight samples each of FTA, FTAb, OTA and FTC (Figure 4A). For all six genes, the expression levels differentiated between follicular adenomas (macro- and micro-follicular) and follicular carcinomas (t-test, p<0.05). In all cases except one (CASP10, F-test, p<0.05), the gene expression level differentiated not only between macro- and micro-follicular adenomas but also between follicular adenomas and oncocytic adenomas, as in the case of CDH16 and CRABP1 genes.

Figure 4. Gene expression of selected markers.

Differential expression of 12 genes measured by real-time quantitative RT-PCR on 32 new follicular thyroid tumors (8 FTA, 8 FTAb, 8 OTA and 8 FTC). The upper and the lower limits of each box represent the upper and the lower quartiles, respectively. The bold lines represent medians. The expression of ADAMTS2, CABIN1, ADLH1A3, USP13, NR2F2, KRTHB5, CASP10, CDH16, CLGN, CRABP1, HMGB2 and ALPL2 genes was referred to the β-globin expression level. Significances of differential expression between the classes were assessed by t-tests and those over the classes by F-tests. FTA: macrofollicular thyroid adenoma; FTAb: microfollicular thyroid adenoma; FTC: follicular thyroid carcinoma; and OTA: oncocytic thyroid adenoma.

To test the relevance of our microarray analysis in identifying subclasses of adenomas, we measured the expression levels of six differential genes from the microfollicular adenoma subclass (ADAMTS2, CABIN1, ALDH13, USP13, NR2F2 and KRTHB5), on the same 32 follicular thyroid samples mentioned above (Figure 4B). Except for NR2F2 (p<0.954), all genes were differentially expressed not only between the three subclasses of adenomas (t-test, p<0.05), but also between the four groups of follicular tumors (F-tests, p<0.05 for ADAMTS2, CABIN1, ALDH13, USP13; p = 0.06 for KRTHB5).

Mutational status of follicular tumors

The mutational status of three genes (TSHR, GNAS and NRAS) was investigated on 59 solitary follicular adenomas (43 from the microarray study, and 16 from the RT-PCR study) to compare these to the expression profile of FTA samples and evaluate their functional status. To identify the role of the PAX8-PPARG translocation in the expression profile of the FTC class, the rearrangement status for PAX8-PPARG was explored on 11 FTC samples (three samples from the microarray study, and eight samples from the RT-PCR study).

In the follicular adenoma samples, we identified mutations in the TSHR gene in seven of the 59 samples (12%): six of which were macrofollicular adenomas from the microarray study and one was a macrofollicular adenoma from the RT-PCR study. All the mutations were situated in the seventh transmembrane domain of the TSH receptor (from codon 619 to codon 633). Our results are in accordance with the mutational status of the TSH receptor gene described in the literature, especially concerning the macrofollicular subtype [19]. One macrofollicular adenoma presented a mutation in codon 201 of the GNAS gene. None of the microfollicular adenomas showed TSHR and GNAS mutations in the four exons explored. Five microfollicular adenomas (four from the microarray study, and one from the RT-PCR study) were positive for the NRAS mutation in codon 61.

No PAX8-PPARG translocation was found in any of the FTC samples studied.

Protein expression of selected differential genes in independent samples

We collected 49 new tissue samples to measure the protein expression of six selected differential genes in the classes common to the two main datasets (FTA, FTC, OTA, OTC, PTC and WT), and also for the AT class as a non-tumoral control (Figure 5). There was significant differential expression between the classes (F-tests, p<0.05) for five proteins (SDHA, CLGN, APOD, CRABP1 and TIMP1). The APOE protein was not differentially expressed (p<0.932). Among the differentially expressed proteins, three had profiles similar to their gene profiles (SDHA, APOD and TIMP1), whereas two proteins (CLGN and CRABP1) showed some differences, as reported in previous studies of selected classes [20], [21]. The expression of the CLGN protein was higher in the FTA and FTC samples, but not in the oncocytic samples, contrary to the gene profile observed. The CRABP1 protein was differentially expressed between the FTA samples compared to the PTC and FTC samples, with a decreased expression of both RNA and protein in the malignant tumors. Considering the appurtenance of this gene to Cluster 2 (Figure 3), the expression of CRABP1 protein could serve to distinguish between the benign and malignant groups of thyroid tissue.

Figure 5. Protein expression of selected genes.

The protein expression of six selected differential genes was measured on 49 new tissue samples. These independent samples represent six tissue classes that are common to the two main datasets (WT, PTC, FTA, FTC, OTA, and OTC). The AT class was added to compare the expression levels with a non-tumoral lesion. Seven samples were used for each class. Protein expression profiles of APOD, CLGN, SDHA, APOE, CRABP1 and TIMP1 are shown by box-plots for the different classes of tissue. Significances of differential expression over the classes were assessed by F-tests. P-values are shown between brackets. WT: wild type tissue; PTC: papillary thyroid carcinoma; FTA: macrofollicular thyroid adenoma; FTC: follicular thyroid carcinoma; OTA: oncocytic thyroid adenoma; and OTC: oncocytic thyroid carcinoma.


The clinical diagnosis of thyroid tumors is subject to significant inter- and intra-observer variations [22], [23]. Since 2001, various molecular markers for thyroid tumors have been proposed on the basis of microarray analysis. However, none of these markers has proved satisfactory in clinical practice, possibly because only a small number of thyroid tumor classes were initially taken into account. In our study, we analyzed six publicly available microarray datasets containing 347 thyroid tissue samples and defining as many as 12 classes of thyroid tumors. One of these datasets, the Fontaine dataset generated by our laboratory, comprised about half the total number of samples, including all the differentiated follicular thyroid lesions.

With regard to the 12 classes of thyroid tumor defined, we found that more than half of the 104 relevant and cross-validated differential genes were also related to other classes, such as AT, GD, TUM, FTAb, which were not shared by the two main datasets, i.e. the Giordano dataset and the Fontaine dataset. This may have affected the relevance of the signature in the classification of the follicular pathologies, explaining the low percentage of genes common to the two classifiers. Although the genes were selected from different sets of classes, discrepancies of some gene profiles, for example in Cluster 5, may be related to incomplete microarray platform compatibility or to differential cellular environment. Interestingly, most of the genes in Cluster 5 were involved in the process of vascular endothelial cell expansion[24][26], which may represent the stromal reaction. Despite the small number of FTC and OTC samples and the microarray platform being restricted to 9,216 probes, our classifier was able to validate the classification of PTC, FTC, FTA and OTC on five other datasets (Table 3). This demonstrated the relevance of our approach to the molecular classification of the main types of differentiated follicular thyroid pathology.

Unsupervised analysis of the class signatures showed a separation into three groups corresponding to benign, malignant, and oncocytic thyroid tumors (Figure 1). The pairwise distinctions of benign/normal and malignant tumors were consistent with published reports, as in the case of PTC and FTA [14]. The surprising proximity of the FTC and FTA classes in the Giordano dataset has already been discussed [4]. This could be related to the high homogeneity in the expression profiles of the FTC samples, which were positive for the PAX8-PPARG rearrangement and significantly related to the malignant profile of the PTC samples. Thus, the clustering of the FTC- and FTA classes appears to be related more to their heterogeneous expression profiles rather than to their tendency to tumoral behavior. These results strongly suggest the existence of molecular subclasses for the FTA and FTC- samples. Interestingly, the FTC samples in the Fontaine dataset, included in the malignant group, were negative for the PAX8-PPARG translocation.

Although the histologic diagnosis of oncocytic follicular tumors presents few problems, the two latest WHO classifications have debated whether these tumors should be considered as an independent class of tumors or as a variant of follicular tumors. This question is still relevant in view of the recent results of miRNA profiling that clearly revealed the segregation of oncocytic tumors from other conventional follicular tumors [27]. According to their specific gene-expression profiles as well as their mutational status for RAS and PAX8-PPARγ genes [7], oncocytic tumors should be considered as distinct from the other follicular tumors. The role of the “oncocytic group” signature in the appropriate classification of benign and malignant tumors is debatable. Considering the mitochondrial richness of follicular thyroid tumors, this signature may highlight the metabolic changes during tumorigenesis. Little is known about the etiology of oncocytomas but their signature is composed of numerous mitochondrial and metabolic genes, and the role played by these genes in tumor aggressivity has not yet been established [7], [20]. Interestingly, we found that microfollicular adenomas (FTAb) were related by their molecular profiles to oncocytic adenomas as well as to carcinomas. Relationships between thyroid metabolism and microfollicular lesions have been explored [28], but never in the context of mitochondrial functions. However, in a recent study, the galectin-3 profile associated with FTAb was reported to be related to its role in the regulation of mitochondrial stability [29]. Galectin-3 has been also involved in thyroid tumorigenesis through its anti-apoptotic activity [30]. Thus, the malignant potential of FTAb is still under discussion [31].

Evaluation of the performance of the classifiers used showed that poor performance was associated with the FTA subclasses in the two main datasets. In our study, the functional status of the FTA samples may have influenced the performance of the classifiers in discriminating true FTA from hyperfunctioning nodules. Only 12% of macrofollicular adenomas presented mutations in the TSH receptor or GNAS genes. These results correlated on microarray analysis with the expression profile of several genes previously associated with the hyperfunctional status of nodules [32]. Even though our FTA group represents a mix of adenomas in terms of functional status, the signature of follicular adenoma defined by our classifier are involved in other processes than iodine uptake and thyroid hormone secretion. As the selected genes are mainly associated to cell metabolism (CABIN1, CAMK2A, DAP, PDK4), signal transduction (ASGR1, PLXNB2, TBC1, PLCG1) and cell adhesion (COL18A1), we believe that the FTA signature we propose can discriminate true FTA from other follicular thyroid tumors. The relevance of constructing subclasses of follicular adenomas was tested on six cross-validated markers and six specific markers from our own analysis (Figure 4). Five specific markers enabled us to segregate micro- and macro-follicular adenomas from oncocytic adenomas and minimally invasive follicular carcinomas, while only one of the six cross-validated markers showed the same performance. Distinguishing between subclasses of adenoma allowed the identification of relevant molecular markers of follicular adenomas. These markers had never been differentially selected during previous microarray analyses of thyroid tumors. However, three of the markers (ADAMTS2, ALDH1A3 and CABIN1) had been proposed as prognostic markers for the recurrence of epithelial tumors [33][35].

The observation of six protein profiles highlights the difficulty of proposing immunomarkers for clinical practice. However, some markers may be relevant for the classification of differentiated follicular pathologies (FTA, FTC, PTC, OTA and OTC). It has been suggested that APOD, which has a specific protein profile for FTC and PTC, may modify the proliferative activity of cancer cells [36]. The differential expression of the folding protein calmegin (CLGN) with FTA and FTC samples may be associated with a difference in the abundance of the protein, as recently described in thyroid tumors [37]. Furthermore, increasing the number of thyroid tumor classes has revealed new aspects of some of the markers. Thus, TIMP1 and CRABP1 were not selected by our classifier but figured in the intersecting differential gene lists. Our data clearly questioned the relevance of TIMP1 as a marker of thyroid cancer. Since high protein levels were observed in the AT and PTC classes, TIMP1 expression may be more closely related to the frequently described lymphocytic infiltration in PTC [38]. Quantitative RT-PCR analysis on new follicular samples confirmed the upregulation of CRABP1 expression in all the adenomas as compared to follicular carcinomas. Underexpression of CRABP1 mRNA was relevant in distinguishing the FTC as well as PTC classes, as reported previously [39], [40]. However, our results differed at the protein level compared to a study on cold thyroid nodules where both the expression of CRABP1 mRNA and the protein were downregulated compared to normal surrounding tissue [41]. As our FTA samples were mostly non-functioning nodules, we postulate that this difference in protein level may be due to the histological class studied, macrofollicular adenomas in the Fontaine dataset, as opposed to the mix of follicular adenomas and adenomatous nodules in the other study. While all these markers can be used to complement current markers [42], the exploration of a large panel of protein markers to improve the differential diagnosis may rapidly become more cumbersome and less relevant than the study of the expression profile of selected genes, especially in the case of suspicious nodular thyroid lesions.

In conclusion, diagnostic tools defined on the basis of thyroid microarray data are more relevant when many samples and tissue classes are used, especially with the inclusion of oncocytic tumors. Our study has revealed additional potentially useful gene and protein markers allowing the classification of several thyroid pathologies. However, to identify early events of transformation, these markers should be viewed in the context of the biological processes in which they participate. We therefore believe that dedicated gene-expression profiling offers the most powerful method of identifying gene groups that correlate to nodular growth, thyroid dedifferentiation or malignancy, taking into account the potential contamination by other cell types such as activated lymphocytes. In this context, the molecular classification of microfollicular adenomas on fine needle aspiration biopsies should have a direct impact in the management of such tumors.

Materials and Methods

Samples and microarray datasets

We analyzed gene-expression data from six microarray datasets containing up to 12 classes of differentiated thyroid pathologies and normal thyroid tissue. The FTA samples were divided into two subclasses according to their follicular architecture, FTA (macrofollicular) and FTAb (microfollicular). The dominant nodule within the multinodular goiter (MNG) was investigated to explore the incidence of genetic or environmental events on the molecular signature of the class of follicular adenoma. When possible, the FTC class was also divided into two subclasses, FTC+ and FTC-, defined respectively by the presence or absence of the PAX8/PPARγ translocation in the samples. Atypical (TUM) and oncocytic features (OT) were individualized on criteria specified elsewhere [15], [20]. We also explored non-tumoral lesions such as those in Graves' disease (GD) and autoimmune thyroiditis (AT) to identify the immunological environment of the tumoral samples. This study is based on the Fontaine dataset generated by our laboratory, which comprises 12 subclasses containing a total of 166 anonymized human thyroid tissues. The five other datasets, particularly the Giordano dataset with 7 subclasses, were used to cross-validate the results of the molecular classification, the classifiers, and the functional analysis of differential genes [5], [43][45]. One dataset (Reyes et al.) has not yet associated citation. Details of the six datasets, containing a total of 347 samples, are summarized in Table 1.

The Fontaine and the Giordano datasets are referred as the main datasets. All the datasets, generated by various microarray platforms (Table 1), are publicly available. The Fontaine dataset (GSE6339), the He dataset (GSE3467) and the Reyes dataset (GSE3678) are hosted by the Gene Expression Omnibus (GEO) database of the U.S. National Center for Biotechnology Information. The Weber dataset was downloaded from European Bioinformatics Institute ArrayExpress database (E-MEXP-97). The two other datasets were downloaded from the authors' websites: the Giordano dataset ( and the Jarzab dataset (

Data analysis

For the Fontaine dataset generated in our laboratory, hierarchical clustering of the data was computed on log-transformed, median gene-centered and normalized data as described elsewhere [15]. Normalized data from the Giordano and Jarzab datasets were downloaded from the authors' websites. For the remaining datasets, raw data were downloaded from the GEO or Array Express databases and normalized with BRB Array Tools v3.7.0b1 developed by Dr. Richard Simon. Since the data originated from three different but one-color platforms, global normalization to the median was used to center the gene-expression values in each array. Genes showing minimal variation across the set of arrays were excluded from the analysis (log-intensity variation P-value>0.15). Hierarchical clustering was done on centered genes by using the centered correlation distance and an average linkage method. Robustness and discrepancy indices were computed from 10,000 data perturbations.

For the two largest datasets [5], [15], we identified genes that were differentially expressed among the classes by using a multivariate 10,000-permutation test, to provide 95% confidence that the number of false discoveries did not exceed 1. Although F-statistics were used for each gene, the multivariate permutation test is non-parametric and does not require the assumption of Gaussian distributions. Functional enrichments in gene clusters were defined at 5% risk for Level 3 Gene Ontology terms by Fatigo tools (adjusted P-values, [46]) as compared to the whole genome, and for canonical pathways by Ingenuity Pathway Analysis tools as compared to the Ingenuity Pathways Knowledge database (

Classifiers were defined for each pathological class and trained either versus the wild type class (binary training) or the entire dataset (complete training). Twenty genes were selected by the Greedy Pairs algorithm, and Diagonal Linear Discriminant Analysis (DLDA) was used for training and sample [47]. Intra- and inter-dataset cross-validations were done by the 0.632+ method and DLDA respectively [48]. Since the classical accuracy coefficient is biased in large datasets containing unbalanced class sizes, we preferred to use the positive accuracy coefficient defined as the geometric mean of the sensitivity (i.e. the proportion of true positives in the samples belonging to the targeted class in the entire datasets) and the positive predictive value (i.e. the proportion of true positives in the samples belonging to the targeted class) [49].

Quantitative RT-PCR analysis

An independent set of 32 follicular tumors, comprising 8 minimally invasive FTC and 24 follicular adenoma (8 OTA, 8 FTA and 8 FTAb), were subjected to real-time quantitative RT-PCR to test the relevance of several selected markers. The anonymized samples were obtained from the Department of Pathology of the University Hospital of Angers, France.

Total RNA was isolated using the TRIzol reagent (Invitrogen, Paisley, UK). RNA integrity was determined using a Bio-Analyzer 2100 (Agilent Technologies, Waldbronn, Germany). Reverse transcription was performed on 1 µg of RNA with Advantage RT-for-PCR kit (Clontech Laboratory, Palo Alto, CA, USA) following the manufacturer's recommendations. Real-time quantification was performed in a 96-well plate using the IQ SYBR Green supermix and Chromo4 (Biorad, Hercules, CA, USA) with the recommended protocol. Twelve genes were explored for their expression levels: CASP10, CDH16, CLGN, CRABP1, HMGB2, ALPL2, ADAMTS2, CABIN1, ALDH1A3, USP13, NR2F2 and KRTHB5. Data were normalized to the β-globin expression level. The significance of differential expression between classes was assessed by t-tests and that over the classes by F-tests. The primer sequences used in this study are specified in Table 6.

Table 6. Primer sequences used for real-time quantitative RT-PCR.

Detection of mutations and PAX8-PPARG rearrangement

The mutational status for the TSHR, GNAS and NRAS genes was determined for all the FTA and FTAb explored in studies by microarray (26 FTA and 17 FTAb) and RT-PCR (8 FTA and 8 FTAb). DNA was isolated from frozen tissues during the guanidium isothiocyanate procedure (TRIzol Reagent, Invitrogen Life Technologies). Exon 2 of the NRAS gene was amplified using primer sequences described elsewhere[50]. Exons 9 and 10 of the TSH receptor gene, and exons 8 and 9 of the Gs alpha gene were amplified using primer sequences previously specified [51]. PCR were performed on 5 µl DNA with the HotGoldstar DNA polymerase (Eurogentec, Seraing, Belgium) according to the manufacturer's recommendations. Amplified fragments were purified and directly sequenced on a CEQ 8000 apparatus, using the CEQ DTCS Quick Start Kit (Beckman Coulter, Fullerton, CA, USA) following the manufacturer's instructions.

The presence of the PAX8-PPARG translocation was evaluated as described elsewhere [52], on the three FTC samples in the microarray study and the eight FTC samples in the RT-PCR study. Quantification, degradation and DNA contamination of RNA were assessed using an RNA 6000 Nano Assay kit (Agilent Technologies, Palo Alto, CA, USA). After reverse transcription, we looked for the translocation in 5 µl cDNA, using primer sequences previously specified [5] and the HotGoldstar DNA polymerase (Eurogentec, Seraing, Belgium) according to the manufacturer's recommendations.

Dot blot analysis

Two micrograms of protein corresponding to the TRIzol fraction of 49 independent and anonymized thyroid tissues were spotted onto nitrocellulose membranes at room temperature, using a Dot Blot apparatus (Biorad, Hercules, CA, USA) following the manufacturer's recommendations. These tissues, provided by the Department of Pathology of the University Hospital of Angers, differed from those used for the microarray and RT-PCR studies. Six groups of thyroid pathologies were defined according to WHO cytological criteria: AT, FTA, FTC, PTC, OTA and OTC and compared to normal thyroid tissue (WT). Seven samples were tested from each group and the assays were performed in duplicate. Seven primary antibodies (all from Abcam, Cambridge, UK) were used at specific dilutions: 1/750 for TIMP1, 1/1000 for APOD, SDHA, CLGN, CRABP1 and APOE, and 1/5000 for α-tubulin as a control. Peroxidase anti-rabbit or anti-mouse secondary antibodies were revealed using the ECL Plus reagent kit (ECL, Amersham, Chalfont, UK). Spots were quantified on a GelDoc XRS apparatus using Quantity One software (Biorad) and expressed in terms of the control value. Significances of differential expression over the classes were assessed by F-tests.


We thank Dominique Couturier and Anne Coutoleau for technical help. We thank Marja Steenman and Kanaya Malkani for the critical reading of this paper.

Author Contributions

Conceived and designed the experiments: RH FS. Performed the experiments: DMP BF ST MM. Analyzed the data: JFF MR FS. Contributed reagents/materials/analysis tools: DMP PR OGB LKT YM FS. Wrote the paper: JFF BF FS.


  1. 1. Rhodes DR, Yu J, Shanker K, Deshpande N, Varambally R, et al. (2004) Large-scale meta-analysis of cancer microarray data identifies common transcriptional profiles of neoplastic transformation and progression. Proc Natl Acad Sci U S A 101: 9309–9314.
  2. 2. Hundahl SA, Fleming ID, Fremgen AM, Menck HR (1998) A National Cancer Data Base report on 53,856 cases of thyroid carcinoma treated in the U.S., 1985-1995. Cancer 83: 2638–2648.
  3. 3. Yeh MW, Demircan O, Ituarte P, Clark OH (2004) False-negative fine-needle aspiration cytology results delay treatment and adversely affect outcome in patients with thyroid carcinoma. Thyroid 14: 207–215.
  4. 4. Lacroix L, Lazar V, Michiels S, Ripoche H, Dessen P, et al. (2005) Follicular thyroid tumors with the PAX8-PPARgamma1 rearrangement display characteristic genetic alterations. Am J Pathol 167: 223–231.
  5. 5. Giordano TJ, Au AY, Kuick R, Thomas DG, Rhodes DR, et al. (2006) Delineation, functional validation, and bioinformatic evaluation of gene expression in thyroid follicular carcinomas with the PAX8-PPARG translocation. Clin Cancer Res 12: 1983–1993.
  6. 6. Barden CB, Shister KW, Zhu B, Guiter G, Greenblatt DY, et al. (2003) Classification of follicular thyroid tumors by molecular signature: results of gene profiling. Clin Cancer Res 9: 1792–1800.
  7. 7. Nikiforova MN, Lynch RA, Biddinger PW, Alexander EK, Dorn GW, et al. (2003) RAS point mutations and PAX8-PPAR gamma rearrangement in thyroid tumors: evidence for distinct molecular pathways in thyroid follicular carcinoma. J Clin Endocrinol Metab 88: 2318–2326.
  8. 8. Aldred MA, Huang Y, Liyanarachchi S, Pellegata NS, Gimm O, et al. (2004) Papillary and follicular thyroid carcinomas show distinctly different microarray expression profiles and can be distinguished by a minimum of five genes. J Clin Oncol 22: 3531–3539.
  9. 9. Huang Y, Prasad M, Lemon WJ, Hampel H, Wright FA, et al. (2001) Gene expression in papillary thyroid carcinoma reveals highly consistent profiles. Proc Natl Acad Sci U S A 98: 15044–15049.
  10. 10. Jarzab B, Wiench M, Fujarewicz K, Simek K, Jarzab M, et al. (2005) Gene expression profile of papillary thyroid cancer: sources of variability and diagnostic implications. Cancer Res 65: 1587–1597.
  11. 11. Weber F, Shen L, Aldred MA, Morrison CD, Frilling A, et al. (2005) Genetic classification of benign and malignant thyroid follicular neoplasia based on a three-gene combination. J Clin Endocrinol Metab 90: 2512–2521.
  12. 12. Eszlinger M, Wiench M, Jarzab B, Krohn K, Beck M, et al. (2006) Meta- and reanalysis of gene expression profiles of hot and cold thyroid nodules and papillary thyroid carcinoma for gene groups. J Clin Endocrinol Metab 91: 1934–1942.
  13. 13. Griffith OL, Melck A, Jones SJ, Wiseman SM (2006) Meta-analysis and meta-review of thyroid cancer gene expression profiling studies identifies important diagnostic biomarkers. J Clin Oncol 24: 5043–5051.
  14. 14. Finley DJ, Zhu B, Fahey TJ, III (2004) Molecular analysis of Hurthle cell neoplasms by gene profiling. Surgery 136: 1160–1168.
  15. 15. Fontaine JF, Mirebeau-Prunier D, Franc B, Triau S, Rodien P, et al. (2008) Microarray analysis refines classification of non-medullary thyroid tumours of uncertain malignancy. Oncogene 27: 2228–2236.
  16. 16. Stolf BS, Abreu CM, Mahler-Araujo MB, Dellamano M, Martins WK, et al. (2005) Expression profile of malignant and non-malignant diseases of the thyroid gland reveals altered expression of a common set of genes in goiter and papillary carcinomas. Cancer Lett 227: 59–73.
  17. 17. Yukinawa N, Oba S, Kato K, Taniguchi K, Iwao-Koizumi K, et al. (2006) A multi-class predictor based on a probabilistic model: application to gene expression profiling-based diagnosis of thyroid tumors. BMC Genomics 7: 190.
  18. 18. Eszlinger M, Krohn K, Hauptmann S, Dralle H, Giordano TJ, et al. (2008) Perspectives for improved and more accurate classification of thyroid epithelial tumors. J Clin Endocrinol Metab 93: 3286–3294.
  19. 19. Corvilain B, Van SJ, Dumont JE, Vassart G (2001) Somatic and germline mutations of the TSH receptor and thyroid diseases. Clin Endocrinol (Oxf) 55: 143–158.
  20. 20. Baris O, Savagner F, Nasser V, Loriod B, Granjeaud S, et al. (2004) Transcriptional profiling reveals coordinated up-regulation of oxidative metabolism genes in thyroid oncocytic tumors. J Clin Endocrinol Metab 89: 994–1005.
  21. 21. Wasenius VM, Hemmer S, Kettunen E, Knuutila S, Franssila K, et al. (2003) Hepatocyte growth factor receptor, matrix metalloproteinase-11, tissue inhibitor of metalloproteinase-1, and fibronectin are up-regulated in papillary thyroid carcinoma: a cDNA and tissue microarray study. Clin Cancer Res 9: 68–75.
  22. 22. Franc B, de la SP, Lange F, Hoang C, Louvel A, et al. (2003) Interobserver and intraobserver reproducibility in the histopathology of follicular thyroid carcinoma. Hum Pathol 34: 1092–1100.
  23. 23. Lloyd RV, Erickson LA, Casey MB, Lam KY, Lohse CM, et al. (2004) Observer variation in the diagnosis of follicular variant of papillary thyroid carcinoma. Am J Surg Pathol 28: 1336–1340.
  24. 24. Ao A, Wang H, Kamarajugadda S, Lu J (2008) Involvement of estrogen-related receptors in transcriptional response to hypoxia and growth of solid tumors. Proc Natl Acad Sci U S A 105: 7821–7826.
  25. 25. Sierra JR, Corso S, Caione L, Cepero V, Conrotto P, et al. (2008) Tumor angiogenesis and progression are enhanced by Sema4D produced by tumor-associated macrophages. J Exp Med 205: 1673–1685.
  26. 26. Taylor KL, Leaman DW, Grane R, Mechti N, Borden EC, et al. (2008) Identification of interferon-beta-stimulated genes that inhibit angiogenesis in vitro. J Interferon Cytokine Res 28: 733–740.
  27. 27. Nikiforova MN, Tseng GC, Steward D, Diorio D, Nikiforov YE (2008) MicroRNA expression profiling of thyroid tumors: biological significance and diagnostic utility. J Clin Endocrinol Metab 93: 1600–1608.
  28. 28. Nucera C, Mazzon E, Caillou B, Violi MA, Moleti M, et al. (2005) Human galectin-3 immunoexpression in thyroid follicular adenomas with cell atypia. J Endocrinol Invest 28: 106–112.
  29. 29. Fukumori T, Oka N, Takenaka Y, Nangia-Makker P, Elsamman E, et al. (2006) Galectin-3 regulates mitochondrial stability and antiapoptotic function in response to anticancer drug in prostate cancer. Cancer Res 66: 3114–3119.
  30. 30. Savin S, Cvejic D, Isic T, Paunovic I, Tatic S, et al. (2008) Thyroid peroxidase and galectin-3 immunostaining in differentiated thyroid carcinoma with clinicopathologic correlation. Hum Pathol in press.
  31. 31. Schmid KW, Farid NR (2006) How to define follicular thyroid carcinoma?. Virchows Arch 448: 385–393.
  32. 32. Wattel S, Mircescu H, Venet D, Burniat A, Franc B, et al. (2005) Gene expression in thyroid autonomous adenomas provides insight into their physiopathology. Oncogene 24: 6902–6916.
  33. 33. Carinci F, Lo ML, Piattelli A, Rubini C, Chiesa F, et al. (2005) Potential markers of tongue tumor progression selected by cDNA microarray. Int J Immunopathol Pharmacol 18: 513–524.
  34. 34. Porter S, Scott SD, Sassoon EM, Williams MR, Jones JL, et al. (2004) Dysregulated expression of adamalysin-thrombospondin genes in human breast carcinoma. Clin Cancer Res 10: 2429–2440.
  35. 35. Watanabe T, Kobunai T, Sakamoto E, Yamamoto Y, Konishi T, et al. (2009) Gene expression signature for recurrence in stage III colorectal cancers. Cancer 115: 283–292.
  36. 36. Soiland H, Soreide K, Janssen EA, Korner H, Baak JP, et al. (2007) Emerging concepts of apolipoprotein D with possible implications for breast cancer. Cell Oncol 29: 195–209.
  37. 37. Netea-Maier RT, Hunsucker SW, Hoevenaars BM, Helmke SM, Slootweg PJ, et al. (2008) Discovery and validation of protein abundance differences between follicular thyroid neoplasms. Cancer Res 68: 1572–1580.
  38. 38. Delys L, Detours V, Franc B, Thomas G, Bogdanova T, et al. (2007) Gene expression and the biological phenotype of papillary thyroid carcinomas. Oncogene 26: 7894–7903.
  39. 39. Hawthorn L, Stein L, Varma R, Wiseman S, Loree T, et al. (2004) TIMP1 and SERPIN-A overexpression and TFF3 and CRABP1 underexpression as biomarkers for papillary thyroid carcinoma. Head Neck 26: 1069–1083.
  40. 40. Huang Y, de la CA, Pellegata NS (2003) Hypermethylation, but not LOH, is associated with the low expression of MT1G and CRABP1 in papillary thyroid carcinoma. Int J Cancer 104: 735–744.
  41. 41. Eszlinger M, Krohn K, Berger K, Lauter J, Kropf S, et al. (2005) Gene expression analysis reveals evidence for increased expression of cell cycle-associated genes and Gq-protein-protein kinase C signaling in cold thyroid nodules. J Clin Endocrinol Metab 90: 1163–1170.
  42. 42. Barroeta JE, Baloch ZW, Lal P, Pasha TL, Zhang PJ, et al. (2006) Diagnostic value of differential expression of CK19, Galectin-3, HBME-1, ERK, RET, and p16 in benign and malignant follicular-derived lesions of the thyroid: an immunohistochemical tissue microarray analysis. Endocr Pathol 17: 225–234.
  43. 43. He H, Jazdzewski K, Li W, Liyanarachchi S, Nagy R, et al. (2005) The role of microRNA genes in papillary thyroid carcinoma. Proc Natl Acad Sci U S A 102: 19075–19080.
  44. 44. Jarzab B, Wiench M, Fujarewicz K, Simek K, Jarzab M, et al. (2005) Gene expression profile of papillary thyroid cancer: sources of variability and diagnostic implications. Cancer Res 65: 1587–1597.
  45. 45. Weber F, Shen L, Aldred MA, Morrison CD, Frilling A, et al. (2005) Genetic classification of benign and malignant thyroid follicular neoplasia based on a three-gene combination. J Clin Endocrinol Metab 90: 2512–2521.
  46. 46. Al-Shahrour F, Minguez P, Vaquerizas JM, Conde L, Dopazo J (2005) BABELOMICS: a suite of web tools for functional annotation and analysis of groups of genes in high-throughput experiments. Nucleic Acids Res 33: W460–W464.
  47. 47. Dudoit S, Fridlyand J (2002) A prediction-based resampling method for estimating the number of clusters in a dataset. Genome Biol 3: 1–21.
  48. 48. Efron B, Tibshirani R (1997) Improvments of cross-validation: the.632+ bootstrap method. 92: 548–560.
  49. 49. Kubat M (1998) Decision trees can initialize radial-basis function networks. IEEE Trans Neural Netw 9: 813–821.
  50. 50. Vasko V, Ferrand M, Di CJ, Carayon P, Henry JF, et al. (2003) Specific pattern of RAS oncogene mutations in follicular thyroid tumors. J Clin Endocrinol Metab 88: 2745–2752.
  51. 51. Bourasseau I, Savagner F, Rodien P, Duquenne M, Reynier P, et al. (2000) No evidence of thyrotropin receptor and G(s alpha) gene mutation in high iodine uptake thyroid carcinoma. Thyroid 10: 761–765.
  52. 52. Kroll TG, Sarraf P, Pecciarini L, Chen CJ, Mueller E, et al. (2000) PAX8-PPARgamma1 fusion oncogene in human thyroid carcinoma. Science 289: 1357–1360.