Clustered Gene Expression Changes Flank Targeted Gene Loci in Knockout Mice

Background Gene expression profiling using microarrays is a powerful technology widely used to study regulatory networks. Profiling of mRNA levels in mutant organisms has the potential to identify genes regulated by the mutated protein. Methodology/Principle Findings Using tissues from multiple lines of knockout mice we have examined genome-wide changes in gene expression. We report that a significant proportion of changed genes were found near the targeted gene. Conclusions/Significance The apparent clustering of these genes was explained by the presence of flanking DNA from the parental ES cell. We provide recommendations for the analysis and reporting of microarray data from knockout mice


INTRODUCTION
Microarray technology has been widely assimilated by gene expression researchers and its success is largely due to its power of assaying thousands of transcripts in a genome wide manner. One of the most popular experimental designs has been to determine the gene expression changes that occur in knockout mice where it is correlated with phenotypic changes. The usual objective is to obtain insights into the role of the mutated protein by identifying particular genes under its control, and retrieve candidate genes that may explain the altered phenotype in the mutant animal.
The creation of knockout mice involves the modification of the genome of mouse embryonic stem (ES) cells through a variety of processes including homologous recombination or other mutagenic procedures [1]. The majority of gene targeting experiments have been performed in ES cells derived from the 129 strain of mice. Following germ-line transmission of the mutant allele, the founder mice are often backcrossed onto other strains, creating hybrid animals. The relative contribution of the genetic background of the ES cell and the strain to which it was bred has been widely studied [2].
We have studied the effects of knockout (KO) mutations on the mechanisms of learning and memory [3,4]. Patterns of neural activity induced by learning or artificial stimuli lead to activation of signal transduction pathways, which ultimately lead to the cellular changes underlying learning and memory. The two major cellular changes involve strength of synaptic transmission (synaptic plasticity) and gene expression (activity-dependent gene expression). It is clear from studies in the sensory-motor synapses of Aplysia californica and the hippocampus of mice and rats that neurotransmitter receptors initiate the signalling events that lead to both synaptic plasticity and activity-dependent gene expression [5,6,7,8,9,10,11]. L-glutamate is the dominant excitatory neurotransmitter in the central nervous system of vertebrates and glutamate receptors are of special relevance to synaptic plasticity. The ionotropic N-methyl-D-aspartate (NMDA) and metabotropic (mGLuR) subtypes of glutamate receptor are commonly involved in experimental plasticity-driven transcriptional changes. Both the NMDA and mGluR receptors are connected by a series of adaptor proteins [12,13] and can be physically isolated from mouse brain in ,2000-kDa multiprotein complexes called NRC or MASC (NRC, NMDA Receptor Complex; MASC, MAGUK Associated Signaling Complex) [14,15,16,17]. NRCs are organized into receptor, adaptor, signalling, cytoskeletal and novel proteins, and includes cell adhesion molecules, kinases, phosphatases, GTPase-activating proteins and Ras and MAPK pathway components. NRC/MASC form a subset of the post-synaptic density which is thought to be comprised of a set of complexes. Such a complex offers an attractive model for the organisation and coordination signalling networks [14,17,18]. It was hypothesised that the function of the NRC is to orchestrate the signalling pathways downstream of receptor activation and coordinate both synaptic plasticity and activity-dependent gene expression amongst other changes.
We were specifically interested in the previously characterized KO mice for PSD-95 and SAP102 [3,4]. Both proteins are from the MAGUK (Membrane Associated GUanylate Kinase) family, which members share a common structure of three PDZ (PSD-95/ Dlg/ZO-1), an SH3 (Src-homology-3) and guanylate kinase (GK) domains and together enable protein-protein interactions. The postsynaptic density 95 (PSD-95) protein is a homolog of the Drosophila suppressor gene, dlg [19] and acts as a scaffold to organize postsynaptic proteins into signalling complexes [4] or anchoring complexes for glutamate receptors and cytoskeletal proteins, as reviewed elsewhere [12]. Mice lacking PSD-95 or SAP102 were found to have enhanced forms of NMDA receptordependent long-term potentiation [3,4,20,21] and exhibited impairments in spatial learning (and other forms of behavioural plasticity) [22,23,24]. Our original aim was to identify those transcripts that were controlled by PSD-95 and SAP102 and therefore try to track their dependent signalling pathways. Surprisingly, and because of the unbiased nature of microarray analyses, we found that the most significantly changing genes were clustered in the neighbourhood of the KO gene. We observed that this phenomenon was frequent in KO studies and, therefore, special care should be taken in the interpretation of this kind of experiments.

RESULTS
To determine the gene expression changes in the hippocampus of mice lacking SAP102 or PSD-95, total RNA was isolated from the hippocampus of SAP102 2/Y and PSD-95 2/2 mutant mice and littermate wild-types. RNA was hybridized to MG-430 2.0 arrays (Affymetrix) that contained 45,037 probe sets excluding controls. The data was normalized and filtered (see Materials and Methods). We first examined the expression data from SAP102 mutants and found that 65% (13/20) of the significantly changed probe sets were encoded by chromosome X, which also encodes SAP102. This unexpected finding suggests that either SAP102 preferentially regulates genes on the X-chromosome or that there is some cisacting effect of the mutation.
We focused on the SAP102 regulated genes on the Xchromosome and found they were clustered around the SAP102/dlgh3 gene ( Figure 1). These genes spanned a region of ,98 Mb ( Figure 1A, Table 1). Within this cluster was the hprt (hypoxanthine guanine phosphoribosyl transferase) gene, a selection marker extensively used in ES cells to generate transgenic mice [25,26,27,28]. In our case, the ES cell clone used to generate our knockout mice was E14TG2a HM-1, which harbours an hprt allele containing a 55 kb-deletion in its 59-flanking region [29]. Using a PCR protocol specifically designed to genotype this deletion [30], we found it was correlated with the reduction of HPRT expression (data not shown). The hprt and SAP102/dlgh3 loci are separated by ,48 Mb and since the hprt locus of 129S5 mice does not contain the deletion, these data suggested that the observed changes in hprt gene expression in SAP102 mutant mice was the result of a linked mutation originated in the ES cells. Following gene targeting of SAP102/dlgh3 gene in E14TG2a ES cells (129P2 background), the chimera was crossed to 129S5 and subsequently backcrossed. This provided an important clue that genetic background of the ES cells might be contributing to the gene expression profiling.
To further explore this hypothesis, we examined the degree of linkage between both loci. Using the MGI database (www. informatics.jax.org) we estimated the percentage of independent segregation of hprt and SAP102/dlg3 to be 27% (27 cM). We tested this using a cross between SAP102/dlg3 heterozygous females and 129S5 (wild type) males, which produced 37 litters, and subsequent genotyping. Only 27% of the heterozygous females had two wild-type alleles of hprt and 29% of E17.5 male embryos showed segregation of hprt and SAP102/dlg3 mutations. These data are in keeping with the conclusion that the entire portion between both loci originated in the original E14TG2a ES cells.
These observations suggest that further backcrossing and marker selection would result in a reduction of the E14TG2a DNA. To explore this possibility we analyzed microarray data from SAP102 2/Y hippocampi that contained the wild-type allele of hprt mutation. In this experiment we used SAP102 2/Y mutant mice that had been backcrossed 3 generations onto the C57BL/6j background. In this case, only six probe sets were changed from the X chromosome (,29% of the significant probe sets; 6/21) and spanned a shorter region of ,48 Mb ( Figure 1A, Table 1). This clustering was significantly different from a random distribution along the X chromosome (P,0.05, x 2 = 26.6 d.f. = 16). Four of the five probe sets were common to the previous experiment (Table 1) and the change was similar in direction and magnitude, thus reproducing the effect of the ES cell-derived sequences and suggesting changes were independent of the host backcrossing genetic background. Since we used C57BL/6j animals for the new analysis, we could take advantage of the availability of primers that distinguished C57-and 129-derived strains genotypes. We focused on Atp7a, which expression was changed in the microarray experiment and is found ,5 Mb from SAP102/dlg3 and tested if it was of 129 or C57BL/6 origin. Using a published genotyping protocol (in which the amplicon covers exon 8 of Atp7a [31]), we found that the all samples that showed changes in Atp7a expression were of 129 origin ( Figure 2). These data further support the conclusion that the changes in gene expression in the mutant mice involving the chromosome X can be explained by genetic background effects.
Similar effects were observed for other mutations involving autosomes. We took advantage of the results obtained in another knockout for a MAGUK protein, PSD-95, which is encoded on chromosome 11. Using an identical microarray platform to that used for SAP102 mutants, we found that ,47% (8/17) of the significantly changed probes in hippocampus were encoded on chromosome 11. As shown in Figure 1B there was significant clustering of probe sets around the dlgh4 locus and spanned a region of ,5 Mb (P,0.001, x 2 = 33.7 d.f. = 11). This shorter length of the ES cell-derived region in PSD-95 mutants may be due to the larger number of backcrosses (.10 in PSD-95 onto a C57BL/6 background) compared to the SAP102 mutants (,48 Mb after 3 backcrossed on C57BL/6). These data are consistent with the view that the number of backcrosses helps to shorten the length of ES cell-derived sequences as well as marker selection (hprt in the SAP102 mutants) and should be considered in microarray profiling experiments in knock-out animals. The above data is based on studies of hippocampus transcript levels and we asked if similar results would be observed in other regions of the nervous system. Using cortex RNA from PSD-95 mutants we found that 4 of the 6 probe sets that were significant in the hippocampus were also significant in the cortex of the same animals (Table 2), thus reproducing the effects of ES-cell derived sequences in gene expression. For further details on the statistical analysis of both SAP102 and PSD-95 mutants, see Tables 3 and 4.
We also validated the microarray experiments performed in SAP102 and PSD-95 mutants by qRT-PCR. As shown in Figure 3 the changes in gene expression detected with microarrays were confirmed using qRT-PCR for SAP102, HPRT, Fundc2, and BB315069 (39 to Fundc2) in the SAP102 mutants and PSD-95, Fbxo39 and BG092359 (39 to Fbxo39) in the PSD-95 mutants.
From the above experiments on brain mRNA from SAP102 and PSD-95 mutants it seems likely that these effects of genes linked to the mutant allele might be generally important in other knockout lines and other tissues. To address this issue we analyzed recent data deposited in ArrayExpress (www.ebi.ac.uk/arrayexpress/) that employed the same format of arrays as used in our study. The purpose was not to exhaustively compile the existent data but to provide further examples. Data from four mutant genes were analyzed: Foxa3 [32], Nebulin (neb) [33], Ercc1 [34] and Fibulin-4 (efemp2) [35] (Figure 4), using RNA derived from testes, skeletal muscle, liver and aorta respectively.
In all four knockouts we found changes in gene expression in genes near to the targeted locus; however the potential boundary between ES cell and host DNA was not clear in all mutants (Table 3, Figure 4). The number of changed genes was plotted for the length of the relevant chromosome and the position of the targeted gene indicated. This allows us to examine clustering 'within chromosomes' and to compare with random distributions (see Materials and Methods). For chromosome 7 and the foxa3 locus, of the 8 genes that changed on this chromosome (including the mutated gene), all were clustered within 28 Mb of the mutation (P,0.01, x 2 = 31.1 d.f. = 14), despite 11 backcrosses onto C57BL/6 strain [32]. For ercc1 and neb mutants, the changed genes were distributed more widely on chromosome 7 and 2 respectively. Although there was a trend toward clustering with the mutant locus, this distribution was not statistically significant (Table 3): for ercc1 there were 72.7% of significant genes within ,60 Mb (,42% of the chromosome); for neb there were 53.3% within ,75 Mb (,42% of the chromosome). The inability to identify more conclusive evidence of linkage in ercc1 or neb mutants was probably due to the large number of changed genes: compared to SAP102, PSD-95 and Foxa3 where 21, 12 and 10 genes respectively were changed, 140 genes were changed in ercc1 mutants and 92 in neb mutants (Table 4). In the case of the efemp2 locus on chromosome 19, there were 4 genes changed; however clustering was not statistically significant (Table 3) perhaps because chromosome 19 is too short and only a small portion (,8% of the chromosome, which was absent of significant genes) was estimated to segregate independently and thus it is possible that the entire chromosome was derived from ES-cell background. Thus using the 'within chromosome' clustering does not reveal statistically significant results for all knockout mice and appears to be influenced by chromosome size and number of significantly changed genes. To overcome these limitations we used an alternative approach where we compared the frequency of changed genes and their contribution to deviation from random distributions on the targeted chromosome with other chromosomes (see Materials and Methods). Using this 'between chromosome' method we found that of 7 mutant examined, 6 showed evidence of clustering of changed genes to the targeted chromosome (Table 4).

DISCUSSION
Microarrays are a powerful method for detecting changes in gene expression and here we examine data from a range of different knockout mice. When comparing RNA from mutant mice and wild type controls, we observed a higher than expected frequency of changed genes near to the targeted/knocked-out gene. These genes were either clustered around the targeted allele or overrepresented on that chromosome. Recent reports found evidence of linked changes in two knockouts [36,37] (one of them published during the preparation of this manuscript). In our studies of PSD-95 and SAP102, the gene was disrupted in 129P2 ES cells and the RNA extracted from mice on a hybrid background (since a distinct mouse strain was used during breeding of the mutants). We consider the most likely explanation for the changes in gene expression of the genes clustered around the knockout locus to be that these genes were of ES cell origin (flanking DNA). In this model, the flanking DNA (from ES cells) contains polymorphisms that alter gene expression relative to the genetic background of the backcrossed mice. We found changes in gene expression in SAP102 mice (from 129P2 ES cells) backcrossed onto 129S5 and C57BL/6 strains. Although 129P2 and 129S5 might be closely related genetic backgrounds, a large variability has been reported between 129 substrains [38,39]. The notion that flanking DNA can contribute to complications in mouse behavioural studies has been recognised and breeding strategies recommended to minimise its effects [40]. We have reported here that although backcrosses shorten the length of ES cell flanking regions more than ten backcrosses was insufficient to completely eliminate them (see PSD-95 and Foxa3 examples), as recognized in [40]. Marker selection (as the example provided by hprt) can accelerate the identification of mice with shorter flanking DNA.
An alternative model, which we cannot exclude, is that the changes in gene expression around the knockout locus were because of the modifications introduced during gene targeting (e.g.  To explore a cis effect in more detail, a future approach could consist of analyzing microarray data from knock-in animals, where the insertion is made into a region lacking functional elements (i.e., coding region or a regulatory element). Microarray analysis in knock-out animals should be carefully examined to avoid misinterpretation in the gene expression profile.
Rather than assign the observed changes to the mutation, the genotype of the changed genes needs to be tested. This current study highlights the effects of flanking DNA of ES cell origin, which is selected during breeding of the mutant mice. It may therefore be necessary to genotype all changed genes before studying mutation-dependent effects. Although this may not be practical in all situations, the simplest approach would be to treat those changed genes on the targeted chromosome with extreme caution and only consider changes on other chromosomes or in those portions of the targeted chromosome estimated to segregate independently. Future array based methods could include mRNA profiling and a separate array for profiling the genotype of the Affy IDs in bold are probe sets that appeared in hippocampus and cortex; Affy IDs in italic are probe sets only appearing in cortex. The difference in probe sets within both tissues was likely due to differences in the statistical power of both analysis (9 hippocampal samples vs.      corresponding probe sets and their surrounding regions. We would also recommend that when gene expression data is reported on mutant mice, that the chromosome location of the genes encoding those mRNAs is clearly reported. Studies on mutant mice on the inbred background of the ES cells would obviate the need to perform genotyping.

Animals
For microarray studies, the ages of mice used were as follows: 6 weeks old (129S5 strain); 12-13 weeks old (C57BL/6j strain). Number of mice: five SAP102 2/Y and five littermate wild-types (129S5 strain); four SAP102 2/Y and three littermate wild-types (C57BL/6j strain); five PSD-95 2/2 and four littermate wild-types (hippocampus, C57BL/6j strain); two PSD-95 2/2 and two littermate wild-types (cortex, C57BL/6j strain). Animals were dispatched by cervical dislocation. All dissections were performed consecutively on the same day at 4uC and tissue kept moist with chilled PBS (Gibco). Collected tissue (hippocampus and cortex) was immediately frozen in liquid nitrogen or immersed into RNAlater reagent (Qiagen) on ice for RNA isolation. Animals were treated in accordance with the UK Animal Scientific Procedures Act (1986).

Nucleic acids isolation
RNeasy Mini and Midi columns (Qiagen) were used to purify the RNA from hippocampus and cortex, respectively. On-column DNase treatment was carried out to remove traces of DNA. During all the purifications, manufacturer's instructions were followed. Samples were concentrated with the addition of 1/10 vol of 3 M NaOAc pH 6 and 3 vol of 100% EtOH and precipitated at 220uC. Once resuspended in 12 ml of RNase-free H 2 O, an aliquot was used to check the RNA integrity in 1% formaldehydeagarose gels.
To isolate genomic DNA, mouse tails were cut into 4-5 pieces and treated with 500 ml of the Lysis Buffer (10 mM Tris-HCl pH = 7.5, 1 mM EDTA, 1% SDS, 10 mM NaCl) and 2.5 ml Proteinase K (20 mg/ml) during 37 uC O/N. Then, 500 ml buffered phenol was added and the mix was incubated 1-2 h with gentle rotation at 4 uC. Once taking the supernatant after a lowspeed spin (5,000 rpm 5 min), 500 ml of isoamylic-chloroformphenol was used and spin at 13,000 rpm for 5 min. DNA was immediately precipitated by adding 1 ml isopropyl alcohol. After spinning at 13,000 rpm for 5 min, the pellet was washed with 500 ml 70% ethanol and resuspend in 100 ml H 2 O.

Microarray hybridization, data acquisition and statistical analyses
For the transcriptome profiling, MG-430 2.0 arrays (Affymetrix) were used, that contain 45,101 probe sets including controls, one array per sample. Briefly, 3 mg of total RNA was reverse transcribed, labelled and hybridized using One-cycle Labelling Kit instructions (Affymetrix). Fluidics Station 450 and GCS3000, both from Affymetrix, were used for the washing and scanning steps respectively.
Signal intensities values and detection flags were extracted using GCOS (Affymetrix), using a Target value of 500 in the scaling step. Data were normalized by the median per gene and per array using GeneSpring v7 (Silicon Genetics): i) ANOVA test (p,0.05) without assuming equal variances, ii) signal intensity must be a Present or Marginal call in all wild-type samples when downregulated and knock-out samples when up-regulated, and iii) fold change .1.8 that was selected arbitrarily for the purposes of the present work. In parallel data was processed using dChip (PM/ MM difference model) [41]. Only those probe sets appearing in both analyses were considered.
The significant genes obtained in the microarray analyses were grouped in bins of 10 Mb to simplify both visualization in the figures and x 2 tests. Centromeres were not considered. We compared the distribution of the significant genes along the targeted chromosome to a random distribution. This random distribution was calculated by dividing the number of significant genes by the number of bins. When testing the distribution of significant genes between all the chromosomes, three types of random distribution were analyzed: equal proportion of significant genes in all the chromosomes, proportion normalized by the length of each chromosome and proportion normalized by the number of genes (based upon Ensembl release 45). The microarray data is deposited in the ArrayExpress database, www.ebi.ac.uk/arrayexpress, accession number E-MEXP-1159. The resulting amplicons were checked for specificity and size in agarose gels. The data generated was analysed in 7500 System SDS v1.2.2 software, that calculated the cycle threshold (C T ) and the fold change ( = 2 2DDCT ). GAPDH was used as endogenous controls. To analyze statistically the quantitative PCR results, Student's t-test was used.
For genotyping, each PCR was performed using 1 ml genomic DNA, 1 U Platinum Taq DNA polymerase (Invitrogen), 5 ml PCR buffer 106 (Invitrogen), 10 mM each dNTP, 50 mM MgCl 2 and 0.5 mM each primer in a final volume of 50 ml. For hprt alleles genotyping, the protocol described in [30] was used with few modifications. Reactions to detect wild-type and mutant alleles were set up in parallel for the same genomic DNA: 1 cycle of 94uC 5 min, 35 cycles of 94uC 30 s, 62uC 30 s for the wild-type allele or 67uC for the null allele, 72uC 60 s and 1 cycle of 72uC 5 min. Addition of 1% DMSO was required for the null allele genotyping. For atp7a allele genotyping, the protocol described in [31] was used with few modifications, in which the conditions were 1 cycle of 94uC 5 min, 35 cycles of 94uC 30 s, 55uC 30 s, 72uC 60 s and 1 cycle of 72uC 5 min.