Principles of MicroRNA–Target Recognition

MicroRNAs (miRNAs) are short non-coding RNAs that regulate gene expression in plants and animals. Although their biological importance has become clear, how they recognize and regulate target genes remains less well understood. Here, we systematically evaluate the minimal requirements for functional miRNA–target duplexes in vivo and distinguish classes of target sites with different functional properties. Target sites can be grouped into two broad categories. 5′ dominant sites have sufficient complementarity to the miRNA 5′ end to function with little or no support from pairing to the miRNA 3′ end. Indeed, sites with 3′ pairing below the random noise level are functional given a strong 5′ end. In contrast, 3′ compensatory sites have insufficient 5′ pairing and require strong 3′ pairing for function. We present examples and genome-wide statistical support to show that both classes of sites are used in biologically relevant genes. We provide evidence that an average miRNA has approximately 100 target sites, indicating that miRNAs regulate a large fraction of protein-coding genes and that miRNA 3′ ends are key determinants of target specificity within miRNA families.


Introduction
MicroRNAs (miRNAs) are small non-coding RNAs that serve as post-transcriptional regulators of gene expression in plants and animals. They act by binding to complementary sites on target mRNAs to induce cleavage or repression of productive translation (reviewed in [1,2,3,4]). The importance of miRNAs for development is highlighted by the fact that they comprise approximately 1% of genes in animals, and are often highly conserved across a wide range of species (e.g., [5,6,7]). Further, mutations in proteins required for miRNA function or biogenesis impair animal development [8,9,10,11,12,13,14,15].
To date, functions have been assigned to only a few of the hundreds of animal miRNA genes. Mutant phenotypes in nematodes and flies led to the discovery that the lin-4 and let-7 miRNAs control developmental timing [16,17], that lsy-6 miRNA regulates left-right asymmetry in the nervous system [18], that bantam miRNA controls tissue growth [19], and that bantam and miR-14 control apoptosis [19,20]. Mouse miR-181 is preferentially expressed in bone marrow and was shown to be involved in hematopoietic differentiation [21]. Recently, mouse miR-375 was found to be a pancreatic-islet-specific miRNA that regulates insulin secretion [22].
Prediction of miRNA targets provides an alternative approach to assign biological functions. This has been very effective in plants, where miRNA and target mRNA are often nearly perfectly complementary [23,24,25]. In animals, functional duplexes can be more variable in structure: they contain only short complementary sequence stretches, interrupted by gaps and mismatches. To date, specific rules for functional miRNA-target pairing that capture all known functional targets have not been devised. This has created problems for search strategies, which apply different assumptions about how to best identify functional sites. As a result, the number of predicted targets varies considerably with only limited overlap in the top-ranking targets, indicating that these approaches might only capture subsets of real targets and/or may include a high number of background matches ( [19,26,27,28,29,30]; reviewed by [31]). Nonetheless, a number of predicted targets have proven to be functional when subjected to experimental tests [19,26,27,29].
A better understanding of the pairing requirements between miRNA and target would clearly improve predictions of miRNA targets in animals. It is known that defined cis-regulatory elements in Drosophila 39 UTRs are complementary to the 59 ends of certain miRNAs [32]. The importance of the miRNA 59 end has also emerged from the pairing characteristics and evolutionary conservation of known target sites [26], and from the observation of a nonrandom statistical signal specific to the 59 end in genomewide target predictions [27]. Tissue culture experiments have also underscored the importance of 59 pairing and have provided some specific insights into the general structural requirements [29,33,34], though different studies have conflicted to some degree with each other, and with known target sites (reviewed in [31]). To date, no specific role has been ascribed to the 39 end of miRNAs, despite the fact that miRNAs tend to be conserved over their full length.
Here, we systematically evaluate the minimal requirements for a functional miRNA-target duplex in vivo. These experiments have allowed us to identify two broad categories of miRNA target sites. Targets in the first category, ''59 dominant'' sites, base-pair well to the 59 end of the miRNA. Although there is a continuum of 39 pairing quality within this class, it is useful to distinguish two subtypes: ''canonical'' sites, which pair well at both the 59 and 39 ends, and ''seed'' sites, which require little or no 39 pairing support. Targets in the second category, ''39 compensatory'' sites, have weak 59 base-pairing and depend on strong compensatory pairing to the 39 end of the miRNA. We present evidence that all of these site types are used to mediate regulation by miRNAs and show that the 39 compensatory class of target sites is used to discriminate among individual members of miRNA families in vivo. A genome-wide statistical analysis allows us to estimate that an average miRNA has approximately 100 evolutionarily conserved target sites, indicating that miRNAs regulate a large fraction of protein-coding genes. Evaluation of 39 pairing quality suggests that seed sites are the largest group. Sites of this type have been largely overlooked in previous target prediction methods.

The Minimal miRNA Target Site
To improve our understanding of the minimal requirements for a functional miRNA target site, we made use of a simple in vivo assay in the Drosophila wing imaginal disc. We expressed a miRNA in a stripe of cells in the central region of the disc and assessed its ability to repress the expression of a ubiquitously transcribed enhanced green fluorescent protein (EGFP) transgene containing a single target site in its 39 UTR. The degree of repression was evaluated by comparing EGFP levels in miRNA-expressing and adjacent non-expressing cells. Expression of the miRNA strongly reduced EGFP expression from transgenes containing a single functional target site ( Figure 1A).
In a first series of experiments we asked which part of the RNA duplex is most important for target regulation. A set of transgenic flies was prepared, each of which contained a different target site for miR-7 in the 39 UTR of the EGFP reporter construct. The starting site resembled the strongest bantam miRNA site in its biological target hid [19] and conferred strong regulation when present in a single copy in the 39 UTR of the reporter gene ( Figure 1B). We tested the effects of introducing single nucleotide changes in the target site to produce mismatches at different positions in the duplex with the miRNA (note that the target site mismatches were the only variable in these experiments). The efficient repression mediated by the starting site was not affected by a mismatch at positions 1, 9, or 10, but any mismatch in positions 2 to 8 strongly reduced the magnitude of target regulation. Two simultaneous mismatches introduced into the 39 region had only a small effect on target repression, increasing reporter activity from 10% to 30%. To exclude the possibility that these findings were specific for the tested miRNA sequence or duplex structure, we repeated the experiment with miR-278 and a different duplex structure. The results were similar, except that pairing of position 8 was not important for regulation in this case ( Figure 1C). Moreover, some of the mismatches in positions 2-7 still allowed repression of EGFP expression up to 50%. Taken together, these observations support previous suggestions that extensive base-pairing to the 59 end of the miRNA is important for target site function [26,27,29,32,34].
We next determined the minimal 59 sequence complementarity necessary to confer target regulation. We refer to the core of 59 sequence complementarity essential for target site recognition as the ''seed'' (Lewis et al. [27]). All possible 6mer, 5mer, and 4mer seeds complementary to the first eight nucleotides of the miRNA were tested in the context of a site that allowed strong base-pairing to the 39 end of the miRNA (Figure 2A). The seed was separated from a region of complete 39 end pairing by a constant central bulge. 5mer and 6mer seeds beginning at positions 1 or 2 were functional. Surprisingly, as few as four base-pairs in positions 2-5 conferred efficient target regulation under these conditions, whereas bases 1-4 were completely ineffective. 4mer, 5mer, or 6mer seeds beginning at position 3 were less effective. These results suggest that a functional seed requires a continuous helix of at least 4 or 5 nucleotides and that there is some position dependence to the pairing, since sites that produce comparable pairing energies differ in their ability to function. For example, the first two duplexes in Figure 2A (4mer, top row) have identical 59 pairing energies (DG for the first 8 nt was À8.9 kcal/mol), but only one is functional. Similarly, the third 4mer duplex and fourth 5mer duplex (middle row) have the same energy (À8.7 kcal/mol), but only one is functional. We thus do not find a clear correlation between 59 pairing energy and function, as reported in [34]. These experiments also indicate that extensive 39 pairing of up to 17 nucleotides in the absence of the minimal 59 element is not sufficient to confer regulation. Consequently, target searches based primarily on optimizing the extent of base-pairing or the total free energy of duplex formation will include many nonfunctional target sites [28,30,35], and ranking miRNA target sites according to overall complementarity or free energy of duplex formation might not reflect their biological activity [26,27,28,30,35].
To determine the minimal lengths of 59 seed matches that are sufficient to confer regulation alone, we tested single sites that pair with eight, seven, or six consecutive bases to the miRNA's 59 end, but that do not pair to its 39 end ( Figure 2B). Surprisingly, a single 8mer seed (miRNA positions 1-8) was sufficient to confer strong regulation by the miRNA. A single 7mer seed (positions 2-8) was also functional, although less effective. The magnitude of regulation for 8mer and 7mer seeds was strongly increased when two copies of the site were introduced in the UTR. In contrast, 6mer seeds showed no regulation, even when present in two copies. Comparable results were recently reported for two copies of an 8mer site with limited 39 pairing capacity in a cell-based assay [34]. These results do not support a requirement for a central bulge, as suggested previously [29].
We took care in designing the miRNA 39 ends to exclude any 39 pairing to nearby sequence according to RNA secondary structure prediction. However, we cannot rule out the possibility that extensive looping of the UTR sequence might allow the 39 end to pair to sequences further downstream in our reporter constructs. Note, however, that even if remote 39 pairing was occurring and required for function of 8-and 7mer seeds, it is not sufficient for 59 matches with less than seven complementary bases (all test sites are in the same sequence context; Figure 2B). In addition, pairing at a random level will occur in any sequence if long enough loops are allowed. However, whether the ribonucleoprotein complexes involved in translational repression require 39 pairing, and whether they are able to allow extensive looping to achieve this, remains an open question. Computationally, remote 39 pairing cannot  be distinguished from random matches if loops of any length are allowed. On this basis any site with a 7-or 8mer seed has to be taken seriously-especially when evolutionarily conserved.
From these experiments we conclude that (1) complementarity of seven or more bases to the 59 end miRNA is sufficient to confer regulation, even if the target 39 UTR contains only a single site; (2) sites with weaker 59 complementarity require compensatory pairing to the 39 end of the miRNA in order to confer regulation; and (3) extensive pairing to the 39 end of the miRNA is not sufficient to confer regulation on its own without a minimal element of 59 complementarity.

The Effect of G:U Base-Pairs and Bulges in the Seed
Several confirmed miRNA target genes contain predicted binding sites with seeds that are interrupted by G:U basepairs or single nucleotide bulges [17,19,26,36,37,38,39]. In most cases these mRNAs contain multiple predicted target sites and the contributions of individual sites have not been tested. In vitro tests have shown that sites containing G:U base-pairs can function [29,34], but that G:U base-pairs contribute less to target site function than would be expected from their contribution to the predicted base-pairing energy [34]. We tested the ability of single sites with seeds containing G:U base-pairs and bulges to function in vivo. One, two, or three G:U base-pairs were introduced into single target sites with 8mer, 7mer, or 6mer seeds ( Figure 3A). A single G:U base-pair caused a clear reduction in the efficiency of regulation by an 8mer seed site and by a 7mer seed site. The site with a 6mer seed lost its activity almost completely. Having more than one G:U base-pair compromised the activity of all the sites. As the target sites were designed to allow optimal 39 pairing, we conclude that G:U base-pairs in the seed region are always detrimental.
Single nucleotide bulges in the seed are found in the let-7 target lin-41 and in the lin-4 target lin-14 [17,36,37]. Recent tissue culture experiments have led to the proposal that such bulges are tolerated if positioned symmetrically in the seed region [29]. We tested a series of sites with single nucleotide bulges in the target or the miRNA ( Figure 3B). Only some of these sites conferred good regulation of the reporter gene. Our results do not support the idea that such sites depend on a symmetrical arrangement of base-pairs flanking the bulge. We also note that the identity of the bulged nucleotide seems to matter. While it is clear that some target sites with one nucleotide bulge or a single mismatch can be functional if supported by extensive complementarity to the miRNA 39 end, it is not possible to generalize about their potential function.

Functional Categories of Target Sites
While recognizing that there is a continuum of basepairing quality between miRNAs and target sites, the experiments presented above suggest that sites that depend critically on pairing to the miRNA 59 end (59 dominant sites) can be distinguished from those that cannot function without strong pairing to the miRNA 39 end (39 compensatory sites). The 39 compensatory group includes seed matches of four to six base-pairs and seeds of seven or eight bases that contain G:U base-pairs, single nucleotide bulges, or mismatches.
We consider it useful to distinguish two subgroups of 59 dominant sites: those with good pairing to both 59 and 39 ends of the miRNA (canonical sites) and those with good 59 pairing but with little or no 39 pairing (seed sites). We consider seed sites to be those where there is no evidence for pairing of the miRNA 39 end to nearby sequences that is better than would be expected at random. We cannot exclude the possibility that some sites that we identify as seed sites might be supported by additional long-range 39 pairing. Computationally, this is always possible if long enough loops in the UTR sequence are allowed. Whether long loops are functional in vivo remains to be determined.
Canonical sites have strong seed matches supported by strong base-pairing to the 39 end of the miRNA. Canonical sites can thus be seen as an extension of the seed type (with enhanced 39 pairing in addition to a sufficient 59 seed) or as an extension of the 39compensatory type (with improved 59 seed quality in addition to sufficient 39 pairing). Individually, canonical sites are likely to be more effective than other site types because of their higher pairing energy, and may function in one copy. Due to their lower pairing energies, seed sites are expected to be more effective when present in more than one copy. Figure 4 presents examples of the different site types in biologically relevant miRNA targets and illustrates their evolutionary conservation in multiple drosophilid genomes.
Most currently identified miRNA target sites are canonical. For example, the hairy 39 UTR contains a single site for miR-7, with a 9mer seed and a stretch of 39 complementarity. This site has been shown to be functional in vivo [26], and it is strikingly conserved in the seed match and in the extent of complementarity to the 39 end of miR-7 in all six orthologous 39 UTRs.
Although seed sites have not been previously identified as functional miRNA target sites, there is some evidence that they exist in vivo. For example, the Bearded (Brd) 39 UTR contains three sequence elements, known as Brd boxes, that are complementary to the 59 region of miR-4 and miR-79 [32,40]. Brd boxes have been shown to repress expression of a reporter gene in vivo, presumably via miRNAs, as expression of a Brd 39 UTR reporter is elevated in dicer-1 mutant cells, which are unable to produce any miRNAs [14]. All three Brd box target sites consist of 7mer seeds with little or no base-pairing to the 39 end of either miR-4 or miR-79 (see below). The alignment of Brd 39 UTRs shows that there is little conservation in the miR-4 or miR-79 target sites outside the seed sequence, nor is there conservation of pairing to either miRNA 39 end. This suggests that the sequences that could pair to the 39 end of the miRNAs are not important for regulation as they do not appear to be under selective pressure. This makes it unlikely that a yet unidentified Brd box miRNA could form a canonical site complex.
The 39 UTR of the HOX gene Sex combs reduced (Scr) provides a good example of a 39 compensatory site. Scr contains a single site for miR-10 with a 5mer seed and a continuous 11base-pair complementarity to the miRNA 39 end [28]. The miR-10 transcript is encoded within the same HOX cluster downstream of Scr, a situation that resembles the relationship between miR-iab-5p and Ultrabithorax in flies [26] and miR-196/HoxB8 in mice [41]. The predicted pairing between miR-10 and Scr is perfectly conserved in all six drosophilid genomes, with the only sequence differences occurring in the unpaired loop region. The site is also conserved in the 39 UTR of the Scr genes in the mosquito, Anopheles gambiae, the flour beetle, Tribolium castaneum, and the silk moth, Bombyx mori. Conservation of such a high degree of 39 complementarity over hundreds of millions of years of evolution suggests that this is likely to be a functional miR-10 target site. Extensive 59 and 39 sequence conservation is also seen for other 39 compensatory sites, e.g., the two let-7 sites in lin-41 or the miR-2 sites in grim and sickle [17,26,36]. The miRNA 39 End Determines Target Specificity within miRNA Families Several families of miRNAs have been identified whose members have common 59 sequences but differ in their 39 ends. In view of the evidence that 59 ends of miRNA are functionally important [26,27,29,42], and in some cases sufficient (present study), it can be expected that members of miRNA families may have redundant or partially redundant functions. According to our model, 59 dominant canonical and seed sites should respond to all members of a given miRNA family, whereas 39 compensatory sites should differ in their sensitivity to different miRNA family members depending on the degree of 39 complementarity. We tested this using the wing disc assay with 39 UTR reporter transgenes and overexpression constructs for various miRNA family members.
miR-4 and miR-79 share a common 59 sequence that is complementary to a single 8mer seed site in the bagpipe 39 UTR ( Figure 5A and 5B). The 39 ends of the miRNAs differ. miR-4 is predicted to have 39 pairing at approximately 50% of the maximally possible level (À10.8 kcal/mol), whereas the level of 39 pairing for miR-79 is approximately 25% maximum (À6.1 kcal/mol), which is below the average level expected for random matches (see below). Both miRNAs repressed expression of the bagpipe 39 UTR reporter, regardless of the 39 complementarity ( Figure 5B). This indicates that both  types of site are functional in vivo and suggests that bagpipe is a target for both miRNAs in this family.
To test whether miRNA family members can also have nonoverlapping targets, we used 39 UTR reporters of the proapoptotic genes grim and sickle, two recently identified miRNA targets [26]. Both genes contain K boxes in their 39 UTRs that are complementary to the 59 ends of the miR-2, miR-6, and miR-11 miRNA family [26,32]. These miRNAs share residues 2-8 but differ considerably in their 39 regions ( Figure 5A). The site in the grim 39 UTR is predicted to form a 6mer seed match with all three miRNAs ( Figure 5C, left), but only miR-2 shows the extensive 39 complementarity that we predict would be needed for a 39 compensatory site with a 6mer seed to function (À19.1 kcal/mol, 63% maximum 39 pairing, versus À10.9 kcal/mol, 46% maximum, for miR-11 and À8.7 kcal/mol, 37% maximum, for miR-6). Indeed, only miR-2 was able to regulate the grim 39 UTR reporter, whereas miR-6 and miR-11 were non-functional.
The sickle 39 UTR contains two K boxes and provides an opportunity to test whether weak sites can function synergistically. The first site is similar to the grim 39 UTR in that it contains a 6mer seed for all three miRNAs but extensive 39 complementarity only to miR-2. The second site contains a 7mer seed for miR-2 and miR-6 but only a 6mer seed for miR-11 ( Figure 5C, right). miR-2 strongly downregulated the sickle reporter, miR-6 had moderate activity (presumably via the 7mer seed site), and miR-11 had nearly no activity, even though the miRNAs were overexpressed. The fact that a site is targeted by at least one miRNA argues that it is accessible (e.g., miR-2 is able to regulate both UTR reporters), and that the absence of regulation for other family members is due to the duplex structure. These results are in line with what we would expect based on the predicted functionality of the individual sites, and indicate that our model of target site functionality can be extended to UTRs with multiple sites. Weak sites that do not function alone also do not function when they are combined.
To show that endogenous miRNA levels regulate all three 39 UTR reporters, we compared EGFP expression in wild-type cells and dicer-1 mutant cells, which are unable to produce miRNAs [14]. dicer-1 clones did not affect a control reporter lacking miRNA binding sites, but showed elevated expression of a reporter containing the 39 UTR of the previously identified bantam miRNA target hid ( Figure 5D). Similarly, all 39 UTR reporters above were upregulated in dicer-1 mutant cells, indicating that bagpipe, sickle, and grim are subject to repression by miRNAs expressed in the wing disc. Taken together, these experiments indicate that transcripts with 59 dominant canonical and seed sites are likely to be regulated by all members of a miRNA family. However, transcripts with 39 compensatory sites can discriminate between miRNA family members.

Genome-Wide Occurrence of Target Sites
Experimental tests such as those presented above and the observed evolutionary conservation suggest that all three types of target sites are likely to be used in vivo. To gain additional evidence we examined the occurrence of each site type in all Drosophila melanogaster 39 UTRs. We made use of the D. pseudoobscura genome, the second assembled drosophilid genome, to determine the degree of site conservation for the three different site classes in an alignment of orthologous 39 UTRs. From the 78 known Drosophila miRNAs, we selected a set of 49 miRNAs with non-redundant 59 sequences. We first investigated whether sequences complementary to the miR-NA 59 ends were better conserved than would be expected for random sequences. For each miRNA, we constructed a cohort of ten randomly shuffled variants. To avoid a bias for the number of possible target matches, the shuffled variants were required to produce a number of sequence matches comparable (615%) to the original miRNAs for D. melanogaster 39 UTRs. 7mer and 8mer seeds complementary to real miRNA 59 ends were significantly better conserved than those complementary to the shuffled variants. This is consistent with the findings of Lewis et al. [27] but was obtained without the need to use a rank and energy cutoff applied to the fulllength miRNA target duplex, as was the case for vertebrate miRNAs. Conserved 8mer seeds for real miRNAs occur on average 2.8 times as often as seeds complementary to the shuffled miRNAs ( Figure 6A). For 7mer seeds this signal was 2:1, whereas 6mer, 5mer, and 4mer seeds did not show better conservation than expected for random sequences. To assess the validity of these signals and to control for the random shuffling of miRNAs, we repeated this procedure with ''mutant'' miRNAs in which two residues in the 59 region were changed. There was no difference between the mutant test miRNAs and their shuffled variants ( Figure 6A). This indicates that a substantial fraction of the conserved 7mer and 8mer seeds complementary to real miRNAs identify biologically relevant target sites. 39 compensatory and canonical sites depend on substantial pairing to the miRNA 39 end. For these sites, we expect UTR sequences adjacent to miRNA 59 seed matches to pair better to the miRNA 39 end than to random sequences. However, unlike 59 complementarity, 39 base-pairing preference was not detected in previous studies looking at sequence complementarity and nucleotide conservation because UTR sequences complementary to the miRNA 39 end were not better conserved than would be expected at random [27].
On this basis, we decided to treat the 59 and 39 ends of the miRNA separately. For the 59 end, seed matches were required to be fully conserved in an alignment of orthologous D. melanogaster and D. pseudoobscura 39 UTRs (we expected one-half to two-thirds of these matches to be real miRNA sites). We first investigated the overall conservation of UTR sequences adjacent to the conserved seed matches and found that overall the sequences are not better conserved than a random control with shuffled miRNAs ( Figure 6B). For both real and random matches, the number of sites increases with the degree of 39 conservation (up to the 80% level), reflecting the increased probability that sequences adjacent to conserved seed matches will also lie in blocks of conserved sequence ( Figure 6B). For real 7mers and 8mers we found a slightly higher percentage of sites between 30% and 80% identity than we did for the shuffled controls. In contrast, the ratio of sites with over 80% sequence identity was smaller for real 7-or 8mers than for random ones, meaning that in highly conserved 39 UTR blocks (.80% identity) the ratio of random matches exceeds that of real miRNA target sites. This caused us to question whether the degree of conservation for sequences adjacent to seed matches correlates with miRNA 39 pairing as would be expected if the conservation were due to a biologically relevant miRNA target site. Indeed, we found that the best conserved sites adjacent to seed matches (i.e., those with zero, one, or two mismatches in the 39 UTR alignment) and the least conserved sites (i.e., those with only three, two, or one matching nucleotides) are not distinguishable in that both pair only randomly to the corresponding miRNA 39 end (approximately 35% maximal 39 pairing energy, data not shown). The observation that miRNA target sites do not seem to be fully conserved over their entire length is consistent with the examples shown in Figure 4 in which only the degree of 39 pairing but not the nucleotide identity is conserved (miR-7/hairy), or at least the unpaired bulge is apparently not under evolutionary pressure (miR-10/Scr). Although this result obviously depends on the evolutionary distance of the species under consideration (see [43] for a comparison of mammalian sites), it shows that conclusions about the contribution of miRNA 39 pairing to target site function cannot be drawn solely from the degree of sequence conservation.
We therefore chose to evaluate the quality of 39 pairing by the stability of the predicted RNA-RNA duplex. We assessed predicted pairing energy between the miRNA 39 end and the adjacent UTR sequence for both Drosophila species and used the lower score. Use of the lower score measures conservation of the overall degree of pairing without requiring sequence identity. Figure 6C shows the distribution of the 39 pairing energies for all conserved 39 compensatory miR-7 sites identified by a 6mer seed match, compared to the distribution of 50 miR-7 sequences shuffled only in the 39 part, leaving the 59 unchanged. This means that real and shuffled miRNAs identify the same 59 seed matches in the 39 UTRs, which allows us to compare the 39 pairing characteristics of the adjacent sequences. We also required 39 shuffled sequences to have similar pairing energies (615%) to their complementary sequences and to 10,000 randomly selected sites to exclude generally altered pairing characteristics. The distributions for real and shuffled miRNAs were highly similar, with a mean of approximately 35% of maximal 39 pairing energy and few sites above 55%. However, a small number of sites paired exceptionally well to miR-7 at energies that were far above the shuffled averages and not reached by any of the 50 shuffled controls. This example illustrates that there is a significant difference between real and shuffled miRNAs for the sites with the highest 39 complementarity, which are likely to be biologically relevant. Sites with weaker 39 pairing might also be functional, but cannot be distinguished from random matches and can only be validated by experiments (see Figure  5). To provide a global analysis of 39 pairing comprising all miRNAs and to investigate how many miRNAs show significantly non-random 39 pairing, we considered only the sites within the highest 1% of 39 pairing energies.
The average of the highest 1% of 39 pairing energies of each of 58 39 non-redundant miRNAs was divided by that of its 50 39 shuffled controls. This ratio is one if the averages are the same, and increases if the real miRNA has better 39 pairing than the shuffled miRNAs. To test whether a signal was specific for real miRNAs, we repeated the same protocol with a mutant version of each miRNA. The altered 59 sequence in the mutant miRNA selects different seed matches than the real miRNA and permits a comparison of sequences that have not been under selection for complementarity to miRNA 39 ends with those that may have been. Figure 6D shows the distribution of the energy ratios for canonical (left) and 39 compensatory sites (right) for all 58 real and mutated 39 non-redundant miRNAs. Most real miRNAs had ratios close to one, comparable to the mutants. But several had ratios well above those observed for mutant miRNAs, indicating significant conserved 39 pairing.
A small fraction of sites show exceptionally good 39 pairing. If we use 39 pairing energy cutoffs to examine site quality for all miRNAs, we expect sites of this type to be distinguishable from random matches. The ratio of the number of sites above the cutoff for real versus 39 shuffled miRNAs was plotted as a function of the 39 pairing cutoff ( Figure 6E). For low cutoffs the ratio is one, as the number of sites corresponds to the number of seed matches (which is identical for real and 39 shuffled miRNAs). For increasing cutoffs, the ratios increase once a certain threshold is reached, reflecting overrepresentation of sites that pair favorably to the real miRNA 39 end but not the 39 shuffled miRNAs. The maximal ratio obtained for mutated miRNAs never exceeded five, which we used as the threshold level to define where significant overrepresentation begins. For 8mer seed sites overrepresentation began at 55% maximal 39 pairing; for 7mer seed sites, at 65%; for 6mer seed sites, at 68%; and for 5mer seed sites, at 78%. There was no statistical evidence for sites with 4mer seeds.
We also tested whether sequences forming 7mer or 8mer seeds containing G:U base-pairs, mismatches, or bulges were better conserved if complementary to real miRNAs. We did not find any statistical evidence for these seed types. Analysis of 39 pairing also failed to show any non-random signal for these sites. This suggests that such sites are few in number genome-wide and are not readily distinguished from random matches. Nonetheless, our experiments do show that sites of this type can function in vivo. The let-7 sites in lin-41 provide a natural example.

Most Sites Lack Substantial 39 Pairing
The experimental and computational results presented above provide information about 59 and 39 pairing that allows us to estimate the number of target sites of each type in Drosophila. The number of 39 compensatory sites cannot be estimated on the basis of 59 pairing, because seed matches of four, five, or six bases cannot be distinguished from random matches, reflecting that a large number of randomly conserved and non-functional matches predominate ( Figure  6A). Significant 39 pairing can be distinguished from random matches for 6mer sites above 68% maximal 39 pairing energy, and above 78% for 5mers ( Figure 6E). Using these pairing levels gives an estimate of one 39 compensatory site on average per miRNA. The experiments in Figure 5 provide an opportunity to assess the contribution of 39 pairing to the ability of sites with 6mer seeds to function. The 6mer K box site in the grim 39 UTR was regulated by miR-2 (63% maximal 39 pairing energy), but not by miR-11, which has a predicted 39 pairing energy of 46%. Similarly, the 6mer seed sites for miR-11 in the sickle 39 UTR had 39 pairing energies of approximately 35% and were non-functional. We can use the 63% and 46% levels to provide upper and lower estimates of one and 20 39 compensatory 6mer sites on average per miRNA. For 5mer sites, the examples in Figure 1 show that sites with 76% and 83% maximal 39 pairing do not function. At the 80% threshold level, we expect less than one additional site on average per miRNA, suggesting that 39 compensatory sites with 5mer seeds are rare. The predicted miR-10 site in Scr (see Figure 4) is one of the few sites with a 5mer seed that reaches this threshold (100% maximum 39 pairing energy; À20 kcal/mol). It is likely that other sites in this group will also prove to be functionally important.
The overrepresentation of conserved 59 seed matches (see Figure 6A) suggests that approximately two-thirds of sites with 8mer seeds and approximately one-half of the sites with 7mer seeds are biologically relevant. This corresponds to an average of 28 8mers and 53 7mers, for a total of 81 sites per miRNA. We define canonical sites as those with meaningful contributions from both 59 and 39 pairing. Given that 7-and 8mer seed matches can function without significant 39 pairing, it is difficult to assess at what level 39 pairing contributes meaningfully to their function. The range of 39 pairing energies that were minimally sufficient to support a weak seed match was between 46% and 63% of maximum pairing energy (see Figure 5C). If we take the 46% level as the lower limit for meaningful 39 pairing, over 95% of sites would be considered seed sites. This changes to 99% for pairing energies that can be statistically distinguished from noise (55% maximal; see Figure 6E) and remains over 50% even for pairing energies at the average level achieved by random matches (30% maximal). It is clear from this analysis that the majority of miRNA target sites lack substantial pairing in the 39 end in nearby sequences. Indeed the 39 pairing level for the three seed sites for miR-4 in Brd are all less than 25% (i.e., below the average for random matches) and Brd was thus not predicted as a miR-4 target previously [26,28,35].
Again, we note the caveat that some of sites that we identify as seed could in principle be supported by 39 pairing to more distant upstream sequences, but also that such sites would be difficult to distinguish from background computationally and that it is unclear whether large loops are functional. If there were statistical evidence for 39 pairing that is lower than would be expected at random for some sites, this would be one line of argument for a discrete functional class that does not use 39 pairing and would therefore suggest selection against 39 pairing. Although the overall distribution of 39 pairing energies for real miRNA 39 ends adjacent to 8mer seed matches is very similar to the random control with 39 shuffled sequences (Figure 7; R 2 = 0.98), we observed a small but significant overrepresentation of real sites on both sides of the random distribution, which leads to a slightly wider distribution of real sites at the expense of the peak values around 30% pairing. Bearing in mind that one-third of 8mer seed matches are false positives (see Figure 6A), we can account for the noise by subtracting one-third of the random distribution. We then see two peaks at around 20% and 35% maximum pairing energy, separated by a dip. Subtracting more (e.g., one-half or two-thirds) of the random distribution increases the separation of the two peaks, suggesting that the underlying distribution of 39 pairing for real 8mer seed sites might indeed be bimodal. This effect is still present, though less pronounced, if 7mer seed matches are included. No such effect is seen for the combined 5-and 6mer seed matches. In addition, we see no difference between a random (noise) model that evaluates 39 pairing of 39 shuffled miRNAs to UTR sites identified by real miRNA seed matches and a random model that pairs the real (i.e., non-shuffled) miRNA 39 end to randomly chosen UTR sequences, thus excluding bias due to shuffling. Overall, these results suggest that there might indeed be a bimodal distribution due to an enrichment of sites with both better and worse 39 pairing than would be expected at random. We take this as evidence that seed sites are a biologically meaningful subgroup within the 59 dominant site category.
Overall, these estimates suggest that there are over 80 59 dominant sites and 20 or fewer 39 compensatory sites per miRNA in the Drosophila genome. As estimates of the number of miRNAs in Drosophila range from 96 to 124 [44], this translates to 8,000-12,000 miRNA target sites genome-wide, which is close to the number of protein-coding genes. Even allowing for the fact that some genes have multiple miRNA target sites, these findings suggest that a large fraction of genes are regulated by miRNAs.

Discussion
We have provided experimental and computational evidence for different types of miRNA target sites. One key finding is that sites with as little as seven base-pairs of complementarity to the miRNA 59 end are sufficient to confer regulation in vivo and are used in biologically relevant targets. Genome-wide, 59 dominant sites occur 2-to 3-fold more often in conserved 39 UTR sequences than would be expected at random. The majority of these sites have been overlooked by previous miRNA target prediction methods because their limited capacity to base-pair to the miRNA 39 end cannot be distinguished from random noise. Such sites rank low in search methods designed to optimize overall pairing energy [16,17,26,27,28,30,35]. Indeed, we find that few seed sites scored high enough to be considered seriously in these earlier predictions, even when 59 complementarity was given an additional weighting (e.g., [28,43]. We thus suspect that methods with pairing cutoffs would exclude many, if not all, such sites. In a scenario in which protein-coding genes acquire miRNA target sites in the course of evolution [4], it is likely that seed sites with only seven or eight bases complementary to a miRNA would be the first functional sites to be acquired. Once present, a site would be retained if it conferred an advantage, and sites with extended complementarity could also be selected to confer stronger repression. In this scenario, the number of sites might grow over the course of evolution so that ancient miRNAs would tend to have more targets than those more recently evolved. Likewise, genes that should not be repressed by the miRNA milieu in a given cell type would tend to avoid seed matches to miRNA 59 ends (''anti-targets'' [4]). Shown is the distribution (number of sites versus 39 pairing) for 8mer seed matches identified genome-wide for 58 39 non-redundant miRNAs (black) compared to a random control using 50 39 shuffled miRNAs per real miRNA (grey). Note that the distribution for real miRNAs is broader at both the high and low end than the random control and has shoulders close to the peak. The red, blue, and green curves show the effect of subtracting background noise (random matches) from the real matches at three different levels, which reveals the real matches underlying these shoulders. DOI: 10.1371/journal.pbio.0030085.g007 Although a 7-to 8mer seed is sufficient for a site to function, additional 39 pairing increases miRNA functionality. The activity of a single 7mer canonical site is expected to be greater than an equivalent seed site. Likewise, the magnitude of miRNA-induced repression is reduced by introducing 39 mismatches into a canonical site. Genomewide, there are many sites that appear to show selection for conserved 39 pairing and, interestingly, many sites that appear to show selection against 39 pairing. In vivo, canonical sites might function at lower miRNA concentrations and might repress translation more effectively, particularly when multiple sites are present in one UTR (e.g., [42]). Efficient repression is likely to be necessary for genes whose expression would be detrimental, as illustrated by the genetically identified miRNAs, which produce clear mutant phenotypes when their targets are not normally repressed (''switch targets'' [4]). Prolonged expression of the lin-14 and lin-41 genes in Caenorhabditis elegans mutant for lin-4 or let-7 causes developmental defects, and their regulation involves multiple sites [17,36,37]. Similarly, multiple target sites allow robust regulation of the pro-apoptotic gene hid by bantam miRNA in Drosophila [19]. More subtle modulation of expression levels could be accomplished by weaker sites, such as those lacking 39 pairing. Sites that cannot function efficiently alone are in fact a prerequisite for combinatorial regulation by multiple miRNAs. Seed sites might thus be useful for situations in which the combined input of several miRNAs is used to regulate target expression. Depending on the nature of the target sites, any single miRNA might not have a strong effect on its own, while being required in the context of others.

Complementarity Distinguishes miRNA Family Members
39 compensatory sites have weak 59 pairing and need substantial 39 pairing to function. We find genome-wide statistical support for 39 compensatory sites with 5mer and 6mer seeds and show that they are used in vivo. Furthermore, these sites can be differentially regulated by different miRNA family members depending on the quality of their 39 pairing (e.g., regulation of the pro-apoptotic genes grim and sickle by miR-2, miR-6, and miR-11). Thus, members of a miRNA family may have common targets as well as distinct targets. They may be functionally redundant in regulation of some targets but not others, and so we can expect some overlapping phenotypes as well as differences in their mutant phenotypes.
Following this reasoning, it is likely that the let-7 miRNA family members differentially regulate lin-41 in C. elegans [17,45]. The seed matches in lin-41 to let-7 and the related miRNAs miR-48, miR-84, and miR-241 are weak, and only let-7 has strong 39 pairing. On this basis, it seems likely that lin-41 is regulated only by let-7. In contrast, hbl-1 has four sites with strong seed matches [38,39], and we expect it to be regulated by all four let-7 family members. As all four let-7-related miRNAs are expressed similarly during development [6], their role as regulators of hbl-1 may be redundant. let-7 must also have targets not shared by the other family members, as its function is essential. lin-41 is likely to be one such target.
The idea that the 39 end of miRNAs serves as a specificity factor provides an attractive explanation for the observation that many miRNAs are conserved over their full length across species separated by several hundreds of millions of years of evolution. 39 compensatory sites may have evolved from canonical sites by mutations that reduce the quality of the seed match. This could confer an advantage by allowing a site to become differentially regulated by miRNA family members. In addition, sites could retain specificity and overall pairing energy, but with reduced activity, perhaps permitting discrimination between high and low levels of miRNA expression. This might also allow a target gene to acquire a dependence on inputs from multiple miRNAs. These scenarios illustrate a few ways in which more complex regulatory roles for miRNAs might arise during evolution.

A Large Fraction of the Genome Is Regulated by miRNAs
Another intriguing outcome of this study is evidence for a surprisingly large number of miRNA target sites genomewide. Even our conservative estimate is far above the numbers of sites in recent predictions, e.g., seven or fewer per miRNA [27,28,29]. Our estimate of the total number of targets approaches the number of protein-coding genes, suggesting that regulation of gene expression by miRNAs plays a greater role in biology than previously anticipated. Indeed, Bartel and Chen [46] have suggested in a recent review that the earlier estimates were likely to be low, and a recent study by John et al. [43], published while this manuscript was under review, predicts that approximately 10% of human genes are regulated by miRNAs. We agree with these authors' suggestion that this is likely an underestimate, because their method identifies an average of only 7.1 target genes per miRNA, with few that we would classify as seed sites lacking substantial 39 pairing. A large number of target sites per miRNA is also consistent with combinatorial gene regulation by miRNAs, analogous to that by transcription factors, leading to celltype-specific gene expression [47]. Sites for multiple miRNAs allow for the possibility of cell-type-specific miRNA combinations to confer robust and specific gene regulation.
Our results provide an improved understanding of some of the important parameters that define how miRNAs bind to their target genes. We anticipate that these will be of use in understanding known miRNA-target relationships and in improving methods to predict miRNA targets. We have limited our evaluation to target sites in 39 UTRs. miRNAs directed at other types of targets or with dramatically different functions (e.g., in regulation of chromatin structure) might well use different rules. Accordingly, there may prove to be more targets than we can currently estimate. Further, there may be additional features, such as overall UTR context, that either enhance or limit the accessibility of predicted sites and hence their ability to function. For example, the rules about target site structure cannot explain the apparent requirement for the linker sequence observed in the let-7/lin-41 regulation [48]. Further efforts toward experimental target site validation and systematic examination of UTR features can be expected to provide new insight into the function of miRNA target sites.