Primary Prevention of Cannabis Use: A Systematic Review of Randomized Controlled Trials

A systematic review of primary prevention was conducted for cannabis use outcomes in youth and young adults. The aim of the review was to develop a comprehensive understanding of prevention programming by assessing universal, targeted, uni-modal, and multi-modal approaches as well as individual program characteristics. Twenty-eight articles, representing 25 unique studies, identified from eight electronic databases (EMBASE, MEDLINE, CINAHL, ERIC, PsycINFO, DRUG, EBM Reviews, and Project CORK), were eligible for inclusion. Results indicated that primary prevention programs can be effective in reducing cannabis use in youth populations, with statistically significant effect sizes ranging from trivial (0.07) to extremely large (5.26), with the majority of significant effect sizes being trivial to small. Given that the preponderance of significant effect sizes were trivial to small and that percentages of statistically significant and non-statistically significant findings were often equivalent across program type and individual components, the effectiveness of primary prevention for cannabis use should be interpreted with caution. Universal multi-modal programs appeared to outperform other program types (i.e, universal uni-modal, targeted multi-modal, targeted unimodal). Specifically, universal multi-modal programs that targeted early adolescents (10–13 year olds), utilised non-teacher or multiple facilitators, were short in duration (10 sessions or less), and implemented boosters sessions were associated with large median effect sizes. While there were studies in these areas that contradicted these results, the results highlight the importance of assessing the interdependent relationship of program components and program types. Finally, results indicated that the overall quality of included studies was poor, with an average quality rating of 4.64 out of 9. Thus, further quality research and reporting and the development of new innovative programs are required.


Introduction
Cannabis is the most widely used illicit drug worldwide [1] and its use is particularly prominent among adolescents and young adults [2,3]. For example, in 2010, 15.7% of Australian youth aged 14 to 19 years had used cannabis in the last 12 months, whereas only 4.7% of Australians over 40 years of age had used [3]. Early initiation of cannabis use is associated with more intensive cannabis use [1,4], and thus, a greater likelihood of developing dependence [5]. In addition, early initiation is associated with an increase in other health risk behaviors [6,7], poor educational outcomes [8,9], impaired cognitive functioning [10], and an increased risk of mental health issues [11,12]. The risks associated with cannabis use are becoming increasingly concerning given recent increases in cannabis use among young people (12.9% of 14 to 19 year olds had used cannabis in 2007) [13]. Furthermore, only 29% of individuals who meet criteria for cannabis dependence seek treatment [14]. Of those who seek treatment, only 31 to 36% experience clinically significant reductions in their use [15,16]. Given the extent of associated adverse effects for early and heavy use and the less than optimal treatment outcomes, primary prevention of cannabis use is critical.
As few researchers have posited what program components may lead to effective prevention of cannabis use, examination of the broader substance use literature may be informative in identifying key components for consideration. When attempting to prevent substance use in general, researchers have highlighted that theoretical models, program design, program facilitators, the developmental stage of participants, and program duration are key variables to consider for program efficacy [17]. Programs that have adopted psychosocial skills-based approaches with interactive designs have performed better than affective or knowledge-based approaches utilizing non-interactive, didactic designs [18][19][20]. Furthermore, programs that have adopted non-teachers as facilitators consistently have outperformed programs utilizing teachers [17,21]. Family interventions as non-school-based interventions [22]. While evidence for optimal duration and timing of program delivery is largely inconclusive [17,20], evidence from a systematic review on substance use prevention indicates that the implementation of booster sessions leads to greater and longer lasting program efficacy [23]. While these substance use literature findings are informative, further research is needed to confirm that these patterns of efficacy are upheld when cannabis use outcomes are assessed independently.
To date, two meta-analyses and one systematic review have assessed cannabis use prevention specifically [24][25][26]. All three reviews concluded that prevention strategies had potential for effectively reducing cannabis use in adolescents. In accordance with the broader substance use literature, interactive program designs were associated with greater program efficacy. Similarly, programs utilizing multiple theoretical models to inform program design and those adopting non-teacher facilitators (i.e., peer leaders or mental health counselors) were found to be more efficacious in reducing cannabis use. Conclusive statements regarding other key program characteristics (e.g., program duration, program size, and specifics of program theory) were not tenable as inconsistent evidence was presented across studies and/or limited data was available to make meaningful analyses [26].
These reviews also are limited in that they focused exclusively on school-based (uni-modal) programs and did not differentiate between universal and targeted intervention programming. Recent developments in the primary prevention literature implicate the importance of multi-modal approaches that utilize family, peer, community, and school-based components [27]. A review that considers multi-modal and uni-modal approaches will allow for a more rigorous assessment of substance use prevention programs. Furthermore, research is needed to determine the relative efficacy of universal (i.e., programs that attend to the general student population) versus targeted/selective programs (i.e., those that attend to high-risk populations). Some authors suggest that prevention strategies are unlikely to be universally effective because adolescents are not universally at risk [28]. Conversely, other authors have demonstrated that universal programs are effective for a variety of youth, including those at high-risk for substance use [29]. Thus, a consideration of various types of approaches and populations will enable a more rigorous review of prevention components and the extent of their efficacy.
There are a number of methodological considerations regarding the aforementioned meta-analyses and systematic review. All three reviews assessed quasi-experimental as well as experimental study designs. While this approach is reasonable, a sole focus on randomized controlled trials (RCTs) is suitable for systematic review as RCTs are more likely to provide unbiased information [30]. In addition, meta-analytic procedures are inappropriate if the heterogeneity is too large [31]. Marked clinical, methodological, and statistical heterogeneity was evident across studies included in all three reviews. Adopting meta-analytic techniques in these instances may be misleading, therefore a systematic review which offers a qualitative approach may be more appropriate [30]. In addition, despite the marked heterogeneity of effect sizes, only the systematic review [26] included an assessment of study quality. This review, however, included only three programs with cannabis use outcomes due to restricted inclusion criteria, which limited their capacity to assess outcomes and limited the generalizability of their results. A review that considers study quality while maintaining a rigorous inclusion criteria pertaining to study design may enable a more comprehensive understanding of program efficacy.
As such, the current study sought to extend existing literature by conducting a systematic review of cannabis use treatment outcomes in primary prevention research. The current review sought to determine the relative effectiveness of universal, targeted, uni-modal, and multi-modal programs, and to explore whether the effectiveness of these programs in deterring cannabis use differed as a function of participant age, program facilitators, program duration, program booster sessions, and program content. This study also assessed if study quality was associated with program effectiveness. The expectation was that a more comprehensive understanding would facilitate a more strategic and informed prevention approach to address the increasingly problematic prevalence of cannabis use in adolescent and youth populations.

Search procedures and selection criteria
We conducted a comprehensive search of eight electronic databases (EMBASE, MEDLINE, CINAHL, ERIC, PsycINFO, DRUG, EBM Reviews, and Project CORK) to identify relevant studies published between 1987 and January 2011. Key search terms were ''cannabis* OR marijuana* or tetrahydrocannabinol'' AND ''adolescen* OR child* OR youth* OR young adult* OR student*'' AND ''prevention* OR early intervention* OR program evaluation* OR school-based*'' (for an example of a full electronic search strategy see Appendix S1). This search identified 1975 records, of which 284 were duplicates. An additional 33 articles were discovered using the reference lists of articles identified through the search. Of the 1724 available records, an independent review of the titles and abstracts performed by two authors demonstrated that 1617 did not assess cannabis use or primary prevention and were omitted. Thus, 107 full-text articles were assessed for eligibility. Eligibility was assessed independently by two authors and discrepancies were resolved by a third author.
To be eligible for inclusion, studies must have (a) assessed cannabis use, (b) implemented the program as a primary prevention effort, (c) examined young people (children, adolescents, or young adults [#24 years]), and (d) presented original data. In addition, studies had to be RCTs published in a peerreviewed journal in English. Overall, 49 RCTs reached the standards stipulated by the inclusion criteria. Figure 1 presents a flow-diagram of the selection process.

Data extraction
Two authors independently extracted data from eligible studies and discrepancies were resolved by discussion with a third author. Data information collected included the study design (control group, comparison groups), demographic characteristics (age, gender, and ethnic distributions), intervention components, theoretical foundations, details of the outcome measures, statistical analysis procedures, and cannabis-specific outcomes. When a study did not report participants' ages, age was approximated from available information. For example, when a study reported a seventh grade sample, the average age of seventh grade students, 12 years, was used as an approximation. Additionally, when an age range was presented, the middle value of that range was documented.

Quality assessment
Study quality was assessed using an adapted version of the EPOC Risk of Bias criteria [32]. As this review only included RCTs two EPOC criteria (allocation sequence and allocation adequacy) were deemed inappropriate. Three additional criteria (adherence adequacy, exposure adequacy, and reliability of outcome measures) were identified as important criteria for primary prevention and were added. These amendments resulted in a total of nine quality criteria that assessed study methodology, fidelity of program implementation, appropriateness of outcome measures, and statistical procedures (for full details see Appendix S2). Quality criteria were scored dichotomously, where 0 = unclear/did not fulfill criteria and 1 = did fulfill criteria. Thus, quality ratings could range from 0 to 9, with higher scores indicating better quality. As with study eligibility, quality was independently assessed by two authors and all differences were resolved by discussions with a third author.

Data Synthesis
To adequately compare the effectiveness of the primary prevention programs, standardized effect size estimates were calculated using Cohen's d [33]. Cohen's d is defined as the difference between two means divided by the pooled standard deviation. According to [33], an effect size can be described as being small, moderate, or large, corresponding approximately to a 20%, 50%, and 80% change respectively. All effect size values were presented in the desired direction, such that positive values reflect a better outcome for treatment participants compared to controls. In addition, 95% confidence intervals (CIs) for each effect size value were calculated. Effect size and CI values not only facilitate a conventional statistical significance test, but also provide detailed information on effect magnitude and test precision, enabling an assessment of clinical significance [34]. For this review, a 95% CI that did not include zero was considered statistically significant. It is important to note that statistical significance was often inconsistent within studies for different outcome measures at different time points. Thus, a single study could be assessed as having both significant and non-significant findings. Furthermore, multiple effect size statistics (e.g., different outcome measures, different follow-up periods) were often included for a single study, as information (e.g., degrees of freedom and inter-correlation values) needed to combine these effects was often not available [35].
The studies included in this systematic review widely varied in their statistical reporting. As a result, multiple techniques were applied to compute Cohen's d and 95% CIs. When available, Cohen's d and CIs were reported as stipulated in the published papers or were calculated from means and standard deviations provided in the published papers. On occasion, however, the reported effect size estimate did not have corresponding confidence limits, or means and standard deviations were not reported. In these cases, an online effect size calculator [36] or appropriate conversion formulae were applied to estimate Cohen's d from standard errors and mean values, odds ratios, F-values, or chi-square values. For example, a ln(odds ratios) can be converted to an effect size estimate by dividing by 1.81 [37]. When group sample size values were unknown group equivalency was assumed. When these conversion formulae were not applicable, authors were contacted to provide means, standard deviations, and sample size values to allow for calculation of Cohen's d and CIs. If authors were unable to provide required information or failed to reply to two email communications their study articles were subsequently excluded from this systematic review (n = 21, for a list of excluded studies see Appendix S3).
A formal meta-analysis was not conducted as marked heterogeneity of study design, participant characteristics, intervention components, outcome measures, effect statistics, and study quality rendered a meta-analysis meaningless [30]. As a result, a narrative synthesis approach was applied. Included studies were categorized on a number of program components in order to effectively assess the relative efficacy of universal, targeted, uni-modal, and multimodal program designs. These syntheses considered statistically significant and non-significant results separately, as well as differential median effect size magnitudes for statistically significant outcomes.

Results
Overall, 28 articles were included in this systematic review, representing 25 unique RCTs. Most of the included studies were conducted in the United States (n = 21), with the remaining studies implemented in Australia (n = 2), the United Kingdom (n = 1), and Europe (n = 1). The number of participants varied considerably across prevention programs with baseline sample size ranging from 64 to 7079 (M = 1933; SD = 2068). Although we attempted to extract data pertaining to theoretical foundations, a meaningful analysis was not feasible as insufficient information was reported to adequately differentiate programs based on theory. Tables 1, 2, 3, 4 present the program characteristics, effectiveness data, and study quality ratings for the universal and targeted prevention programs subdivided by uni-modal and multi-modal program design.

Cannabis specific program content
The majority of studies (60%, n = 15) did not report cannabisspecific content components. Rather, many of the studies reported targeting substance use in general or tobacco, alcohol, and other drugs in combination as content areas. Three studies (11%; [38,39,40]) were alcohol specific programs and thus did not address cannabis use in their program content (but did measure cannabis use as an outcome). While six studies (24%) specified that cannabis-specific content was included in their program, only one study (4%; [41]) specified an entire module devoted to cannabis. Of those programs reporting cannabis specific program content three (50%) reported statistically significant findings (d = 0.09 to 0.22, Mdn = 0.12) in comparison to 12 (63%) studies that did not include cannabis specific content (d = 0.07 to 5.26, Mdn = 0.30).

General program content
The vast majority (84%, n = 21) of the 25 included studies reported some form of psychoeducation. Other typical content included social skills training (64%, n = 16), risk resiliency/refusal skills training (60%, n = 15), and decision making skills training (40%, n = 10). Six studies (24%) covered all four content areas, of which five reported statistically significant results (d = 0.09 to 5.26, Mdn = 1.19). Five studies covered three of the four content areas, all of which reported statistically significant results (d = 0.07 to 0.90, Mdn = 0.16, n = 5). Nine studies included content for only two areas, of which four reported statistically significant results (d = 0.08 to 0.74, Mdn = 0.15), and four studies reported inclusion of only one area, of which one was significant (d = 0.20). One study [42] used motivational enhancement therapy and did not apply any of the aforementioned content areas, though results were not statistically significant. Further analysis of specific content components was not possible.

Outcome measures and assessment intervals
The vast majority (92%; n = 23) of the 25 included studies reported outcome measures relating to frequency of use. These measures were dichotomous or continuous and time intervals measured ranged from weekly use to lifetime use, with some studies assessing multiple time periods (e.g., monthly and lifetime use). Three studies reported alternate outcomes in addition to frequency of use, with one reporting stages of initiation [39] and two studies reporting quantity of use [42,43]. The two studies not reporting frequency of use measured new user proportion [44,45]. On average, studies included three outcome measures (1 to 18, Mdn = 2). Of the 25 studies, 13 (52%) did not present post-test data, while four (16%) reported post-test data only. Thirty-six percent (n = 9) of trials completed follow-up within one year, while 28% (n = 7) had follow-up tests for two years, and 20% (n = 5) had follow-up periods greater than two years post program implementation.

Data synthesis of program design
Universal programs. Of the set of 25 studies, 15 (60%) studies were universal prevention programs. The majority of these studies (60%, n = 9) utilized middle school students, with all except one [40] which recruited from community-based organizations, recruiting from a school setting. Nine (60%) of the universal programs had a significant finding (d = 0.08 to 5.26, Mdn = 0.36). The universal program conducted by Schinke and colleagues [40] obtained substantially larger effect sizes than the other universal programs (d = 1.63 to 5.26, Mdn = 2.19). When excluding this study, the median effect size of the other significant findings reduced to 0.14.
Uni-modal programs. Uni-modal programs are those that adopted a single modality for prevention implementation. Fourteen (56%) studies utilized uni-modal program designs, of which seven (50%) were universal programs and seven (50%) were targeted. Many uni-modal programs were implemented in school settings (n = 9). Other delivery modes included the family home [47], community-based organizations [48], an HIV clinic [42], a computer [49], and one study delivered weekly classes in an unspecified location for participants recruited from numerous sources, including medical clinics and staff outreach [50]. Nine of the uni-modal programs reported significant findings (d = 0.09 to 0.74, Mdn = 0.20), of which four were universal (d = 0.09 to 0.22, Mdn = 0.13) and five were targeted (d = 0.14 to 0.74, Mdn = 0.20).
Multi-modal programs. The remaining 11 (44%) studies utilized multi-modal program designs, of which eight (73%) were universal programs and three (27%) were targeted. The primary sites for multi-modal program implementation were middle and high school settings (n = 9), with one study utilizing a college setting [38], and another utilizing community based organizations [40]. The core components of the multi-modal interventions involved drug prevention programs predominantly delivered through school curriculums (n = 7), with other programs utilizing, a CD-Rom intervention [40], a child-skills workshop [44], a motivational interviewing session [38], and a one-on-one health consultation [51]. Parent and family-based intervention components were most commonly adopted in conjunction with these core components (n = 8). The parent components would vary from intensive skills training workshops [44,45,52,53], to take-home handbooks and information pamphlets that could be used as a basis for discussion [38,51]. Other additional components included peer involvement  Cohen's d calculated using odds ratio.  High-risk females were defined as pregnant adolescents who were using drugs OR who were at risk for using drugs, and for non-pregnant adolescents who were either using drugs OR at risk for using drugs AND who were at risk for pregnancy, where risk was determined using the Problem Oriented Survey Instrument for Teenagers (POSIT (n = 2), community leadership/mentoring (n = 2), mass media coverage (n = 2), and school community development (n = 1). Of the eleven multi-modal programs, six reported significant findings (d = 0.07 to 5.26, Mdn = 0.68), of which five were universal (d = 0.08 to 5.26, Mdn = 0.90) and one was targeted (d = 0.07 to 0.12, Mdn = 0.10). When excluding the Schinke and colleagues study [40], the median effect size of the other statistically significant multi-modal study findings reduced to 0.14 and to 0.17 for the universal multi-modal study findings.

Synthesis of individual program components
Participant age. Participant ages at baseline ranged from 11 to 21 years. Studies were divided into three categories to assess the optimal period of intervention: early adolescence (11 to 13 years, n = 14), middle adolescence (14 to 17 years, n = 9), and late adolescence/young adult (18+years, n = 2). Of those targeting early adolescence, ten (71%) programs yielded significant findings (d = 0.07 to 5.26, Mdn = 0.17), whereas only five (56%) programs targeting middle adolescence yielded significant findings (d = 0.14 to 0.74, Mdn = 0.2). No significant findings were reported for the late adolescence/young adult age group (d = 20.07 to 0.34, Mdn = 0.10, n = 2). Table 5 presents the outcome data and quality ratings for participant age across universal and targeted programs, subdivided by uni-modal and multi-modal program design. Universal multimodal programs delivered during early adolescence were associated with a median large effect size, but only a small median effect size (d = 0.17) when excluding the Schinke and colleagues study [40] While targeted uni-modal programs delivered during middle adolescence were associated with small effect sizes in comparison to trivial effect sizes for programs delivered during early adolescence, there were an equal number of outcome measures in this category with statistically non-significant results. For universal uni-modal and targeted multi-modal programs, no statistically significant findings yielded greater than trivial median effect sizes and these effects were not reliable.
Of the 23 included studies, approximately 67% (n = 6) of teacher-facilitated programs reported significant results (d = 0.07 to 0.68, Mdn = 0.11), compared to 37% (n = 7) of non-teacher facilitated programs (d = 0.14 to 2.86, Mdn = 0.51), and 100% (n = 2) of programs utilizing multiple leaders (d = 0.75 to 5.26, Mdn = 2.38). The non-teacher facilitated programs associated with statistically significant effect sizes were facilitated by project workers (n = 2), health professionals (n = 1), peer leaders (n = 1), parents (n = 1), or were self-facilitated computer programs (n = 2). The multiple facilitators programs utilized a teacher in combination with a project worker [45] or a parent in combination with a CD-Rom intervention [40]. Table 6 presents outcome data and quality ratings for program facilitators across universal and targeted programs, subdivided by uni-modal and multi-modal program design. Universal multi- (2),

20.07
(2) 0.09*, 0.05 (8) modal programs were reliably associated with large effect sizes for programs utilizing non-teacher facilitators and multiple facilitators, as only one study contradicts this finding. While targeted unimodal programs were associated with small effect sizes for programs utilizing non-teacher facilitators, there were an almost equal number of statistically non-significant findings in this category. Of those teacher facilitated programs reporting statistically significant results, no median effect size estimates were greater than trivial, and there were an equal or greater number of outcome measures with statistically non-significant results within each program design category. Program duration. Program duration was assessed according to the number of sessions delivered, divided into two categories: short programs (ranging from 0 to 10 sessions, Mdn = 5, n = 13) and long programs (ranging from 11 to 23 sessions, Mdn = 15, n = 12). It is important to note, however, that the length of these sessions varied from approximately 13 minutes to 90 minutes, and that the total period of time taken to implement an intervention program varied from 1 day to 3 years. Seven (54%) of the programs categorized as short (d = 0.07 to 5.26, Mdn = 0.53) and eight (67%) of the long programs (d = 0.08 to 0.75, Mdn = 0.14) yielded significant findings. Table 7 presents outcome data and quality ratings for program duration across universal and targeted programs, subdivided by uni-modal and multi-modal program design. While short universal multi-modal programs were associated with large effect sizes in comparison to trivial effect sizes for long programs, there were an equal number of short programs in this category with statistically non-significant Table 5. Significant and non-significant outcome data and quality ratings, as a function of program design for participant age.   Table 6. Significant and non-significant outcome data and quality ratings, as a function of program design for program facilitator. results. Further, when removing the Schinke and colleagues study [40], the median effect size for statistically significant short universal multi-modal findings was moderate in size (d = 0.70) rather than large. One long targeted uni-modal program yielded a statistically significant small effect size and another obtained statistically non-significant results in this category. Booster sessions. Booster sessions were often implemented in addition to the core program sessions. The number of booster sessions ranged from 1 to 15, with specific information on session length, time of implementation, and content often not reported. For these reasons, studies were divided into two categories, those that implemented boosters (n = 9) and those that did not (n = 16). Eighty-nine percent of programs that utilized boosters reported significant findings (d = 0.07 to 5.26, Mdn = 0.15, n = 8), in contrast to only 44% of programs that did not utilize boosters (d = 0.14 to 0.90, Mdn = 0.22, n = 7). Table 8 presents the outcome data and quality ratings for the presence of booster sessions across universal and targeted programs, subdivided by uni-modal and multi-modal program design. Unlike other program components, studies with booster sessions were more often associated with statistical significance than studies without booster sessions; however, the effects were often trivial or small. Only two universal multi-modal programs were associated with moderate to large effect sizes, and one of these studies was the Schinke and colleagues study [40]. Programs without boosters more often reported non-significant findings, and when statistically significant findings were reported, their median effect size was small or trivial.

Evaluation of study quality
The two independent raters agreed on 97% of the quality scores. After discrepancies were resolved, quality scores ranged from 1 to 8, with an average quality rating of 4.64. This indicates that for many of of studies, half of the quality criteria were either not met. Quality ratings were consistent across universal unimodal (Mdn = 4.0, range 3-6), universal multi-modal (Mdn = 4.5, range 3-8), and targeted uni-modal (Mdn = 5.0, range 4-8) program designs, with lower quality reported for targeted multimodal programs (Mdn = 2, range 1-3). The two highest quality studies were both conducted by Schinke and colleagues [40,47]. Their universal multi-modal study produced the largest effect sizes examined in this review, while their targeted uni-modal study was associated with trivial to small effects.
Overall, irrespective of program design, most studies scored positively on criteria assessing baseline outcomes (92%) and baseline characteristics (84%). A moderate percentage of studies scored positively on criteria assessing contamination (64%) and selective reporting (72%). In contrast, only half of all studies (52%) scored positively on the incomplete data criterion. The poor scores for incomplete data were a result of extremely high attrition rates, poor statistical treatment of missing data, or a lack of reporting. Of those studies scoring zero for poor attrition, attrition rates ranged from 23% to 47%, for follow-up periods from immediate post-test to 6 years.
Quality scores were particularly problematic on criteria assessing program exposure, program adherence, blinding, and reliability of outcome measure with only 20%, 20%, 24%, and 40% of studies, respectively, scoring positively on these measures. While these scores could simply be a result of poor reporting, not being able to adequately assess these methodological components prevents a meaningful analysis of program curriculum.

Discussion
This systematic review evaluated the effectiveness of primary prevention programs in averting young people from using cannabis. The current study extended knowledge gained from previous reviews by assessing the relative efficacy of universal, targeted, uni-modal, and multi-modal approaches, and evaluating whether the effectiveness of individual program components was related to program type. Overall, results suggest that primary prevention programs may be able to deter young people from using cannabis, with statistically significant effect sizes ranging from trivial (0.07) to extremely large (5.26). Despite this potential, evidence was largely inconclusive regarding a distinctive pattern of program efficacy as the percentages of statistically significant and non-statistically significant findings were often equivalent across program type and individual components. A consideration of the magnitude of statistically significant median effect size estimates, however, did reveal components that may more strongly influence program efficacy. The efficacy of these components must be Table 7. Significant and non-significant outcome data and quality ratings, as a function of program design for program duration. interpreted with caution as they may be unreliable and biased given the equivalency of significant and non-significant data across categories, as well as the variability of study quality. Furthermore, a visual inspection of the data indicated that one study [40] had dramatically larger effect size values than all other included studies (d range = 1.63 to 5.26), which may have overestimated the magnitude of median effect sizes. As such, results for categories where this study was included must be interpreted with caution as analysis without this study included often resulted in substantially smaller effects.

Cannabis specific-content
Most of the of studies included in this review did not include cannabis-specific content, hindering efforts to make conclusive statements about the relative efficacy of cannabis-specific versus general substance use prevention programming. Despite this, some programs were able to effectively prevent cannabis use for at least a short period. Tobler et al. [24] found that programs that were effective in preventing cannabis use were similarly effective for alcohol and tobacco use. As such, they suggested that cannabis did not require a singularly focused program. Similarly, Foxcroft et al. [56] found no apparent differences in program efficacy in an assessment of alcohol specific versus multi-drug prevention programming. Given that some programs included in the current review were able to avert cannabis use, despite not exclusively focusing on cannabis-related content, it may be that specific content is not a necessary component for program efficacy. In fact, Tobler et al. [24] suggested that the relative effectiveness of a program was dependent on delivery method, not program content. Research that specifically examines the relative efficacy of a singularly focused cannabis-use prevention program versus a multi-drug prevention program is needed to elucidate these suggestions.

Program type
Prior to this study, scarce research examining the relative efficacy of universal, targeted, uni-modal, and multi-modal programs existed, with the preponderance of cannabis-specific and general substance use prevention reviews and meta-analyses assessing universal school-based (uni-modal) approaches only.
Overall, a consideration of statistically significant median effect size magnitudes revealed that universal multi-modal programs may be more effective in averting cannabis use than universal unimodal, targeted uni-modal, and targeted multi-modal programs. Importantly, all three targeted multi-modal program studies were of poor quality, thus results pertaining to this category may be unreliable and should be interpreted with caution. Furthermore, the effectiveness of universal multi-modal studies appeared to be much greater when the study by Schinke and colleagues was included (Mdn d = 0.90 versus 0.17). As this study was of better quality than the others, universal multi-modal programs may be quite effective in preventing cannabis use when they are adopted and implemented as intended.
The finding that universal multi-modal programs may outperform universal uni-modal, targeted uni-modal, and targeted multimodal programs implies that the combination of multi-modal and universal strategies may be particularly important. There is growing evidence which suggests that multi-modal prevention strategies improve effectiveness [27,57], a finding which is consistent with problem-behavior theory that espouses the importance of concurrently targeting multiple domains [6,58]. Thus, it is not surprising that multi-modal programs appear to outperform uni-modal programs. The fact that this effect was only evident for universal programs, however, is somewhat puzzlingly given that the limited evidence available implicates the potential efficacy of both universal and targeted approaches [59]. It may be that utilizing a multi-modal strategy which facilitates a broadspectrum approach is necessary for universal programs which endeavor to reach a widely varied population. Conversely, targeted programs which are designed to address an identified high-risk population may not similarly necessitate a multi-modal approach and perhaps a multi-modal strategy, in targeting multiple domains, may actually dilute and offset important messages. Alternatively, it may be that the poor quality of the targeted multi-modal programs resulted in an underestimation of efficacy or that targeted programs overall are not effective. Given that most of the research to date had focused on universal programming, future research is needed which specifically addresses targeted approaches. Table 8. Significant and non-significant outcome data and quality ratings, as a function of program design for booster sessions. Individual program components Participant age. Overall, programs targeting early and middle adolescence may yield small beneficial effects, whereas programs targeting late adolescence may not be effective, as no statistically significant results were reported. Given that the estimated age of initiation for cannabis use is between 15.9 and 18.4 years [60,61], and that early initiation is associated with an increased risk for problematic outcomes, it may be that utilizing primary prevention strategies for late adolescence (greater than 18 years) is too late to avert use. The substance use literature has predominantly reported inconclusive findings regarding the optimal developmental stage for program delivery [17,20]. An analysis of participant age as a function of program type revealed that universal multi-modal programs may benefit from addressing early adolescence, whereas targeted uni-modal programs may be more effective when targeting middle adolescence, though this result for the latter finding was not reliable. Future research should directly examine the relative efficacy of cannabis prevention delivered during early and middle adolescence.
Program facilitator. Consistent with the general substance use [17,21] and cannabis-specific [25] prevention literature, programs utilizing non-teacher facilitators or multiple facilitators may be more effective than programs utilizing teachers only, when considering the magnitude of statistically significant median effect sizes. This finding was evident only for universal multi-modal programs, with either statistically non-significant or unreliable outcome data reported across other categories. Importantly, an assessment of studies reporting statistical significance indicated that programs utilizing teachers and multiple facilitators were more likely to yield statistically significant outcomes than programs utilizing non-teacher facilitators. These marked discrepancies in patterns of results highlight the importance of conducting a holistic assessment of both statistical and clinical significance. Insufficient data was available to further specify which non-teacher facilitators and which combination of multiple facilitators were more effective, as a maximum of only two studies reporting statistically significant effect sizes for non-teacher facilitators and multiple leaders per category were reported. Future research is needed to elucidate these findings.
Program duration. Inconsistent with previous cannabisspecific research [25], the results of the current study suggest that programs shorter in duration may be more efficacious than longer programs. On the whole, results pertaining to program duration largely have been inconclusive [20,62]. Tobler et al. [17], however, found that while program efficacy was not related to program duration overall, a re-analysis of duration as a function of program type (i.e., interactive or non-interactive) revealed that interactive programs benefited from longer duration, a benefit which was not evident for non-interactive programs. In accordance, the results of the current study revealed apparent differences in program efficacy as a function of program type; specifically, universal multi-modal programs may be more effective when shorter in duration, whereas targeted uni-modal programs may benefit from longer duration, though this latter result was not reliable. Further research is needed to confirm the relative efficacy of program duration (both time period and number of sessions) as a function of program type.
Boosters. The majority of studies utilizing booster sessions yielded statistically significant results. An assessment of the magnitude of statistically significant median effect size data, however, revealed that only universal multi-modal programs were reliably associated with effective outcomes when boosters were implemented. While this finding may be surprising given that previous literature has consistently found that booster sessions are associated with both increasing and maintaining program efficacy [23,63], it may be that the effectiveness of boosters is intrinsically linked to and may be dependent on aspects of program design (e.g., content or format; [62]). Gottfredson and Wilson [59] suggest that conclusions pertaining to booster sessions are largely drawn from a handful of instructional programs, presenting analogous program designs (e.g., LST) [54], thus future research should continue to assess the relative efficacy of implementing boosters for a variety of program designs. In addition, further research is needed to ascertain the specific details (e.g., length, time of implementation, and content) of effective booster sessions, an assessment that was beyond the scope of the current review due to poor reporting.

Quality of studies
Despite adopting rigorous inclusion criteria such that only RCTs published in peer-reviewed English journals were eligible for inclusion, on the whole, the quality of included studies was quite poor. Problems of poor quality were related to high levels of attrition and missing data, the use of inadequate outcome measures, inadequate reporting of program components and implementation fidelity, as well as a general poor reporting of salient methodological features (e.g., methods of blinding and baseline cannabis use outcomes). While high levels of attrition, in general, cause a serious threat to internal and external validity [64], in the current review levels of attrition were particularly problematic as they often were differentially related to baseline cannabis use. While some studies applied statistical methods to reduce the bias caused by differential attrition (e.g., intent to treat analysis, inclusion of baseline cannabis use as covariate) future research should focus on maintaining retention throughout followup periods [65,66]. In addition, included studies often failed to report or assess implementation fidelity (i.e., program exposure and program adherence), rendering an assessment of specific program curriculum components meaningless. High implementation fidelity has been highlighted as a key feature to achieving program effectiveness [67]; thus, reporting implementation fidelity in prevention trails is necessary to facilitate an adequate and appropriate assessment of program efficacy.
Marked variability of statistical methodology, cannabis use outcomes, and follow-up periods across studies, rendered a meaningful meta-analysis untenable. Future research may benefit from the development of a standardized procedure for assessing cannabis use which stipulates the use of psychometrically verified assessments as well as a directive timeline for appropriate followup intervals. In addition, studies often failed to report important statistical information (e.g., group sample size, standard deviations, mean scores), which is required to calculate effect size and confidence interval values. Solely presenting traditional null hypothesis testing is no longer considered sufficient as it only provides information regarding statistical significance and does not facilitate a meaningful assessment of clinical or practical significance.

Implications for Research
Future research must become more rigorous in reporting important methodological program characteristics (e.g., program content, delivery mode, and program fidelity) and salient statistical information (e.g., means and standard deviations, sample size values, and problems with attrition). The development of such procedures will not only enable a quantitative assessment of outcome data, which was not feasible in the current review, but will also facilitate a more rigorous assessment of all program components. In addition, results of the current study implicate the importance of assessing the inter-dependent relationship between individual program components and program type. Specifically, individual program components appeared to effect program efficacy as a function of program design. Thus, it seems pertinent that future research acknowledges the inter-dependent nature of different elements of program design and modifies assessment procedures to reflect these relationships. Furthermore, future research should evaluate the potential efficacy of universal multimodal designs, targeting early adolescence, short in duration, implementing boosters, and utilizing non-teacher facilitators, rather than simply continuing to disseminate existing programs.
In addition, research is needed to assess theoretical foundations for effective program development and to identify specific elements of effective program content and design. Prior research in the broader substance use literature [18][19][20] has often dichotomized information on program theory (e.g., psychosocial and non-psychosocial), program content (e.g., affective vs. knowledge-based), and program design (e.g., interactive and non-interactive) to provide a basic understanding of effective components. While these categorizations are informative, they do not provide sufficient or holistic information to enable specific developments in program components or allow for program replication. If the quality of program reporting improves a more rigorous assessment of effective theory-driven components, content areas (e.g., interpersonal skills, intrapersonal development, and substance use knowledge) and delivery methods (e.g., discussions, role plays, and computer activities) may be feasible. As the word limit of certain scientific journals may not allow for sufficient reporting a more rigorous examination of program manuals is required or authors need to focus on submitting to journals without strict word limits. Further research is also needed to substantiate suggestions that cannabis-specific content may not provide any additional benefit to effectively prevent cannabis-use, over and above general substance use prevention strategies. In addition, the long-term follow-up of prevention studies needs to extend past high school. In order to determine if primary prevention programs are effective in reducing harms, not just use, research need to examine how long programs can delay use for and whether or not programs universally prevent cannabis use or if they only work for those who would have become infrequent users. Finally, as the preponderance of studies included in this review were conducted in the United States, the above recommendations for future prevention research need to be carried out in a variety of different countries. Prevention programs that work in the United States may not be effective in countries or cultures that have different rates of teenage cannabis use or different views on how and when cannabis use should be discussed. For example, educational systems in countries other than the United States may prohibit drug prevention programs from being delivered during early adolescence. Cultural modifications need to be examined for their efficacy before policymakers adopt a specific program.

Implications for Policy
Primary prevention strategies are often considered the most valuable approaches for targeting substance use as they can be extremely cost-effective [10,68] and have an enormous capacity to prevent substance use across a wide range of individuals. Thus, it is imperative that dissemination of primary prevention is consistent with the scientific literature. Given that reliable and clearly discernible patterns of program efficacy remain largely inconclusive, further research is needed to elucidate effective prevention strategies. At this point in time, policymakers should be concerned with allotting money for high quality primary prevention research studies with long-term follow-up that continues beyond high school. Although this review has highlighted that some programs work, their effects often are trivial to small over a few years, and often they are not adopted or implemented as intended. If programs cannot be carried out with high fidelity during the research process, it is likely they also will be poorly adopted and implemented once mandated. The two studies that achieved the highest quality ratings in this review examined programs delivered by a computer. Given the growing use of technology, and the ability of computerized interventions to be delivered as intended, policymakers should strongly consider their use for program delivery. Further, policymakers need to strongly consider the potential benefits of programs that target early adolescence, are short in duration, utilize booster sessions and non-teacher facilitator rather than simply disseminating programs that may appeal to educational systems or concerned adults. After all, it is the children for whom the programs are intended. Lastly, policymakers need to consider that when programs work, they may only work minimally, and additional consideration of secondary prevention programs may be necessary.

Strengths and limitations
The current review extended on previous literature by providing a broad-spectrum approach to assessing primary prevention program efficacy. The inclusion of universal and targeted programs, uni-modal and multi-modal strategies, in addition to the assessment of individual program components, facilitated a comprehensive synthesis of existing primary prevention strategies. In addition, given that cannabis is the most widely used illicit drug worldwide [1,69] and that individually focused reviews of cannabis prevention are scarce, the specific focus on cannabis use adopted in this review provided much needed knowledge and direction. Furthermore, including quality ratings as well as presenting both significant and non-significant outcome data provides an unbiased and comprehensive synthesis of included studies.
Nevertheless, there are several limitations that must be considered. First, the inclusion of only peer-reviewed published articles may have inadvertently biased prevention results, as studies reporting statistically significant findings are more likely to be published [70]. The inclusion of grey literature, in addition to published literature, may provide a more wide-ranging overview of available programs. Second, an appropriate meta-analysis was not feasible due to the marked heterogeneity of study design and measurement outcomes. As a result, a narrative review that synthesized a wide-variety of approaches was utilized. The diversity within a prevention strategy (e.g., different delivery modalities for universal uni-modal programs) may have obscured important findings. Third, multiple effect size values were often reported for an individual study and the inconsistency of the magnitude of reported effect size values across studies may have skewed results. An improvement of methodological and reporting quality of prevention trials would facilitate an appropriate quantitative analysis. Fourth, the individual program components assessed in this study also were limited due to poor reporting of salient and important information. All included studies relied on self-reported cannabis use, which may not provide an accurate measure of cannabis use; however, evidence has supported the validity of such self-reported data [71]. Lastly, 21 of the 49 articles that met inclusion criteria were excluded due to the unavailability of data. If data were available, inclusion of these studies may have resulted in vastly different outcomes.

Conclusions
Overall, the current study suggests primary prevention programming may avert cannabis use. Albeit reliable and discernible patterns for program efficacy remain largely inconclusive, results of the current study implicate the importance of assessing the relative efficacy of all program types and the interdependent relationship of program type and individual program components. Substantial work is needed to improve the methodological and statistical reporting quality of effectiveness trials. The improvement of study quality in addition to continued research developing new models of prevention programs that consider the inter-dependent nature of individual program components will enable a successful approach to preventing cannabis use. Given the high prevalence of cannabis use in young people [2,3] and the extent of problems associated with early initiation [6,7], further developments in this area are pertinent.

Supporting Information
Appendix S1 Search strategy for CINAHL (Ovid) 1987-2011. Limits included English language, peer reviewed and human. (DOCX) Appendix S2 Quality Criteria (adapted from EPOC Risk of Bias).