Inquiry-based training improves teaching effectiveness of biology teaching assistants

Graduate teaching assistants (GTAs) are used extensively as undergraduate science lab instructors at universities, yet they often have having minimal instructional training and little is known about effective training methods. This blind randomized control trial study assessed the impact of two training regimens on GTA teaching effectiveness. GTAs teaching undergraduate biology labs (n = 52) completed five hours of training in either inquiry-based learning pedagogy or general instructional “best practices”. GTA teaching effectiveness was evaluated using: (1) a nine-factor student evaluation of educational quality; (2) a six-factor questionnaire for student learning; and (3) course grades. Ratings from both GTAs and undergraduates indicated that indicated that the inquiry-based learning pedagogy training has a positive effect on GTA teaching effectiveness.


Introduction
Graduate teaching assistant (GTA)-run introductory science courses are the norm at higher education institutions in North America, Australia and New Zealand, and are becoming more prevalent elsewhere [1]. GTAs are part-time employees (often research students) hired to lead lab sessions, grade papers, and provide assistance to course instructors, and account for many of the contact hours undergraduates have with the department. GTAs have a powerful influence on undergraduate student learning, but not without cost [2,3]. Graduate students gain income and teaching experience, and science departments gain a pool of inexpensive laborers versed in discipline-specific content, but because GTA teaching can be inconsistent, and students find it difficult to learn from inexperienced teachers, ensuring that undergraduates receive high-quality teaching from GTAs is one of the main teaching issues facing science departments [4][5][6][7].
One solution includes general teaching training for GTAs, but the effectiveness of this training is mixed [2][3][4]8,9]. The training content is not standardized, as it is for schoolteachers, and attendance is not compulsory. A 1997 survey of 153 biology departments in the United States indicated that 49% of GTA programs required no formal training whatsoever [10]. For those programs that do require formalized GTA training, the form that training takes can vary widely, from university-wide mass orientation workshops to subject-based instruction from instructors, supervisors or peers [4]. Content might be as important as format -a 2004 study cited the lack of adequate preparation for facilitating open inquiry labs as a major difficulty both for undergraduate students and the GTAs themselves [5].
Well-constructed science lab activities are powerful learning tools; through guided inquiry, undergraduates gain first-person experience of scientific principles and phenomena learned in lectures, and learn to employ experimental methods to solve discrete problems [11]. As such, training that exposes GTAs to the inquiry process may improve teaching effectiveness. Common misperceptions about undergraduate labs include: (a) that the purpose of lab activity is to recapitulate content learned through lectures or from readings; or (b) that the purpose is to familiarize students with cutting-edge experimental techniques. Such objections ignore what is unique about the lab experience -the opportunity to practice applying the scientific method. Many introductory labs deliberately use older, highly predictable experimental setups to teach students basic scientific inquiry skills.
Inquiry-based learning is a pedagogical approach rooted in constructivist learning theories that advocates teaching cognitive skills through open-ended exploration by the learner ( [12,13]). Inquiry activities commonly include designing protocols, developing procedures and proposing unified explanations for experimental or anecdotal phenomena. Structured-inquiry activities, which are commonly used as undergraduate lab exercises, typically ask students to propose specific explanations, rooted in relevant course content, for data gathered from experimental trials [14,15]. Teaching "facts", "knowledge" or "content" is only the secondary objective of such activities: the main goal is to develop the reasoning skills that are required to plan, execute and interpret scientific experiments.
Successful inquiry-oriented initiatives such as POGIL (Process Oriented Guided Inquiry Learning) emphasize the coordination of content and methodological skills during science learning activities [16][17][18], but implementing largescale reorganization of teaching materials and curricular content calls for substantial investment in time and money, and may be beyond the capability of many course instructors. Moreover, reorienting major introductory course lecture content toward inquiry learning is a considerable effort, and requires departmental consensus to implement effectively. The current study investigated whether it is possible to increase learning effectiveness in an inquiry-based activity without taking on large-scale curricular redesign.
Previous studies have suggested that training GTAs on the use of inquiry-based methods will improve GTA teaching effectiveness in undergraduate labs [19,20]. Although few empirical studies have identified which kinds of GTA training content actually improve general lab GTA teaching effectiveness, and none have specifically focused on inquirybased methods, some authors have used correlational data or made general recommendations regarding GTA teaching [7,13,21]. Inquiry-based methods may be of particular value to GTAs teaching introductory labs, since labs are designed to teach scientific enquiry as well as course content [22][23][24].
Here, our aim was to test whether grounding GTA training in inquiry-based learning theory would measurably improve teaching effectiveness in undergraduate biology labs, as assessed by undergraduate and GTA responses to online questionnaires, as well as standardized student grades. We predict that explaining inquiry-based methods would improve lab GTA teaching performance, although it might also be reasonable to argue that the limited time allotted might be better spent if allocated to practical teaching activities rather than theoretical ones because GTA training is extremely brief (typically less than five hours per year). In order to test this hypothesis, we conducted a semi-randomized control trial to compare two training regimens for GTAs of introductory biology labs: a control regimen that is taught various "best practices" associated with lab teaching; and an experimental regimen that explained inquiry-oriented pedagogy to GTAs.

Methods i) Participants
This study took place during the first semester of the academic year at a large research-intensive North American university with about 25,000 students. Of a total number of 126 GTAs, 54 GTAs volunteered to take part in the study and met the study inclusion criteria. Each of these GTAs was employed as a GTA in a single lab section of an introductory biology course in the Fall 2012 semester, and was a graduate student registered in an MSc or PhD programme in the Department of Biology during the same semester. An online pre-assignment questionnaire recorded experience and program data for each GTA two weeks before the onset of the training. On the basis of the data collected in this questionnaire, participants were divided into two groups based on academic program (Masters or PhD). GTAs were allocated to either a "best practice" (Control) or inquiry-based learning pedagogy (Inquiry) training group. This allocation was semi-random; the proportion of GTAs in either academic program was held constant between training groups to balance the experimental design and avoid a confounding factor of academic program, but was otherwise random. Two participants from the control group discontinued the study after the first training session; 52 GTAs finished the training sessions (24 control and 28 inquiry GTAs) and all of these 52 participants completed an online survey conducted at the end of the semester.
GTAs were blind to their training group assignment and were not given specific details about the experimental design. Details about their lab section were collected during the training group workshops and from the online survey. This information allowed for pre-training responses to be matched with later survey responses as well as matching undergraduate students to their lab section GTA while protecting participant confidentiality.
Undergraduate students were recruited to rate their GTAs through advertisement posters, email invitations, and social media announcements. Undergraduates were told that GTAs were part of a training experiment, and were part of more than one GTA training group, but were not told: (a) what the training groups were, or (b) which training group their GTA had been allocated to. A total of 602 undergraduates provided responses, with 352 (58.5%) completing the entire online survey. Approximately 1,250 undergraduate students were enrolled in undergraduate biology during this semester. All GTA lab groups were represented by at least one undergraduate.
At the end of the academic semester, all 16 course professors were contacted by email and notified of the nature of the training experiment (i.e. that GTA training was being manipulated in a controlled experiment -although the training groups, training regimens and the GTAs involved were not revealed), then asked to provide average course grades. Average course marks for 49 of the 52 sections taught by GTAs taking part in the experiment were collected.

ii) Training Workshops
The general format of the training for both groups was the same and the same instructor taught all sessions. Each training group attended a pair of workshops; each workshop lasted 2.5 hours for a total of 5 hours of training. The two workshops were held one week apart. The first workshop was oriented toward teaching and the second was oriented toward grading. All workshops involved a mix of PowerPoint® slides, group discussions, peer-led problem-solving sessions and questionand-answer sessions. Both training groups included the same core topics. The first session included information on: lecturing, Inquiry-Based Training Boosts Biology TA Teaching PLOS ONE | www.plosone.org moderating group discussions, facilitating lab activities, diagnosing problems in student understanding, giving effective prelab talks, dealing with student questions, and encouraging/ motivating students in the lab. The second session included material related to marking: grading lab reports, providing useful feedback, rubric design, and speed grading tips.
Because our aim was to investigate whether inquiry pedagogy training improved GTA teaching effectiveness, the training groups mainly differed in terms of whether inquirybased learning was explained as a theory of learning to be used in the science lab. Content for both GTA training groups was drawn from previous recommendations [9,25,26] and was common between the two training groups.
The areas of teaching focus for the control ("best practices") training group seminars were: (1) to identify effective and efficient strategies for GTAs to teach content knowledge to undergraduates during lab activities; and (2) to ensure that lab activity assessments were marked fairly. In general, the material taught to this group was designed to reflect the status quo GTA training curriculum at North American universities, and so "best practices" identified in GTA training materials or educational studies were provided to GTAs [26][27][28]. The training group focused on specific teaching practices of the GTAs themselves rather than explaining how to teach scientific enquiry; because many of the "best practices" for GTAs related to the teaching of content knowledge (i.e. "tips for teaching undergraduate students to use a light microscope") rather than the scientific method, this training group did not explicitly teach GTAs how to teach higher-order cognitive skills. A full list of the activities that took place in the two control training group seminars (teaching and assessment) is given in Tables 1 and  2.
For the inquiry pedagogy training group, teaching and grading were presented from a constructivist perspective: learning objectives were explained using the revised Bloom's taxonomy, scaffolding and inquiry facilitation were explained as teaching methods that help students learn methodological skills, and giving feedback was oriented toward encouraging further inquiry [13,20,21,[29][30][31][32][33]. The areas of teaching focus for the inquiry pedagogy training group seminars were: (1) to teach GTAs how inquiry-based practices teach students to reason independently and apply the scientific method; (2) to teach GTAs how to facilitate structured inquiry and open-ended learning as lab activities; and (3) to teach GTAs how to assess inquiry. Instead of teaching GTAs to teach undergraduates how to use a light microscope, GTAs were told instead how to facilitate and evaluate student-centered inquiry with respect to using a light microscope to ask scientific questions. A full list of the activities that took place in the two inquiry-based learning pedagogy training group seminars (teaching and assessment) is given in Tables 3 and 4.
The common two-seminar format meant that both training groups used the same peer-led learning activities. For example, GTAs of both training groups paired up to spend time practicing answering common student questions -but these took different forms based on the learning goals of the respective training group. For example, GTAs in the control training group practiced answering questions about lab protocol by answering common questions about the relevant section of the lab manual. GTAs in the inquiry pedagogy training group practiced answering questions about lab protocol by learning to design questions to help students figure out the next steps in the protocol for themselves. Extensive efforts were made to ensure the consistency and reliability of the workshops for both training groups. GTAs in both training groups were instructed not to discuss the content of their GTA training sessions with one another or with undergraduates.

iii) Measures
We assessed teaching effectiveness for the GTAs in the two training groups using three measures: (a) a 32-item, nine-factor student evaluation of educational quality (SEEQ) [34][35][36]; (b) a 6-item cognitive learning evaluation (CLE) questionnaire; and (c) standardized mean student grade. a) Student Evaluation of Educational Quality. The Student Evaluation of Educational Quality (SEEQ) inventory is a nine-factor, 32-item validated survey instrument that has been widely used to evaluate undergraduate instruction at North American universities. The original SEEQ inventory [34] was written to appraise course instructors, but for this study we modified it to evaluate GTAs by simple modifications of the text (see Table 5). Nine identifiable dimensions of teaching quality can be evaluated using SEEQ: (1) learning/academic value; (2)  instructor enthusiasm; (3) organization; (4) group interaction; (5) individual rapport; (6) breadth of coverage; (7) examination and grading; (8) assignments; and (9) overall instructional ability. We used the first eight factors from Marsh's original SEEQ inventory to create two SEEQ instruments (available from the original papers [28,30]). We changed Factor 9 from its original (1982) version. Marsh's 1982 version of the SEEQ inventory did not use "overall instructional ability" but rather "course difficulty" as the ninth SEEQ factor [34,36]. GTAs do not have control over course difficulty because they do not design the course. As such, we opted to use a newer version of SEEQ developed by the University of Saskatchewan, which includes "overall instructional ability" as factor 9. After these modifications, two SEEQ instruments were created: one was used by GTAs for self-evaluation, and the other by undergraduates to rate their GTA (see Table 5 for examples) SEEQ responses were given using a five-point Likert scale and scored as: "strongly disagree" = 1; "disagree" = 2; "neutral"= 3; "agree"= 4; and "strongly agree"= 5. When participants selected "not applicable", then that question was given a null value and excluded from further analyses. b) Cognitive Learning Evaluation. The cognitive learning evaluation (CLE) instrument is a short six-factor questionnaire evaluating individual learning outcomes derived from Bloom's revised taxonomy of learning [37]. The six factors assessed were: (1) knowledge; (2) comprehension; (3) problem-solving; (4) conceptual-analytic; (5) planning; and (6) evaluation (Table  6). Factors occur in a hierarchical order of increasing learning depth and difficulty of instruction that assumed that higher factors represent more in-depth learning. Each CLE factor was assessed by a single item. Responses were collected using a five-point Likert scale and scored as: "strongly disagree"= 1; "disagree" = 2; "neutral"= 3; "agree"= 4; and "strongly agree"= 5. When participants selected "not applicable", then that question was given a null value and excluded from further analyses. c) Grades. Undergraduates self-reported their final grades for the course. Final grades reflected contributions from both lab and lecture components, although GTAs graded only lab assignments. Grading frameworks differed amongst course   subtracting course mean GPA scores (calculated by averaging undergraduate self-reported grades per course) from individual GTA lab group GPA scores. The resulting standardized grades ranged from 3.7 to -3.1, and represented how much better (i.e., positive values) or worse (i.e., negative values) a given GTA lab group mean grade was than the average grade for any group in that course (in terms of GPA). Standardized grades were used instead of raw grades to account for the variation between courses, professors and lab material.

iv) Procedures
GTAs completed the pre-assignment questionnaires during the first week of a 13-week university semester. Training sessions were held on the third and fifth weeks of that same semester.
The SEEQ, CLE, and other demographic measures were collected from GTAs and undergraduates using a 116-item online survey hosted by Qualtrics Inc. The surveys for GTAs and undergraduate students were nearly identical, with only minor modifications relevant to the their roles. (e.g., references to "me (the GTA)" for the GTA survey were "my GTA" in the undergraduate survey). GTAs completed the surveys near the end of the semester. Approximately 7 weeks elapsed between the first GTA training session and the completion of the survey. Undergraduates were surveyed during a 45-day period beginning on the last day of exams for that semester. Data from course professors were collected during this same 45-day period.

v) Analyses
A pretest logistic regression of program distribution indicated that there were no significant differences in the ratio of Masters to PhD students between training groups (R 2 = .0003, χ 2 1,1 = 0.02, p = .895). A similar pretest one-way ANOVA showed no significant difference in GTA experience (in years of teaching) between the two training groups (F(1,50) = 0.01, p= .933). Shapiro-Wilks tests indicated that all of the dependent variables were normally distributed.
For the SEEQ and CLE questionnaires, we ran multivariate analyses because the responses were composed of multiple non-independent subscales. Responses for these instruments were analyzed using a 2x2 between-subjects MANCOVA, and follow-up ANCOVA tests were performed where the multivariate response was significantly predicted by training group (similar to the analyses suggested in other studies [38,39]. For follow-up tests, Type I error was controlled for using a Bonferroni-corrected alpha value (e.g., the SEEQ inventory had 9 factors, so we used α = .0056). We ran separate MANCOVAs for GTA self-evaluation (n = 52 for both SEEQ and CLE) and for undergraduate evaluation (n = 47 for SEEQ, n = 50 for CLE). Two independent variables were included in our MANCOVAs: training group (control or inquiry) and GTA academic program (Masters or PhD), while previous GTA experience (in years of teaching) was included as a covariate. The dependent variables were all of the nine SEEQ and six CLE factors. GTA self-responses are analyzed as raw ratings. Undergraduate responses were pooled for each GTA lab section and analyzed as a mean rating for each GTA. We assessed significance of training group as a multivariate effect using the Pillai's Trace statistic, since Pillai's Trace is the multivariate measure which is both: (1) most robust to multivariate non-normality; and (2) conservative when the sample size is small (<100) [40].
We ran each MANCOVA without the covariate (i.e., as a MANOVA) to see if the significance levels of any multivariate or univariate effects depended on its inclusion. These analyses indicated that this was not the case, and as such, we therefore present only the MANCOVA results here.
To analyze grade data, we used a two-factor ANCOVA (n = 49) to examine standardized grade scores, again with training group (control or inquiry) and GTA academic program (Masters or PhD) as independent variables and GTA experience (in years of teaching) as a covariate. Again, the analysis was rerun without the covariate (i.e., as an ANOVA) and the overall findings were consistent with the ANCOVA results. As such, only the ANCOVA results are reported here.
All statistical analyses were performed using SPSS 20. Effect sizes are reported using partial η 2 for multivariate tests and r for univariate tests. Partial η 2 indicates the estimated amount of the total variation in the response due to variation in an individual predictor [41].

vi) Research Ethics
This study was examined and approved by the Carleton University Research Ethics Board (Project Number 13-0563). Written informed consent was obtained from all human subjects -both GTA participants and undergraduate survey respondents -involved in this study. Subjects were specifically informed that: (1) they were under no obligation to continue their participation in the research project; (2) that there were no known risks associated with participation; and (3) no identifying information would be collected, responses would be kept anonymous, and results would be reported only in aggregate.

Results a) Student Evaluation of Educational Quality Results
There was not a significant effect of the covariate -GTA experience (in years) -on the undergraduate responses to the SEEQ questionnaire (see Table 3 for a list of SEEQ Factors) (Pillai's trace = 0.18, F(9,34) = 0.81, p = .61, partial η 2 = 0.18, observed power = .33). There was a significant multivariate main effect for training group, (Pillai's trace = 0.51, F(9,34) = 3.93, p = .002, partial η 2 = 0.51, observed power = .98; Table  7). However, there was neither a significant effect of GTA Refers to the instructor's ability to created attentiveness and interest in the educational material on behalf of the student. High instructor ratings indicate that the instructor is creating engagement through dynamic presentation, and relating course material in a way that evokes interest.
"My GTA was energized and dynamic in conducting the course" (3) Organization 4 Refers to the structure and transparency of the instructor's explanation of subject matter. High instructor ratings indicate that the instructor is relating information clearly and precisely, in a way that is easy for students to understand.
"GTA explanations were clear" (4) Group Interaction 4 Refers to the ability to foster academically useful social interactions within the classroom. High instructor ratings indicate that the instructor is encouraging group work in a positive way, and is motivating students to share knowledge effectively.
"Students were invited to express their own ideas and/or question the GTA" Note: sample items shown refer to the undergraduate survey; GTA survey contained the same items, but replaced "my GTA" with "I" (e.g., "Overall, I am a good teacher").
Items were rated on a five-point Likert scale (strongly disagree, disagree, neutral, agree, strongly agree) with a "NA/Don't Know" null response included.  Note: sample items shown refer to undergraduate survey; GTA survey contained the same items, but replaced "my GTA" with "I" and "me" with "undergraduate students" (e.g., "I helped undergraduate students learn problem-solving skills").
Items were rated on a five-point Likert scale (strongly disagree, disagree, neutral, agree, strongly agree) with a "NA/Don't Know" null response included.

Discussion
These results suggest that offering inquiry-oriented pedagogy GTA training can yield gains in academic performance in undergraduate science labs and improve GTA teaching (as evaluated by both undergraduates and GTAs), particularly with respect to scientific reasoning. Inquiry pedagogy training group GTAs outperformed control training group GTAs, both with respect to SEEQ/CLE questionnaires and to self-reported undergraduate grades, and there were no items on either survey in which control training group GTAs received significantly higher ratings than inquiry pedagogy training group GTAs (this was true for both undergraduate evaluations and GTA self-evaluations). From the student-and self-evaluation data, GTAs who completed the inquiry pedagogy training received higher ratings than those in the control training group in two main areas: enthusiasm and interpersonal skills, and organization and academic value. This first result is fairly straightforwardthe inquiry pedagogy training group was rated significantly  higher than control training group GTAs for enthusiasm (SEEQ factor 2) and individual rapport (SEEQ factor 5), both of which are important skills in an interactive role such as a lab facilitator. The second result was that inquiry pedagogy training group GTAs received higher student ratings for organization (SEEQ factor 3) and assignments (SEEQ factor 7), both areas of the lab in which the GTA has only partial control, suggesting improvement in general teaching ability and planning skill. More specifically, undergraduates rated GTAs who completed the inquiry pedagogy training higher than those who completed the control training regimen for lower-order cognitive skills (e.g., SEEQ factor 1, CLE factors 2 and 3), higher-order cognitive skills (CLE factors 5 and 6) and overall measures of teaching ability (SEEQ factor 9). Taken together, these results indicate that GTAs who completed the inquiry pedagogy training were rated as better organized, provided better feedback on assignments, and were better overall teachers of both higher-and lower-order skills than the control training group GTAs. We note that the differences between mean Likert scores for the two treatments are small in some cases. Likert scores, which have a limited range given that only five responses are possible (i.e. a score of 1-5), required consistently lower or higher values to register as significantly differing from the value of the other GTA training group. Since our analysis used a fairly conservative Bonferroni correction to decrease the likelihood of Type I errors, small but statistically significant differences between training groups for a specific SEEQ or CLE factor likely indicate improvements that are real but small.
Our results indicated that self-reported grade data confirmed the advantage of inquiry pedagogy training, with the lab groups of GTAs who completed the inquiry-based learning pedagogy training reporting significantly higher standardized grades than GTAs who completed the control training regimen. These results suggest that undergraduate students in the lab sections of experimental group GTAs had higher overall course grades than did students in the lab sections of control training group GTAs. Because course grades include both a lab and a lecture component, higher overall grades may not be due entirely to influence of lab GTA, although it may be that effective lab teaching (from the GTA) may have additional in-lecture benefits that we did not quantify. Both training groups outperformed the average grade of the course, perhaps because some GTAs in the course did not have any training whatsoever. The finding that the grades of both GTA groups were higher than the course mean indicates that there might be some benefit to the control "best practice" training. The best practice training focused on practical skills and did use elements deemed successful in other studies (5). This type of GTA training might improve the academic value of lab activities for undergraduate students, but that GTAs, who are not expert teachers, may not be able to identify how or why. We conclude that the recorded differences in (undergraduate-and GTA-rated) teaching ability between inquiry pedagogy training group GTAs and control training group GTAs are due to the fact that inquiry pedagogy GTA training improves GTAs' ability to facilitate guided-inquiry activities related to scientific reasoning. While many methods can be used to teach undergraduates content, structuredinquiry activities (i.e. labs) are specifically designed to ask students to use higher-order cognitive skills as they employ the scientific method to solve problems. GTAs are evidently better able to teach undergraduates to "reason like scientists" after inquiry-based learning pedagogy training. We note that students rated inquiry pedagogy training group GTAs as better teachers of both the particular SEEQ factors related to inquiry facilitation (including academic value, enthusiasm, group interaction and rapport) as well as higher-order cognitive skills (i.e. higher CLE factors). We suggest that inquiry pedagogy training, which encourages GTAs to use active feedback as the lab activity is performed, has enabled inquiry GTAs to better teach using guided inquiry.
There are several limitations related to this study. Firstly, our sample size was small (52 GTAs), and the trial had only two training groups. Second, although there was a highly significant predictive relationship between training group and mean  standardized grade for GTA lab section, lab grades were only part of the total grade; it is therefore unclear to what degree this treatment might predict improvement in student grades. Third, we note that we did not directly assess GTA teaching skills, and instead used student and GTA evaluations-along with self-reported student grades-to estimate actual GTA teaching ability. We also assume that the students who participated in our survey were representative of the larger undergraduate population. Further study, on a larger scale, of inquiry-based learning pedagogy training for science lab GTAs is our main recommendation for further studies. Although we believe our conclusions are supported by the data collected, a large, hierarchically-designed intervention experiment, involving a greater number of replicate training groups, would provide a more powerful test of our hypothesis. Further study of the reason of the differences between undergraduate and GTA questionnaire responses with respect to individual SEEQ and CLE factors might also be valuable; it is unclear why undergraduates seem better able to identify differences between training groups than the GTAs themselves.

Conclusion
In conclusion, our data indicate that providing GTAs with a theoretical understanding of the guided-inquiry methods that underpin lab activities increases the quality of GTA teaching in introductory science labs, especially in areas related to scientific reasoning. Across all measures, inquiry-based learning pedagogical training for GTAs seems to improve teaching, even when training is limited to only five hours. We believe that our data indicate that there is value in including information on inquiry-oriented teaching methodologies in future GTA training regimens, especially where those GTAs teach inquiry-oriented activities such as teaching labs.