Table 1.
The scientific and engineering practices and crosscutting concepts as listed in the Framework [19].
Table 2.
Constructed and selected response criteria for the scientific practice of Developing and Using Models from the 3D-LAP. Differences are highlighted in bold.
Fig 1.
Examples of the criteria found in the 3D-LAP for core ideas, one each from biology, chemistry, and physics.
Fig 2.
Assessment task from an introductory biology course that elicits evidence of three-dimensional learning.
Panel A shows the parts of the task that meet each of the criteria for the Developing and Using Models scientific practice (see Table 2). Panel B shows the parts of the task that meet the criteria for both the crosscutting concept (Structure and Function, given in the text above) and core idea (Structure and Function, see Fig 1). Further analysis of this task is provided in the S1 Supporting Information.
Fig 3.
Comparison of two exams characterized using the 3D-LAP.
The first row of each diagram shows the question number. In the last three rows, blue, green, and red shaded cells, indicate there is evidence for a scientific practice, crosscutting concept, or core idea, respectively. Questions 21–23 on the Chemistry B exam are constructed response. All other questions shown are selected response.
Fig 4.
Comparison of the percentage of exam points assigned to tasks that were coded with each of the three dimensions for two exams per discipline.
The Chemistry A and B exams are the same ones shown in Fig 3.
Fig 5.
Comparison of two exams from each discipline, displaying the percentage of exam points assigned to tasks coded with scientific practices, crosscutting concepts, and core ideas.
This representation shows that although there are few zero dimensional tasks in physics, the vast majority of the tasks address core ideas, and the other two dimensions are almost never elicited.
Table 3.
Possible coding of tasks by two coders and the resulting code for inter-rater reliability.
Table 4.
The percent agreement values for each of the three dimensions, which we use to determine our inter-rater reliability using the 3D-LAP to evaluate assessment tasks in our dataset.