Fig 1.
Extraction and processing of text from curricular documents.
Slides, notes, and other documents were collected for every session in the pre-clerkship curriculum for one medical school class. Text was extracted from the documents using PDF Miner. Text from documents for the same session was combined and then processed to remove non-alphanumerical characters and common words, resulting in a collection of meaningful words. The LDA mallet library was used to generate a topic model.
Table 1.
Most relevant words for select topics.
Table 2.
Mapping of sessions to competencies.
Fig 2.
Quantitative mapping of sessions to competencies.
Each topic was mapped to one competency based on an evaluation of the 30 most salient words for the topic. The scores for how well each topic represented the content for all sessions was summed to generate a total score for each topic. The total scores for topics that mapped to the same sub-competency were summed to generate a final score for each sub-competency. HP: Health Promotion and Disease Prevention, MTD: Mechanism and Treatment of Disease, CR: Clinical Reasoning, PC: Patient Care, PR: Professionalism, CM: Communication, RS: Responsibility to Society, CDK: Creation and Dissemination of Knowledge, PS: Physician as Scientist.
Table 3.
Mapping course content to competencies.
Table 4.
Mapping pedagogies to competencies.
Fig 3.
Change in use of gender and gender identity terms across four years of pre-clerkship curricula.
The total number of gender identity terms used in four years of pre-clerkship curricula (triangles). The number of sessions in which 5% of the terms in the documents were gender identity (circles).
Fig 4.
Integration of content has declined over the past four years.
A. A network map of sessions in the pre-clerkship curriculum. Each node represents a session and a line between nodes indicates the sessions cover related content. Sessions from three courses have been highlighted: Clinical Skills (green), Anatomy (yellow) and Homeostasis (red). The network map can be accessed online at http://medcurriculum.org/pre_clerkship_network. B. Distribution of connections for sessions. The histogram shows the number of sessions that are connected to The map Integration for each class year was determined by the two different metrics of connectivity in network graphs of sessions. Network density (circles) is measure of the fraction of all possible connections in a network graph that exist in the graph. Mean connections (triangles) is the mean number of connections for each session in the graph. Both show a slight decrease over the past four years. B. Integration within courses has remained stable while integration between courses has declined. The mean number of connections between sessions in the same course (triangles) and between sessions in different courses (circles) are shown.