The authors have declared that no competing interests exist.
Conceived and designed the experiments: SVD BYH. Performed the experiments: SVD BYH. Analyzed the data: SVD. Wrote the paper: SVD BYH.
Neurotree is an online database that documents the lineage of academic mentorship in neuroscience. Modeled on the tree format typically used to describe biological genealogies, the Neurotree web site provides a concise summary of the intellectual history of neuroscience and relationships between individuals in the current neuroscience community. The contents of the database are entirely crowd-sourced: any internet user can add information about researchers and the connections between them. As of July 2012, Neurotree has collected information from 10,000 users about 35,000 researchers and 50,000 mentor relationships, and continues to grow. The present report serves to highlight the utility of Neurotree as a resource for academic research and to summarize some basic analysis of its data. The tree structure of the database permits a variety of graphical analyses. We find that the connectivity and graphical distance between researchers entered into Neurotree early has stabilized and thus appears to be mostly complete. The connectivity of more recent entries continues to mature. A ranking of researcher fecundity based on their mentorship reveals a sustained period of influential researchers from 1850–1950, with the most influential individuals active at the later end of that period. Finally, a clustering analysis reveals that some subfields of neuroscience are reflected in tightly interconnected mentor-trainee groups.
Neuroscience is a highly interdisciplinary field that draws researchers from a variety of backgrounds ranging across the sciences and humanities. Understanding how ideas are drawn into neuroscience from other fields and how they interact is of central interest to the history of science. Given the large size of the field (the annual meeting of the Society for Neuroscience regularly draws over 30,000 attendees), it is becoming increasingly difficult even for active neuroscientists to simply observe and describe the trends governing the field. These problems are ripe for computational tools that enable systematic organization and study of large data sets containing information about individual neuroscience researchers.
An academic mentorship database provides several additional benefits to a research community, allowing new members to learn the lay of the land and to place themselves within the context of their field. Several fields of science have published their own mentorship history in some form or another, including mathematics, computer science, primatology and physics
This report describes Neurotree
The dataset contained in Neurotree provides a valuable resource for quantitative study of the individuals and disciplines that have influenced neuroscience throughout its development. Because mentors often train multiple students, understanding academic mentorship also allows one to follow the divergence of theories and techniques through different descending branches of the tree. Here we describe the data that constitutes Neurotree, assess how completely and accurately it documents mentorships, and illustrate how it can be used to understand large-scale trends in the field of neuroscience.
Neurotree is accessed through a public website at
This plot shows a typical mentorship tree diagram for one researcher in Neurotree, Robert Yerkes, modified to fit in print format and to display extra details from the database. The tree mimics the style of biological family trees, in which the central node is linked downward to children (trainees) and up to parents (mentors). Each node in this graph is annotated with the year in which it was added to Neurotree (e.g., “added 2006”), illustrating how the tree has filled in over time. Numbers (e.g., “+37”) on nodes at the top and bottom of the tree indicate the number of ancestors and descendents, respectively, from that node.
Information about each researcher in Neurotree can also be displayed in a more detailed format, as in the example of Robert Yerkes shown here. Detailed information, when available, includes a photo or drawing, links out to various analyses on the Neurotree site, biographical notes and a link to a relevant off-site web page. In addition, dates and locations of mentor relationships are provided when available.
The core of Neurotree is a relational database consisting of two main tables (
Connection type | Description |
0 | Research assistant. Undergraduate, pre-bachelor's degree. |
1 | Graduate student. Work lead to master's or doctoral dissertation. |
2 | Postdoctoral fellow. Short-term employment after earning doctorate. |
3 | Research scientist. Long-term employment after doctorate. |
4 | Collaborator. Non-directional, work together influenced each other's thinking. |
Additional database tables are designed to link researchers to a set of institutions and research areas. We have imposed no restrictions on the contents of these auxiliary tables. Thus if a new institution or research area is entered by a user, this will result in the addition of a new entry to the respective table.
The simple architecture of the tree reflects an attempt to incorporate information into the Neurotree database using an organic, unrestricted approach. The field defining an individual's research area is not restricted to a fixed set of terms and, instead, can be whatever the user adding data to the site considers appropriate. This approach leads to an obvious potential for lack of consistency, but at the same time permits a very flexible and dynamic catalog of research areas in neuroscience, which evolve rapidly. Most importantly, the flexible structure of the Neurotree database means that future researchers can amend, improve, and augment this structure in the future.
As it has grown, Neurotree has been confronted with ambiguities over the precise definition of mentoring relationships, relying on modern terms such as “research assistant” and “postdoctoral fellow.” Such designations have evolved over the course of history, and differences persist between countries today. In addition, many influential individual careers have taken idiosyncratic paths that do not reduce easily to a simple set of relationships. Our preference is to be pragmatic and suggest that contributors use the term that seems most appropriate based on the stage of an individual's career and to document all relationships that substantially influenced the trainee's work (see
Another technical problem is that some scientists have multiple home institutions. We have adopted a policy that the most recent home institution should be the official institution. We acknowledge that this leads to some confusion, as when a scientist is closely associated with one location and then, near retirement, moves to another one. We plan to revise the database structure to track institutional affiliation over time.
Like Wikipedia and other crowd-sourced projects, Neurotree is publicly editable, and, as a consequence, is not guaranteed to be accurate. Formal documentation is not required for submissions, but we have implemented a simple reporting system for flagging and resolving possible errors. Error reports can be submitted by any site visitor. A volunteer group of editors validates these reports and makes appropriate changes to the database. Generally error reports can be checked against information publically available on the Internet. If the need arises for a more extensive discussion, editors may choose to contact the individuals who entered the information in question or to open the discussion with other editors. In the case of discrepancies that cannot be definitively resolved (typically in the case of historical figures whose biographies may be incomplete), the information in the tree is labeled as potentially unreliable.
In addition to inaccuracies in data, an additional issue is that information in the database may not be complete, as only a subset of mentors or trainees may be listed for any given individual. In this study, we explore how several statistical properties of the tree have evolved over time in order to understand how complete information is in the current tree.
Neurotree can be described as a graph composed of nodes (researchers) and directional edges (mentor relationships,
The distance,
In order to characterize how prolific one individual has been in training researchers who have themselves been productive, it is possible to count offspring using the directional information in the graph. A mentorship tree is defined as the graph of nodes along the mentor-to-trainee axis from an individual researcher. In this way, the total impact,
An alternative metric has been proposed for studying fecundity based only on the researchers trained directly by a mentor rather than iteratively across generations
Given that neuroscience, like any academic discipline, contains a number of sub-fields, one might expect clustering, in which researchers tend to train others who continue work within their subfield rather than in a new, completely unrelated field. We studied this problem by clustering the Neurotree database according to mentorship relationships. A sparse connection matrix was defined,
Clusters were assigned numbers based roughly on chronology, ordered by the average generation of each group. In order to characterize their basic features, each cluster was labeled with two representative researchers (the two individuals with shortest distance to other members of the cluster) and two representative research areas (the most common research areas across all members of the cluster). The clusters were plotted using open source software (Graphviz,
The single parameter required by the spectral factorization algorithm, cluster count, was not identified by objective criteria. The value of 60 produces a number of clusters that could be plotted on a single graph and which demonstrated the variable topology of clusters identified by spectral factorization (large versus small, tightly versus loosely connected, etc., see Results). Changing the number of clusters over a range of 40–80 did not have a major impact on the patterns observed in these properties.
It is likely that other, more advanced clustering algorithms may provide cleaner and more interpretable results. We chose the spectral factorization method for this study as a compromise between a more standard
Neuroscience is an evolving field, and many features of the graph are likely also to evolve over time. Thus to understand the tree, it can be helpful to measure statistics as a function of the time at which researchers performed their work. The database has a capacity for logging the dates of mentor relationships, but this information is often incomplete. As an alternative to using absolute dates, we labeled each researcher with their
Data contained in Neurotree are available for export under the Creative Commons License 3.0. The data may be used freely by other researchers, and publications using the data should cite this publication as a source. Instructions for requesting the data are included in the site FAQ at
Data in Neurotree are collected from publicly available web sites and databases. Thus this study represents an analysis of information in the public domain. In order to respect potential privacy concerns, we have given individuals the opportunity to have their information removed from the Neurotree simply by submitting an error report or contacting the site administrators.
Neurotree
As the site was indexed by search engines and subsequently discovered by researchers with related interests, the scope of the database grew unexpectedly beyond its original focus. An example illustrates how the tree filled in around one researcher in the database, Robert Yerkes, a comparative psychologist (
The site has also taken on a number of new functions in addition to its original design as an education resource. Neurotree can serve as a tool for disambiguating between researchers with the same name, a problem that occurs frequently in such a large field. It is also used as a professional networking tool, enabling journal editors, employers and potential collaborators to learn more about individuals they encounter in the community.
As the site has grown, we have taken a broad view of the term “neuroscience”, and have chosen to err on the side of inclusiveness. Neuroscience has been and continues to be a highly interdisciplinary field, and maintaining information about the relationship between neuroscience and related fields is valuable in and of itself. Thus we have deliberately encouraged people to submit information about connections between neuroscientists and well-known individuals in other fields. This information provides insight into connections across a broader academic community and with the historical roots of the field.
The site continues to grow and, as of January 2012, draws an average of 25,000 unique visitors each month. The original database was seeded with about 500 researchers and has since grown to 35,000, with about 300 added each month (
One of the benefits of any genealogical database is the ability to map the connections linking members of the tree. Neurotree can be described mathematically as a graph, with nodes (researchers) connected by edges (mentor relationships,
The graphical structure of Neurotree permits a number of analyses, some of which we demonstrate below. Because the database depends on contributions of volunteer users, however, the results of any analysis must be interpreted with the caveat that information in the database is not complete. As the tree matures and fills in, we expect the data to become increasingly more reliable.
In order to assess the reliability of the current database, we performed simple spot checks on its content. First we examined the accuracy of 100 randomly selected researchers in Neurotree compared to information available elsewhere on the Internet. Of these, 72 were verified to have correct institutional affiliation, 13 had positively identified errors, and no information was available about the accuracy of the final 15. Given that information is subject to change sporadically during a career, one less stringent concern is that information be accurate for researchers who are retired or no longer active in research. Of the 85 individuals identified outside of the database, 13 were no longer active, and 12 of these were accurately documented in Neurotree.
To assess how completely Neurotree represents the field, we also compared faculty rosters between Neurotree and three departmental web sites in institutions that varied in size and geography (
Department | Count | Listed in Neurotree | Correct institution |
Hebrew University, ICNC | 27 | 22 | 20 |
Reed College, Psychology | 9 | 6 | 6 |
University of Michigan, Neuroscience | 114 | 67 | 58 |
Totals | 150 | 95/150 | 84/95 |
Percent | 63% | 88% |
Finally, we assessed how accurately and completely mentorship records were documented for five research groups by comparing trainee lists from public web pages and information in Neurotree (
Mentor | Institution | In lab web site | In Neurotree | In Neurotree, not lab site |
C. Daniel Salzman | Columbia University | 14 | 8 | 0 |
Patricia Kuhl | University of Washington | 15 | 5 | 1 |
Barbara Chapman | University of California, Davis | 7 | 8 | 2 |
Robert Malenka | Stanford University | 15 | 25 | 14 |
Lynn Robertson | University of California, Berkeley | 16 | 11 | 0 |
Totals | 67 | 40/67 | 17/57 | |
Percent | 75% | 30% |
Lists of trainees (postdoctoral fellows only for Malenka, graduate students and postdoctoral fellows for all others) were compared between the websites of principal investigators who publish this information and mentorship data in Neurotree. In a few cases, Neurotree documented relationships that did not appear in the lab web sites (right column). All of these relationships were confirmed as accurate by a Medline publication record.
For a more quantitative analysis of the maturity of Neurotree's connectivity, we measured on the temporal dynamics of three statistics: the fraction of researchers linked in the main graph (
The trajectory of statistics for the first 1000 nodes follows a distinct pattern from that of the entire tree. The 1000th researcher was added when Neurotree had been online for 10 months. After that time, the fraction of these nodes connected to the main graph steadily increased until about month 60, at which point the fraction reached an asymptote of 96%. Simultaneously, the average distance between each pair of these nodes dropped and also reached an asymptote of 5.5 steps. The average number of connections per researcher to other researchers, finally, stabilized at 2.5. A slight ongoing rise in number of connections appears to reflect new connections that continue to form between members of the group. The fact that the number of connections within the first 1000 entries has stabilized does not mean that connections of these nodes with the rest of the tree have done the same. The average number of connections from this group to the entire tree has grown to 10, and continues to grown at a rate of 0.8 connection per year (data not shown). This ongoing growth likely reflects both the entry of new researchers into the field as well as the filling in of earlier connections.
In contrast to the first 1000 entries, we observed that the statistics of the full tree have remained more or less flat since the first year online. The fraction of nodes connected to the main graph has remained stable at about 80%. The mean distance between nodes has very slowly risen from 10 to 11 steps. Finally, the average number of connections per node has remained nearly constant at 2.2. This suggests a balance between the rate at which new entries are added and connections between older entries are filled in more completely.
As illustrated by the example tree (
We measured a fecundity index by counting iteratively the number of trainees and trainees of those trainees, normalized exponentially by the number of steps from the original mentor (see Methods,
The 25 researchers with the highest fecundity index appear in
Rank ( |
Name | Institution | Year | Gen | Rank (Alt |
||
1 | 1/4 | 1/10 | |||||
1. | John Eccles | Australian National University | 1937 | 20 | 153 | 1 | 11 |
2. | Charles Sherrington | University of Oxford | 1901 | 19 | 117 | 8 | 168 |
3. | Stephen Kuffler | Harvard University | 1962 | 21 | 167 | 2 | 26 |
4. | Karl Lashley | Harvard University | 1924 | 20 | 159 | 5 | 80 |
5. | John Langley | University of Cambridge | 1900 | 19 | 113 | 111 | 849 |
6. | Michael Foster | University of Cambridge | 1870 | 18 | 109 | 162 | 1342 |
7. | Edgar Adrian | University of Cambridge | 1923 | 20 | 155 | 43 | 273 |
8. | Donald Hebb | McGill University | 1952 | 21 | 199 | 9 | 58 |
9. | Robert Yerkes | Yale University | 1918 | 19 | 154 | 105 | 645 |
10. | Johannes Müller | Humboldt Universität zu Berlin | 1842 | 16 | 75 | 81 | 178 |
11. | Wilhelm Wundt | University of Leipzig | 1886 | 17 | 111 | 106 | 206 |
12. | Bernard Katz | University College London | 1952 | 21 | 204 | 19 | 82 |
13. | Torsten Wiesel | Rockefeller University | 1974 | 22 | 267 | 4 | 14 |
14. | Keith Lucas | University of Cambridge | 1904 | 19 | 136 | 236 | 1826 |
15. | Hans- Lukas Teuber | Mass. Inst. of Technology | 1965 | 21 | 231 | 13 | 85 |
16. | John Black Johnston | University of Minnesota | 1907 | 22 | 157 | 231 | 2077 |
17. | John Watson | Johns Hopkins University | 1916 | 19 | 158 | 234 | 2095 |
18. | Clinton Woolsey | University of Wisconsin | 1964 | 21 | 230 | 25 | 141 |
19. | Philip Bard | Johns Hopkins University | 1928 | 20 | 197 | 77 | 701 |
20. | Hugo Munsterberg | Harvard University | 1902 | 18 | 137 | 193 | 738 |
21. | John Fulton | Yale University | 1932 | 20 | 168 | 62 | 255 |
22. | Wilder Penfield | McGill University | 1952 | 20 | 181 | 91 | 535 |
23. | Hallowell Davis | Harvard University | 1935 | 21 | 170 | 118 | 598 |
24. | Archibald Hill | University College London | 1915 | 20 | 171 | 82 | 264 |
25. | Julius Axelrod | National Inst. of Mental Health | 1962 | 22 | 255 | 12 | 44 |
Year refers to first year a degree was awarded to a trainee of this mentor. Generation (Gen) refers to the number mentorship steps back to the oldest common ancestor. Rankings for these individuals based on measures with alternative (Alt) normalization factors appear in the columns at right. At the extremes,
To illustrate the importance of appropriate normalization in the fecundity calculation,
One challenge to precise interpretation of the temporal features of these data is that dates are not recorded for a substantial number of connections in Neurotree. In order to include a larger pool of researchers in the analysis of temporal dynamics, we computed the mentorship generation for each researcher by counting the number of steps back directly to their oldest ancestor. As discussed above, 64% of researchers in Neurotree can trace their mentorship back to a single individual. When we compared generation versus first mentoring year for the subset of researchers with appropriate data, we found a very strong correspondence (
Using mentorship generation as a proxy measurement for time, we could then study the timecourse of fecundity across the field (
To confirm that mentorship generation captures the essential features of a more strictly defined temporal analysis, we repeated the analysis of fecundity over time, but now focused only on the subset of 4654 researchers for which mentorship dates were available (
In order to study the relationship between mentorship groups and research areas within neuroscience, the entire set of connected nodes (30055/35953 researchers) was clustered into 60 groups based on the strength of mentorship connections (see
Clusters were derived by spectral factorization of the sparse matrix of mentor relationships between all researchers connected in the main Neurotree graph. Each box describes a cluster, numbered according to the average generation of researchers in the cluster. Clusters are plotted in roughly chronological order from top to bottom. Each cluster is labeled with the names of the two researchers with the smallest mean distance to other researchers in the cluster and by the two most common research areas in the cluster. Lines connecting clusters indicate the relative strength of connections between them (dotted: 1–5, solid: 6–20, bold: 21+).
Each cluster was labeled with the names of the two researchers with the lowest mean distance to other researchers in that cluster and by the two research areas most frequently occurring across the cluster. Despite being derived through independent metrics, research areas representing a cluster typically show an obvious relationship to the representative researchers. For example, in cluster 10, Donald Hebb and Richard Thompson are both associated with the study of memory. Likewise, for cluster 13, Terrence Sejnowski and Torsten Wiesel are associated with visual and systems neuroscience. The information in Neurotree about research areas is not complete, as it depends on unconstrained choices by users entering the data. This incomplete information could lead to some of the apparent discrepancies (e.g., cluster 26, Rakic and Greenberg are not immediately associated with pain research).
Some neighboring clusters identify logical divisions between research areas. For example, cluster 15 (Schiller/Merzenich) captures a number of researchers who study sensory processing in non-human primates while cluster 23 (Gabrieli/D'Esposito) includes researchers who study similar problems of representation in humans.
When the clusters were studied more quantitatively, a few additional features were noteworthy. First, the size of clusters varied substantially (
A more informative analysis may be to study how tightly coupled clusters are, relative to their average distance from other clusters (
Data documenting the tradition of academic mentorship naturally provoke curiosity to most people who have participated in the system. Each of us has received training from someone, who in turn was trained by someone else, and the whole process continues, iteratively, into the unknown past. An understanding of one's academic mentorship allows one to connect oneself to the historical development of a field. A genealogical tree also provides the opportunity to see otherwise invisible links between ourselves and our colleagues, our friends, and important figures in the field. For these reasons, academic genealogies have been created for many fields. Neurotree is an attempt to do so for the large and diverse field of neuroscience.
Good mentoring is a skill that can differentiate successful from unsuccessful lab leaders. As of yet, very little is known about how important mentorship skills are in producing successful progeny
More generally, Neurotree provides an important tool for the study of the birth, life, and death, of ideas. Central to the function of academic mentorship is the transmission of ideas from mentor to trainee. Thus having a clear and full database of individuals and their relationships can serve as a tool for studying the life cycle of ideas.
We argue that Neurotree has a specific role in the field of neuroscience. It provides a single repository for valuable information that is both highly specific and well-defined (such as mentor-trainee relationships) and that is more open-ended (such as field of interest). Additionally, Neurotree presents an opportunity to sort out potential confusion regarding multiple researchers with the same name. While there is no current widely accepted unique identifier for individual scientists, the Neurotree database can help discriminate among individuals.
As an experiment in crowd-sourcing the acquisition of data, Neurotree has been successful thus far. The Society for Neuroscience, whose academic focus encompasses a similar scope to that of Neurotree, lists 41,000 current members. This number does not include historical figures or neuroscientists who have not joined the Society, but the order of magnitude of this number matches that of the number of researchers listed in Neurotree. Given its record of growth, we expect Neurotree to develop a progressively more complete description of the field, thereby allowing reliable and unbiased sampling of mentorship relationships, and increasingly more accurate measures of progeny counts and connection distances. We have identified limitations to the scope of the dataset, both in its accuracy and completeness, and it remains an open question as to how completely these gaps can be filled with the current crowd-sourcing approach.
In addition to the general problem of sampling, crowd-sourcing efforts face the additional challenge of possible bias in how data is sampled
Even with an incomplete data set, we have demonstrated approaches for approximating missing data from the database (e.g., using generation as a substitute for first year of mentoring). This has permitted us to include a much larger data set into the analysis of historically influential figures in the field.
Neurotree can serve a number of functions, all of which would be improved if data in the tree were more complete and accurate. Substantial data sources exist in the public sphere online that could be used to automatically or semi-automatically fill in gaps in institutional affiliation, mentorship dates, and research areas in the current database. These resources include structured databases (e.g.,
Numerous additional data resources exist that can be incorporated into Neurotree. Information about the contents of publications (methods, preparations, scientific questions) can be linked to individual researchers, providing a means of systematically studying the relationship between mentorship and the experimental approaches adopted by trainees. As expanded scientific content is linked to researchers, Neurotree will provide an increasingly powerful tool for studying the evolution of the field.
The software that forms the basis for Neurotree can be readily adjusted to make a database for any academic field. Based on unsolicited requests, we have created academic trees for other disciplines that, as far as we know, lack one of their own. These other trees include history, linguistics, and marine ecology, as well as a dozen others. Although they are given their own tree for display, they draw from the same database. This shared database permits cross-listing researchers between trees in different disciplines, so that, as the trees fill in, it will be possible to trace the larger-scale linkages between fields. Furthermore, it will be possible to study not only the graphical properties of mentor relationships within neuroscience but also how ideas and trends have traveled between fields.
The authors would like to thank the thousands of contributors to Neurotree who have made this experiment possible. We also thank Jack Gallant for critical early contributions to the data set and Henry Cooney for assistance with data analysis.