Systematic shifts in scaling behavior based on organizational strategy in universities

To build better theories of cities, companies, and other social institutions such as universities, requires that we understand the tradeoffs and complementarities that exist between their core functions, and that we understand bounds to their growth. Scaling theory has been a powerful tool for addressing such questions in diverse physical, biological and urban systems, revealing systematic quantitative regularities between size and function. Here we apply scaling theory to the social sciences, taking a synoptic view of an entire class of institutions. The United States higher education system serves as an ideal case study, since it includes over 5,800 institutions with shared broad objectives, but ranges in strategy from vocational training to the production of novel research, contains public, nonprofit and for-profit models, and spans sizes from 10 to roughly 100,000 enrolled students. We show that, like organisms, ecosystems and cities, universities and colleges scale in a surprisingly systematic fashion following simple power-law behavior. Comparing seven commonly accepted sectors of higher education organizations, we find distinct regimes of scaling between a school’s total enrollment and its expenditures, revenues, graduation rates and economic added value. Our results quantify how each sector leverages specific economies of scale to address distinct priorities. Taken together, the scaling of features within a sector along with the shifts in scaling across sectors implies that there are generic mechanisms and constraints shared by all sectors, which lead to tradeoffs between their different societal functions and roles. We highlight the strong complementarity between public and private research universities, and community and state colleges, that all display superlinear returns to scale. In contrast to the scaling of biological systems, our results highlight that much of the observed scaling behavior is modulated by the particular strategies of organizations rather than an immutable set of constraints.


Appendix A: Data documentation
Original data sources For this paper, we conducted analysis on a combined dataset composed of matched data from the Delta Cost Project, based on Integrated Postsecondary Education Data System (IPEDS), with additional variables of completion metrics, selectivity and mid-career education returns from College Scorecard Project, Brookings Value-Added Dataset and the Equality of Opportunity Project respectively.
IPEDS is the main postsecondary data collection program of the National Center for Education Statistics (NCES). It administers mandatory surveys to all postsecondary educational institutions in the United States and its territories that have a Program Participation Agreements (PPA) with the appropriate office at the U.S. Department of Education. This PPA makes the institution eligible to accept Federal Student Aid-receiving students, as authorized by the Title IV of the Higher Education Act of 1965 (Ginder, Kelly-Reid andMann 2014, Higher Education Act 1965). Therefore, universities that opt out from receiving federal funding for various reasons are not included in our analysis. However, for the purposes of this paper, the general terms "universities" and "schools" refer to these Title-IV postsecondary schools. The Higher Education Act of 1965 more specifically defines these Title-IV institutions as degree-granting institutions that admit as regular students only persons with a high school diploma or equivalent, are legally authorized by a state, and are accredited by an agency recognized by the U.S. Secretary of Education.
We primarily use 2013 data from the Delta Cost Project (www.deltacostproject.org), founded in 2007 by an independent non-profit of the same name. This dataset, which we abbreviate to "Delta data" or "Delta," brings together selected variables from the IPEDS surveys-most notably Institutional Characteristics, Fall Enrollment, Finance, Fall Staff and Graduation Ratesin order to facilitate in-depth comparisons between universities in the IPEDS universe (Hurlburt, Peek and Sun 2017). The Delta Cost Project standardizes data that had been imputed differently between reporting standards and years, as institutional groupings and the surveys themselves change over time (Hurlburt, Peek and Sun 2017, see also Appendix E). Key variables we used in the analysis include enrollment, faculty, revenues and expenditures; see Appendix B for a full list.
We supplement Delta data with data from the College Scorecard project, here "Scorecard," (U.S. Department of Education 2015, 2018, https://collegescorecard.ed.gov/), in order to examine student completion outcomes. The College Scorecard database was assembled primarily to help students and families assess college options. Scorecard integrates IPEDS metrics with National Student Loan Data System (NSLDS) data on completion, earnings and loan repayment for cohorts of Federal Student Aid-receiving (FSA) students, who are tracked via the Free Application for Federal Student Aid (FAFSA), the application for all federal student aid programs. The Delta Cost Project dataset also contains graduation rate data, for full-time, firsttime degree-seeking students. Scorecard outcomes are more complete because they follow cohorts of FSA students, as discussed in Appendix G.
We acknowledge here that IPEDS has recently added the Outcome Measures (OM) Survey, which includes 2015 and 2016 graduation rate data for 6yr and 8yr cohorts of not only full-time, first-time students, but all undergraduates by full-/ part-time and first-/ non-first-time (transfer). This complete coverage of all undergraduate students is lacking in the outcome measures we use, namely FSA completions, and would circumvent many of the problems and assumptions outlined in Appendix G. However, one advantage of examining FSA completions, over OM, is that FSA attributes transfer student completions to their original institution, whereas OM does not distinguish between transferring away and dropping out. In the community college sector, where transferring is often considered a desirable outcome, using OM would make success rates appear artificially low. Despite exceeding the current scope, IPEDS Outcome Measure data may merit future investigation.
Post-graduation student performance metrics, such as mean earnings after graduation, all come from the Mobility Reports Cards by Chetty, Friedman, Saez, Turner, and Yagan (2017), as part of the Opportunity Insights Project, downloadable from https://opportunityinsights.org/data/mrc_table3.csv. This data provides the average earnings in 2014 of students having attended a particular institution for at least a year (whether they graduated or not), for cohorts born from 1980 to 1991. We use the data for the cohort born in 1984. This dataset is the best existing estimate of earnings as the data was obtained from matching record of student attendance with tax filings.
Two more datasets are used as complements. First, the Brookings Added Value dataset is used as it contains data on normalized SAT entrance scores (on the quantitative section of the SAT) for 3626 institutions. Finally, we also make use of the Higher Education R&D Survey (HERD) sponsored by the NSF. It requests institutions of higher learning granting a Bachelor or higher degrees to report the source of their R&D funding on a yearly basis. This dataset for 2013 includes data for 886 universities for which we have a valid IPEDS UNITID. We include the reported revenue (other than own institutional fund) as a robustness check for the Delta data, especially given that Delta data do not provide a direct measure of research revenue (we used the revenue component "Government Grants and Contracts" as a proxy for research revenue).

Merging the datasets
To merge College Scorecard and Brookings to Delta data, we begin with all 5,804 entries in Delta data that have values for total enrollment and remain after our outlier analysis described in Appendix D; we then search the pool of 7,804 Scorecard entries for entries with matching IPEDS UNITIDs. We append matches from Scorecard to Delta to create the combined dataset used in our analysis.
On one hand, there are 238 universities in Delta without matching UNITIDs in Scorecard that have valid enrollment data, which enrolled a total of 197,719 students. These Delta-only observations include the U.S. military academies, a few small research universities, and (mainly) 2yr and 4yr for-profit institutions. A list accompanies the other data files used in our analysis. The main example is 26 campuses of Devry University, which enrolled over 74,000 students combined. The problem arises from a Scorecard grouping together campuses or institutions that report earnings and other data for FSA receiving students together. Delta data does not include such data and separates these schools by campus. We return to this problem in Appendix G on completion rate analyses.
On the other hand, there are 1,508 entries from Scorecard with UNITIDs that did not appear in Delta; however, 688 of these observations, accounting for 1,696,272 undergraduate students (neither total enrollment nor graduate enrollment are included in College Scorecard), actually find their way into Delta data as "child institutions," grouped under "parent institutions," and of these, 441 institutions with 405,075 students have usable FSA completions data. The remaining 820 Scorecard schools, that neither matched a Delta data entry nor a child institution wrapped into one, account for only 143,179 students. Appendix E explains in detail how Delta groups together campuses or institutions with common IPEDS reporting history while Scorecard disaggregates them into multiple institutions, and it addresses how this grouping affects our analysis. We conclude that campus and institutional grouping does not affect our main findings. Figure 4 in the main text results from combining tuition data from Scorecard, data on incoming student test scores from Brookings, and data on earnings from the Equality of Opportunity project. The unit of analysis for the Mobility data is the super-OPEID, derived from the Office of Postsecondary Education Identifiers, but which aggregates some OPEID6 units into super-OPEID because multiple OPEID6 units were found to match the EIN-ZIP numbers indexing colleges in the 1098-T forms used as one of the primary datasets by Chetty et al. (2017). Hence, as we merge the data on average earnings with the tuition data from Scorecard and the test data from Brookings, we face a downscaling problem: for OPEID6 units that are part of a super-OPEID, we have to calculate the average for the super-OPEID that each institution is part of. The Equality of Opportunity dataset has data on earnings for the cohort born in 1984 for 2,756 schools. 2,482 of these could be matched to Scorecard using OPEID6 values, as well as a fuzzy string matching approach for the name of the institution (because Scorecard seems to have some false attribution of OPEID6 values). Overall, we have earnings data for 80% of public institutions, 58% of private non-profit institutions and 38%% of private for-profit institutions. The schools for which data is missing tend to be smaller (less than 5,000 students). Overall, 2,183 schools have data for all of the three following variables: earnings, standardized incoming test scores, and tuition. Table B-1 includes detailed descriptions of each variable used in the analysis. They fall into seven categories: enrollment, revenues, expenditures, faculty, student outcomes, student earnings, and institutional characteristics. All revenue and expenditure data originally comes from the IPEDS Finance Survey through three state-level reporting standards: Governmental Accounting Standards Board (GASB) Statements 34 and 35 for most public institutions, (2) Financial Accounting Standards Board (FASB) for private and some public institutions (IPEDS 2014), or (3) FASB for for-profit institutions. All revenue and expense categories used here are reported consistently between the three standards, relevant exceptions noted below. See Hanover Research report (2014), Goldstein and Menditto (2017), and this IPEDS webpage: https://nces.ed.gov/ipeds/report-your-data/data-tip-sheet-distinguishing-finance-standards-fasbgasb

Appendix B: Precise definitions for key variables used
The "Data Source" column provides reference to the dictionaries, glossaries and other documents where the original variable definitions are explained. For readability we condensed, rearranged or rewrote each definition. Where we merged variables, or the Delta Cost Project merged IPEDS variables, it is always explained in the first sentence of the definition. Delta = Delta Cost Project dataset. Variable dictionary, data mapping file from IPEDS to Delta, and data file documentation (Hurlburt, Peek and Sun 2015) are all available here: https://www.deltacostproject.org/delta-cost-project-database IPEDS = Integrated Postsecondary Education Data System, descriptions of IPEDS surveys is available in their methodology report (Ginder, Kelly-Reid, and   The sum of all 2012-13 fiscal year revenue categories: student tuition, government student aid, government appropriations, government grants and contracts, donation and investment revenue, and auxiliary revenue. Institutional grant aid is not included.

IPEDS, Finance, Delta
The net amount of tuition revenue paid directly by students (not including Pell, Federal, State, and Local grants), whether from savings, Federal Student Loans, or other loan programs. Third party grants and scholarships and Veterans Affairs education benefits under the Post-9/11 or Montgomery GI Bill are also included. This amount approximates the total cost of education to students and families.

Component of revenue
Government student aid

IPEDS, Finance, Delta
Total student aid grant amounts, calculated as the sum of scholarships and fellowships from local government, those from state government, and the gross amount of federal educational grant aid disbursed or otherwise made available to recipients by the institution. Pell grants and all other grants issued by federal agencies are included, except Veterans Affairs education benefits. Loans and federal work study are also excluded.

Component of revenue
Government appropriations IPEDS, Finance, Delta The sum of local, state and federal appropriations. Below the state level, local appropriations consist of education district taxes assessed directly by an institution or on behalf of an institution, received immediately in full amount, and similar revenues from local policy based on collections of other taxes or resources (sales taxes, gambling taxes, etc.). State and federal appropriations are revenues received by the institution through acts of their respective legislative bodies. Funds reported in this category are for meeting current operating expenses, not for specific projects or programs. (In the delta data dictionary, federal appropriations are grouped with federal grants and contracts as restricted revenue, unlike state and local appropriations. However, the precise functions of these federal monies are not specified, so we assume they are more similar to state and local appropriations than government grants and contracts.) As such, grants, contracts and capital appropriations are excluded at all levels.

Component of revenue
Government grants and contracts, or Research revenue

IPEDS, Finance, Delta
The sum of local, state and federal grants and contracts. These revenues are reserved for research, training programs, or public service activities for which expenditures are reimbursable under the terms of a grant or contract issued by a government agency at any level (such as the National Science Foundation or California Energy Commission). Pell Grants are excluded if they were reported as federal grants. We use this variable as a proxy for research revenue.

Component of revenue
Donation investment revenue

IPEDS, Finance, Delta
The total amount of revenue coming from private gifts, grants and contracts, affiliated entities, and investment returns. Gifts are private donations involving no legal consideration, while private grants and contracts stipulate provision of specific goods and services to the funder for receipt of the funds. They directly relate to nonauxiliary institutional purposes, and include estimated dollar amount of contributed services. Affiliated entities include fundraising foundations, booster clubs, and other non-consolidated institutionally-related organizations. Investment income encompasses interest, dividend, rental and royalty income revenues derived from the institution's investments, especially those of endowment funds. (Endowment investment income may take the form of, and includes, both realized and unrealized gains and losses at FASB institutions, but not GASB institutions, which may explain some but not all of the gap between the endowments of public and private research universities.)

Component of revenue
Auxiliary revenue

IPEDS, Finance, Delta
The total amount of revenue from all operations of auxiliary enterprises, hospitals, independent enterprises, and other sources. Auxiliary enterprises are essentially selfsupporting fee-based operations of the institution that exist to furnish a service to students, faculty, or staff, including residence halls, food services, student health services, intercollegiate athletics (only if essentially self-supporting), college unions, college stores, faculty and staff parking, and faculty housing. Independent enterprises are generally limited to major federally funded research and development centers, though expenses managed as investments of endowment funds are excluded. See delta variable dictionary for full description of miscellaneous / other revenues.

Component of revenue
Total expenditure

IPEDS, Finance, Delta
The sum of all 2012-13 fiscal year expense categories: instructional, research, public service, maintenance, student service, and grants and auxiliary expenditures. Institutional grant aid, depreciation, and interest payments are not included.

Total expenditure
Instructional expenditure

IPEDS, Finance, Delta
Expenses of the colleges, schools, departments, and other instructional divisions of the institution, including general academic instruction, occupational and vocational instruction, community education, preparatory and adult basic education, special and extension sessions (all of which are both credit and non-credit). This category also includes expenses for departmental research and public service that are not separately budgeted.

Component of expenditure
Research expenditure

IPEDS, Finance, Delta
Expenses for activities specifically organized to produce research outcomes and commissioned by an agency either external to the institution or separately budgeted by an organizational unit within the institution. The category includes institutes and research centers, and individual and project research.

Component of expenditure
Public service expenditure IPEDS, Finance, Delta Public service activities primarily provide non-instructional services beneficial to individuals and groups external to the institution, such as conferences, institutes, general advisory service, reference bureaus, community services, cooperative extension services, and public broadcasting services.

Component of expenditure
Maintenance expenditure

IPEDS, Finance, Delta
The sum of expenditures on academic support, institutional support, and operation and maintenance activities. Academic support expenditures support the primary institutional missions identified by IPEDS: instruction, research, and public service. Relevant services and activities include the retention, preservation, and display of educational materials (for example, libraries, museums, and galleries), academic administration (including academic deans but not department chairpersons), course and curriculum development expenses, formally organized academic personnel Component of expenditure development, and services like information technology. Institutional support activities are typically understood as administration or bureaucracy. They include expenses for general administrative services, central executive-level activities concerned with management and long range planning, legal and fiscal operations, space management, employee personnel and records, logistical services such as purchasing and printing, and public relations and development. Operation and maintenance activities service campus grounds and facilities used for general, nonauxiliary purposes, with expenses including utilities, fire protection and property insurance. (GASB standards separate out operation and maintenance costs, but FASB standards embed them in each functional expense category, such instruction, research and student services. The Delta Cost Project harmonized all universities by subtracting operation and maintenance costs from all categories, and recombining them separately.) See Delta Documentation (Hurlburt, Peek and Sun 2017). Student service expenditure IPEDS, Finance, Delta Student service includes admissions, registrar activities, and activities whose primary purpose is to contribute to students emotional and physical well-being and to their intellectual, cultural, and social development outside the context of the formal instructional program. Examples include student activities, cultural events, student newspapers, intramural athletics, student organizations, supplemental instruction outside the normal administration, and student records. Intercollegiate athletics and student health services may also be included except when operated as self-supporting auxiliary enterprises.

Component of expenditure
Grants and auxiliary expenditure

IPEDS, Finance, Delta
The sum of grants to third parties and the operating expenses all auxiliary enterprises, independent enterprises, and hospitals that are reported as part of the university. Grants include scholarships and fellowships, paid to students or third parties for goods and services not provided by the institution, such as off-campus housing. These third party grants represent all financial aid counted towards total expenditures, since aid packages that students direct towards tuition and fees are not counted. See auxiliary revenue for a definition of auxiliary and independent enterprises.

Component of expenditure
Total faculty IPEDS, Fall Staff, Delta The total number of persons identified by the university in 2013 whose initial assignments principally entail conducting instruction, research or public service. They may hold academic rank titles of professor, associate professor, assistant professor, instructor, lecturer, an equivalent, or even executive titles ( "Total net assets is the sum of net assets invested in capital assets, net of related debt, restricted-expendable net assets, restricted-nonexpendable net assets, and unrestricted net assets. It can be calculated as the difference between total assets and total liabilities." -copied from Delta Dictionary Used as an alternative size variable in Figure

Appendix C: Total enrollment headcount as the underlying size variable
We used total enrollment headcount as the fundamental measure of scale for the entirety of our analysis. We chose this measure of size because our focus is on the university as a fundamental service to the student and this measure allows us to naturally compare inputs and outputs relative to the student along with the tradeoffs associated with other functions of the university.

Other enrollment variables as the underlying size variable
Instead of total enrollment, we could have used full-time equivalent (FTE) enrollment, to account for the fact that many students are not full-time students. There are pros and cons to each measure. Total enrollment is a good measure of the fixed cost associated with administering and hosting a student, but an imperfect measure for the variable cost of the time spent on instruction. Total enrollment is also a good measure of the number of students interacting with the institution and with each other and thereby generating possible network economies. However, FTE is a α enrollment : 1.03 ± 0.01 α faculty : 1.05 ± 0.01 α asset : 0.76 ± 0.01 α employee : 1.08 ± 0.01 10 1 10 2 10 3 10 4 10 5 10 6 10 7 10 8 10 9 10 10 10 10

Size Variable
Total Expenditure better measure for any input or output variable that is sensitive to the time spent by students on campus, such as teaching expenditure and tuition costs. For robustness, in Table C-1 we thus  reproduce Table 2 of the main text using FTE as our measure of size (we don't include student outcomes here since they are analyzed as a function of cohort size instead of total enrollment). We see that coefficients are very similar to and share overlapping confidence intervals with those in Table 2, exceptions noted below.  There are a few instances where the exponents differ when using total enrollment or FTE differ such that their confidence intervals do not overlap, or they do not exhibit the same overall scaling behavior.
1) Tuition scaling for state colleges changes from 1.04 ± 0.06 versus total enrollment to 1.11 ± 0.05 versus FTE, for community colleges changes from 0.89 ± 0.04 to 0.95 ± 0.04, putting some nuance on our conclusion that at scale, the affordability of these schools remains constant or increases. Indeed, the sublinear scaling shown in Table 2 may be a result of the cheaper tuition for part-time students and their increasing population in state/community college. We note that the sublinear scaling of public funding is extremely robust. Hence, the economies of scale of community and state colleges are in large part associated with (or a consequence of) this reduction in public funding. It is a little less clear whether they lead to a reduction in the costs to students. 2) Using FTE shifts two scaling exponents enough to change their overall scaling behavior, because of their proximity to the somewhat arbitrary boundaries that we use to define linear scaling, from 0.95 to 1.05. However, each of these changes is slight and occurs well within the original exponent's confidence intervals. The changes do not alter our comparisons of scaling relationships within or between sectors, in the main text. a) Teaching expenditures at state colleges changes from 0.9 ± 0.04 to 0.95 ± 0.04 b) Faculty pay at state colleges changes from 0.91 ± 0.04 to 0.95 ± 0.03 c) Maintenance expenditures for community colleges changes from 0.88 ± 0.02 to 0.93 ± 0.02.

Appendix D: University sector definitions and outlier analysis
Our main findings rely upon grouping universities into distinct sectors for comparison, so the exact definition of sectors crucially determines the results and their interpretation. In this Appendix, we will discuss (1) the detailed definitions of seven sectors we used in the main text, (2) rationale for merging for profit 2yr and 2yr-universities, and (3) different cases of outlier institutions, how they impact the scaling of their sector, and why we exclude or include them.

Seven sector definitions
Our sector classification in Table D-1 encompasses the main institutional archetypes typically discussed in US higher education (Bok 2013). As mentioned in the main text, the seven sectors we identify are drawn conventionally along the dimensions of control, level, and research activity. Control categories are public, private non-profit, and private for-profit, each representing a different "business model." Level categories are 4-year and higher (4yr+), 2-butless-than 4-year (2yr), and less than 2-year (2yr-), based on the typical time to completion of the highest-offered degree or certification at the institution. Of the resulting nine sectors, we excluded the three with fewer than 100,000 students each, comprising together barely half a percent of total enrollment in the delta dataset. We combined private for-profit 2yr and 2yrsectors because they share a professional focus, and exhibit almost exactly the same scaling behavior, as shown below in Table D-2. As for research, we observe that its presence as a primary institutional objective, in addition to education, entails fundamentally different financial flows, distinct scaling in all key variables, and therefore different institutional type. At the greater than four-year level we separated research and non-research universities using the 2010 Carnegie Classification for both public and private non-profit institutions. We grouped all three tiers of research activity into research sectors, and the remaining four-year masters and baccalaureate Carnegie Sectors into non-research. We excluded all Puerto Rican and U.S. Territory schools, and medical schools classified as non-research four-year institutions. It is useful to understand how the size ranges of sectors compare, prior to conducting scaling analysis. The box-and-whisker plot below, Figure D-1, shows the distribution of enrollment in different sectors. Note how the middle two quartiles of the sectors' size ranges hardly overlap, though their full ranges consistently do. Merging for-profit 2yr and 2yr-universities to create professional school sector We noticed several important similarities between private for-profit 2yr universities and private for-profit 2yr less universities. The two sectors share a vocational focus, comprised mostly of programs in cosmetology, nursing, and technical areas. Likely by consequence, they exhibit very similar scaling behavior. The confidence intervals of their respective scaling exponents for key  Table D-2. Furthermore, the scaling intercepts are only ever slightly different between the two sectors and exist in similar size ranges (though the less-than-2yr schools are generally smaller). For these reasons, it makes sense to consider whether they constitute a single institutional sector. We tested this directly by regressing on the entire sample of 2142 schools and comparing the scaling results of the individual sectors to these new results, in column three of Table D-2.  Table D-2 shows the exponents of key variables in three sectors: private for-profit 2yr and private for-profit 2yr less, and their combined sector (professional schools). Each table entry is a fitline equation in log-log space and the exponent value is bolded. FSA completions data are not used; see Appendix G for our explanation.
We see that total expenditures in the combined sector scale with an exponent exceeding 1.05, which qualifies it as superlinear, despite both constituent sectors exhibiting linear scaling (0.95 ≤ b ≤ 1.05) for this variable. However, all confidence intervals overlap, and in the main text we consider total expenditures of professional schools to scale linearly or very slightly superlinearly, with respect for the arbitrariness of the 1.05 boundary. Thus we find the combined sector overall displays the same scaling as the 2yr and 2yr-sectors, so we use it throughout our research under the name professional schools.

Introduction to outlier analysis
In our approach, as we group universities into sectors and run regressions on them, scaling exponents are not only our results of interest, but help us differentiate between university sectors. However, our groupings are legitimate insofar as the residuals of the data are uniformly distributed across size. In other words, curvature or outliers that are heavily concentrated at the head or the tail of the scaling fitline can disproportionately influence the exponent value, compared to those located in the middle of the domain. Besides challenging the robustness of our analysis, outliers themselves also reveal the underlying constraints of the system and potential opportunities to break them. Therefore, we find it crucial to identify outliers and investigate what they do differently to leverage synergies or efficiencies at scale, and thereby strengthen our understanding of scaling mechanisms and university strategies.
By filtering out data that have high residuals, we identified four main groups of outliers, each explained as the special case sections below. We carefully scrutinized these groups in two ways. On one hand, we compared the scaling of these subsets of data to all other universities in the sector, using total expenditure versus total enrollment as our test case. On the other, we confirmed that we have theoretically sound reasons for excluding or including them. We summarize our treatment of outlier institutions here: • Universities with medical schools or hospitals are excluded from non-research 4yr+ sectors (state colleges and non-profit private colleges) • Universities with medical schools and hospitals are included, in both public and private non-profit domains • Rockefeller University is excluded from the private research university sector • Online-only schools and schools based in U.S. Territories are excluded from all sectors We recognized that there exist other institutional types that tend to have high residuals, such as some stand-alone law schools, art and theater institutes, business schools and grouped schools that combine multi-sector colleges into one. However, we decided to not analyze them in-depth with the belief that their population is too small to significantly skew our outcomes.
Outlier analysis helped with finer tuning to isolate sectors of comparable institutions, explained fully below. Though revealed through the analysis, we feel well justified in removing these groups on purely theoretical grounds. Nonetheless, a more holistic approach might account for a spectrum of institutional types: for example, private research universities range from having no medical school or maintaining a distant affiliation to having one completely integrated into the greater university. To determine natural classes of institutions, future analyses may wish to perform a priori algorithmic classification schemes.

Special case: research universities
After other exclusions in this appendix, a total of 268 Delta schools are in the three tiers of "doctoral universities" set forth by the 2010 Carnegie Classification (Carnegie Foundation for the Advancement of Teaching, 2011). These three tiers of research activity are Tier 1 -Research Universities (very high research activity), Tier 2 -Research Universities (high research activity), and Tier 3 -Doctoral/Research Universities. This combined class of universities includes only 4yr+ schools that awarded twenty or more research/scholarship doctorates between Fall 2008 and Spring 2009, according to the IPEDS Completions Survey. We term these schools "Research Universities." By control, 160 are public, 102 are private non-profit, and 6 are private for-profit. Figure D-2 and Figure D-3 respectively, public and private research universities (circled in green) exhibit markedly different scaling from their non-research counterparts. The clusters of research universities cover most of the upper-end curvature above the fitline, and exhibit superlinear scaling very different from that of non-research universities. The same patterns are not observed for the six for-profit research universities, which fall neatly around the scaling relationships for all for-profit 4yr+ schools, and do not change the scaling when removed.

Shown in
The research cluster also overlaps with medical schools at larger size, especially those typically associated with prestige and extensive facilities. It is clear that both research and medical schools can explain the outlier curvature at the tip of the graph, and separating them can improve the robustness of scaling model on the rest of the data-but why do we choose research as the breaking point for sector definition rather than medical schools? We argue that medical schools within their own set can have very different internal structures depending on whether they are stand-alone institutions or are hosted within big research universities. Meanwhile, we reason that research universities with and without medical schools are more likely to share institutional structure, whatever that may consist of. Furthermore, research is a widely accepted sector divider when discussing American universities, informally and in the higher education literature (Bok 2013, Crow andDabars 2015). This allows us to approach prevailing hypotheses about scale in higher education. Therefore, we are confident in dividing the public 4yr+ and private non-profit 4yr+ universities each into research and non-research sectors. We decided to include research universities that have a medical school or hospital, because their medical facilities cannot clearly be separated from their institution-wide research activities. Special case: universities that grant medical degrees or have hospitals There are 162 schools that grant medical degrees and 84 schools that have hospitals in our Delta dataset. Most of these universities are in public and private non-profit 4yr+ realm, and may or may not have a research component. Moving forward, we simply refer to them as medical schools and hospitals.
From Figure D-4 and Figure D-5 we can see that medical schools (in red or yellow color) as a subset of data have a much higher normalization constant, which means that their total expenditure and revenue is also higher on average. In particular, universities that have hospitals or both tend to have even larger residuals than those who only have medical degrees, implying that it is the nature of medical field practice to expand financial flow. Furthermore, many medical schools/hospitals enroll only graduate students, spend disproportionately large amounts per student, and often function primarily as free-standing medical institutions rather than as institutes of postsecondary education. For all the reasons given above, we are confident in excluding these institutions from non-research 4yr+ private colleges and state colleges. Removing medical schools does not impact the scaling exponent of total expenditure for nonprofit private colleges, but increases state colleges' scaling exponent from 0.69 to 0.84 with a narrower confidence interval for the exponent. We only apply these exclusions to sectors of non-research colleges. As mentioned, we still include research universities that report having a medical school or hospital.

Total Enrollment
Total Expenditure  Rockefeller awarded its first Ph.D. degree only in 1954 and currently operates a very small graduate program. It has a faculty of 82 largely independent researchers who supervise a staff of more than 1500 researchers, postdocs, clinicians and technicians and only 175 graduate students (Ph.D. and M.D./Ph.D.). Perhaps unsurprisingly, the institute's mission statement, unchanged "since 1913," makes no mention of education (https://www.rockefellerfoundation.org/about-us/).

California Institute of Technology
As such, based on its unique history and organizational structure we can safely exclude Rockefeller from a comparative analysis, as it is truly sui generis. In addition, we checked whether the undergraduate share of enrollment at universities is correlated with the variance (residuals) because Rockefeller has 100% grad students. One could argue that Rockefeller is less like a research university and closer to a stand-alone medical school which would also be composed only of graduate students and focused on biomedical research and practice. Figure D-7 and Figure D-8 confirm that Rockefeller and some stand-alone medical schools all have similarly high expenditure residuals and negligible undergrad enrollment. For all public 4yr+ universities, outlier schools also have high graduate student enrollment, but for private 4yr+ universities, the correlation between percentage of undergraduate enrollment and residuals is weak and noisymany theological seminaries and other special-focus institutions are also graduate-only, but unlike medical schools, perfectly conform to the rest of the sector's scaling. Therefore, we conclude that undergraduate share of enrollment, or presence of undergraduates, is insufficient evidence to exclude Rockefeller and/or other graduate-only institutions.
California Institute of Technology (CalTech) is the second-most significant outlier in the private research university sector. We know that its enormous amounts of total revenue and expenditure stem from the Jet Propulsion Laboratory (JPL), a federally funded research center that it houses. Much like medical schools, such university research and development centers exist on a spectrum, from practically independent operations to deeply embedded fixtures in the greater university. We have no comprehensive, specific method to determine which labs are separable from their institution-wide research activities. Consequently, we keep CalTech and all such "parent" institutions in the dataset (see Appendix E).

Special case: online schools
Online programs are popular targets for universities in recent years accompanied by the rise of online education technology. Intuitively, online programs can exhibit large economies of scale because once the initial cost of establishment is stabilized, universities can potentially attract more out-of-state and non-US students and invest less physical facilities per student without affecting the quality of education with scale. We used the variable "DISTANCEONLY" in College Scorecard to distinguish schools that are entirely online. Out of the 32 online schools, 22 are for-profit 4yr+ colleges, while another seven are non-profit private colleges.
For both sectors, online schools scale sublinearly and in particular, scale even more sublinear than all the data in Non-profit private college sector. Because we set a threshold of fifteen points to run a regression, we only show the graph for the for-profit 4yr+ sector. Including online programs does not change the scaling exponents because these schools compose a small percentage of all schools considered in the regression. However, we still decided to set aside online schools as they display a distinct scaling relationship and function very differently than universities with in-person campuses, and because their coverage in our dataset is too sparse to  Special case: U.S. territories universities Similar to online programs, we also excluded universities located in U.S. territories from all sectors, including those at Puerto Rico and the Pacific Islands. We assume their economies and higher education system are qualitatively different, with much lower costs of living and labor than the in the fifty states. 94 out of 132 these schools are private non-profit colleges or professional schools, which exhibits sublinear scaling in their own sectors ( Figure

Total Enrollment
Total Expenditure

Special case: liberal arts colleges
It should be noted that in our analysis the non-profit private colleges sector contains a fairly large amount of institutional diversity. This sector is defined as all private non-profit colleges in nonresearch classes of four-year universities, per the 2010 Carnegie Classification. These classes include Master's Colleges and Universities, Baccalaureate Colleges, Baccalaureate/Associate's Colleges, the entire swath of Special Focus Institutions (noting that we exclude medical schools), and the few universities outside the classification system. A notable example includes schools that award doctoral degrees such as the Rocky Mountain University of Health Professions. However, an important subcategory within this sector, and one which represents a well-known institutional archetype of US education, are the "liberal arts" colleges. It is possible to restrict our analysis within this broader sector to only Baccalaureate Colleges within the Carnegie Classification. Doing this we find different scaling relationships than the overall sector (Table  D-4) where, most notably, the baccalaureate colleges have superlinear scaling for expenditure on instruction and faculty pay.

Appendix E: Delta Cost Project grouping of campuses and institutions
The Delta Cost Project groups schools in such a way that complicates both sector definitions and the institutional unit of analysis (Hurlburt, Peek andSun 2017, Jaquette andParra 2016). In their paper "The Problem with the Delta Cost Project," Jaquette and Parra advise against using Delta data for cross-sector comparisons. Here, we explain how Delta data groups universities and how it raises three main considerations for our scaling analysis: (1) possible statistical issues with all our scaling relationships due to the mismeasurement of university size; (2) matching data with the College Scorecard dataset for completions; (3) grouping universities across sectors. In total, we demonstrate that it has no major effect on our results.

Delta grouping of universities explained: multiple-campus institutions and multiple-institution systems
It is very common for universities to exist in multiple locations, or to maintain organizational ties with other universities. Consequently, "NCES allows certain institutions ('parent institutions') to report data for branch campuses or other affiliated institutions ('child institutions') for various IPEDS surveys," as stated on page 2 of the Delta data file documentation (Hurlburt, Peek and Sun 2017). This complicates IPEDS data, for two reasons: (1) institutions may choose to report certain finance data at the parent level, while reporting other data separately by child institution, and (2) institutional parent-child relationships change when, for example, campuses are added, or schools merge. The Delta Cost Project maintains consistency of the data by grouping together universities that have reported any IPEDS data together at any time in the program's history, from 1987 to 2015.
The result is that Delta data collapses the data of many types of related institutions into a single entry with all the institutional characteristics (IPEDS UNITID, name, sector, SAT/ACT scores) of a particular school's location. We call such entries "Delta grouped" universities. They are denoted in the dataset by boolean variable 5 "isgrouped?" where 1 means grouped and 0 means not grouped or "ungrouped." See Delta Cost Project's Parent Child Master List for a crosswalk between 568 of the 616 Delta grouped institutions, and their respective 1,431 child institutions: https://deltacostproject.org/sites/default/files/ database/fy2015_parent-child-master-list.xls.
Table E-1 shows the extent of Delta grouping, measured by the percentage of schools that Delta groups and their share of total enrollment in each sector. Note that throughout the entire system, 10% of schools are grouped, but enroll 27% of students because they are all disproportionately large, and because many are large public research universities. collapses data from all 22 campuses. They are spread throughout the state and range from localized associates-and bachelors-granting campuses to residential four-year campuses to a graduate professional school. Another example is the grouping of medical schools and hospitals with their associated research universities (see Appendix D).
In the second case of university grouping, Delta groups together some 247 Title-IV institutions accounting for 1,291,201 undergraduate students-into systems that are listed under one "linchpin" institution selected somewhat arbitrarily (Delta Cost Project 2011, Jaquette and Parra 2016). The most startling example is the University of Texas-Austin. It appears as the single largest public research university in Delta data, with total enrollment of about 219,000 students, because it combines values from all thirteen other Title-IV institutions in the University of Texas (UT) system such as UT Arlington, UT Brownsville and UT El Paso. (We note that not all university systems are grouped; for instance, each university in the University of California system appears separately in Delta data). Jaquette and Parra (2016) emphasize that such grouping of multiple-institution systems leads Delta to lack entries for many Title-IV institutions, particularly for public universities (missing 8%) as opposed to private non-profits (only 0.5%).
In the first case, of multiple-campus institutions, we do not have a clean-cut way of distinguishing how functionally separate the campuses are, and in the second case, we can be sure the grouped institutions can be considered functionally separate schools. Thus, to be conservative we must assume that any instance of a grouping in delta data erroneously combines functionally separate institutions.

Statistical problem with grouping, and its overall bearing on our scaling results
The grouping of arguably distinct data points artificially reduces the combined weight of the affected schools on the regression and creates an aggregate point with a new residual, located systematically up and to the right of where its constituent points would have been. This could bias the regression fit, the fitline slope, and the scaling result, all depending on the distribution of points within each grouping-along both vertical and horizontal dimensions.
To test the severity of this problem, we compare the scaling of all main variables with and without these delta-grouped schools. We check total revenues, total expenditures, tuitions, faculty, and faculty pay each versus enrollment, in every sector. We find that the removal of grouped schools has almost no effect on the scaling relationships. Not one of the exponents changes enough for its new confidence interval not to overlap with the old one.  The only sets of exponents that raise concern about Delta grouping are the following: 1) Removing Delta grouped schools shifts three scaling exponents enough to change their overall scaling behavior, because of their proximity to the somewhat arbitrary boundaries that we use to define linear scaling, from 0.95 to 1.05. However, each of these changes is slight and occurs well within the original exponent's confidence intervals. The changes do not alter our comparisons of scaling relationships within or between sectors, in the main text. a) Maintenance expenditure for public research universities changes from 1.07 ± 0.18 to 1.04 ± 0.14 b) Grants and auxiliary expenditures for non-profit private colleges changes from 1.05 ± 0.07 to 1.07 ± 0.08 c) Auxiliary revenues for for-profit colleges changes from 0.93 ± 0.14 to 0.97 ± 0.15 2) Tuition scaling for public research universities changes from 1.25 ± 0.13 to 1.34 ± 0.16 when grouped universities are excluded (see Figure E-1). However, this significant change to the exponent does not bring into question the superlinearity of the scaling relationship, nor does it affect how tuition scaling compares to the scaling of other salient university characteristics. Total Enrollment

UT-Austin
3) Total Faculty scaling for professional schools changes from 0.76 ± 0.02 to 0.72 ± 0.02 when grouped universities are excluded (see Figure E-2). Though the exponents' confidence intervals do not overlap, both are clearly sublinear, and the change does not affect how faculty scaling compares to the scaling of other salient university characteristics. In the last two instances, as shown in Figure E-1 and Figure E-2, the scaling of grouped schools is closer to linear than the scaling of the entire sector (which includes both grouped and ungrouped schools). This effect can be explained mathematically and is to be expected. If we let " 7 and # 7 be the minimum values for size and an associated feature of a university respectively, then for a school at the maximum size (" & , # & ) the largest possible error is given by grouping ' = " & " 7 ⁄ schools into one campus. In this case, for the fitted scaling relationship # = # 8 " 9 , the relative error at the maximum size is given by − 1 which will overestimate the general scaling if ! < 1 and underestimate the scaling if ! > 1. For example, if # & = 30# 7 and ! = 0.80, then the relative error will be roughly 100%, and if # & = 1000# 7 and ! = 0.80, then the relative error will be roughly 300%. We observe that the grouped campuses fall above the scaling relationship when ! < 1 in Figure E-2, and fall below the scaling relationships for ! > 1 in Figure E-

Total Enrollment
Total Faculty these grouped schools typically raises superlinear exponents and lowers sublinear exponents, in line with the reasoning given here. However, it impacts each scaling relationship differently, and as we have emphasized, only very slightly.
This resilience of scaling relationships to removal of grouped institutions is an interesting result in its own right, and merits further investigation. The simplest explanation is that grouped schools are so few, and collapse together so few campuses, that they hardly impact the regression. We also think it is possible that for certain variables and sectors, grouped schools may actually follow the nonlinear scaling of the ungrouped schools, counter to their tendency towards linearity explained above. For example, grouped public research universities may disproportionately include medical schools, which may keep up their scaling exponent at the level of ungrouped research universities. Perhaps some other source of variance within the grouped universities keeps them within the overall variance. The implications run deep. We would expect grouped schools to tend toward linear scaling, if indeed the observed economies of scale depend upon greater populations of students, faculty and staff interacting in one place-as we hypothesize throughout the main paper, and analogous to the elegant interaction model of urban scaling, advanced by Luis Bettencourt (2013). The fact that we do not observe this strongly and uniformly across each sector, barring other explanations, may call into question our very notion of an institution. With more clearly disaggregated data, perhaps directly sourced from IPEDS, and better-defined levels of aggregation (separateness of campuses, student bodies, administrations, finances, government oversight, etc.), we could study how distributed institutional structure affects their scaling in general. We project that number and size of affiliate campuses could affect bureaucratic scaling differently from that of core functions, e.g. instruction and research, and that these very institutional economies of scale make it advantageous for universities to operate multiple campuses. Such research could shed light on the tradeoffs to institutional interconnectedness, decentralization, and complexity.
We conclude that grouping of schools does not affect our scaling results in general, and we continue to include Delta grouped universities in all sectors.

Delta grouping and matching with FSA Scorecard data
Delta grouping of universities raises a matching problem when using Delta data in combination with any other data on U.S. universities. The desired dataset may contain individual campuses or institutions that Delta instead groups together, so when simply merging this dataset to Delta data, only its data on the parent institution is appended to each Delta grouped university, and all its data on corresponding child institutions is omitted.
This institutional matching issue applies in every instance that we merge a dataset to Delta: Scorecard for FSA completions data, Chetty Mobility Report Cards for earnings data, Brookings for test score data, and HERD for research expenditure data. In each instance, we test whether it affects our results by removing all Delta-grouped institutions from the combined dataset, as described in the previous section, thereby narrowing the sample of universities in every sector to Delta "ungrouped" schools. These are single-campus schools that have never reported data to IPEDS together with another institution.
It is particularly important to examine how matching between Delta and Scorecard (explained in Appendix A) impacts our scaling analyses of FSA completions, because in our main scaling results by sector, we feature FSA completions alongside various revenue, expenditure, and faculty variables-all of which come from Delta data, as does total enrollment.
We compared results with and without Delta grouped schools for FSA completions versus FSA cohort in Table E-3, and FSA cohort versus total enrollment in Table E-4.  Table E-3 shows the FSA completion vs. FSA cohort for grouped schools, ungrouped schools and all schools by sector. Neither variable comes from Delta. We observe that FSA completion scaling is all but identical between Delta ungrouped schools and all Delta schools. These results appeared in Table E-2 but are worth reiterating in this section. Delta grouped schools themselves scale somewhat differently from FSA schools, but scaling exponent confidence intervals still overlap. Disregard results for professional schools and for-profit colleges, which we exclude (see Appendix G).  combined, but has the FSA completions and FSA cohort of only the parent institution. In the completions versus cohort relationships, each Delta grouped system is included as the parent institution. These points are valid but omit child institution data. However, in the cohort versus completions relationships, each Delta grouped system has an erroneously small FSA cohort, relative to its total enrollment, and would appear on a plot below and to the right of the fitline. This explains why FSA cohort vs total enrollment scaling exponents increase when grouped schools are removed, and why these results are much more sensitive to the removal of grouped schools than completion vs cohort scaling.

Grouping of universities from multiple sectors
The Delta Cost Project is prone to grouping Title-IV institutions from different sectors. Such entries take the institutional characteristics of the linchpin (parent) institution, including its name and sector. Take  We were not able to determine precisely how many of the 247 grouped Title-IV systems this affected, but we estimate the number is low from scanning Parent Child Master List (https://deltacostproject.org/sites/default/files/ database/fy2015_parent-child-master-list.xls).
Delta acknowledges that its method of assigning the linchpin institution, and thereby the sector of the grouped university, is sometimes unclear (Delta Cost Project 2011, Jaquette and Parra 2016). We observe from their Parent Child Master List that Delta generally designates as the parent of a system one of its institution with the highest educational level. In other words, Delta incorporates two-year colleges into four-year colleges, and incorporates both two-and four-year colleges into research universities. We are confident that grouping these institutions in this way is no different from the much more commonplace Delta grouping of multiple campus institutions, where complicated combinations of institutional types effectively round up schools/campuses that would be in different sectors. In other words, we suppose that grouping the University of Maine System is akin to grouping schools such as the University of New Mexico and Kent State University in Ohio, which we justified earlier in this appendix. Therefore, we feel confident that grouping schools into a sector with a higher educational level does not significantly affect our results.
Grouping across sectors is problematic when it places schools in a different educational level than they belong. The CUNY system is one such example. CUNY groups not only community colleges but also graduate institutions under a state college (non-research public 4yr+) parent. On closer examination we find that CUNY reported awarding 463 doctoral degrees, far more than the twenty needed to qualify it as a research university according to the Carnegie Classification (Carnegie Foundation for the Advancement of Teaching, 2011). Additionally, it reports spending $132,065,820 on research (roughly 3% of total expenditure) in 2013.

Appendix F: Revenue and expenditure component analyses
In this section we explain (1) the method by which we constructed the bar charts shown in Figure  3 of the main text and (2) how to interpret them with potential caveats in mind.

Constructing and reading the component bar charts
In the bar charts (see Figure F-1 as an example), each colored band corresponds to the per student amount of a given revenue or expenditure component, at a particular size bin along on the horizontal axis of the logarithm of total enrollment. Thus, as enrollment grows from left to right, a widening band indicates increase per person or superlinear scaling of that variable, while a narrowing band would indicate diminishing returns and sublinear scaling. For instance, Figure  F-1 shows that student tuition revenue scales superlinearly at public research universities, from about $5,000 on average per student at a typical university of 10,000 students, up to about $9,000 at enrollments around 100,000.
The width of each band, representing the value of a revenue or expenditure variable at that size, is calculated from the fitted scaling equation for that particular component using the bin size. Bin sizes are evenly spread out with an arbitrary step size of 10 8.= within the actual range of enrollment. Only bins that have more than five schools and components that have more than 15 data points are shown in the final visualization. In revenue component bar chart, from bottom to top, revenue bands include student tuition, government student grant aid, and government appropriation, all unrestricted but ostensibly dedicated toward covering educational and institutional costs; government grants and contracts dedicated toward research; unrestricted private funding, including donations, endowment and investment returns; and auxiliary revenues, restricted to auxiliary expenditures. Table B-1 of Appendix B defines each variable in detail.
In the expenditure component bar chart, up to six expenditure component variables are shown if data is available. The bottom three bands-instruction, research, and public service, in order of bottom to top-represent direct expenditures on three productive institutional functions, thereby outlining the scope of activities in each. The maintenance component above generally (though not exclusively) capture the various operational or administrative costs. The top two bands, student service and grants and auxiliary, represent costs that are less relevant to university functions. Noting that we group grants and auxiliary into one variable in expenditure component bar chart as grants contain a very small portion of the total expenditure and induce unnecessary granularity in visualization.

Qualifications to the bar chart visualization
The overall height of the bars, as the summation of component regression fitlines, corresponds roughly to the overall scaling behavior of total expenditure or revenue-increasing height indicates superlinearity whereas decreasing height indicates the opposite. However, the exponent   of the summation does not perfectly match up with the scaling of total variable. While the aggregated fitline describes the data with higher resolution, it may overweigh components which have the fewest data points. For example, most professional schools do not have grants and auxiliary expenditure, but the aggregated fitline treats all professional schools as having such spending, in amounts predicted by the few schools that do. Consequently, the expenditure bar chart in Figure F-2 slopes downward, indicating a diminishing return, and the corresponding summation of component fitlines, ! -+& in Figure F-3, scales less than linearly in the same way. This is misleading: ! ! , the scaling exponent of total expenditure versus total enrollment also in Figure F-3, has an exponent of 1.06 ± 0.02 (For-profit 2yr and 2yr-sectors both have linear scaling for total expenditure. When the sectors were combined into the sector professional schools, the exponent became barely superlinear. See Appendix D). This instance is the only mismatch we found between exponent values derived from summation and data. Conversely, all other bar charts, like the revenue component bar chart for professional schools (below in Figure  F-2), give an accurate picture of overall scaling, where non-monotonic slope suggests linearity; the component line plot in Figure F-4 shows there is no substantive difference between ! -+& and ! # for professional schools. Nonetheless, throughout the rest of the project, we only use actual scaling results derived directly from regression on total revenue and expenditure data to make statements about overall financial throughput in a given sector.    Figure F-4 shows the scaling of components of revenue that have more than 15 data points for professional schools. In this case, ! -+& , which is calculated by adding up all component fitlines, scales extremely close to ! # , the scaling exponent of total revenue except at very small sizes. (Minor note: There are four components shown in the line graph, but only three components seem to appear in the revenue bar chart in Figure F-2. This is because endowment revenue in this sector is extremely small, about $23 per capita, so it becomes almost invisible in the bar chart though it is indeed there.)

Caveats regarding the data
First, we must point out the main disadvantage of condensing many scaling results conveniently into a single figure, as we do with each bar chart: to ensure readability, we omit precise scaling exponent values, confidence intervals, number of schools per size bin and all other descriptive statistics in order to ensure readability. The exponent values and confidence intervals for all these variables are provided in Appendix J.
Second, while IPEDS expense category definitions are precise, university reporting may not be reliable. For example, classifying a particular staff member under maintenance or student service is a judgment call that undoubtedly varies within and between universities. Such fuzzy reporting blurs the lines between purpose areas, and limits our assessment of tradeoffs. For example, the IPEDS expenditure variable of academic services has an educational element, but we combine it with operation maintenance and institutional support expenditures into one variable, maintenance expenditures, as our best measure of bureaucracy. Importantly, there is also a grey area between

Total Enrollment
Total Revenue Breakdown instruction and research expenditures. It is unclear under which category research universities report expenses that have both educational and research components, such as wages to graduate students. The magnitude and distribution of these potential reporting errors are unknown, but we presume that it does not affect the key analyses.
Third, bar charts for some sectors lack certain revenue and expenditure component variables. In most cases these absences are expected-non-research universities do not spend money on research or public service, nor do private universities typically receive government appropriations or grants and contracts. The only exception is the maintenance expenditures from professional schools which have 17 data points that barely cross our threshold for regression.
Finally, while IPEDS has a well-defined category for R&D expenditure, it does not have a very satisfactory category for R&D revenue (i.e. funds raised by the university specifically for and as a result of its research activities). In the paper, we used a proxy, namely the amount of government grants (federal, state and local) that are for specific programs, research and projects, and excluding Pell grants. This is of course an imperfect measure. To gauge its reliability we additional examine here data from the Higher-Education Research and Development Survey (HERD) carried out annually by the NSF (see Appendix A). Unfortunately, the alignment between this dataset and our other datasets is low because the NSF requests administrators to fill in the survey for single campuses, which means that the R&D revenue for larger universities present in Delta data, many of which include multiple campuses (see Appendix E), will be underestimated and will cause noise at the higher end of the scale. Additionally, given that this data is reported to the NSF, there may be biases in the reporting.
We find that R&D expenditure from HERD and R&D expenditure from Delta Cost are in somewhat good agreement, except at higher scales where the HERD data shows systematically lower expenditure (either because of the difference in the unit of analysis between the two datasets or because of inconsistent reporting to different institutions). R&D revenue from HERD are imperfectly correlated with the Government project grants and contracts data we use in our main analysis. Re-estimating the scaling relationship for R&D revenue (a measure of R&D output) using the HERD data, we find that it is extremely uncertain, showing scaling coefficients that are of the order of those shown in the paper but with much greater uncertainty (See Table  F-1). We believe this is due to the difference in the unit of analysis between HERD and Delta data. It also points to the fact that better data is needed on research output to deepen our understanding of how institutional size affects the research output of a university.

Appendix G: Completion rate analyses
To examine how postsecondary educational programs perform with scale, our analysis includes data on the scaling of total completions. We use the two best publicly available and most widely reported measures of completions in U.S. higher education to measure educational output in this most basic sense: completion rates of Federal Student Aid-receiving student cohorts (referred to as FSA completions) and completion rates for first-time full-time students (referred to as FTFT completions). In this Appendix, we explain (1) how we compose the two total completion metrics from raw data; (2) why we use FSA and FTFT cohorts as the size variables for our completion scaling analyses, how these cohorts relate to total enrollment, and why we favor FSA results over FTFT results; and (3) special cleaning we employed to the data to avoid the impacts of a campus grouping problem with FSA data (separate from the Delta grouping problem explained in Appendix E).

Total Completions Metrics for FSA and FTFT cohorts
National Student Loan Data System (NSLDS) tracks every Federal Student Aid (FSA) receiving student through their entire postsecondary career. The data used here capture the 2013 educational outcomes of federal student loan receiving students whose first aid year was 2007 (grant receiving students were not tracked until 2012). We use FSA Cohort (6yr), the population of that cohort, instead of total enrollment as the relevant measure of size for the scaling of this outcome variable. Each university reports a complete spectrum of all possible student outcomes for its FSA cohort. These outcomes include the percentage of FSA students who graduated within six years from the original institution, from a different four-year institution, and from a different two-year institution. We group these percentages together to obtain the six-year completion rate at each institution of original enrollment. See pages 21-23 of College Scorecard Full Data Documentation (https://collegescorecard.ed.gov/assets/FullDataDocumentation.pdf) for full variable descriptions and Appendix B for more details. Note that because of privacy considerations, College Scorecard suppresses any datum with under 30 students, for any student outcome at any school. Some small schools are thus excluded from our completion rate analysis.
Total completion data for full-time first-time (FTFT) students is found in Delta data, which sources it from the IPEDS Graduation Rate Survey Unlike FSA completion rate data, FTFT completion rate only includes students who graduate from the same institution. In the community college sector, transferring to a 4yr school is a common and favorable outcome, so we find FTFT completions less informative than FSA in this sector. In the professional school sector, none of the 1393 2yr-schools report graduation rate. The scaling result we use describes the 730 of 798 2yr colleges that do report 3yr FTFT graduation rate.
For the FTFT completion scaling results, this documented graduation rate for FTFT students is regressed versus the corresponding FTFT student cohort.

FSA and FTFT cohorts versus total enrollment
Total enrollment, including full-time and part-time students or undergraduates and graduates, is the key size variable that all aforementioned institution-wide financial flows we analyzed were mapped on. However, we are limited by such comparison due to the lack of matched completion data for total enrollment. In Table 2 of the main text and throughout, we mix size variables when assessing each sectors' functional tradeoffs: we use completion cohort to assess educational output scaling (via completions), but total enrollment to assess scaling of affordability (via tuition) and all other variables / purposes. This crisscrossing of size variables limits to some extent our ability to accurately assess tradeoffs. We assume that the incongruence is small enough for our results to still be meaningful, and here examine the validity of this assumption.
What are the proportions of total enrollment that FSA and FTFT students constitute? Table G-2 indicates that FSA cohorts on average represent between 45% to 92% of one-year total enrollment across all sectors, while this share is between 24% to 26% for FTFT cohorts. In addition, more than 80% of schools have both FSA and FTFT cohort data in most sectors, supporting the usage of these two metrics to construct narratives of completion against revenue and expenditure.
Table G-2 also indicates that FSA and FTFT cohorts' proportions of total enrollment change with scale. Typically, the proportion of FSA students decreases as total enrollment increases, indicating that larger schools may have greater shares of wealthier students. Hence FSA cohorts become less representative and FSA completions lose some of their predictive power with scale. In contrast, FTFT cohorts grow more than proportionally versus total enrollment for research schools, telling us that FTFT completions may actually have greater confidence for larger research schools, as a measure of overall graduation rate and educational throughput. Consequently, understanding that time commitment and economic background of students are embedded differently in the stratified education sectors according to their missions, we can choose one metric over another depending on the size of schools and their unique student composition. For example, in the case of community college, we trust FSA rather than FTFT as this sector attracts many part-time and less wealthy students.
Overall, we give more weight to the results derived from FSA completions than FTFT completions because aid-receiving cohorts represent a larger proportion of the student body, may be more representative and the FSA completions includes the possibility that the student graduates at a transfer institution.   Figure G-2 shows the cleaned data by itself, for the for-profit colleges. While the streaks are gone, and it looks significantly better, the majority of points had to be removed, and there is still a problem. The FSA cohort of a university, or any component of its enrollment, should never exceed total enrollment. We should see a hard cut-off at the one-to-one line, as we do in public and private non-profit sectors. Instead, we observe several points far above the one-to-one line (not shown but easily visualized using the tick marks). We also observe this for Professional Schools. This leads us to believe that our method of removing schools with inappropriately grouped FSA completion data has failed in these sectors, and that their grouped FSA data could not be harmonized with Delta data's grouping. We conclude by discarding FSA completion data entirely for for-profit colleges and professional schools. Meanwhile we remove the few schools in other sectors with inappropriately grouped FSA completion data.  Nonetheless, we consider both for-profit and professional school FSA completion data invalid for the above reasons pertaining to FSA cohorts and total enrollment.

Appendix H: Tuition, selectivity and post-graduation earnings
We turn here to a few concerns and robustness tests relevant for Figure 4. The unit of analysis in the Equality of Opportunity dataset of earnings does not align perfectly with that of Scorecard because of the presence of some OPEID campuses being grouped under a single super-OPEID, as discussed in Appendix A. The alignment is poorer still with the Delta dataset since this dataset uses UNITIDs (with its own idiosyncratic groupings of campuses). Hence, we reproduced all the graphs, omitting any grouped campuses (whether super-OPEID or the groupings in Delta) to check that the qualitative patterns we discuss in the Discussion are robust to these data merging problems, finding that only a few observations are affected by this unit problem.
Another potential issue in Figure 4 is that we are using data from 2013, as in the rest of the paper. However, the earnings data displayed is for individuals born in 1984, who started college around [2002][2003]. The relationship between these two variables is informative of the tradeoff between tuition and mid-career earnings that students in 2013-2014 might have considered by gauging mid-career earnings of young adults in their 30s. Yet, we may also want to know the relationship between earnings and the tuition paid by the students whose earnings we analyze. We thus reproduce the graphs here with tuition data from 2003. We see that out-of-state tuition (i.e. market price) was considerably less at the high-end (for non-profit private colleges and private research universities), while the distribution of the average net tuition has remained more constant. The relative position of sectors is similar to that in Figure 4 of the main text in the curve relating out-of-state tuition to mean earnings. When looking net tuition paid by students, we see that sectors were more starkly differentiated in the financial added value they provided in 2003 (second panel), and very clearly dominated by the public sector schools.    Finally, part of the SAT scores data from Brookings are imputed for schools that are missing them, generally because they are fairly unselective. The additional imputed data makes it easier to discuss all sectors. However, the model for this imputation does include tuition and the Carnegie classification, so we reproduce the Figure in the main text here with raw score data from Scorecard. This covers fewer schools and almost entirely leaves out certain sectors, but shows that the common monotonic relationship between selectivity, tuition and earnings is apparent in the raw data, as well as the observation that non-profit private colleges are much more heterogeneous than other sectors. for example, that the public and private research universities not only have steeper scaling relationships for instruction with total enrollment but also a larger overall magnitude for most enrollments.
Similarly, for considerations of scaling-up higher education we are interested in the scales at which certain sectors outcompete one another. These crossing points are implicit in the absolute magnitude analysis discussed above and in Figure I-1 to Figure I-11, and can be more easily seen by looking at the scaling relationships on a per capita basis. The overall of the per capita scaling relationships for all sectors are given in Figure           We see that for all sizes, community colleges have the lowest completion rates and private research universities the highest. In between, the relative performance of sectors depends on the sector. For example, small private non-profit colleges do better than small state colleges, but as these schools get bigger, their performance becomes similar.

Appendix J: Final table of all scaling results
The functional form of the scaling relationships, overall confidence in the scaling parameters, and number of schools considered in each analysis are all useful features for future efforts and provide details not readily available in all of our plots and analyses such as our consideration of the components of revenue and expenditure. Thus we have provided all of the essential scaling relationships from this study in Table J-1.