The number of undocumented immigrants in the United States: Estimates based on demographic modeling with data from 1990 to 2016

We apply standard demographic principles of inflows and outflows to estimate the number of undocumented immigrants in the United States, using the best available data, including some that have only recently become available. Our analysis covers the years 1990 to 2016. We develop an estimate of the number of undocumented immigrants based on parameter values that tend to underestimate undocumented immigrant inflows and overstate outflows; we also show the probability distribution for the number of undocumented immigrants based on simulating our model over parameter value ranges. Our conservative estimate is 16.7 million for 2016, nearly fifty percent higher than the most prominent current estimate of 11.3 million, which is based on survey data and thus different sources and methods. The mean estimate based on our simulation analysis is 22.1 million, essentially double the current widely accepted estimate. Our model predicts a similar trajectory of growth in the number of undocumented immigrants over the years of our analysis, but at a higher level. While our analysis delivers different results, we note that it is based on many assumptions. The most critical of these concern border apprehension rates and voluntary emigration rates of undocumented immigrants in the U.S. These rates are uncertain, especially in the 1990’s and early 2000’s, which is when—both based on our modeling and the very different survey data approach—the number of undocumented immigrants increases most significantly. Our results, while based on a number of assumptions and uncertainties, could help frame debates about policies whose consequences depend on the number of undocumented immigrants in the United States.

The number of undocumented immigrants at time t is: t > 0 (S1) where I t and O t are the population inflows and outflows at time t, respectively ( Table 1 summarizes  is available for every year in our timespan [3]. 2 Table 2 provides the number of visas issued for each year. Let 1 The visa overstay number only includes arrivals via air and sea.
The assumption that the rate of overstays for all previous years is equal to the 2016 rate is in fact quite conservative. Let τ j be the number of years a newly arriving undocumented immigrant in year j remains in the country.
Then Pr{τ j ≥ k} is the probability that a new arrival in year j is still present k years later. The total number of visa overstayers present at year Approximately 41% of undocumented immigrants based on the current survey data approach are visa overstayers [4], which translates to a visa overstay population of 4.6 million in 2015. For formula (S4) to generate as many overstayers as the 4.6 million in the 11.3 million estimate, we would need to increase the visa overstay rate to 1.1 × r.
II. Illegal Border Crossers. We estimate the number of individuals who successfully cross the border in year t, B t , using the data provided in the recent DHS report [5]. The report uses a repeated trials model [6], combined with data on apprehensions at the border, to estimate the rate of apprehension of individuals attempting to cross the Southern Border for each year from 2005 to the present (see Figure 1 for their results give up.) Let C t be the total number of individuals who wish to cross the border in year t and will make at least one attempt. C t p t individuals will be apprehended on their first attempt, and a fraction (1 − d t ) of these will attempt to cross again. It follows that C t p 2 t (1 − d t ) individuals will attempt to cross a second time and be apprehended on their second attempt. Continuing in this way, a geometric series is generated that provides a formula for the total number of apprehensions that will be made, as well as the total number of repeat apprehensions, that is, apprehensions of individuals who tried to cross and were apprehended at least once earlier in the year. Let A t denote the total number of apprehensions (see Table 3), andĀ t denote the number of repeat apprehensions. DHS [5] provides data for both of these.
Applying the logic of the model: . (S8) It follows using algebra that Now let Q t denote the number of individuals who give up without having crossed successfully: Rearranging (S6): The number of successful border crossers B t is equal to the difference between the initial pool of individuals who wish to cross, C t , and the number who give up, Q t (all others eventually make it across successfully in this model). Thus Finally, We make a few notes about this formula. First, the probability of apprehension is assumed to be constant across attempts. This rate could decrease, if individuals learn how to escape detection over time; and it could increase due to a selection effect with individuals better able to escape detection making it through after just one or a few trials. Second, the DHS estimates of the apprehension rates in [5] are subject to uncertainty. However, their estimates are larger than those elsewhere in the literature [7,8], thus contributing to our overall conservative estimate (underestimate) of the number of border crossers. Third, we compared the above model with models where individuals quit if they fail n times (n > 2). The results show that the number of border crossers in the repeated trial model is indeed lower than the crossers in these alternative models. Thus our model is again conservative in terms of the number of crossers we use in our analysis.
Most experts agree that the apprehension rate was significantly lower in earlier years and has been steadily increasing [7,8]. Another point of data in support of this is the fact that the number of border agents has increased dramatically over the timespan of our analysis [9] (see Table 4). Moreover, the number of hours spent by border agents patrolling the immediate border area increased by more than 300% between 1992-2004, and new infrastructure (e.g., fences) and technologies (e.g., night vision equipment, sensors, and video imaging systems) were introduced during this period [10]. Thus, for our conservative estimate we assume that the apprehension rate in years II. Mortality Rate. We set the mortality rate, δ, equal to 0.7%, the age-adjusted mortality rate reported by the Center for Disease Control and Prevention [18]. Note that this is generally viewed in the literature as an overestimate [13]. To further check that this rate is an overestimate, we combined the age, gender, and country of birth distributions of undocumented immigrants reported in [17,19] with CDC mortality rates [18] (CDC reports death rates by age, race, and Hispanic origin). The resulting mortality rate is less than 0.2%, much lower than the mortality rate we consider. Note that the mortality rate is quite small and does not have a large impact on our estimates.
III and IV. Deportations and Adjustments. The annual number of deportations and adjustments (change from illegal to legal status), which we denote D t , are taken directly from published data [13,20,21]. To overestimate the outflows, we include the deferred action for childhood arrivals (DACA) recipients in the annual adjustments [22]. Table 5 presents the annual number of deportations and adjustments in our timespan.
We use the following procedure to calculate our conservative estimate of the population of undocumented immigrants at each time t. Since the emigration rate depends on the duration of stay, we must keep track of entry times. If t ≤ 10, calculating N t is straightforward -from equation (S1) we get: If t > 10, however, the formula becomes more complicated, as the exit rate of the population with age greater than 10 reduces to (1 − µ l − δ). To incorporate this into equation (S1) let: The number of undocumented immigrants at time t > 10 is then: We address parameter uncertainty by establishing ranges for key parameters.
These key parameters are: (i) the visa overstay rate, r; (ii) border apprehension rates for individuals attempting to cross the border illegally, p = {p 1 , ..., p 27 } (recall t = 27 corresponds to the year 2016); (iii) the voluntary emigration rate, which is set separately for illegal border crossers, µ β s , and visa overstays for the first year, µ o s ; then jointly for both border crossers and visa overstays for years 2-10, µ m ; and jointly for years 10 and above, µ l . We also establish a cohort-specific range for each annual cohort from 1991-2016 for the first-year rate for illegal border crossers, µ β s ≡ {µ b s,1 , ..., µ b s,27 }; (iv) the mortality rate, δ.
For each parameter we establish a uniform distribution over a set range (we will describe the parameter ranges in the next section).
To include the second source of variability, the inherently stochastic nature of the population, we impose a Poisson structure on our model. Specifically, conditional on all parameter values, which we represent by α ≡ {r, p, µ β s , µ o s , µ m , µ l , δ}, we model the overall population as the sum of Poisson variables, each of which counts the number of people who enter at a given time and exit at a future time.
Formally, let Λ j,k denote the number of arrivals at year j who are still present k years later, and Pr{τ i j ≥ k}, i ∈ {o, b} denote the probability that an individual undocumented immigrant in the cohort of type i (overstayer or border crosser) arriving at year j is still present k years later. Then, and the overall population is We assume that the Poisson variables S j (α) and B j (α) are mutually independent conditional on the parameters α for all time periods j, and also that S j (α) (B j (α)) is independent of τ o j (α) (τ b j (α)) for all j, again conditional on the parameters.
The first assumption means that given the parameter values, the number of visa overstayers in any given year does not depend upon the number of border crossers in any other year. This is a reasonable assumption as possible correlations that might arise among these two arrival types are already captured in the parameters.
The second assumption simply means that the duration an arriving individual remains in the country does depend upon the year of arrival, but does not depend upon the number of arrivals. Since the sum of independent Poisson variables is also Poisson, the population size N t conditional on the parameters α is also Poisson distributed, that is: Thus, each simulation run follows two steps: (i) a random draw of the parameter vector, which we denote by α, and a draw for the initial population of undocumented immigrants in 1990, denoted by n 0 ; and (ii) conditional upon α, a draw for the population at year t, n t ( α), for t = 1, 2, ..., 27.

Parameter Ranges
The parameters are uniformly drawn from the following ranges: ii. For illegal border crossers, there are data indicating that first-year rates vary across cohorts [8]. To incorporate this, we assume that a voluntary emigration rate is drawn for each cohort year from a uniform distribution that is specific to that cohort's year of initial entry; the lower bound of this range is set by the numbers in [8] and the upper bound is set at 0.50. 5. To capture circular flows, we impose a negative correlation between the first-year emigration rate and the border apprehension rate for illegal border crossers; based on our own analysis for annual data from the best recent study [8] we use a correlation of -0.5. Specifically, we generate two correlated random variables, one for the probability of apprehension and the other for the first-year emigration rate of border crossers from the ranges described above.