Figures
Abstract
The majority of migration moves globally are internal within national borders. This makes internal migration intensities an important component for understanding the dynamics of population change according to size, composition and across geographies. While incorporating migration into demography’s quantitative framework allows a description of population change across both time and space, and mathematical and conceptual frameworks for migration have been developed, researchers lack a public repository of historical age-origin-destination-specific migration probabilities that is in a common format and spans a range of countries. Addressing this requires a robust method for inferring migration probabilities from census and survey data when there are significant levels of uncertainty from small-sample noise and age aggregation. In this paper we extend the P-TOPALS and P-spline methods for smoothing migration probabilities to apply to grouped data by ages to develop a methods protocol for a harmonised, homogeneous format and multi-nation Human Internal Migration Database. We find our method out-performs a hybrid spline-parametric method in terms of both accuracy and plausibility. We illustrate the method by estimating complete age-origin-destination migration probabilities for more than 50 countries using microdata samples from IPUMS International. This work advances the stock of migration data from which demographers and others can draw from in the analysis and projection of population change.
Citation: Dyrting S, Taylor A (2024) Estimating complete migration probabilities from grouped data: A methods protocol for developing a global Human Internal Migration Database. PLoS ONE 19(12): e0315389. https://doi.org/10.1371/journal.pone.0315389
Editor: Bernardo Lanza Queiroz, Universidade Federal de Minas Gerais, BRAZIL
Received: May 10, 2024; Accepted: November 23, 2024; Published: December 10, 2024
Copyright: © 2024 Dyrting, Taylor. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: All migration data consisting of counts of exposed population and movers by age, as well as estimated migration probabilities are deposited on Open Science Framework: <https://osf.io/vmrfk>
Funding: The authors were in part funded by a grant for independent demographic research by the Northern Territory Department of Treasury and Finance. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing interests: The authors have declared that no competing interests exist.
Introduction
The majority of people who migrate move within their own country rather than internationally [1, 2]. Internal migration probabilities, along with fertility and mortality rates, are the primary processes for understanding and describing the dynamics of spatial changes in the sizes and structures of sub-national populations, especially for countries that have completed the first demographic transition, since that process leads to a convergence towards lower vital rates [3], although significant intra and inter-national variations in fertility and mortality rates do remain, including, for example, mortality rates within the United States [4]. Because of this, internal migration has become the key driver of population change and having a complete picture of migration probabilities is important for their application to constructing sub-national multiregional life tables [5], stable age structures [6], subnational population projections [7, 8], and measures of the intensity of migration and its effectiveness in redistributing population [3, 9, 10]. They are also increasingly a focus for government policies and initiatives for population attraction in countries and sub-national jurisdictions experiencing reduced natural growth rates and increasing rates of urbanization and regional depopulation [11, 12]. In this article migrations are internal unless stated otherwise.
The development of a theoretical framework for migration began with Ravenstein’s observation in the late 1800s that human movement was patterned [13, 14]. The literature on migration evolved to characterise its diffusive nature as being a balance between new opportunities for the individual at the destination and intervening ones at the origin [15], with differences in actual and perceived positive and negative factors at the origin and destination, tempered by intervening obstacles [16]. Migration has subsequently been described as being selective by demographic and socioeconomic characteristics [17, 18], and part of a mobility transition progressing sequentially in time and radiating in space as a society passes through modernisation [19]. Empirical studies have continued to document the evolution of migration in European countries and their former colonies [20–24] and beyond [3, 25, 26].
Incorporating migration into demography’s quantitative framework allows a description of population change in size and composition across both time and space. In this article we work within the multistate demography framework [27–29]. In this approach, the fundamental variable is a complete schedule of migration rates, a matrix of probabilities that a person at place O (the origin) with age x will be at place D (a potential destination) with age x + n an interval n years later. The labels O and D refer to a given spatial decomposition and age x ranges by single year increments from birth to the limit of human life ω. The triplet (O,D,x) therefore labels the three dimensions of a migration ’cube’ [30]. As well as allowing the fore mentioned calculations at high age resolution, robust methods for estimating complete migration probability schedules are a necessary precursor to resolving one of the main problems frustrating spatial demography and subsequently its application to infrastructure and service planning: data. This is because, in migration studies, mathematical and conceptual frameworks are more advanced than the available data [31] and, while for vital processes public repositories of historical data exist [32, 33], for internal migration there are no such repostitories for the demographer to draw from.
The works of P. Rees & Kupiszewski [30] and Bell et al. [34] show that data problems for migration take a number of forms. First, most data on internal migration are collected periodically rather than continuously, either by population censuses or surveys, and usually every 10 or 5 years. This impacts the quality of the estimated migration probabilities by preventing the pooling of counts over consecutive years. Secondly, there are significant variations in the types of migration data collected between countries, with population registers capturing changes in address (migration events) and population censuses and surveys recording addresses at two or more points in time usually separated by a fixed duration (migration transitions) [35, 36]. Even for transition-style data there are variations in the interval n, with Bell et al. [34] finding that, of the 142 countries they surveyed that collected internal migration data in the 2000 UN census round, 29 (20%) collected 1-year data, 52 (37%) collected 5-year data, and 32 (23%) collected data over some other interval. The enduring and ongoing nature of migration as a demographic event means that transition data over long intervals will underestimate the number of moves, making migration probabilities over different intervals not directly comparable [36, 37].
Lastly, internal migration data is mostly compiled and published by national statistics offices based on varying sources, methods and formats, making international comparative demographic studies on internal probabilities labour intensive according to the number of countries in the study. The work required includes obtaining access permission, data extraction, and potentially significant fees for bespoke data requests for tables which are not made publicly available [30, 38]. Additionally, published data is often incomplete, such that it is not possible to derive accurate migration probabilities for all combinations of origins, destinations, and ages, either because data is missing or because sample-based estimates of probabilities are noisy with high standard errors. The problem of incomplete data is particularly prevalent for internal migration because one cannot avoid estimating quantities at the subnational level.
Converting migration data to complete schedules of internal migration probabilities would mitigate many of these problems, and free the user from issues of data type (transition or event) by harmonising into one common data structure. This is relevant for subnational population projections where age groups must map into each other over time as individual age groups get one increment older, and also facilitates the fitting of model migration schedules, which are more highly parameterised than mortality and fertility models due to the presence of multiple peaks [39]. Complete migration probability schedules also allow a wider dissemination of migration data because estimates of migration probabilities that account for finite sample noise are less disclosive than the data they are derived from, being in effect a data treatment technique for aggregated data [40], alleviating potential ethical issues around the identification of individuals or indeed minority groups.
In the case of transition-style migration measures, the data from which migration probabilities are derived are census or survey origin-destination matrices by sex and age. In general, the age variable is grouped. That is, some or all values represent an age range spanning an interval greater than one year. It is difficult to assess the extent of grouped migration data because, as both P. Rees & Kupiszewski [30] and Bell et al. [38] stress in their review of national statistical offices, there can be considerable variance between the data that is collected (available ’in principle’) and data that is published (available ’off the shelf’). If we consider off-the-shelf census data provided by national statistics offices from the group of six countries listed in Bell et al. [38, Census Catalogue section] as recording changes in address over two intervals (Australia, 2011; Canada, 2001; Malta, 2005; Greece, 2001; Portugal, 2001; South Korea, 2000), and confine ourselves to first subnational geographic levels, we find that Australia provides access to origin-destination tables by single-year of age [41], Canada provides origin-destination tables without age disaggregation and tables of out- and in-migration by five-year age groups [42], Malta provides tables by ten-year age groups [43], Greece by ten-year age groups [44], the tables provided by Portugal are not disaggregated by age [45], and South Korea provides counts of out- and in- migrants by five-year age groups [46]. Of the five countries that publish data disaggregated by age, four used aggregated age groups. A solution to this data problem in spatial demography will therefore require a method for inferring migration probabilities from age-specific origin-destination matrices that can be applied to both single-year and age-grouped data.
Current methods for inferring single-year of age migration probabilities for grouped data are based on the property that grouped probabilities are a weighted average where the weights are proportional to the single-year population [47]. Spline methods seek to construct a complete curve that reproduces the grouped probabilities. If the weights do not vary appreciably over each age group, then we can approximate them by equal weights. This approximation was used by Campbell [48] for the case of five-year age groups to interpolate a complete curve from grouped data using Sprague multipliers [49], and by Rogers et al. [47] to interpolate out-migration from Stockholm using cubic splines [50]. If it is further assumed that probabilities are approximately linear in age over the group interval, then the grouped probability is approximately equal to the probability at the mid point of the age group. This approximation was use in Rogers et al. [51] as a basis for cubic spline interpolation.
Internal migration probabilities have a common profile over the life course: decreasing with age for children, then increasing for young adults as they move for further education or careers, reaching a maximum in the twenties, thereafter declining with age, with possibly a secondary increase at ages associated with peaks in retirement or post-retirement moves [18]. Model migration schedules describe this pattern using a specific functional form for each component, the magnitude, position, and width of which is explained by a set of parameters [39, 47, 52–54]. One of the first applications of model migration schedules was in the expansion of grouped data [47], where population weights were used to estimate one-year rates from five-year age data. Rogers & Castro [55] advocated the use of the mid-point approximation for fitting the model parameters, and Rogers et al. [51] used a hybrid approach, interpolating a compete curve using cubic splines and the mid-point approximation, and then fitting the result with a model migration schedule.
While methods based on splines and model migration schedules have their advantages, they both have limitations that make them unsuitable as general frameworks for estimating migration probabilities from grouped data. Spline methods are easy to calibrate and have great flexibility in the age profiles they can fit because they only assume the profile is locally polynomial. However, they make assumptions about the population distribution or migration probability over each age group which are not likely to hold as the length of the group interval increases. For example, when the age groups are terminated by an open interval such as 80 years and over, the population weights will exponentially decrease rather than remain constant, and for age groups spanning a local maxima such as a student or labour peak, migration probabilities will display significant convexity rather than change linearly with age. Spline methods also assume observations are free from significant sample noise which limits their application to large populations and age groupings greater than one year. Model migration schedules have the advantage that, when properly calibrated, they will produce complete schedules that reflect the features their parameters encode, but their accuracy is constrained by the assumed functional form. As a result, over time they have accreted evermore peaks (childhood, student, labour, retirement, post-retirement [39]), which in turn can make them difficult to calibrate due to their large number of parameters (the most parsimonious has seven [47]). Furthermore, as the group interval increases, the user is confronted with the problem of controlling this set of parameters in the face of a decreasing number of observations.
In light of these constraints and issues, here we aim to develop a general framework for estimating migration probabilities from age-specific origin-destination matrices that can be used as the basis of a methods protocol for a Human Internal Migration Database (HIMD). Our approach is to generalise a recent method for smoothing single year of age data [56, 57] to the case of grouped ages. This new method combines the strengths of both splines and model migration schedules (flexibility, ease of calibration, ability to specify views on the reasonable form of the age distribution), accounts for sample noise, is stable when the number of age intervals becomes small and can be applied to a general age abridgement structure. We test this method to assess whether it is an improvement over an existing method both in terms of measures of goodness of fit to the observed data and plausibility of the generated profiles. From this, we propose a migration database methods protocol and illustrate its utility by generating complete schedules of internal migration cubes for 54 countries.
In the next section we introduce the problem of estimating probabilities from age-grouped data, its decomposition into the two problems of estimating generation and distribution components, and the solutions to these two problems using P-TOPALS and P-spline methods respectively. The data and methods section describes the preparation of migration data from IPUMS-I microdata samples, and the process of estimating complete migration schedules from grouped data using P-TOPALS/P-splines and a comparison method. In the results section we compare the two methods using metrics of best-fit and demographic plausibility. In the final two sections we discuss the implications of our results and conclude with possible directions for future development.
Grouped migration probabilities
Assume a country has been divided into d + 1 geographical subunits and consider the internal movement of its population over a time interval n from a given origin O to the d possible destinations D which we index from 1 to d. Grouped migration data of transition type consists of counts
(1)
of
migrants to destination j in the age group [ai+n,ai+n+bi) out of an exposed population
(2)
of bNi in the age group [ai,ai+bi) for i = 1,…,g. The final age interval can be open (bg = ∞) or closed (bg < ∞). Age groups are usually contiguous (ai+1 = ai+bi) but the methods described here also apply when some or all of the intervals overlap or are disjoint. The population bN is the number of people who are at origin O at the beginning of the time interval and who are also in the country at the end of the interval. It therefore excludes people who are not alive or who are not in the country after time n. Our aim is to use
and bN to estimate the probability
, defined to be the probability a person initially aged x at origin O is in destination j after the time interval n, for all ages x = 0,1,…,ω. To be precise,
are probabilities conditional on survival and remaining in the country, but we will refer to them simply as probabilities. We take the approach of Dyrting & Taylor [57] and use the generation-distribution framework [58] to factor the (ω+1)×1 vector nmj of probabilities of migrating to destination j into the product
(3)
where the out-migration probability nm is defined to be the (ω+1)×1 vector probability of out-migrating from origin O irrespective of destination and the migration ratio ncj is defined to be the (ω+1)×1 vector probability of migrating to destination j conditional on out-migrating from origin O. Here, and in the following, all matrix operations and functions act elementwise (obtained by operating one element of the matrix at a time) unless stated otherwise. Sample out-migration probabilities are calculated by taking the ratio
(4)
of the total number of migrants
(5)
and the initial population. Sample migration ratios
are calculated by taking the ratio
(6)
of migrants to destination j and the total number of migrants. In the generation-distribution framework, estimating
from migration data is equivalent to estimating out-migration probability and migration ratios from sample probabilities
and
.
This problem presents a number of difficulties for any estimation method. First, grouping can significantly obscure the age-dependence of migration because it averages probabilities over the age interval. Secondly, the number of sample probabilities is usually less than the number of probabilities to estimate, g ≤ ω + 1, by a factor approximately equal to the average group interval. Lastly, when regarded as a measurement of actual probabilities, sample probabilities will have components of sample noise, notwithstanding that age grouping is sometimes used to mitigate the effect of small populations. The P-TOPALS approach is well suited to handling these problems because, like model migration schedules, it is open to views on the plausible age profile of migration probabilities, it can generate estimates from even a moderate number of observations because it uses low-dimensional penalties, and it explicitly accounts for and adapts to sample noise by using maximum likelihood fitting.
Estimating out-migration
Both the number of migrants and the exposed population within each age group can be written in terms of numbers in each single-year age group. It follows [47] that a grouped probability can be written as a weighted average of single-year of age probabilities
(7)
where A⋅B denotes matrix multiplication and the g × (ω + 1) weight matrix w has elements
(8)
A person exposed to the risk of out-migrating who is initially aged x will span the age range [x,x+n) over the interval n. Therefore, the multi-year probability can be expressed in terms of implied one-year probabilities mx through the expression
(9)
Eqs (7) and (9) illustrate that multi-year transition-style age-grouped migration probabilities are related to implied one-year probabilities by a combination of geometric and arithmetic averaging so that the probability is an average of mx over the age range [ai,ai+bi+n).
In the P-TOPALS approach [56] we represent the (ω+1)×1 vector of probabilities m = [m0,…,mω]’ relative to a standard migration curve
(10)
where
is an (ω + 1) × 1 vector, B is an (ω+1)×l matrix of B-spline functions arrayed columnwise, and θ is an l×1 vector of spline weights. The number of B-splines l is determined by the number of spline knots [59]. The relational form of Eq (10) makes it possible, via the standard
, to express views on the age structure of out-migration that are otherwise obscured by the effect of age-grouping over a multi-year interval. The spline weights θ are found by maximising the function
(11)
with the first term equal to the log-likelihood assuming sample noise has a Poisson distribution and the second term controlling the smoothness of the B-spline component by penalizing k-order differences in θ. Maximising ℒ(θ) leads to a system of nonlinear equations which can be solved by iterated linear regressions [56]. The derivation of this iteration is given in S1 Appendix.
The P-spline approach uses a finely spaced set of spline knots and controls the amount of smoothing through the choice of the penalty size λ. We see from Eq (11) that, for small values of λ, the maximum of ℒ(θ) will be determined by the first term. In this case the solution will be be approximately when knot spacing is finer than age-grouping, showing that for small penalties P-TOPALS will approximate an interpolation of the sample probabilities. When the penalty is large, the second term in Eq (11) will dominate and in this case the B-spline component B⋅θ will approximate a (k-1)-order polynomial [60]. As the penalty ranges from small to large the P-TOPALS fit changes from a fit using l parameters to an approximate fit using k parameters. The penalty can be chosen automatically by optimising an information criterion, such as the Akaike information criterion [61] or Bayesian information criterion [62], that balances the improvement in fit against the increased number of effective parameters [56].
Estimating migration ratios
The age-grouped migration ratio for destination j is related to grouped out-migration and destination-specific out-migration
through the expression
(12)
It is difficult to estimate ratios because there is an additional constraint that the sum of ratios over all destinations combined must equal one. We generalise the approach used by Dyrting & Taylor [57] for ungrouped ratios and introduce grouped conditional ratios
defined to be the the probability of migrating to destination j conditional on not migrating to destinations 1,…,j-1. It is related to
by the expression
(13)
where
is the product
(14)
Single-year conditional ratios aj are similarly defined in terms of c. The advantage of using conditional ratios is that they are only subject to the condition , and are also related by a weighted average similar Eq (7),
(15)
Details of the derivation of the weight matrix are given in S1 Appendix.
In the method given in Dyrting & Taylor [57] the (1+ω)×1 conditional migration ratio vector is expressed in terms of B-splines
(16)
and the spline weights φj are determined by maximising the penalised likelihood function
(17)
where
(18)
The first term in Eq (17) is the log-likelihood function assuming migration to destination j follows a binomial process (a person has or has not migrated to j), and the second term penalises first-order differences in the spline weights. Similar to out-migration, the above equation can solved for ϕj using iterated regressions and the penalty set manually or determined by optimising an appropriate information measure such as the Akaike information condition with corrections [57, 63].
Materials and methods
IPUMS International [IPUMS-I, 64, 65] is a project which inventories, stores, harmonises and makes available census and survey microdata from around the world. There are 149 microdata samples in IPUMS-I with variables that allow the calculation of sample migration probabilities between first-level (sub-national) administrative units. From these samples we derived counts of movers and the exposed population bN by age group. Table 1 gives a summary of the samples by country, showing the range of years, maximum number of destinations (dmax), migration intervals (n), number of samples (S), and the range of values for the maximum finite age group (bmax) in each case. Of the 54 countries, 3 (Ireland, Israel, and United Kingdom) have grouped ages, accounting for 10 samples. Also shown in Table 1 is the range of Whipple’s Index of digital preference (W), a method for measuring age heaping to identify variability in the quality of age reporting across regions or countries, for each country [49, 66, 67], and the number of samples (G) that have grouped ages or would require grouped ages because their Whipple index was greater than or equal to 110. The latter is the threshold used by United Nations [68] to categorise data as being either "highly accurate" or "fairly accurate" data (W < 110) or "approximate", "rough", or "very rough" (W ≥ 110).
The migration dataset was divided into two subsets: Ds, the 73 samples with ungrouped data, and De, consisting of the 76 samples where the data was grouped or required grouping. For samples in De with W ≥ 110, age was aggregated into five-year age groups. Digital preference can sometimes be stronger for ages equal to multiples of 10 compared to odd multiples of 5. This can lead to populations aggregated into five-year age groups displaying a marked "sawtooth" pattern of alternating high and low values which is quantified by a value for the sawtooth index ST [69] greater than 1. The samples that were grouped due to a high Whipple index were then examined for preferences in reporting age with end digit 0 over end digit 5. If a preference was evident visually or if ST was high (ST ≥ 1.11), ages over a given value (usually 40 years but for some samples lower) were aggregated into ten-year groups. Samples with an open age group were examined for age overstatement using an index OS based on Coale & Kisker [70] equal to the population of the open age group as a proportion of the population over 70 divided by the same ratio for a reference population. For the latter we used the United Nations [71] estimate for that country and year. For samples with OS ≥ 100, the open age interval was not included. Data were deposited in an OSF repository, available to the reader from the link provided in [72].
For each combination of country, year, and migration interval in De, total first-level internal migration probabilities (the probability of changing the first-level administrative unit of residence regardless of origin) were estimated using P-TOPALS with a flat standard (), quadratic B-splines with nodes spaced 2.5 years apart from age 0 to age 180, and a linear penalty (k = 1) with the penalty size chosen by optimising the Bayesian information condition. For each first-level administrative unit, out-migration was then estimated using the same configuration with the country-specific total out-migration as the standard. Destination-specific migration ratios were then estimated using P-splines with quadratic B-splines with nodes spaced 2.5 years apart from age 0 to age 180, a linear penalty, with the penalty size chosen by optimising the Akaike information condition with corrections. In total we estimated 1,205 origin-specific and 20,220 origin-destination-specific schedules of migration probabilities at single-year intervals from age 0 to age 110, covering 80 migration cubes for 33 countries.
As a comparison method we used the hybrid approach of Rogers et al. [51], originally specific to five-year age groups, here generalised to the case of arbitrary age grouping. In this method, the ith sample probability with a closed age interval (bi < ∞) is taken as the value of the estimated probability at the mid point of the age group
(19)
Probabilities at single-year ages between the first and last mid point are found using the xi as knots of a cubic spline. Final estimates at all ages are then found by fitting the spline with the student-peak model migration schedule [54]
(20)
Origin-destination-age-specific migration probabilities were estimated by the hybrid method using the following. Step 1: best-fit parameters were obtained for total first-level migration. Step 2: parameters from Step 1 were then used as starting values for an estimation of the best-fit parameters for origin-age-specific out-migration. Step 3: parameters from Step 2 were used to estimate best-fit parameters for origin-destination-age-specific migration by adjusting the level parameters (a1,a2,a3,a4,a5, and c in Eq (20)), keeping the other parameters fixed at the origin-specific values. If any non-constant component in Eq (20) was not used in a step (for example, by holding its level ai fixed at 0) then it was not used in the subsequent steps.
Figs 1 and 2 respectively illustrate the two cases where migration probabilities are estimated by expanding data that is published in grouped form (Ireland, 2006) or requires grouping because of age heaping (Indonesia, 2010). Evident in both figures are the significant amounts of sample noise both in the probabilities at single year of age (Fig 1, ages 0 to 15) and groups (closed and open).
Sample probabilities are single-year of age (circles) and grouped (horizontal lines). Grey area and vertical lines indicate 95% confidence interval for sample probabilities based on P-TOPALS/P-spline fit. Age is measured at the start of the migration interval. The open age group begins at age 84. Source: Author calculations based on Census data obtained from IPUMS International [64, 65].
Horizontal lines, grouped sample probabilities; vertical lines, 95% confidence interval for sample probabilities based on P-TOPALS/P-spline fit. Age is measured at the start of the migration interval. The open age group begins at age 90. Sample probabilities for the 85–89 and 90+ age groups are zero. Source: Author calculations based on Census data obtained from IPUMS International [64, 65].
Results
We compared the two methods using three metrics: two measures of the level of fitting error between sample and estimated probabilities, and a measure of deviation of the estimated probabilities from a reference schedule. The first goodness-of-fit statistic, which we call the migration fitting error, is the scaled multinomial deviance measuring the aggregate error of estimates of all migration probabilities from a given origin
(21)
In this expression the scaling factor 1/(gd) has been added to control for the effect of the number of observations and destinations on the total deviance and allow comparisons between cases with different age groupings and spatial decompositions. Deviance can be thought of as the generalisation of the sum of squared standardised errors to case of a general distribution, in this case the multinomial distribution. Fig 3 shows the migration fitting error densities for P-TOPALS/P-spline and the hybrid method calculated using kernel density estimation [73]. We see that, compared to the hybrid method, P-TOPALS/P-spline fitting errors have a much lower mean (0.86 compared to 1.97 for the hybrid method) and smaller standard deviation (0.20 compared to 2.86 for the hybrid method). These results indicate that the P-TOPALS/P-spline approach provides a consistently better fit to destination-specific probabilities.
Migration fitting error (dev) is given by Eq (21). Source: Author calculations based on Census data obtained from IPUMS International [64, 65].
For a given origin, a good fit of migration probabilities to all destinations does not necessarily lead to a good fit of total out-migration from a given origin. To assess this, we use an additional goodness-of-fit measure, the scaled binomial deviance for total out-migration
(22)
Fig 4 shows the out-migration fitting error densities for P-TOPALS/P-spline and the hybrid methods. We see that, compared to the hybrid method, P-TOPALS/P-spline out-migration fitting errors also have a lower mean (0.69 compared to 9.30 for the hybrid method) and smaller standard deviation (0.31 compared to 34.22 for the hybrid method).
Out-migration fitting error (devm) is given by Eq (22). Note: Inset uses an expanded scale to show the broad distribution of fitting error for the hybrid method. Source: Author calculations based on Census data obtained from IPUMS International [64, 65].
For cases where the number of observations g is of the same order as the number of fitting parameters, it is possible for a method to achieve an accurate fit to sample probabilities with an unrealistic profile. To assess the plausibility of the estimated destination-specific migration probabilities we use the shape deviation metric , equal to the sum of the percentage that each destination-specific schedule’s profile differs from a reference profile (nmref)
(23)
The reference profiles nmref were calculated from the ungrouped dataset Ds as follows. For each of the 2235 combinations of first-level administrative regions and migration intervals, we used the method given in Dyrting [56] to estimate a smooth schedule of implied one-year out-migration probabilities, rescaled to sum to 1 over ages 0 to 110. The one-year reference schedule mref was defined as the average of the scaled implied one-year curves, and multi-year reference schedules nmref calculated from mref using Eq (9). Fig 5 shows the resulting reference curves for the two most frequent intervals, n = 1 and n = 5. We see that both the one-year and the five-year probabilities have a prominent labour peak centred near age 20. The one-year probabilities also display a student peak at age 17, which is also evident in the five-year probabilities as a discontinuous change in the slope [56].
Source: Author calculations based on Census data obtained from IPUMS International [64, 65].
Fig 6 shows the shape deviation densities for P-TOPALS/P-spline and the hybrid methods, and, for reference, the density of the shape deviation for the ungrouped dataset Ds. It shows that P-TOPALS/P-spline deviations are concentrated at lower values (mean 4.81 and standard deviation 3.92) compared to the hybrid method (mean 13.38 and standard deviation 8.63) and have a distribution similar to the ungrouped dataset (mean 4.35 and standard deviation 3.13), showing that the variation in expanded schedules generated by P-TOPALS/P-splines is plausible, and consistent with the range of shapes present in smoothed ungrouped data.
Shape deviation () is given by Eq (23). Source: Author calculations based on Census data obtained from IPUMS International [64, 65].
To further test the relative performance of P-TOPALS/P-spline and the hybrid method we compared expansions of Australian census 2021 interstate migration data grouped over 5 and 10 year age intervals with data at 1 year age intervals, all of which are available from the Australian Bureau of Statistics [74]. Table 2 gives migration and out-migration goodness of fit measures by state, migration interval, and age grouping. It shows that P-TOPALS/P-spline has the smallest migration fitting error in 30 cases and the hybrid method has the smallest error in 2 cases (ACT and South Australia, 1-year migration interval and 10-year age grouping). P-TOPALS/P-spline has the smallest out-migration fitting error in 29 cases and the hybrid method has the smallest error in 3 cases (New South Wales, 1-year migration interval, 5-year and 10-year age groups, and Tasmania, 1-year migration interval, 10-year age group).
We now turn to describing our HIMD methods protocol which is an application from the above. A HIMD methods protocol for estimating complete schedules from transition-style data will need to include processes for smoothing single-year data and expanding grouped data. Fig 7 gives a summary of our proposed protocol using the P-TOPALS/P-spline method and applied to migration data D to illustrate a proof-of-principle. We used the method given in Dyrting & Taylor [57] to smooth migration ratios for each origin in Ds and combined the results with the smoothed out-migration estimates described above to produce 3,468 origin-specific and 73,067 origin-destination-specific schedules of migration probabilities at single-year intervals from age 0 to age 110. These schedules were combined with the expanded schedules from De to give a set of 172 complete migration cubes for 54 countries. These internal migration probabilities are available for download from an Open Science Foundation repository [72].
Discussion
We have shown that the methodology for estimating migration probabilities to develop a HIMD will need to account for both ungrouped and grouped migration data. We find that of the 149 census samples available in IPUMS-I, approximately half (76) used grouped ages or required grouping to mitigate age misstatement. We have extended the P-TOPALS/P-spline smoothing method to the problem of expanding grouped data and shown that, as an expansion method, it is both more accurate and better able to produce plausible schedules than an existing approach based on cubic splines and model migration schedules. We illustrated the feasibility of the method by creating complete first-level migration cubes from sample data in IPUMS-I covering 54 countries.
As well as being a useful method for both smoothing and expanding migration data, the P-TOPALS/P-spline method is also a useful tool for harmonising migration data for different transition intervals to a common representation, 1-year implied out-migration probabilities (Eq (9)) and conditional ratios (Eq (15)). We made use of this in the Results section to derive reference schedules. Additional possibilities for expanding this work include extending the calibrated splines (CS) methodology [75, 76] to migration by performing shape-calibration using the HIMD 1-year implied out-migration probabilities, and investigating how implied out-migration changes with interval length for the countries that record changes of address over more than one interval (Botswana, Canada, Greece, Mozambique, Philippines, Senegal, Spain, Trinidad and Tobago) to gain further understanding of the 1-year/5-year problem [36, 37, 77].
An effective protocol for producing migration cubes from census data also has positive implications for the use and dissemination of migration statistics. For example, it will make it easier to specify the migration matrix input to multiregional population projection models, considered as the benchmark for intra-national and national population projections [78]. In addition, national statistics offices regularly publish life tables, and our methodology could be used to produce complete migration probability cubes for them to provide as one element of their census data release program. Furthermore, a challenge for spatial demography is the graphical summary of both age and spatial structure. Figs 8 and 9 reprise Figs 1 and 2 with flow maps produced using FlowMapper.org [79] and highlight the spatial pattern of out-migration and inter-region connectivity but not the age structure. It is our hope that a HIMD will encourage interactions between demographers and cartographers to develop automated processes for representing both the spatial and age patterns of migration [79–81]. The work here provides a stepping-off point for this to occur by providing a harmonised common data source.
Spatial decomposition of Ireland is by IPUMS International harmonised first-level geography. The shapefile can be accessed at https://international.ipums.org. Source: Author calculations based on Census data obtained from IPUMS International [64, 65].
Spatial decomposition of Indonesia is by IPUMS International harmonised first-level geography. Only the top 100 migration ratios are shown. The shapefile can be accessed at https://international.ipums.org. Source: Author calculations based on Census data obtained from IPUMS International [64, 65].
Our finding that P-TOPALS/P-spline out-performs the hybrid method in both accuracy and plausibility is consistent with the strengths of the former method and the limitations of the latter. While P-TOPALS/P-spline treats grouped probabilities exactly (Eq (7)), the hybrid method approximates them by the migration probabilities in the centre of the age group and does not account for the open interval (Eq (19)). P-TOPALS/P-spline takes a top-down approach which ensures that estimates of destination-specific migration are consistent with estimates of out-migration, whereas the hybrid method is bottom-up, estimating destination-specific probabilities independently, so that a small value for dev does not imply a small value for devm. As the exposed population decreases, sample noise increases. In this case, P-TOPALS/P-spline responds by decreasing the effective number of fitting parameters. In contrast, the number of parameters in the hybrid model is fixed, and the parameter values become less constrained as sample noise increases. In some cases this can lead to the generation of implausible schedules using the hybrid method.
Despite the better performance of P-TOPALS/P-spline, we acknowledge that the method has drawbacks which might cause researchers to prefer the hybrid method. The algorithms given in the subsections on estimating out-migration and migration ratios are more complex than the process for fitting the hybrid method given in the Methods and Materials section, and we found implementing P-TOPALS/P-spline in software required more work than adapting model migration schedules for grouped data. In our case the method supports a multi-country database which we feel justifies the additional work, but researchers undertaking smaller studies might feel it does not. Once the method was implemented it was then necessary to decide on the configuration of parameters to apply such as the choice of standard, the fineness of the the knot spacing, the degree of the B-splines, the order of the penalty, and the method for choosing the optimal penalty size. While, for the hybrid method, it is also necessary to decide which parameters to fix and which to fit, users will be more familiar with this choice given model migration schedules have been used for nearly 50 years [6].
The P-TOPALS/P-spline configuration outlined in the Methods and Materials section was applied to all samples in De. There is, therefore, the potential for improvements to be made by designing configurations that are specific to the sample. For example, the choice of a flat standard () when estimating the total first-level migration probability will imply a smoothly varying profile across ages where the group interval bi is large. If there are reasons to believe migration probabilities are not smooth across an age group, for example it includes a student peak, then this can be incorporated by choosing a standard that includes this feature. The size of the penalty that controls the amount of smoothing was determined by the Bayesian information criterion for out-migration and by the Akaike information criterion for migration ratios. If there are reasons to believe sample noise is being under- or over-smoothed, then an alternate criterion can be used, or the the penalty set manually. The weights wi,x in Eq (8) depend on the exposure Nx at age x which was estimated by a spline interpolation of the grouped exposures bNi that assumes an exponentially decreasing profile over open intervals. It may be that the user has additional information on the age distribution which can be used in the calculation of Nx by expanding bNi using P-TOPALS for age distributions [82].
Progress has been slow in the development of methods for expanding grouped migration data beyond the spline and MMS approaches first introduced in Rogers et al. [47]. Up to now these two methods have remained the standard, though now combined into a hybrid approach [51]. Consequently, there has been a dearth of comparative studies for alternate methods and countries. This is likely a result of the estimation of migration probabilities requiring the careful handling of multiple factors that, in combination, make the problem formidable. First, the number of schedules to estimate is quadratic in the number of spatial units (d), becoming large for even first-level administrative units, and making manual intervention unfeasible and small-sample noise unavoidable. Secondly, small sample noise cannot be reduced by pooling consecutive years because data is collected infrequently. Thirdly, data is often doubly abridged, spanning multiple years of both time (n > 1) and age (b > 1). Our findings show that it is possible to address these challenges within the P-TOPALS/P-spline framework, adding non-parametric methods to the demographer’s migration toolbox, and illustrating the feasibility of a HIMD dataset which will be a useful testing ground for further advances in estimation methods and will enhance demographers’ and other experts’ access to quality data for modelling human spatial exchanges.
In this article, the population exposed to internal migration is conditional on being in the country at the start and at the end of the time interval. It therefore excludes people who are international migrants (people who are outside the country at either the start or end of the interval) which reduces the potential for direct biases in estimates of internal migration probabilities due to regional variations in international migration rates. This is not to say that international migration does not affect internal migration. International migrants (people born outside the country) will be included in internal migration rates provided they are in the country at the start and at the end of the interval, and to the extent that they tend to prefer large urban centres, they will influence urban-regional migration probabilities [83, 84].
There are a number of limitations of the migration dataset we used in this study that have implications for the utility of our estimated migration cubes. IPUMS-I datasets are census microdata samples and as such will include higher levels of sample noise than the full census microdata. Access to the complete census data would allow more accurate estimates of migration probabilities but would also entail greater costs in time and access fees. Table 1 shows that our dataset includes most countries in the Americas but under-represents Africa, Europe, Asia, and Oceania. Further work needs to be done collecting data for these regions to make the dataset reflect a global view of internal migration. Of the 54 countries, 49 have less than 5 census years represented in our dataset, and none are prior to the 1970s. This constrains the level of time series analysis that can be performed. Comparisons between countries are complicated by differences in migration interval n mentioned in the Introduction. Currently our dataset estimates probabilities of migrating between first-level administrative regions. Analysis of migration between second-level regions, or between urban and regional areas, will require data at finer spatial levels.
A global Human Internal Migration Database will provide information on the movement of subnational populations and can therefore have significant ethical and practical implications. Publishing data on small populations raises concerns that information about an individual might be disclosed even though the data is aggregated [85]. PTOPALS/P-spline is a mapping from aggregate counts of movers and exposed population to probabilities that are smoothed across ages. The amount of smoothing we have applied increases as the exposed population decreases. As a result, the method here is comparable to the types of data modification techniques applied to census and other data by national statistical agencies to prevent identification and disclosure of individuals or groups of individuals who may be vulnerable [40, 86].
Conclusions
We have shown that it is possible to extend both P-TOPALS and P-spline methods for smoothing migration probabilities and ratios [56, 57] to the problem of expanding grouped migration data. We find that it out-performs the hybrid spline-MMS method both in terms of fitting sample data and plausibility of the expanded schedules. Furthermore, we have shown that it is feasible to construct a database of complete schedules of internal migration cubes using the same framework for both smoothing data of high quality and expanding data that has been grouped to mitigate age misstatement that can form the core of a HIMD. These migration cubes are available for download from an Open Science Foundation repository [72].
A limitation of the method developed in this article is that it can only be applied to transition-style data collected from population censuses or surveys. A significant number of countries collect event-style migration data using population registers [38]. Consequently, a direction for further research is to extend the methods here to estimate migration schedules from event-based sample rates.
We have applied the P-TOPALS/P-spline method to estimating migration probabilities between first-level administrative units, but the methods can also be applied at more granular geographies. For example, IPUMS-I provides migration data for second-level geographic subunits for a selection of countries. The strategy we used in the Results section of first estimating aggregate migration, and then using it as a standard for estimating origin-specific out-migration can also be applied in this case. Sample noise would increase relative to the first-level case as a result of the reduction in exposed populations, and the penalty terms in Eqs (11) and (17) will play an increasing role in the expansion of probabilities and ratios.
Although in this study our method was shown to perform well against the hybrid method, we believe that model migration schedules will continue to be a valuable tool for studying migration. However, it will be necessary to solve the problem of the automatic solution of optimal fitting parameters for them to be useful for analysing large datasets. It has long been recognised that the MMS parameters can be grouped into families [47, 51, 87], and it might be possible to use the range of profiles spanned by the schedules estimated in this article to specify additional constraints on the parameters that regularise the MMS fitting problem in a way analogous to the shape constraints used in mortality calibrated splines [76].
Supporting information
S1 Appendix. Solving for the B-spline weights.
https://doi.org/10.1371/journal.pone.0315389.s001
(DOCX)
Acknowledgments
The authors wish to acknowledge the statistical offices that provided the underlying data making this research possible: National Institute of Statistics and Censuses (Argentina), National Institute of Statistics (Bolivia), Central Statistics Office (Botswana), Institute of Geography and Statistics (Brazil), Central Bureau of Census and Population Studies (Cameroon), Statistics Canada, National Institute of Statistics (Chile), National Bureau of Statistics (China), National Administrative Department of Statistics (Colombia), National Institute of Statistics and Censuses (Costa Rica), National Statistics Office (Dominican Republic), National Institute of Statistics and Censuses (Ecuador), Bureau of Statistics (Fiji), Ghana Statistical Services, National Statistical Office (Greece), National Institute of Statistics (Guatemala), Institute of Statistics and Informatics (Haiti), National Institute of Statistics (Honduras), Statistics Indonesia, Central Statistical Office (Iraq), Central Statistics Office (Ireland), Central Bureau of Statistics (Israel), National Bureau of Statistics (Kenya), Statistics Bureau (Laos), Department of Statistics (Malaysia), Statistics Mauritius, National Institute of Statistics, Geography, and Informatics (Mexico), National Statistical Office (Mongolia), National Institute of Statistics (Mozambique), Central Bureau of Statistics (Nepal), National Institute of Statistics and Censuses (Nicaragua), Statistics Division (Pakistan), National Statistical Office (Papua New Guinea), General Directorate of Statistics, Surveys, and Censuses (Paraguay), National Institute of Statistics and Informatics (Peru), National Statistics Office (Philippines), Central Statistical Office (Poland), National Institute of Statistics (Portugal), Federal State Statistics Service (Russia), National Agency of Statistics and Demography (Senegal), Statistics Sierra Leone, Statistics South Africa, National Bureau of Statistics (South Sudan), National Institute of Statistics (Spain), Central Bureau of Statistics (Sudan), National Bureau of Statistics (Tanzania), Central Statistical Office (Trinidad and Tobago), Office of National Statistics (United Kingdom), Bureau of the Census (United States), National Institute of Statistics (Uruguay), National Institute of Statistics (Venezuela), General Statistics Office (Vietnam), Central Statistical Office (Zambia), and Central Statistical Office (Zimbabwe).
References
- 1.
Programme UND. Human development report 2009, overcoming barriers: Human mobility and development. New York: UNDP; Palgrave Macmillan; 2009.
- 2.
Bell M, Charles-Edwards E. Cross-national comparisons of internal migration: An update on global patterns and trends. New York: UN Department of Economic and Social Affairs, Population Division; 2013. Report No.: 2013/1.
- 3. Rees P, Bell M, Kupiszewski M, Kupiszewska D, Ueffing P, Bernard A, et al. The impact of internal migration on population redistribution: An international comparison. Population, Space and Place. 2016;23: e2036.
- 4. Fenelon A. Geographic divergence in mortality in the United States. Population and Development Review. 2013;39: 611–634. pmid:25067863
- 5. Baffour B, Raymer J. Estimating multiregional survivorship probabilities for sparse data: An application to immigrant populations in Australia, 1981–2011. Demographic Research. 2019;40: 463–502.
- 6.
Rogers A, Castro LJ. Model multiregional life tables and stable populations. Laxenburg: International Institute for Applied Systems Analysis; 1976. Report No.: RR-76-9. Available from: https://pure.iiasa.ac.at/id/eprint/535/1/RR-76-009.pdf
- 7.
United Nations. Preparing migration data for subnational population projections. New York: United Nations; 1992.
- 8.
Smith SK, Tayman J, Swanson DA. A practitioner’s guide to state and local population projections. Dordrecht: Springer; 2013.
- 9. Bell M, Blake M, Boyle P, Duke-Williams O, Rees P, Stillwell J, et al. Cross-National Comparison of Internal Migration: Issues and Measures. Journal of the Royal Statistical Society Series A: Statistics in Society. 2002;165: 435–464.
- 10. Rees P, Bell M, Duke-Williams O, Blake M. Problems and solutions in the measurement of migration intensities: Australia and Britain compared. Population Studies. 2000;54: 207–222.
- 11.
Department of Home Affairs. Regional migration [Internet]. Australian Government; 2021. Available from: https://immi.homeaffairs.gov.au/visas/working-in-australia/regional-migration
- 12.
Centre for Population. Planning for Australia’s future population [Internet]. Australian Government; 2021. Available from: https://population.gov.au/publications/publications-planning-future
- 13. Ravenstein EG. The laws of migration. Journal of the Statistical Society of London. 1885;48: 167–235.
- 14. Ravenstein EG. The laws of migration. Journal of the Royal Statistical Society. 1889;52: 241–305.
- 15. Stouffer SA. Intervening opportunities: A theory relating mobility and distance. American Sociological Review. 1940;5: 845–867.
- 16. Lee ES. A theory of migration. Demography. 1966;3: 47–57.
- 17.
Thomas DS. Research memorandum on migration differentials. New York: Social Science Research Council; 1938.
- 18. Bernard A, Bell M, Charles-Edwards E. Life-course transitions and the age profile of internal migration. Population and Development Review. 2014;40: 213–239.
- 19. Zelinsky W. The hypothesis of the mobility transition. Geographical Review. 1971;61: 219–249.
- 20.
Long L. Migration and residential mobility in the United States. New York: Russell Sage Foundation; 1988.
- 21. Long L. Residential mobility differences among developed countries. International Regional Science Review. 1991;14: 133–147. pmid:12284725
- 22.
Rogers A, Willekens FJ, editors. Migration and settlement: A multiregional comparative study. Dordrecht: D. Reidel Publishing Company; 1986.
- 23.
Rees P, Stillwell J, Convey A, Kupiszewski M, editors. Population migration in the European Union. Chichester; New York; Brisbane; Toronto; Singapore: John Wiley & Sons; 1996.
- 24.
Raymer J, Willekens F, editors. International migration in Europe: Data, models and estimates. Chichester: John Wiley & Sons; 2008.
- 25. Bell M, Charles-Edwards E, Ueffing P, Stillwell J, Kupiszewski M, Kupiszewska D. Internal migration and development: Comparing migration intensities around the world. Population and Development Review. 2015;41: 33–58.
- 26. Bell M, Bernard A, Charles-Edwards E, Zhu Y, editors. Internal migration in the countries of Asia: A cross-national comparison. Springer; 2020.
- 27.
Rogers A. Introduction to multiregional mathematical demography. New York: John Wiley & Sons; 1975.
- 28.
Rogers A. Multiregional demography principles, methods and extensions. Chichester: Wiley; 1995.
- 29.
Schoen R. Modeling multigroup populations. New York: Springer; 1988.
- 30. Rees P, Kupiszewski M. Internal migration: What data are available in Europe? Journal of Official Statistics. 1999;15: 551–586. Available from: http://www.scb.se/contentassets/ca21efb41fee47d293bbee5bf7be7fb3/internal-migration-what-data-are-available-in-europe.pdf
- 31. Raymer J, Willekens F, Rogers A. Spatial demography: A unifying core and agenda for further research. Population, Space and Place. 2019;25: e2179.
- 32.
Human Fertility Database [Internet]. Max Planck Institute for Demographic Research (Germany) and Vienna Institute of Demography (Austria); 2023 November 15 [cited 2024 April 4]. Available from: https://www.humanfertility.org
- 33.
Human Mortality Database [Internet]. Max Planck Institute for Demographic Research (Germany) and University of California, Berkeley (USA) and French Institute for Demographic Studies (France); 2024 March 1 [cited 2024 April 4]. Available from: https://www.mortality.org
- 34. Bell M, Charles-Edwards E, Kupiszewska D, Kupiszewski M, Stillwell J, Zhu Y. Internal migration data around the world: Assessing contemporary practice. Population, Space and Place. 2015;21: 1–17.
- 35. Courgeau D. Migrants et migrations. Population. 1973;28: 95–129. French.
- 36. Rees PH. The measurement of migration, from census data and other sources. Environment and Planning A: Economy and Space. 1977;9: 247–272.
- 37.
Kitsul P, Philipov D. The one year/five year migration problem. In: Rogers A, editor. Advances in multiregional mathematical demography. Laxenburg: International Institute for Applied Systems Analysis; 1981. pp. 1–34.
- 38.
Bell M, Bernard A, Ueffing P, Charles-Edwards E. The IMAGE repository: A user guide. Queensland Centre for Population Research, The University of Queensland; 2014. Report No.: 2014/01.
- 39. Wilson T. Modelling age patterns of internal migration at the highest ages. Spatial Demography. 2020;8: 175–192.
- 40.
Australian Bureau of Statistics. Data confidentiality guide [Internet]. Canberra: ABS; 2021 November 8 [cited 2024 April 21]. Available from: https://www.abs.gov.au/about/data-services/data-confidentiality-guide
- 41.
Australian Bureau of Statistics. Table Builder ‐ 2011 Census, internal migration [Internet]. Canberra: ABS; 2021 November 19 [cited 2023 June 3]. Available from: https://www.abs.gov.au/statistics/microdata-tablebuilder/tablebuilder
- 42.
Statistics Canada. 2006 census topic-based tabulations, mobility and migration [Internet]. Ottawa: StatsCan; 2008 July 15 [cited 2023 April 21]. Available from: https://www12.statcan.gc.ca/census-recensement/2006/dp-pd/tbt/St-eng.cfm?LANG=E&Temporal=2006&APATH=3&THEME=71&FREE=0&GRP=1
- 43.
National Statistics Office. Census of population and housing, 2005, volume 1: population. Malta: NSO; 2007. Available from: https://nso.gov.mt/wp-content/uploads/Census_Vol1.pdf
- 44.
Hellenic Statistical Authority. Μετανάστευση / 2011 [Internet]. Piraeus: ELSTAT; 2014 [cited 2020 January 2]. Greek. Available from: https://www.statistics.gr/el/statistics/-/publication/SAM07/2011
- 45. Magalhães M da G. Migrações inter NUTS II e projecções regionais de população residente. Revista de Estudos Demográficos. 2003;34: 61–71. Portuguese. Available from: https://www.ine.pt/xportal/xmain?xpid=INE&xpgid=ine_estudos&ESTUDOSest_boui=106300&ESTUDOSmodo=2
- 46.
Korean Statistical Information Service, Statistics Korea. Number of internal migrants by sex and five-year age groups for province [Internet]. Daejeon: KOSIS; 2023 [cited 2023 June 2]. Available from: https://kosis.kr/statHtml/statHtml.do?orgId=101&tblId=DT_1B26001_A04&conn_path=I2&language=en
- 47. Rogers A, Raquillet R, Castro LJ. Model migration schedules and their applications. Environment and Planning A. 1978;10: 475–502.
- 48.
Campbell PR. Population projections for states by age, sex, race, and Hispanic origin: 1995 to 2025. U.S. Bureau of the Census, Population Division; 1996. Report No.: PPL-47. Available from: https://www.census.gov/library/working-papers/1996/demo/ppl-47.html
- 49.
Siegel JS, Swanson DA, editors. The methods and materials of demography. 2nd ed. San Diego: Elsevier Academic Press; 2004.
- 50. McNeil DR, Trussell TJ, Turner JC. Spline interpolation of demographic data. Demography. 1977;14: 245–252. pmid:858435
- 51.
Rogers A, Little J, Raymer J. The indirect estimation of migration. 1st ed. New York: Springer; 2010.
- 52. Rogers A, Castro LJ. Model migration schedules. Laxenburg: International Institute for Applied Systems Analysis; 1981. Report No.: RR-81-30. Available from: https://pure.iiasa.ac.at/id/eprint/1543/1/RR-81-030.pdf
- 53. Rogers A, Watkins J. General versus elderly interstate migration and population redistribution in the United States. Research on Aging. 1987;9: 483–529. pmid:3438564
- 54. Wilson T. Model migration schedules incorporating student migration peaks. Demographic Research. 2010;23: 191–222.
- 55.
Rogers A, Castro LJ. Migration. In: Rogers A, Willekens FJ, editors. Migration and settlement: A multiregional comparative study. Dordrecht: D. Reidel Publishing Company; 1986. pp. 157–208.
- 56. Dyrting S. Smoothing migration intensities with P-TOPALS. Demographic Research. 2020;43: 1607–1650.
- 57. Dyrting S, Taylor A. Smoothing destination-specific migration flows. Annals of Regional Science. 2021;67: 359–383.
- 58. Rogers A, Raymer J, Willekens F. Capturing the age and spatial structures of migration. Environment and Planning A: Economy and Space. 2002;34: 341–359.
- 59.
de Boor C. A practical guide to splines. Revised. New York, NY: Springer; 2001.
- 60. Eilers PHC, Marx BD. Flexible smoothing with B-splines and penalties. Statistical Science. 1996;11: 89–121.
- 61. Akaike H. A new look at the statistical model identification. IEEE Transactions on Automatic Control. 1974;19: 716–723.
- 62. Schwarz G. Estimating the dimension of a model. The Annals of Statistics. 1978;6: 461–464.
- 63. Hurvich CM, Tsai C-L. Regression and time series model selection in small samples. Biometrika. 1989;76: 297–307.
- 64.
Minnesota Population Center. Integrated public use microdata series, international: Version 7.3 [Internet]. Minneapolis, MN; 2020 [cited 2023 October 23]. doi: https://doi.org/10.18128/D020.V7.3
- 65.
Minnesota Population Center. IPUMS harmonized migration variables [Internet]. 2022 March 15 [cited 2023]. Available from: https://international.ipums.org/international/geo_mig.shtml
- 66.
Whipple GC. Vital statistics: An introduction to the science of demography. New York: John Wiley & Sons; 1919.
- 67.
United Nations. Manual II: Methods of appraisal of quality of basic data for population estimates. New York: United Nations; 1955.
- 68.
United Nations. Demographic yearbook 1973, special topic population census statistics III. New York: United Nations; 1974.
- 69. Riffe T, Aburto J, Alexander M, Fennell S, Kashnitsky I, Pascariu M, et al. DemoTools: An R package of tools for aggregate demographic analysis [software]. 2019. Available from: https://github.com/timriffe/DemoTools/
- 70. Coale AJ, Kisker EE. Mortality crossovers: Reality or bad data? Population Studies. 1986;40: 389–401.
- 71.
United Nations. World population prospects, 2022: Population by single age and sex, 1950–2021, medium scenario [Internet]. New York: United Nations; 2022 [cited 2023 September 25]. Available from: https://population.un.org/wpp/Download/Standard/CSV/
- 72.
Dyrting S, Taylor A. Data from: Estimating complete migration probabilities from grouped data [Internet]. Northern Institute, Charles Darwin University; OSF; 2024. Available from: https://osf.io/vmrfk
- 73. Jones AT, Nguyen HD, McLachlan GJ. logKDE: Log-transformed kernel density estimation. Journal of Open Source Software. 2018;3: 870.
- 74.
TableBuilder [Internet]. Australian Bureau of Statistics; 2021 November 19 [cited 2024 October 22]. Available from: https://www.abs.gov.au/statistics/microdata-tablebuilder/tablebuilder
- 75. Schmertmann C. Calibrated spline estimation of detailed fertility schedules from abridged data. Revista Brasileira de Estudos de População. 2014;31: 291–307.
- 76. Dyrting S, Taylor A. Estimating age-specific mortality using calibrated splines. Population Studies. 2023; 1–18. pmid:37493582
- 77. Dyrting S. A framework for translating between one-year and five-year migration probabilities. ResearchGate [Preprint]; 2018.
- 78. Wilson T, Bell M. Comparative empirical evaluations of internal migration models in subnational population projections. Journal of Population Research. 2004;21: 127–160.
- 79. Koylu C, Tian G, Windsor M. Flowmapper.org: A web-based framework for designing origin–destination flow maps. Journal of Maps. 2023;19: 1–9.
- 80. Jenny B, Stephen DM, Muehlenhaus I, Marston BE, Sharma R, Zhang E, et al. Design principles for origin-destination flow maps. Cartography and Geographic Information Science. 2016;45: 1–14.
- 81. Jenny B, Stephen DM, Muehlenhaus I, Marston BE, Sharma R, Zhang E, et al. Force-directed layout of origin-destination flow maps. International Journal of Geographical Information Science. 2017;31: 1521–1540.
- 82. Dyrting S, Flaxman A, Sharygin E. Reconstruction of age distributions from differentially private census data. Population Research and Policy Review. 2022;41: 2311–2329. pmid:36310654
- 83. King R, Skeldon R. “Mind the gap!” Integrating approaches to internal and international migration. Journal of Ethnic and Migration Studies. 2010;36: 1619–1646.
- 84.
Skeldon R. International migration, internal migration, mobility and urbanization: Towards more integrated approaches. Geneva: International Organization for Migration (IOM); 2018. Report No.: 53.
- 85.
Dinur I, Nissim K. Revealing information while preserving privacy. Proceedings of the twenty-second ACM SIGMOD-SIGACT-SIGART symposium on principles of database systems. New York: Association for Computing Machinery; 2003. pp. 202–210. doi: https://doi.org/10.1145/773153.773173
- 86.
United States Census Bureau. Federal statistical research data center disclosure avoidance methods: A handbook for researchers version 4.0. USCB; 2023.
- 87.
Ruiz-Santacruz JS. Estimación de calendarios migratorios mediante la simulación de los valores iniciales en las optimizaciones de parámetros de los modelos de migración multi-exponenciales: Una aplicación a la migración internacional intra-latinoamericana. Bellaterra: Centre d’Estudis Demogràfics; 2019 pp. 1–69. Report No.: 463. Spanish. Available from: https://ddd.uab.cat/record/212615