Population estimation beyond counts—Inferring demographic characteristics

doi:10.1371/journal.pone.0266484

Fig 1.

Detailed flowchart of the method and the employed datasets.

More »

Expand

Fig 2.

Planning areas (thick lines) including their subzones (thin lines) in Singapore.

The yellow areas are part of the training group, while the turquoise zones are the test areas for estimations. The grey parts of the country are out of scope of our work because they are not residential or not dominated by HDB. Source of the administrative dataset: Urban Redevelopment Authority / data.gov.sg (2014).

More »

Expand

Table 1.

An overview of the predictors.

For each subzone, the density of each amenity has been computed.

More »

Expand

Fig 3.

Visualisation of some of the datasets that we have used in our work.

Proportion of age groups by administrative area (from which we calculate the proportion of seniors and the average age—plotted as well) together with the average age of buildings. The plot hints at disparate demographics of neighbourhoods and at an association between the age of buildings and age of residents, which we attempt to take advantage of in our estimations. Source of the datasets: Singapore Department of Statistics and Housing and Development Board (data.gov.sg).

More »

Expand

Fig 4.

Extracts from the datasets that we have used in our work.

(a) aggregated age group distribution for subzones in one of the planning areas in our focus (in our work, we estimate the proportion of the senior group depicted in blue); (b) population counts of subzones are disparate, presenting a suitably diverse dataset for estimations. Source of the datasets: Singapore Department of Statistics and Housing and Development Board (data.gov.sg).

More »

Expand

Fig 5.

Observed vs predicted and predicted vs predicted (models) scatterplots for population count, average age, and elderly proportion.

LM and SVM tend to produce very similar predictions (population counts and average age), while RF and LM reveal differences in particular for lower and higher values (elderly proportion).

More »

Expand

Table 2.

Overview of the performance of the different combinations of the developed regression models to estimate population counts and age.

More »

Expand

Table 3.

An overview of the predictors and their variable importance from none (o) to high (***).

More »

Expand