Table 1.
Users’ gender count are listed individually under various features to provide a comprehensive insight into the dataset. For example, under feature Languages, users are categorized in three languages that is Estonian, Russian, and English. The number of males and females under each category is also listed.
Fig 1.
Calls density with probabilities based on age.
X-axis represents the user’s age and y-axis represents the call density for Overall, Female and Male users in CDR data. Using the cumulative density function for the distribution, we map the tail probability directly into colors. For example, the 25th, 50th and 75th quantile for overall users is 44, 52 and 61 respectively. Similarly, these quantiles for female users are 44, 52, 62; and for male users are 44, 51, 60.
Fig 2.
Users network formed using CDR data.
A representative of the original network using snowball sampling. In Fig (a), users are color-coded based on gender. Color coding is as follows: red node are male users and green nodes are female users. The nodes with higher PageRank value are shown in relatively bigger size than others. Fig (b) shows the males-only network (with modularity = 0.803). Color-coded group represents communities. Similarly, Fig (c) shows the female-only network (with modularity = 0.913). Here, also color-coded group represents communities.
Table 2.
Network statistics of users.
Table 3.
Statistics comparison of male and female network.
Fig 3.
The difference between PageRank gender representation ratio and actual population gender distribution ratio among various counties.
The difference less than zero (in green) indicates that females are major source of information in that region and their number is higher compared to their population. Similarly, the difference greater than zero (in red) implies that in that region, higher number of males are the primary source of information compared to their population.
Table 4.
Comparison of population gender distribution and PageRank centrality gender distribution (among top 100).
Difference is calculated by subtracting population ratio from PageRank ratio. The negative value of difference indicate that the female representation is higher in that county. Similarly, a positive difference value indicate that male representation is higher in that county.
Fig 4.
Median of the calls for various age-groups based on gender.
Median of the calls for female’s age-groups (14,24], (24,54], (54,64] and (64,100] are 28, 27, 19 and 12 respectively. Similarly, median of the calls for male’s age-groups are 25, 27, 19 and 11 respectively.
Fig 5.
Coleman’s homophily index (HI) for various age-group.
HI for female’s age-groups (14,24], (24,54], (54,64] and (64,100] are -0.09, 0.09, -0.03 and -0.1 respectively. Similarly, HI for male’s age-groups are -0.68, 0.04, -0.08 and -0.11 respectively.
Fig 6.
Median of the calls for various language speaking population based on gender.
Median of the calls for female’s for languages English, Estonian and Russian are 26, 24 and 25 respectively. Similarly, median of the calls for male’s for languages are 21.5, 22 and 27 respectively.
Fig 7.
Coleman’s homophily index (HI) for various languages.
HI for male’s languages English, Estonian and Russian are -0.27, 0.14 and -0.04 respectively. Similarly, HI for female’s age-groups are 0.02, 0.1 and 0.14 respectively.
Fig 8.
Calls density for counties in Estonia based on gender.
Median of the calls for male’s in various counties (starting from bottom (Harju) to top (Võru)) are 7, 2, 4, 2, 2, 2, 5, 2, 2, 2, 3, 3, 2, 2 and 3 respectively. Similarly, median of the calls for female’s in various counties are 7, 4, 7, 2, 2, 2, 3, 3, 2, 2, 5, 4, 2, 3 and 4 respectively.
Fig 9.
Coleman’s homophily index (HI) for various counties.
HI for male’s in various counties (starting from bottom (Harju) to top (Võru)) are -0.27, 0.14 and -0.04 respectively. Similarly, HI for female’s age-groups are 0.02, 0.1 and 0.14 respectively.
Table 5.
Coleman’s homophily index (HI) for Case Study of prime working age population.
Table 6.
The representation of various gender and age-group in the CDR data in comparison to the census data.
Table 7.
The representation of various gender and language in the CDR data in comparison to the census data.
Fig 10.
Comparison of gender, age-group, and county relation using (a) CDR records, and (b) census data from Statistics Estonia.
In both figures (a) and (b), leftmost bars represent gender, middle bars represent age-group, and rightmost bars show the counties of Estonia.
Fig 11.
Comparison of gender, language and county relation using (a) CDR records, and (b) census data from Statistics Estonia.
In both figures (a) and (b), leftmost bars represent gender, middle bars represent language, and rightmost bars show the counties of Estonia.
Fig 12.
Comparison of gender, age-group, language, and county relation using CDR data.
Here, leftmost bars represent gender, second leftmost bars represent age-group, second rightmost bars shows language, and rightmost bars represent the counties of Estonia.