Social world knowledge: Modeling and applications

doi:10.1371/journal.pone.0283700

Fig 1.

A summary of our social context modeling approach: (a) Given a graph of sampled users and the accounts that they follow, we identify accounts of high in-degree, assuming that they represent entities of general interest; the figure illustrates a popular user as a blue figure, and unpopular user–in grey. (b–c) We consider the sets of accounts followed by each sampled user (in red) to be socially related. (d) We focus on entity co-occurrences within these sets, where we discard the sampled users identity, and avoid the modeling of unpopular accounts.

More »

Expand

Fig 2.

Account popularity based on our sample of 1.3 million Twitter users and the accounts that they follow: Number of accounts vs. number of their followers (log-log scale, where the number of accounts is illustrated by point size).

There are 90 million unique accounts with at least one follower in the sampled data. A long tail of accounts are followed by up to 100 users (top-left part of the plot). We learn the embeddings of the most popular ∼200K accounts, which have 350 followers or more.

More »

Expand

Table 1.

Statistics regarding the number of accounts followed by individual users in our sampled network, considering: All accounts, popular accounts which we consider as entities, and entity accounts for which a respective entry in Wikipedia has been found.

More »

Expand

Table 2.

The most similar entities to exemplary query entities, as ranked using cosine similarity by the different entity embedding methods.

More »

Expand

Table 3.

Results of assessing the political bias of news sources.

The table reports Spearman’s correlation of conservative-to-liberal ranking of selected news accounts generated based on different entity embeddings, compared with poll-based rankings reported by Pew Research in 2014 and 2020. The number of relevant account embeddings is given in parenthesis for each method.

More »

Expand

Fig 3.

Ranking of political polarity based on our embeddings.

More »

Expand

Table 4.

Results of assessing the political leaning of news sources as binary polarity.

Prediction accuracy is reported for all the accounts available per method (‘all’), and for the news accounts that have embedding in all methods (‘common’).

More »

Expand

Table 5.

Personal trait prediction: Dataset statistics.

More »

Expand

Table 6.

Personal trait prediction: Statistics of the number of popular account embeddings that are associated with each user in the dataset using the different entity embedding methods, and the proportions of users in the dataset that have a limited number of embeddings (less than 5 or 10) associated with them using each method.

More »

Expand

Table 7.

Personal trait prediction results [ROC AUC].

*The results by Volkva et al. were obtained using a earlier version of the dataset which was substantially larger, and different tweets, and are therefore not directly comparable.

More »

Expand

Table 8.

The top Twitter accounts that are characteristic to different subpopulations as measured using our datasets labeled with personal attributes and the Pointwise Mutual Information (PMI) measure.

More »

Expand