Measuring relative opinion from location-based social media: A case study of the 2016 U.S. presidential election

Social media has become an emerging alternative to opinion polls for public opinion collection, while it is still posing many challenges as a passive data source, such as structurelessness, quantifiability, and representativeness. Social media data with geotags provide new opportunities to unveil the geographic locations of users expressing their opinions. This paper aims to answer two questions: 1) whether quantifiable measurement of public opinion can be obtained from social media and 2) whether it can produce better or complementary measures compared to opinion polls. This research proposes a novel approach to measure the relative opinion of Twitter users towards public issues in order to accommodate more complex opinion structures and take advantage of the geography pertaining to the public issues. To ensure that this new measure is technically feasible, a modeling framework is developed including building a training dataset by adopting a state-of-the-art approach and devising a new deep learning method called Opinion-Oriented Word Embedding. With a case study of the tweets selected for the 2016 U.S. presidential election, we demonstrate the predictive superiority of our relative opinion approach and we show how it can aid visual analytics and support opinion predictions. Although the relative opinion measure is proved to be more robust compared to polling, our study also suggests that the former can advantageously complement the later in opinion prediction.


Introduction
Measuring and monitoring public opinion trends from social media has emerged as a potential alternative to opinion polls due to its voluntary nature and penetration to a large number of people [1]. Almost all social media platforms (e.g., Twitter and Facebook) allow users to tag their locations on the posted messages, dubbed Location-based social media (LBSM). Bringing a geographic perspective allows the study of opinion variation across geographic entities (e.g., states) that frame public events, e.g., political elections.
However, as an organic and passive data source, social media data pose several analytical challenges such as how to identify the target information from the unstructured and unprompted data, how to quantify the highly qualitative textual messages, and how to ensure the data can be representative of the broader electorate. Hence, two fundamental concerns need to be grappled with: 1) whether quantifiable measurement of public opinion can be garnered reliably from social media and 2) whether it can produce better or complementary measures compared to opinion polls [2,3].
Also, the practice of opinion polling has intrinsic limitations. All polls measure opinion in an absolute sense, where opinions are classified as one of several predefined and mutually exclusive categories, such as candidates A, B, and C. This way is conducive to overlooking complex opinion structures that would be embedded in an opinion space comprising every category as its own dimension. With each dimension featuring a gradient of preference level for one category (e.g., a range from anti-A to pro-A), some very complex opinion position could be triangulated from these dimensions, such as not very anti-A, somewhat pro-B, but no preference to C.
This research aims to advance the measurement of public opinion captured from Twitter posts, by addressing several of the points raised above. First, a relative opinion measure is proposed, based on a concept of relational space framed by the modalities of functional relationships between entities (e.g., individual persons or geographic areas). It enables the construction of a multi-dimensional and continuous representation of opinion space to 1) account for complex opinion structures arising from discrete extremes/categories (e.g. swing states in U.S. presidential election) and 2) encompass sufficient dimensions that individually characterize the opinion space from a certain aspect. Second, relative opinion positions of Twitter users are learned from textual tweets and represented as points in the multidimensional opinion space. A novel deep learning model known as opinion-oriented word embedding is devised to learn vector representations of words from a corpus of textual posts whose opinion indication is clearly captured by a set of selected hash tags. Third, the power of the relative opinion measure is twofold: 1) creating a spatial visualization of the opinion space where users' opinion positions can be aggregated to any level of geography based on their location information; 2) supporting opinion predictions at an aggregate geographic level consistent with the target public event (e.g., the state level for presidential elections) via a linear neighborhood propagation method that combines the relative opinion measure and the opinion polls.
The rest of this paper is structured as follows. Next section reviews the measurement of public opinion with regard to methodological approaches and data sources. The conceptualization and construction of relative opinion measure is then advanced with a general modeling framework. It is followed by a detailed description and explanation of data collection and methods supporting the framework. With an application to tweets during the 2016 U.S. presidential election, it demonstrates the spatial visualization of the relative opinion space for visual analytics compared with opinion polls; the following section extends this work to opinion predictions. Finally, conclusions are drawn on the scientific merit of the relative opinion measure and future work is discussed.

Literature review
Social media data as an alternative source for opinion measurement Public opinion consists of people's beliefs, attitudes and desires on public issues or problems.
For governments, political leaders, and policy makers, discerning public opinion is crucial to inform administration, election campaign, and policy-making [4]. Data are traditionally collected by survey or opinion poll. These techniques involve a structured questionnaire, a defined population from which individuals are sampled, and a method of aggregation of individual responses to infer a quantity of interest (as a measure of opinion) for the population, as core components [2,5].
With the proliferation of web and mobile technologies, social media platforms, such as Twitter and Facebook, have permeated large segments of population worldwide; they let people express their thoughts, feelings, attitudes and opinions, which can be shared and accessed publicly [4]. This presents not only a ubiquitous means to convey individual opinion but also an unprecedented alternative source for public opinion collection. In contrast to surveys, this new form of public opinion data is characterized by unstructured and unprompted opinion expression. In essence, it belongs to a type of organic or passive data that users voluntarily post on social media. That said, social media data can prevent the prompting or framing effects that may exist in surveys when respondents and their responses are oriented and affected by how the questionnaire designers select and frame the topics/issues [2,[5][6][7].
Moreover, social media data have unparalleled advantages in temporal and geographic coverages at very fine granularity. Users across countries and from different geographic regions may post on a daily or even hourly basis. Indeed, it is likely to capture people's instantaneous and spontaneous responses to public events and issues and their changes over time, which is impossible for survey data because of the cost and practicality [2,8]. Almost all social media platforms allow users to tag their locations on the posted messages. At the finest level, the exact location such as a pair of geographic coordinates can be reported, although larger geographic regions are more common such as towns, cities or states.
Geographic variation of public opinions plays a critical role in many situations such as electoral-area-based elections (e.g., congressional district and state). Due to the cost of opinion polls, social media data have been proposed to interpolate state-level polls for U.S. presidential elections [9]. The timeliness and geographic reach of social media data reinforce their appeal as opinion polls are facing growing hurdles in reaching and persuading reluctant respondents [2].

Challenges of LBSM data for opinion measurement
Measuring public opinion with LBSM data is still confronting several main challenges. First, given the unstructured and unprompted nature of social media data, how to determine the topics and relevant posts from a huge pool of social media data is a great challenge. It has been argued that simple ad hoc search criteria, such as the mention of candidate names for an election, may cause systematic selection bias. As a result, it may miss those relevant messages without mentioning candidate names or add noise to the data when candidate names happen to be confused with other names [2,10]. Therefore, the selection criteria need to be thought out and the potential selection bias needs to be assessed with the interpretation of results. The difficulty of identifying topics also lies in that topics are changing, related to one another or split into sub-topics as discourse is carried on over time [2]. Efforts have been made to discover related topics or sub-topics either explicitly by using a combination of topic models and sentiment analysis [11] or implicitly by constructing a network of co-occurrent hashtags referring to related topics [3].
Second, quantification of opinion from qualitative and unstructured textual data from social media is not only a technical challenge but also a theoretical one. Simple metrics based on counting tweets or mentions related to certain topics/issues or candidates/parties, though widely applied, have been criticized for their low performances compared to real outcomes or opinion polls [12][13][14][15][16]. As an enhancement to simple counting methods, lexicon-based sentiment analysis has been used in numerous studies to extract positive or negative sentiments from textual messages on certain topics [2,13,15,17]. However, both types of methods are in fact measuring attentions or sentiments rather than opinions (e.g., attitude regarding an argument or preference for a candidate or a party) [3,15,18,19]. Moreover, the lexicon-based approach often exhibits unstable performance on the unstructured, informal, sometimes ironic or sarcastic, language of social media messages due to an ad-hoc dictionary of words with sentiment polarity it relies on [20].
Recent research has built more accurate measurements of opinion by taking a supervised learning approach with either manually created or automatically generated in-domain (instead of ad-hoc dictionary) training set that identifies exact opinion information, such as political support/opposing or agreement/disagreement [3,15,18,21]. Taking a 'bag-of-words' approach in natural language processing (NLP) [22], these studies all assume that every word of a message constitutes a piece of the opinion expressed by the message as a whole, whether this word is directly related to the topic or considered neutral. These stimulating attempts to transform qualitative textual data into quantitative measures of opinion are however very elementary due to the nature of bag-of-words representation (a vector of word appearance) taking no consideration of word order and hence no sematic information captured at the word level.
Recent advance in deep neural network learning of word representation (or word embedding) as dense, low-dimensional and real-valued vectors, has suggested superior performance compared to bag-of-words-based methods. The neural-network-inspired word embedding has been proved effective in a variety of NLP tasks including sentiment analysis [23][24][25][26][27].
Formally, word embedding maps words and phrases to mathematic vectors, creating quantitative representations in a multidimensional space that preserve syntactic, semantic and contextual relationships of words [28]. For instance, the well-known word2vec model [24] can achieve tasks like king -queen = man -woman or infer pairs of present tense-past tense.
However, there is limited research incorporating the semantic-preserving word embedding for opinion measurement (some studies using it for topic detection only, such as [29,30]).
Third, there is no formal process for defining a population frame and drawing samples from social media data as in a survey. The representativeness of social media data is questionable, although the decentralized nature of this data source and the diversity of its users may compensate for the potential bias, owing to the large size of social media data [2]. A number of studies has shown that social media users cannot be representative of the national populations in many aspects, such as their geographical distribution, age, gender, race, educational level, political ideology, and interests in topics [31][32][33][34][35][36]. For example, geotagged Twitter users in the U.S. are more likely to be younger, have higher income, live in urbanized areas, and be located in the east or west coastal areas [33]. It was also found that the majority of Twitter users are female, but they are not politically active [34]. Besides, measuring public opinion from social media depends on users who publicly express their opinions. However, these active participants who voluntarily offer opinions may have systematically different opinions on a topic from those who are explicitly asked or choose not to offer their opinions (e.g., shy Trump voter issue); hence the former users' opinions are over-represented in social media. The underlying uncertainties are really hard to control, without even mentioning the problems of bots, spammers and fake user accounts [37]. Further research has been called for to assess the extent of uncertainty involved and how plausible social media data can be used as a trustworthy source for opinion measurement.
The representativeness issue also plays a critical role in aggregating information from individual social media messages to a measure of public opinion. For example, a small group of users act as opinion leaders and dominate the discussion on social media in terms of the volume of tweets or retweets [38]. User level effect must be controlled for to avoid the overrepresentation of high-level participation. Furthermore, the geotagging of social media posts may enable the aggregation to a certain geography when electoral district-based opinion measurement is necessary. However, the resolution of the tagged geographic information varies significantly cross messages (e.g., tweets), with only a small proportion of them having exact locations [39]. Geotags are volunteered by users and hence selecting only geotagged tweets may introduce a selection bias, which again causes the representativeness problem [40].
This research responds to the second challenge and proposes a novel approach for opinion measurement. Supervised learning ensures the measurement of opinion rather than attention or sentiment. Thanks to semantic-preserving word embedding, it also ensures the capture of opinion information at the finest grain (i.e., word level). Thus, our measure is flexible enough for aggregation at a range of levels, such as message, user, and various levels of geographic granularity. Furthermore, this study partially addresses the third challenge by producing spatial representations of opinion measures at an aggregated level, which permits a straightforward assessment of representativeness of the social media data. In addition, to account for the topic selection issue posed by the first challenge, we employ the topic discovery and opinion identification methods proposed by [3] to build a training set for supervised learning. However, unlike [3], opinion prediction will be conducted at an aggregated level (e.g., electoral district) to better mitigate errors. Evidence has shown that opinion classification error at the individual level (e.g., user) remains high and can propagate with aggregation [21,41]. It is indeed our contention that individual opinion is never devoid of uncertainty and that the user holds full control of the content of a message after its initial post through changes, edits or even withdrawal.

Conceptualization and construction of relative opinion
As an organic source of data for public opinion extraction, social media is characterized by unstructured text, which contrasts with the designed 'question' and 'response' structure of traditional survey data (opinion polls). Opinions may be embedded or even hidden in this unstructured and free-form writing, which is naturally fuzzy, complex, and of high dimensionality [2]. In and of themselves, dimensions in social media text are implicit, hidden and most of the time undeterminable. Hence, the opinions extracted from such free-form discourse inherit these features, which must be properly handled when taking the measurement.
While public opinions revealed from survey data can also be multi-dimensional, the dimensions are encoded explicitly in specific questions. In this sense, the opinion measurement taken by survey data is static, deterministic, and certain and may be called absolute opinion. The conceptualization of the opinion measurement as a classification problem with predefined opinion categories, e.g. the support of candidates or parties [3,11] also permeates existing practices in statistical analysis of mentions and in sentiment analysis for social media data. This particular conceptualization rooted in the absolute opinion paradigm is ill-suited to capture the complexity of opinion structures and the continuity between opinion categories that may exist in the high-dimensionality free-form discourse on social media.
In response to the deficiencies of the absolute opinion approach for the measurement of opinions from social media posts, we hereafter propose a relative opinion conceptualization.
This approach is inspired from the relativist view on space in physics and geography, which posits that space is not merely a neutral container for natural and socioeconomic processes, but that it is in turn defined and (re-)constructed by the relations among things and events that take place and operate within it [42,43]. Specifically, relative opinion is to measure how dissimilar one's opinion is from that of others. Once the semantic relations for every pair of individual agents' (person, community or state) opinions are captured, they can serve to frame the construction of a relative opinion space. This space entails a multidimensional and continuous representation of opinions that can account for complex opinion structures (e.g. swing states in U.S. presidential election) and sufficient dimensionality of the opinion space.
To construct the relative opinion measure from Twitter data, we further propose a modeling framework (Fig 1). It is our fundamental contention that opinion information can be incorporated into the learning process of generic word embedding to preserves the opinion orientation of words and sentences towards certain topics, such as supports for candidates or parties. This framework comprises three components: 1) data collection and training data for opinion identification, 2) generation of opinion-oriented word embedding, and 3) aggregation of word-level opinion embedding to individual and higher level of geographic units (the state in the case study of this paper). The output is an aggregate relative opinion measure at the state level and will be used for subsequent visual analytics and predictive analysis in the following case study.

Methods and data Data collection and training data for opinion identification
We continuously collected tweets using the Twitter Streaming API (about 1% of Twitter posts) from September 1st, 2016 to November 8th, 2016. A total of 2.2 million tweets were collected in the English language from the United States with certain location tags, mentioning the two top presidential candidates from the Republican Party (Donald J. Trump) and the Democratic Party (Hillary Clinton) by using the same queries from [3] with the following keywords: trump OR realdonaldtrump OR donaldtrump and hillary OR clinton OR hillaryclinton. As in [3], we use the name of the Twitter client extracted from each raw tweet to filter out those automated tweets from bots. 90% of the collected tweets are retained originating from official clients. Following the procedure in [3], we utilize hashtags from tweets as the main source of opinion information to build a training set of labelled tweets that indicate clear opinion preferences. This procedure including five steps (Fig 1) is detailed in S1 Appendix. The output of the procedure is a set of labeled tweets in terms of six opinion categories: Pro-Clinton, Anti-Trump, Support-Clinton (with Pro-Clinton and Anti-Trump two most common labels), Pro-Trump, Anti-Clinton, and Support-Trump (with Pro-Trump and Anti-Clinton two most common labels). This training set contains 238,142 tweets.

Opinion-oriented word embedding
We develop our opinion-oriented word embedding (OOWE) as an extension of the sentiment-specific word embedding (SSWE) method [25]. SSWE evolves generic word embedding to incorporate sentiment information (e.g., positive/negative emoticons) from tweets into semantics-preserving word embedding. OOWE distinguishes itself from SSWE in two respects (Fig 2): 1) supervised learning with opinion preference rather than sentiment information; 2) it can accommodate any number of opinion categories rather than two (positive-negative) for sentiments. Specifically, we modify the upper linear layer to include two separate components, opinion and semantic, which capture the opinion preference and semantic context of words, respectively. Given opinion categories, the output layer outputs a + 1 dimensional vector in which one scalar % stands for the language model score and & ' ( = 1, … , ) scalars stand for opinion scores for all categories. The loss function is specified as a linear combination of two hinge losses: where and 4 are original and corrupted ngram inputs, respectively, is a weighting parameter, & @ is the opinion score for the Positive opinion category while & ' ( ≠ ) is the opinion scores for other Negative opinion categories.

Fig 2. Neural network structure for opinion-oriented word embedding algorithm
We tokenized each tweet with TwitterNLP [44], remove the @user and URLs of each tweet, and hashtags in the set of labeled hashtags. We train OOWE by taking the derivative of the loss through back-propagation with respect to the whole set of parameters [28], and use AdaGrad [45] to update the parameters. We empirically set the window size as 3, the embedding length as 50, the length of hidden layer as 20, and the learning rate of AdaGrad as 0.1. After training, the outputs are represented by 50-dimensional numeric vectors whose relative positions and distances between each other reflect their relative opinions towards topics such as supports of candidates in elections.

Aggregation of relative opinion measure
Since we measure relative opinion at the word level, it is possible to aggregate it to any higher level, such as the tweet, the user, and the state. A straightforward way to aggregate the embedding representation of words to a document is to take their centroids (averages), which has been a common approach in creating document-level embedding [46]. Relative opinion measures at the user and state levels can be obtained similarly. As suggested in a recent review of ways to measure public opinion with social media data by [2], different levels of participation of users in social media, as reflected by the varying number of tweets posted by different users, should be controlled for at the user level. Taking advantages of location tags in social media will allow aggregation of opinion measures by administrative and geographic areas, which may be very useful in electoral studies. The limitation of the representativeness of geotagged tweets is well recognized. We assume that users would be willing to reveal coarser (e.g., state or country) rather than finer (e.g., coordinates or city) location information. We observe that state level location can be inferred for around 90% of the tweets we collected from one of three pieces of information: Tweet location field, mentioned location in tweet text, and user profile location field. well represented in the data. A small variation of opinion is also found for ME in Fig 5. However, as ME's population is relatively underrepresented in the data (a small circle in Fig   6), the small variation of opinion for ME shown in Fig 5 may  Results show a large variation of opinion for users in Texas (Fig 5) with a reasonable population representation (Fig 6). In Texas, Clinton in 2016 received three percent more of the votes in 2016 than Obama did in 2012. Texas cities, such as Houston, have been experiencing growth in relatively liberal urban professionals and Hispanic and other immigrants, which makes urban areas swing strongly democratic [48]. This could explain the large variation of opinion in Texas instead of a previous intuition that Texas has always been a "deep-red" state. AL, MS, and LA on the far extreme of the red side also show large variation (Fig 5), which means that opinion could differs largely across individuals even in "deep-red" states as republicans could come with all different flavors. However, the former case of Texas close to the dividing boundary is much more critical than the later ones for the election outcome. It is noted that Kansas is placed on the dividing boundary and carries a large opinion variation with a relatively bad representativeness. At first glance, this may be a misplacement due to the data issue, as Kansas has always supported the Republican candidates in the past four presidential elections (a deep-red state). By looking into countylevel election results (Table 1), we see that the most populous county, Johnson County, had quite close supports for both candidates. This is also true for the total support of the five most populous counties. It could be an explanation for the large variation of opinion given the commonly accepted assumption that people in more urbanized area have a larger chance to tweet [40].  [50]). This fact, together with the so-called "Shy Trump effect"

Visual analytics of state-level relative opinion measure
(Trump supporter were unwilling to reveal their true preference because their support was socially undesirable), may also lead to the underrepresentation of opinion among Trump supporters in social media. In sum, the best practice is to examine the representativeness of opinion measures and the opinion variation together (combining Figs 5 and 6) in order to evaluate the usefulness of the constructed measure and to unveil the underlying opinion patterns.

Prediction with state-level relative opinion measure Linear neighborhood propagation
Relative opinion embedding can be used to construct a graph, where each data point (state) is a vertex with a label indicating one of the opinion categories (absolute opinion) and an edge exists between a pair of data points based on a distance metric criterion. Then, predicting the unknown opinion labels (binary opinion in this case) for the entire graph with only a few data points labeled can be formulated as a semi-supervised label propagation problem on the constructed graph of relative opinion [51,52]. In the case of the US presidential election, a few "deep-red" and "deep-blue" states whose voters predominantly choose either the Republican (red) or the Democratic (blue) candidate are usually easy to identify through polling or historical voting, which makes the above prediction problem feasible.
We adopt here a well-established method named Linear Neighborhood Propagation (LNP; [51,52]). As a semi-supervised learning approach, LNP assumes both local and global proximity: 1) points in local neighborhood are likely to share the same label; 2) points on the same structure (such as a cluster or a submanifold) are likely to share the same label [53].
Inspired by some of the nonlinear dimensionality reduction methods constructing a low dimensional representation of high dimensional data with the local structure of the data preserved, such as Locally Linear Embedding (LLE, [54]), LNP further assumes the data points are sampled from an underlying manifold and each data point and its label can be linearly reconstructed from its neighbors. That is, there exists an adjacency weight matrix that minimizes the reconstruction error for data points: where is the number of neighbors for each data point.
Because of the local linearity assumption, the manifold can be arbitrarily well-approximated by a sufficiently small neighborhood (a linear subspace or the tangent space) surrounding any data point, which would ideally shrink to zero. The to be determined essentially characterizes the linear neighborhood for every data point by specifying the contribution of each neighbor. Because captures the intrinsic local structures of the manifold, the weights are invariant to linear transformations of the high-dimensionality manifold into a lowdimensionality representation preserving the intrinsic structures of the original manifold. This is the fundamental rationale behind LLE. Along a similar rationale, the label of each data point can be reconstructed by a linear combination of its neighbors' labels: The algorithm applied in this study (Algorithm 1) is an extended version of the original algorithm proposed in [51]. Research has shown that the Geodesic distance is superior to the default setting of Euclidean distance in LLE, and can eliminate the "short circuit" problem and lead to a more faithful representation of the global structure of the underlying manifold [55]. The Geodesic distance is approximated as the length of shortest path between a pair of data points in a weighted graph ]^& , which can be computed as in [56]. Following [55], ]^& is constructed by connecting each data point with a set of neighboring data points based on a typical dissimilarity measure, e.g., the Euclidean distance, as the edge weight. To determine the neighboring data points, a global P is chosen as the minimal value such that all the pairwise geodesic distances are finite [55]. In Step (

Output:
The labels for all data points. Procedure: (1) Compute nearest neighbors for each data point in based on a defined distance metric (Euclidean or Geodesic distances); if Euclidean distance is used, skip (1.5), otherwise perform (1.5).
(1.5) Run MDS algorithm on pairwise Geodesic distances to reconstruct ′ ∈ ℝ as an unfolding of and set = ′.
(2) Construct the -nearest-neighbor graph . (3) Compute the adjacency weight matrix that best reconstruct each data points in from its nearest neighbors by minimizing Equation 2 with the constraints.

Comparison of predictions with Euclidean and Geodesic distances
Given the embedding of data points and the initial labels, the only parameter for Algorithm 1 is the number of nearest neighbors . The influence of on the quality of embedding generated from LLE and its variants has been studied [54,57]. General criteria must be considered for the range of . First, the dimensionality of the output embedding should be strictly less than ; second, a large will violate the assumption of local linearity in the neighborhood for curved data sets and lead to the loss of nonlinearity for the mapping. In the following experiments aimed at comparing Algorithm 1 implemented with Euclidean and Geodesic distances, respectively, sensitivity analysis is conducted on for the range of  versus the Geodesic distance, for varying numbers of initial labels. Due to the random assignment of the initial labels, 50 runs for each setting are conducted to examine the stability of performance via a 95% confidence interval (the region surrounding the median). Across figs 9-12, similar patterns are shown for each distance type. For the Euclidean distance (EUC), the prediction error first decreases as increases, then reaches its lowest value around =8 before it resumes increasing for larger s. After =8, the general trend of the prediction error is increasing, though it decreases slightly after =20. On the other hand, the prediction error for the Geodesic distance (GEO) first decreases until it arrives at a local minimum around =5; after that it bounces back to a local peak around =8. For =9 and onward, the prediction error becomes smaller, which presents a general decreasing trend for the entire curve. It is noted that although predictions with EUC seem always better than those with GEO locally around =8, the later shows a superior performance globally over a longer range of = [13,25]. The superiority achieved while using the Geodesic distance should be attributed to Step (1.5) in Algorithm 1 where the linearity is enforced by MDS in the reconstructed geometry of embedding.
The comparison between figs 9-12, the results with different numbers of initial labels, indicates that, as the number of initial labels increases, the prediction errors for both EUC and GEO generally shift lower and the confidence interval width becomes narrower. These are especially prominent for the range of = [13,25], and mean that predictions with more prior information (larger number of initial labels) will lead to consistently better and more stable performance. As we have demonstrated that predictions with GEO consistently outperform those with EUC and achieve global optima over the range of ∈ [13,25] under different parameter settings, we wonder whether there will always exist an optimal or a range where some optimal s reside and whether it is possible to identify them before running predictions.   The region surrounding each median is a 95% confidence interval calculated from 50 runs of prediction.

Fig 10. Comparison of the median prediction errors with the Euclidean distance (EUC) and those with the Geodesic distance (GEO) using four initial labels for each category.
The region surrounding each median is a 95% confidence interval calculated from 50 runs of prediction.

Fig 11. Comparison of the median prediction errors with the Euclidean distance (EUC)
and those with the Geodesic distance (GEO) using six initial labels for each category.
The region surrounding each median is a 95% confidence interval calculated from 50 runs of prediction.

Fig 12. Comparison of the median prediction errors with the Euclidean distance (EUC) and those with the Geodesic distance (GEO) using eight initial labels for each category.
The region surrounding each median is a 95% confidence interval calculated from 50 runs of prediction.

Predictions with the optimal neighborhood sizes
As one of the key parameters in Algorithm 1, the neighborhood size dramatically affects the quality of prediction as demonstrated by Figs 9-12. To obtain the optimal results, the selection of optimal becomes a key issue. However, the two measures used in the previous section only enable the comparison between two sets of embedding based on either Euclidean or Geodesic distances, but they do not suit the comparison across the values of .
As ( , ) is a function of , an automatic technique, called Preservation Neighborhood Error (PNE), was developed for choosing the optimal by evaluating the quality of across a range of [57]. This technique minimizes a cost function that considers both the local and global geometry preservation (Equation 6).  (6) where ‡ is the set of nearest neighbors found in the original space; ˆ is the set of nearest neighbors found in the low-dimensionality embedding space. t , u and ( , ) are the pairwise distances in the original and in the embedding space, respectively. In Equation 6, the first item is the error of misses indicating the preservation of local neighborhood, while the second item refers to false positives that reflect the loss of global geometry of the manifold [57].
From the median PNE measure shown in Fig 13 for using the Geodesic distance over a range of , PNE values become evidently lower after =10 than those before that. It indicates that there may exist some optimal s, especially around =18 and =24, for predictions within the range = [11,25]. This range indeed includes the range [13,25] where the predictions have achieved superior performance in Figs 9-12, although =18, where the minimum of PNE is, does not necessarily correspond to the optimal for the best prediction. Fig 13 also shows a generally high stability (small confidence intervals) of the PNE measure over the range [13,25]. It has been demonstrated that PNE is able to indicate a rough range where the optimal neighborhood size may reside rather than a specific optimal k, which can significantly reduce the amount of prediction runs in model selection for better quality of predictions.

Comparison of predictions with polling
As polling is still the mainstream method to obtain public opinion, we compare the performance of the prediction enabled by the relative opinion measure against election polls and actual election votes. The polling data are from pre-election wave of the 2016 Cooperative Congressional Election Survey conducted statewide from October 4th to November 6th [58,59]. The survey results for Clinton (blue) and Trump (red) are plotted with the relative opinion measure (Fig 14). Two prediction models based on the relative opinion measure are demonstrated here with 8 and 12 predetermined initial labels, respectively (Table 2). Given the suggestion of the range for optimal neighborhood size , =18 is chosen for both predictions with Geodesic distance.
To verify =18 is indeed optimal for predictions, iterations of prediction runs for in the range [2,25] are performed and presented in Fig 15. It shows that at =18 the prediction errors for the models with 8 labels and with 12 labels are 2 and 0, respectively. Figs 16 and 17 plot the prediction results for every state with the relative opinion measure for the two models, respectively. The model with 8 labels produces two errors, namely WI and KS, which are along the opinion dividing boundary. It shows that the relative opinion measure enables predictions of opinions with a high level of performance even with very common prior knowledge of the opinions for a small number of states, as the 8 labels in Table 2 are either deep-red or deep-blue states. The cluster of errors for IA, NC, FL, and OH created in polling have been eliminated, because in the relative opinion space these states are closer to the Support-Trump opinion extreme where most deep-red states are located. It is thanks to the nature of the relative opinion measure triangulating every state's opinion position based on its relationship with every other state's position that reduction of uncertainty is achieved and that a more robust measurement of opinions than polling produced.
For the model with 12 labels, when labels are given for the states of DE, CT, KS, and WI, the prediction produces zero error, which emphasizes the criticality of the prior knowledge about the opinions of states located along and close to the opinion dividing boundary. For these states pointing out the target for more accurate polling, if obtained, combining this prior knowledge with the relative opinion measure would lead to a high quality of predictions that is beyond reach by either of the two alone. In other words, opinion poll can complement the relative opinion measure by providing prior knowledge for initial labels. Table 2 Settings for initial labels  Prediction errors for the two models with 8 and 12 initial labels, respectively, using settings in Table 2 and the Geodesic distance.

Conclusions and future work
This study proposed to measure relative opinion from LBSM data in response to the challenging of leveraging the rich and unstructured discourse on social media as an alternative source to opinion polls for public opinion measurement (the first question in the introduction). The advantages of the relative opinion measure lie in its theoretical grounding and methodological suitability to LBSM data. The relative opinion conceptualization theoretically compensates the deficiency of the absolute opinion measure in representing complex opinion structures. On the other hand, the pairwise relationship of opinions characterized by this measure naturally suits the embedding representation of opinion positions that can be learned from high-dimensionality textual messages with supervision. To make this quantification technically feasible, a modeling framework was proposed, including building a training dataset by adopting a state-of-the-art approach and developing a supervised learning method, the opinion-oriented word embedding.
To demonstrate the validity of the relative opinion measure, spatial visualizations of relative opinion space were constructed to aid visual analytics. As an exploratory analysis approach, it facilitates the examination of uncertainty and representativeness of the measure, the discovery of opinion patterns across geographies, and the correspondence between relative opinion positions with other variables such as opinion polls and real election outcomes, which might lead to the formation of new hypotheses on electoral behavior. Furthermore, the relative opinion measure supports practical opinion predictions at aggregated geographic levels, transforming a continuous representation into a discrete one that is comparable to opinion polls and strongly validated by election outcomes. This is enabled by a linear neighborhood propagation method that incorporates the intrinsic geometry of the relative opinion space, optimal neighborhood sizes, and the prior knowledge of opinion preferences for a small number of entities.
In the case study of the 2016 U.S. presidential election, it is demonstrated that the relative opinion measure constructed on Twitter data is more robust than polling data, thanks to its theoretical grounding and the various analytical techniques that exploit the intrinsic properties of LBSM data. However, given their differences in concept, data collection, and methodology, the relative opinion measure cannot and should not replace polling. Instead, the two type of measures and their associated data are complementary in opinion measurement, which has been demonstrated to be feasible and promising by our prediction approach. This is an answer to the second research question presented in the introduction.
Admittedly, as the present work is an initial study of the relative opinion measure, further investigation is needed. There are several directions for future studies. First, the results reported in this study are based on the 1% sample of tweets for the study period. If this sampling rate can increase to better represent the population, we should be able to examine the sensitivity of our measure to the variation of sample size. Second, with tweets extracted longitudinally, the temporal dynamics of this measure could be investigated to support opinion predictions over time. Third, social network data from Twitter could be utilized and incorporated into the relative opinion measure for better opinion measurement.