Twitter use in scientific communication revealed by visualization of information spreading by influencers within half a year after the Fukushima Daiichi nuclear power plant accident

Scientific communication through social media, particularly Twitter has been gaining importance in recent years. As such, it is critical to understand how information is transmitted and dispersed through outlets such as Twitter, particularly in emergency situations where there is an urgent need to relay scientific information. The purpose of this study is to examine how original tweets and retweets on Twitter were used to diffuse radiation related information after the Fukushima Daiichi nuclear power plant accident. Out of the Twitter database, we purchased all tweets (including replies) and retweets related to Fukushima Daiichi nuclear power plant accident and or radiation sent from March 2nd, 2011 to September 15th, 2011. This time frame represents the first six months after the East Japan earthquake, which occurred on March 11th, 2011. Using the obtained data, we examined the number of tweets and retweets and found that only a small number of Twitter users were the source of the original posts that were retweeted during the study period. We have termed these specific accounts as “influencers”. We identified the top 100 influencers and classified the contents of their tweets into 3 groups by analyzing the document vectors of the text. Then, we examined the number of retweets for each of the 3 groups of influencers, and created a retweet network diagram to assess how the contents of their tweets were being spread. The keyword “radiation” was mentioned in over 24 million tweets and retweets during the study period. Retweets accounted for roughly half (49.7%) of this number, and the top 2% of Twitter accounts defined as “influencers” were the source of the original posts that accounted for 80.3% of the total retweets. The majority of the top 100 influencers had individual Twitter accounts bearing real names. While retweets were intensively diffused within a fixed population, especially within the same groups with similar document vectors, a group of influencers accounted for the majority of retweets one month after the disaster, and the share of each group did not change even after proven scientific information became more available.


Introduction
The term scientific communication is defined as communicating scientific information to non-experts in the general public [1]. In this age of information overload, appropriate scientific communication is critical to raise the scientific literacy of the public, to implement policies based on evidence, and to improve the well-being of citizens [2].
It was common in the past to provide the general public with one-directional information through classical mass media outlets such as newspapers, televisions, and radios [3]. Recently however, social media platforms, such as Twitter and Facebook, have been playing increasingly important roles as media through which to disseminate and receive scientific information [4,5]. In fact, it is estimated that approximately 60% of the general public rely on social media as a source for scientific information [6]. Social media platforms enable real-time communication with rapid propagation over a wide demography [7]. In regard to scientific communication however, there are several drawbacks to using social media. In particular, there is concern about the spread of scientifically inappropriate or inaccurate information through erroneous rumors or hoaxes during times of natural or other disasters [8,9]. For example, there were prior incidences of inappropriate scientific information regarding vaccine efficacy and cancer treatments being disseminated through social media [10]. Therefore, to use social media effectively for scientific communication, it is important to identify how to make good use of the properties of these media in the future.
Twitter is a social media platform where registered users can create posts containing up to 280 characters and attach images. At the time of this study however, the limit in place was 140 characters, and even now Japanese tweets fall outside the scope of this deregulation. Twitter users can follow each other freely and spread information more broadly compared to Facebook. [11]. On Twitter, the relationship between users who are followers and those who are being followed forms a social network, and retweeting or replying to another user's tweet is the way to distribute and propagate information. Retweeting is the act of spreading information to one's followers by quoting verbatim the tweet of other users [12]. In this way Twitter can be used to diffuse scientific information, and its role has been increasing in recent years due to its high real-time capabilities and ease of exchanging information with other users.
The advantage of Twitter is that it allows direct communication between people who are too far away socially as well as physically in everyday life. Especially, at the time of a social phenomenon, such as a disaster that attracts public attention, related tweets rapidly increase [13]. As such, Twitter is regarded as a very useful social media tool to obtain necessary information, spread information, and ask for help in case of a disaster [14,15]. Despite Twitter being a platform that plays a key in the exchange of current information, there are limited reports that focus on how scientific information diffuses, and how Twitter is useful for scientific communication within the first few months after a disaster.
The Great East Japan Earthquake and the subsequent Fukushima Daiichi nuclear accident, which occurred on March 11 th , 2011, resulted in radioactive contamination and radiation exposure to the public [16]. Residents in the surrounding areas had long-term radiation exposure and the fear of the contamination spreading resulted in social unrest across Japan [17]. In response to this situation, one-directional and conventional scientific communication regarding radiation was released from various government and private sources [18]. However, in addition to a wide range of perceived problems (such as health, societal, and lifestyle) caused by radiation pollution, there were some stakeholders making scientifically erroneous assertions particularly about low-dose radiation and its health effects. Many conflicting opinions circulated and caused confusion among the general public [19]. As a result, residents were at a loss as to whom to believe, and the public's trust in science itself was lost [20].
Social media, in particular Twitter, was actively used for both direct communication and for transmission and exchange of scientific information at the time of the earthquake [21-24], especially in the affected areas. However, many reports on the subject have only described the phase immediately after the Fukushima nuclear power plant accident including evacuation and logistics [25,26], and there is insufficient information on how Twitter was used for scientific communication of radiation-related issues. Assessing how Twitter was used after the radiation accident is very useful in order to clarify how social media is used in the world of scientific communication.
In the present study, we used Twitter data up to six months after the accident, to examine how Twitter was used, especially as it relates to retweets, and to see the spreading of scientific information on radiation. This study will provide useful information for scientists to understand the background of distrust in science after the Fukushima Daiichi nuclear power plant accident, and to better understand the method of appropriate scientific information dissemination on social media platforms, such as Twitter should there be a future crisis.

Tweet and retweet data used
Tweets and retweets used in this research was purchased from NTT DATA Corporation. NTT DATA, an IT company, is a member of NTT (Nippon Telegraph and Telephone Corporation) Group, the largest telecommunications company in Japan. NTT DATA is Twitter's intermediary in Japan authorized to give customers paid access to tweets. For this research, we gave NTT DATA a list of Japanese words, phrases and expressions related to radiation and radioactivity resulting from the Fukushima Daiichi nuclear power plant (Table 1). Using this list of keywords, NTT DATA extracted and compiled the contents of tweets and retweets (including replies) written in Japanese that were sent on between March 2 nd , 2011 to September 15 th , 2011 (i.e. the first six months after the Great East Japan Earthquake). This data was purchased and used in our analysis.
All tweets extracted by NTT DATA that had at least one of the keywords and key phrases shown in Table 1 were included in the analysis. Keywords and phrases were chosen to analyze events and facts related to radiation and not the effects of the earthquake and tsunami; they also did not include emotional words that expressed fear, anxiety or anger about the radiation. All members of our research team agreed that the search terms shown in Table 1, was a highly accurate representation of the scope of the Fukushima nuclear power plant accident and radiation. Due to our allowed fiscal budget, this research was limited to 50 million tweets, as a result "nuclear power plant" and "Fukushima" and other terms that are related, but not critical to our research, were not used as independent search terms.
The number of tweets and retweets transmitted during the study period, and the transition of the retweet ratio among all tweets were examined. Retweets were counted by the number of accounts that posted the same tweet; for example, if 10 different accounts each retweeted one tweet posted by user-A, the number of retweets was counted as ten.

Definition of "influencers" and classification of their tweet contents
We found that the majority of the retweets were based on original posts sent out by a few hundred accounts. The top 2% of accounts were the source of original posts that received 80.3% of all retweets during the study period. Thus, the top 100 accounts that were retweeted frequently were identified and defined as "influencers".
To classify the contents of their tweets, we calculated tweet's document vectors for each influencer's account. The method is as follows: using the text that appeared in all the tweets used in this study and the article text of Japanese Wikipedia as corpus, each Japanese document was separated with spaces using Japanese morphological analysis engine MeCab [27]. For the dictionary, ipadic and mecab-ipadic-neologd were used [28]. In this way we derived 390,681,577 words from 12,219,497 tweets, and 397,785,864 words from 1,072,888 Wikipedia articles. The number of unique words combined was 171,644. Gensim version 2.3, a natural language processing library for Python programming was used to execute Doc2Vec [29]. The default parameter setting of genism was used for learning with 100 dimensions of the output vector. Python code can be found in the following URL (https://github.com/likr/twitter-analysis2018/tree/master/scripts).
K-means method was applied to the document vector to classify each influencer [30]. Of the top 100 influencer accounts examined during the study period, 99 accounts were still active as of June 2017. Five accounts of outliers that did not constitute the same clusters with other influencer accounts have been removed, so 94 accounts were used for the final clustering.
First, five clusters were identified in the k-means method based on the Elbow method [31]. These five clusters were then grouped according to the contents of their tweets regarding radiation and whether the clusters included media accounts or not. Finally, we classified the influencers into three groups, and examined the number and the ratio of retweets over the study period.

Visualization of radiation information spreading by influencers
In order to visualize the spread of radiation information by influencers, we built a retweet network centered on influencers. A retweet network is a weighted directed graph linking the relationship that account A has retweeted influencer X for n number of times. We visualized the center of the retweet network using only the top 20 influencers and accounts with more than 5 retweets. The Fast Multipole Multilevel Method [32] (FM3) implemented in the Open Graph Drawing Framework [33] was used to set the coordinate positions of nodes.

Protection of personal information accompanying tweet and account data use
The data of this study was received from Twitter, Inc. and are in accordance with the company's user agreement for the handling of personal information. Due to contractual agreements with NTT DATA, the purchased data used in this study cannot be shared. However, researchers can purchase this data through Twitter Inc., or its local intermediary, by specifying the same keywords and key phrases listed in Table 1. The extracted tweets and retweets will then be similar to the ones we purchased from NTT DATA. The Twitter accounts which were used in this study can be accessed individually by the public for free at www.twitter.com. All tweets and retweets generated from these accounts can also be viewed, although they will not be limited to those bearing our keywords listed in Table 1.

Overall trends of tweet and retweet
The total number of tweets and retweets that included the keywords listed in Table 1 during the period from March 2 nd to September 15 th , 2011 was 24,287,299. The number of accounts that sent out tweets or retweets at least once was 1,397,941. Since Japanese is written without space between words, if the Japanese words in Table 1 were a part of a sentence or ID, it was included in the study. On the other hand, in the case of an English word like Sv, it was not used as a search term unless it exists as an independent English word of Sv. When radio-is searched in Japanese, compound words such as heat radiation, emissivity, radiologist, etc. are included in the search result. Regarding contamination, it has related words such as environmental pollution and nuclear pollution. In order to clarify the extent of how such unrelated tweets are included in the present study, we randomly extracted 1000 tweets, and confirmed that there are 116 tweets which seem to be irrelevant to the present research topic. Using that as an estimate, we predict that unrelated tweets accounted for 11.6±2.6% of total tweets assessed in this study. The number of tweets per account fell between 0 to 61,037 with a median of 1; the number of retweets sent out per account was 0 to 36,716 with a median of 1. Fig 1 shows the number of tweets and retweets per day during the study period.
The number of tweets and retweets per day that included the keywords shown in Table 1 increased sharply after March 11 th , 2011 (= day 0 of the Great East Japan Earthquake). The number remained high in the first month, but it decreased drastically after the second month (around 100,000 cases per day). The maximum was 643,603 on March 15 th , 2011 (= day 4), the minimum was 74,274 on August 15 th , 2011 (= day 157). The average number of tweets and retweets per day for each month after the disaster was 241,529, 109,197, 120,720, 93,854, 103,953, and 99,464 (from 11 th March to 10 th April, from 11 th April to 10 th May, from 11 th May to 10 th June, from 11 th June to 10 th July, from 11 th July to 10 th August, and from 11 th August to 10 th September, respectively). Several spikes were observed in the first month; 12 th March (= day 1), 15 th March (= day 4), 16 th March (= day 5), and 23 rd March (= day 12) measured the number of 400,000 or more (480, 573, 643,603, 488,555, and 500,575, respectively). After 1 month (11 th April), only 12 th April (= day 32) exceeded 200,000 (207,293 tweets and retweets).
During the study period, the total number of retweets was 12.07 million which accounted for 49.7% of all tweets and retweets combined. This retweet ratio remained at around 50% during the study period, but more precisely, it started with a downward trend one month after the disaster and then increased slightly afterwards. The average during the first week of the disaster was 57.3%. The weekly average one month after the earthquake (average for 7 days from 11 th April) was 41.4%, and the monthly average during August was 50.3%.

Classification of influencers by Doc2Vec
The document vectors of tweets by influencers were calculated using Doc2Vec. Table 2 shows the result of clustering influencer accounts into 5 groups by k-means method.
In Cluster 1, ten out of 13 individual accounts had real names. Of these ten accounts, four belonged to academia, two were journalists, and one bureaucrat. In Cluster 1, tweets tended to rationally describe the effect of radiation based on facts, and this cluster was defined as group A. A typical tweet is as follows: "In 1974 China conducted atmospheric nuclear tests, and radioactive materials fell in Tokyo with rain. I was a student then, and measured people's hair and clothes with a Geiger counter.
The measured values were comparable or larger than those experienced at hospitals in Fukushima. No health problem due to radiation exposure has been reported up to the present for the citizens exposed then in Tokyo." Cluster 2 had 21 out of 38 individual accounts with real names. Two accounts belong to academia, four businessmen, three journalists, and three politicians. In Cluster 3, 21 of the 25 individual accounts had real names. Among them, three accounts belonged to academia, with five journalists and six politicians.  Both Cluster 2 and cluster 3 had many emotional tweets, and criticisms against the government and Tokyo Electric Power Company (TEPCO). Since these two clusters had similar tweet contents regarding radiation, we combined them to create group B. The contents of tweets concerning radiation among group B differed from those among group A. A typical tweet is below: "I will repeat it many times! To buy and eat radioactively polluted agricultural and fishery products is showing "support for TEPCO" rather than "assistance for victims"! Why should consumers, at the expense of their own health, help with the damage that should be compensated for by TEPCO? Stop doing this stupid thing now! Do you want to save TEPCO until your children develop thyroid cancer?" Cluster 5 consisted of 7 news agencies and one individual account of a journalist. Since clusters 4 and 5 were accounts related to mass media, these two clusters were collectively shown as group C for subsequent counting. Fig 3 shows (a) the number and (b) the proportion of retweets that each influencer group accounted for out of the total retweets. At the beginning of the disaster, the number of retweets received by group A influencers was almost equal to that of group B, but after one month group B received the majority of retweets, and the situation remained unchanged afterwards. Tweets posted by group C received the lowest number of retweets. Specific bumps were observed in group B in the middle of May and in the middle of July. Fig 4 shows the retweet network diagram of radiation information generated by influencers. A node with an account name as a label represents an influencer. The size and color of the influencers' node indicates the total number of retweets and their group, respectively. Nodes that do not bear the color of any influencer, shows the color of the group whose messages the group retweeted the most. The link density represents the number of retweets. Overall, retweets of group B's posts were dominant. Inside each group were there many retweet interactions, whereas the number of retweets between groups were relatively small. Among each respective group, especially in group B, a tight network was built by the influencers at the hub who frequently retweeted each other's contents. The network diagram would be uploaded in the following URL. (https://likr.github.io/twitter-analysis2018/)

Discussion
Scientific communication on SNS (social networking service) has become increasingly important. However, in emergency situations such as a natural disaster where scientific communication is necessary, little is known about how much scientific information is spread and transmitted on Twitter.
Of note, retweets account for roughly half of all the radiation-related tweets and retweets posted within half a year after the Fukushima Daiichi nuclear power plant accident. The majority of the original posts that were retweeted were sent out by accounts defined as "influencers". In this study, retweets accounted for 49.7% of all tweets and retweets combined. The top 100 accounts received 31.1% of retweets, and the top 200 accounts received 40.0%. Although future research is necessary regarding the extent to which the number of retweets themselves represent the information spreading power of Twitter, our data suggests that it is possible that the majority of information on Twitter is being supplied by very few sources. These findings are comparable to past research results dealing with tweets not related to radiation, such as hate speech targeting foreigners in Japan [34]. Twitter is a social media platform with a high degree of free interaction between individual accounts, but in terms of information spreading, retweets account for half of the total. Influencers can have a stronger impact on information transmitted to the general public rather than the interaction between individuals.
In this study, the majority of influencers (54%) had their personal names attached to their Twitter accounts. News agencies accounted for 15% of influencers, and had a small number of retweets throughout the study period as shown in Fig 3. Some group accounts, which were socially important but not identified by personal names, such as news media and government agencies, did not have a strong influence on information propagation. This result is consistent with a past report showing that Japanese government's tweets were no longer being retweeted once public concerns and doubts have become too strong [35]. These findings suggest that individual accounts bearing real names had more influence on the spread of radiation related information than other accounts. Since there were various opinions on radiation, the general public had trouble ascertaining which information was scientifically correct; and perhaps, judging whether the content of tweet was correct or not, depended on if the sender could be trusted. Scientists should avoid transmitting scientific information in a closed society within their affiliated organizations or open to the public but in an anonymous manner during events that cause social debate such as nuclear accidents and radiation exposure. Although it could give us useful information on how to effectively transmit scientific information, our study has not revealed the mechanism of information spreading by influencers. A further study on their tweets, including those with topics other than radiation, would give us some hint for an effective way of transmitting scientific information to the public.
Interestingly, the ratio of tweets sent by influencers stayed fairly constant since the first month. In the early days after the accident, posts made by group A, B and C were all frequently retweeted; however, the number of retweets received by group A and C showed a rapid decrease, and messages by group B received the majority of retweets one month after the disaster. This tendency remained unchanged over the next six months even after credible scientific information became widely available such as actual measured radiation doses in the environment. We did not investigate why the share of group A and C rapidly reduced, and group B maintained its dominance. We did observe however that group B's tweets were more emotional than the other groups and involved many criticisms against the government and TEPCO. Such tweets may be easier to propagate widely through SNS than science-based and less emotional information. Scientists should recognize that such emotional exchanges tend to occupy the majority of posts made on SNS. Further research is necessary to understand how to effectively convey scientific but not emotional information through SNS.
The results in this study suggest that retweets were intensively spread within a fixed population, especially within groups with similar document vectors, while intercommunication among groups with different document vectors was small (see Fig 4). These findings are analogous to the fact that at the time of the United States presidential election, each camp accessed only the information its own camp posted on Twitter [36]. While influencers were eventually classified into three groups in the present study, there seems to be a discontinuity in information spreading in group A and B as can be seen in the network diagram (Fig 4). Although this research did not carefully examine the contents of each tweet, the sentiment of tweets concerning safety and danger of radiation is firmly fixed within each group, and the contents of tweets exchanged within each group were clearly differentiated. In group A, information on radiation was transmitted based on relevant scientific evidence, whereas in group B the majority sent out cautionary messages, over-emphasizing or exaggerating the danger of radiation. Therefore, when members of the general public tried to acquire information on radiation, they may have been exposed only to biased information depending on which group of influencers they were following with their Twitter account. Twitter is an interactive social media platform, but information regarding radiological issues was spread mainly through retweeting influencers' messages; as a result, individuals were found to have received only biased information from a limited number of influencers.
The present study is suggestive when considering how government and international organizations communicate scientific information to the public. As shown in the present study, information on SNS is not limited to only those that are scientifically correct. Contents that are perceived to be more emotional and eye-catching tend be propagated more. A lot of information in Twitter is spread by influencers sharing information with each other. For this reason, the method of unifying the information sources and providing information to the public only from specified organizations is not necessarily optimal as a method of distributing information to the public. Scientists and stakeholders will have to link each other and distribute information in cooperation. In addition, although discussion and public dialogue are important to deepen mutual consent and understanding of controversial issues such as radiation [37,38], attention must be paid to the possibility that a two-way communication tool like Twitter could be used unilaterally by influencers to spread their own agenda.

Conclusion
The results of this study showed that retweets account for roughly half of all the tweets related to radiation within half a year after the Fukushima Daiichi nuclear power plant accident. The majority of the retweets were based on original posts sent out by a few hundred accounts defined as "influencers". The majority of influencers had individual accounts with real names. While the ratio of information spreading by influencers was established and fixed in the first month, retweets were intensively spread within fixed population, especially within groups with similar tweet contents.