Health and kinship matter: Learning about direct-to-consumer genetic testing user experiences via online discussions

Background Millions of people have undergone direct-to-consumer genetic testing (DTC-GT), but little is known about individuals' motivations and experiences (e.g., discussion topics and emotions after obtaining the test results) in engaging with DTC-GT services. Previous studies either involved only a small number of DTC-GT consumers or were based on hypothetical scenarios. Objective Our study aimed to fill this gap by investigating online discussions about DTC-GT that developed naturally among tens of thousands of social media users. Methods We focused on the posts that were published in the r/23andme and r/AncestryDNA subreddits, which correspond to the two companies with the largest number of consumers in the DTC-GT market. We applied computational methods to infer and examine the topics discussed, temporal trends in posting rates and themes (e.g., aggregation of related topics), and emotions expressed in these online forums. Results We collected 157,000 posts published by 16,500 Reddit users between 2013 and 2019. We found that the posting rates increased sharply after popular promotional events (e.g., each Amazon Prime Day and Black Friday) and most posts were inquiries into, or status updates about, testing progress. The inferred themes of Ancestral Origin and Kinship/Feelings were the two most frequently discussed, while discussions about the Health Risks theme focused primarily on submitting DTC-GT raw data to third parties for interpretation. The Kinship/Feelings theme exhibited the largest range of emotional response. A qualitative review of the posts with extreme emotions showed that some people became excited because they found their biological parents or other kin, while others became upset because they unexpectedly found that their parents or other kin were not biologically related to them. Conclusion This research demonstrates that online social media platforms can serve as a rich resource for characterizing actual DTC-GT experiences. The findings suggest that DTC-GT consumers' purchasing behaviors are associated with societal events and that future investigations should consider how DTC-GT challenges our understanding of kinship structure and function, genomic privacy, and the interpretation of health risks.


Introduction
Traditionally, genetic tests have been ordered and interpreted in clinical or biomedical research settings. However, the past decade has witnessed tremendous growth in direct-toconsumer genetic testing (DTC-GT) [1], from which individuals can learn more about themselves regarding a variety of issues, ranging from ascertaining one's ancestry to assessing the risk of developing various diseases [2,3]. This type of service is now a commodity offered by an ever-growing collection of companies, the two largest of which are AncestryDNA and 23andMe [4], that has been purchased by more than 25 million people so far [5].
AncestryDNA and 23andMe, as well as many other companies, offer to define from what part(s) of the world one's ancestors originated. Even though the results of such tests can vary across platforms [6,7], this information has allowed people to uncover membership of a particular tribe or community [2], as well as to understand the relationships among racial and ethnic identities and genetic ancestry in a region [8]. These companies also allow consumers to download their uninterpreted or "raw" sequence data. For example, more than one million people have posted their identified gene data on sites such as GEDMatch in order to find more relatives. The sites where people have posted their DTC-GT sequence results have enabled law enforcement to track down suspects in criminal cases, such as the Golden State Killer [9]. These services also pose non-trivial limitations and consequences for individuals, such as revealing unanticipated information about one's familial relationships: either identifying new connections or undermining existing ones [3,10]. Moreover, the same revelations that make it possible to identify relatives and criminal suspects necessarily disclose genomic information about tens of millions of Americans-even if they never underwent DTC genetic testing or consented to sharing information about themselves-creating concerns about privacy intrusion [11].
For years, companies have sought to provide health-related information services, such as genetic risk predispositions to consumers, in conjunction with ancestry and kinship data. This enterprise, however, has been volatile, as illustrated by the numerous companies that have been established and gone defunct [12] and how the U.S. Food and Drug Administration (FDA) famously stepped in to halt 23andMe's health-related business for a time [13]. Yet the availability of DTC health-related products has generated significant concerns by professionals. While some commentators and clinicians reportedly feel comfortable assisting their patients to interpret these results [14,15], others worry that healthcare providers are ill-prepared or unwilling to assist test recipients who seek advice and medical interventions [16][17][18][19][20] and that these results could increase health care utilization and divert resources from more pressing issues [21][22][23]. Indeed, several studies indicate that some consumers do seek medical advice but they are not always satisfied [24,25]. Other consumers plan to engage in selfdirected behavioral change but do not always follow through [26,27].
What is known about consumers' responses to DTC-GT in general, and in regard to health-related results in particular, is incomplete. In a recent systemic literature review of over 150 studies that assessed multiple aspects of these services [28], only nine considered the experience of individuals who actually purchased DTC-GT services [29]. The insights provided by research to date are further limited because they are based on surveys and interviews with consumers [26,[30][31][32][33], which reveals responses that may be shaped by the investigators' questions as well as hindsight bias rather than eliciting the respondents' unprompted reports of their experiences [34].
Online social platforms provide an opportunity to learn what aspects of DTC-GT are of interest to consumers by examining what they choose to say either on their own or in conversation with others. Indeed, many people use online environments to discuss and share various aspects of their daily lives, including their DTC-GT results. For instance, Twitter users are posting DTC-GT results, typically revealing ancestral background more than other findings (e.g., disease risks) [35]. More recently, a large-scale analysis of Twitter discourse related to DTC-GT indicated that this behavior is often influenced by the news and DTC promotional websites [36]. While many Twitter users disclose their test results, the limited number of characters in a tweet, along with Twitter's design as an all-purpose discussion environment, make it challenging for users to engage in in-depth discussions. Consequently, this medium hinders our ability to gain a deeper understanding of individuals' questions about, as well as experience with, DTC-GT.
Thus, in this study, we investigated online discussions about taking DTC-GT experience on Reddit, an online content rating and discussion website that was used by approximately 11% of adults in the U.S. in 2019 [37]. Unlike Twitter, which maintains its content based on a social network, Reddit organizes its content into different forums, called subreddits based on particular topics (e.g., r/legaladvice) and allows for posts to be as long as the authors wish. In each subreddit, users can initiate a new thread by submitting a post, make a comment, or upvote another submission or (or downvote) comment. Due in part to its rich content and user engagement, Reddit has grown in popularity for researchers studying a broad range of health issues, including but not limited to mental health, eating disorders, weight loss, dermatological issues, and opioid abuse [38][39][40][41][42].
For this analysis, we specifically focused on the information contributed to the r/23andme and r/AncestryDNA subreddits. We aimed to characterize what people experienced, discussed, and cared about regarding DTC-GT through naturally unfolding online discussions. Particularly, we examined how these topics changed over time, correlated with contemporaneous events such as holiday promotional events, the FDA's approval for 23andMe to provide consumers with health risk reports, and the inferred emotional state of users. We found that many people purchased DTC-GT when there were promotional events, that results regarding ancestry and kinship elicited the most conversation and affect, and that people sought suggestions on sending raw genetic data to third-party services for health risk interpretation. Our findings suggest that some users appeared not to be ready to deal with unexpected consequences from these tests. To the best of our knowledge, this is the first study examining the impact of DTC-GT using long-form consumer-directed online discussions, and our observations reveal the value that this method adds to prior research in DTC-GT area.

Data collection
We collected data from the r/23andme and r/AncestryDNA subreddits through the official Reddit Application Programming Interface. We selected these two subreddits because they provide a specific environment that is dedicated to discussions on the services provided by the two DTC-GT companies that cover the majority of consumers in the market. To collect the data, we created a web crawler using the Python programming language (version 3.6) and applied it to obtain all of the unique identifiers (IDs) of submissions that were archived in pushshift.io before March 26, 2019. Based on these IDs, we next applied the PRAW Python software package (version 5.6.0) to collect submissions as well as any updates, along with comments about them. Each post (which could be a submission or a comment) contained the following fields: 1) post ID, 2) author name, 3) creation date, 4) title (if it is a submission), 5) body text, and 6) the post ID to which it replied.
After collecting the data corpus, we combined the text in the title and body of each submission to represent its content. This was done for two reasons. First, the titles provide information on the topic of the post that is supplemental to the body. Second, we observed that some submissions contained only links in the body of the post. We subsequently removed the posts that contained only a [delete] or [remove] in the content, which indicated that the post had been deleted by either the authors or mediators of Reddit.

Topic extraction
We applied latent Dirichlet allocation (LDA) [43], a computational topic modeling technique that is often utilized in natural language processing, implemented in Mallet (version 2.0.8), to identify the general topics that were communicated in r/23andme and r/AncestryDNA subreddits. Given a predefined number of topics (a required parameter for learning LDA) and a large number of posts, LDA generates two distributions. The first corresponds to the probability that each word is used in a topic. The second corresponds to the probability that each topic is used to describe a post. Since LDA is an unsupervised technique (in that it is not trained on examples of known topics), we relied on the coherence score, as well as heuristics based on visualization of the topics, to determine the number of representative topics [44,45]. The coherence score measures the extent to which two terms in a topic appear together with a high probability in either 1) external documents (e.g., Wikipedia) or 2) the documents that are applied for topic modeling. In general, a larger average coherence score across the topics suggests a better model. At the same time, to enhance the interpretability of topics, we aimed for a lower amount of overlap in topics after their projection into a two-dimensional space according to a multidimensional scaling method. This was accomplished by executing the LDA algorithm with the number of topics equal to each integer between 2 and 25. After each execution, we projected all of the resulting topics into a two-dimensional space and performed a manual review to select the number of topics with a large coherence score and minimal overlap. To further enhance interpretability, we replaced each word with its lemma form and retained only nouns, verbs, adjectives, and adverbs using Spacy (version 2.0.18). It should be noted that we relied on this process to determine whether a word should be retained in the LDA algorithm.

Topic prevalence
We defined topic prevalence as the percentage of posts that discussed a topic within a fixed time period. Specifically, we followed three steps to calculate prevalence. First, after obtaining the topic distribution for each post, we empirically selected a topic probability threshold to ensure that only the topics with a probability above the threshold are acknowledged as being present in a given post. For instance, imagine that we set a threshold of 0.25 in a ten-topic model and a post exhibits probability of 0.3 for topic T 1 , 0.3 for topic T 2 , and 0.05 for each of the remaining eight topics. In this situation, we say that only topics T 1 and T 2 are present in this post. Second, once the threshold was determined, we created a topic co-occurrence matrix by computing when two topics were present as the primary (with the highest probability) and the secondary (with the second highest probability) topics in a post. The value of each matrix cell represents the percentage of posts that discussed the corresponding primary and/or secondary topics. Third, we aggregated the topics into more general themes based on topic cooccurrence. The identified themes were then subject to a prevalence analysis.

Emotions in the themes
After generating themes, which indicated what DTC-GT users experienced, we further compared the emotions in different themes to obtain greater insight into how they felt about their testing experience. Specifically, we applied Linguistic Inquiry Word Count (LIWC, version 2015) to extract the percentage of words in a post that mentions either a positive or negative emotion [46]. LIWC has been widely applied to user-generated data in online environments, including Twitter and Reddit, to perform semantic investigations into discussions about various health-and wellness-related issues such as post-traumatic stress disorder [47], and characterize conversational patterns [48]. We applied a Wilcoxon signed-rank test and Mann-Whitney U test to compare the rate at which the emotions were expressed across the themes at a significance level of 0.001. . It should be recognized that r/AncestryDNA was initiated in 2017 while r/23andme was initiated four years earlier in 2013. This is intuitive because 23andMe started its service in 2006 while AncestryDNA ran its first test in 2012. However, it is still surprising that the number of Reddit users in r/AncestryDNA (~2,500) was substantially smaller than that in r/23andme (~15,000) because this is the opposite of their market penetration. According to 2019 estimates, AncestryDNA had approximately 5 million more users than 23andMe [5]. Given that the number of users differed, we inquired if there was a difference in the rate at which Reddit users participated in these subreddits. It was observed that, on average, each user in r/AncestryDNA wrote 7.7 posts compared to 9.2 posts in r/23andme. Yet this difference was not statistically significant under a Wilcoxon rank-sum test (p = 0.07), which suggested that the rate of contribution did not influence the following analysis.  Table 1 summarizes the nine topics inferred from r/AncestryDNA and r/23andme through LDA (see S1 Third, many users of these two subreddits disclosed their feelings. A post from one user demonstrates this topic with respect to their search experience for kin:

PLOS ONE
Learning about direct-to-consumer genetic testing user experiences via online discussions Table 2 provides a comparison of the distributions of topics in r/23andme after the second FDA approval and r/AncestryDNA. First, it was found that users in r/23andme were more likely to talk about the Testing Progress and Health Risks topics, while users in r/Ances-tryDNA were more likely to talk about Kinship. Second, when talking about their ancestry, users in r/23andme were more likely to mention heritage in general, while users in r/Ances-tryDNA were more likely to mention European heritage. Additionally, users in r/23andme were more likely to mention Sharing Results, Feelings, and Haplogroup topics.

Topic prevalence
We generated the topic co-occurrence matrix (see S2 Fig) by empirically setting the distribution threshold to 0.13 (i.e., topics with a probability below 0.13 were deemed to be insufficiently representative of a post). We empirically selected this threshold based on a requirement that it should be larger than the average topic distribution (e.g. 0.11). Based on the matrix, we clustered the topics into six themes: 1) Testing Progress; 2) Ancestral Origin (European Ancestry, General Ancestry); 3) Sharing Results; 4) Health Risks; 5) Haplogroup/Matching (Haplogroup, DNA Matching); and 6) Kinship/Feelings. Fig 2 illustrates the temporal prevalence of the six themes. In this analysis, we combined both subreddits due to the relatively small number of posts in r/AncestryDNA. There are several findings worth highlighting. First, we observed that the Ancestral Origin and Kinship/ Feelings themes increased in prevalence over time, eventually achieving levels that were substantially higher than the average prevalence. Second, the Haplogroup/Matching, Health Risks, and Sharing Results themes were below the average level most of the time. Third, the Testing Progress theme decreased in prevalence, but experienced periodicity that was highly correlated with Black Friday and Amazon Prime Day in 2017 and 2018, as previously noted. Table 3 compares the two inferred emotions between the themes. The (P)ositive and (N)egative columns show the average percentage of the corresponding emotion words identified by applying LIWC for every post in each theme. ΔN and ΔP refer to a comparison between each theme with the Kinship/Feelings for both emotions, respectively. P-N refers to the difference between the two emotions for each theme. Here, there are several notable observations. First, the subreddit users expressed positive emotions more often than negative emotions across all

PLOS ONE
Learning about direct-to-consumer genetic testing user experiences via online discussions the themes (as shown in the P-N column). Second, the Kinship/Feelings exhibited the highest negative emotion, which was followed by the Health Risks and Testing Progress themes (as shown in the ΔN column). Third, the Kinship/Feelings theme also exhibited the highest positive emotion, followed by the Haplogroup/Matching and Health Risks themes (as shown in the ΔP column).

Principal findings
This investigation of online discussion about DTC-GT in two subreddits yielded several primary findings. First, the topics discussed by the Reddit users align with the services offered by the DTC-GT companies. Of particular note, r/AncestryDNA users were more likely to discuss  European ancestry composition as compared with the world more generally, which is likely due to the fact that 296 of its 392 (75.5%) ethnic regions are for people of European heritage, while only 52 of 171 (30.4%) in 23andMe are European [49], differences which themselves are striking. Second, the observed posting trends in both subreddits clearly reflect the impact of consumer marketing. For example, both DTC-GT and the number of posts published in the subreddits have experienced rapid growth since 2017 [50]. Additionally, it was evident that Reddit users' purchasing behaviors were associated with major promotional events, as illustrated by marked spikes in activity after each Black Friday and Amazon Prime Day, when both companies achieved strong holiday sales on Amazon.com due to price reductions [51]. As one user mentioned in a post: Third, Ancestral Origin and Kinship/Feelings were the two most frequently discussed themes in these two subreddits, but kinship was a more prominent topic and was accompanied by a wider array of emotions. Some people sought to find new relatives, which is consistent with the growing body of literature that focuses particularly on people who were adopted [52] or who were conceived using gamete donation [53]. That interest in uncovering these relationships is driven in part by a desire for information about health history [54] or even identity is particularly intriguing since, until recently, the norm in the law in this country had been severing these connections [55]. On other occasions, people discovered that their biological connections were not what they expected, an occurrence increasingly documented in the media. For example, a recent cover story published by the American Psychological Association reported that a woman who planned to seek testing with her fiancé before marriage unexpectedly discovered that she had been conceived with donor sperm [56]. A similar story was reported in r/ 23andme as well: Indeed, some writers assert that family secrets may be a thing of the past [57]. Yet, DTC companies say little about these potential revelations, at most including language in their terms of service that people may be surprised by the results.
Fourth, discussions about health risks were less common but focused primarily on submitting DTC-GT generated raw data to third parties for interpretation. Given the level of concern among health care providers about DTC health-related results [22,24,58] and the many studies of individuals' views about these tests [59,60], the level of spontaneous discussion of the test results offered by 23andMe was surprisingly low even after 2017 when that company was once again permitted to offer them. This may be attributable to the limited panel of healthrelated results offered by 23andMe, most of which are relatively uncommon, so that most discussants would not have received a concerning result. Being found to be a carrier of a recessive disorder would be more common, but likely to be less distressing unless the person was actively pursuing childbearing [61]. And of course, some of those who did receive worrisome results may not have chosen to disclose them in online environments.
Many people, however, clearly wanted health-related information, as evidenced by the fact that most of the posts regarding health focused on the possibility of using third-party services to interpret health risks from raw DTC-GC data, which had often been obtained from purchasing a basic ancestry service. This is a path pursued by a growing number of consumers [62]. Yet using third-party services to obtain health information is problematic, even if clinicians are willing to consider them. Third-party interpretation websites often lack adequate informed consent, have questionable clinical validity and utility, and lack medical supervision [63]. Moreover, one study reported that 40% of genetic variations identified from DTC-GT raw data were not confirmed on further analysis in a clinical laboratory [3]. Thus, consumers may receive inaccurate results.

Limitations and future work
Despite the insights gained from this study, there are several limitations that we believe can serve as the basis of future work. First, the population in our study was composed of active users in r/ 23andme and r/AncestryDNA, which may limit the generalizability of our findings. For example, Reddit users are typically male (69%), between the ages of 18 and 29 (64%), Caucasian (70%), from the United States (58%), and completed at least some college education [64], which differs from the demographics of 23andme and Ancestry users who tend to be older and more often female but are still mostly Caucasian and college educated [65,66]. It could be useful to consider users from other online platforms, and other strategies to understand the experiences of non-social-media users to obtain broader insights. Second, we relied on LDA to discover the topics and focused on providing a general picture of discussion in these subreddits. Future investigations can apply alternative advanced topic modeling techniques, such as structural topic models, to directly extract topic prevalence [67]. Third, while LIWC is a linguistic tool designed to support semantic analysis in social media data, there are other tools dedicated to emotion analysis, such as the NRC-lexicon [68] and the EMOTIVE-ontology [69], that could be utilized. Fourth, it would be worthwhile to investigate the extent to which online discussion helps individuals cope with the consequences of undergoing DTC-GT. Finally, it will also be important to monitor whether discussions about using DTC-GT to learn health risks grow as the FDA approves more of these tests, particularly those that purport to assess common disease risk, and as other companies begin to offer them [70,71].

Conclusion
This investigation presented evidence that online social media platforms can serve as a rich resource for characterizing actual DTC-GT experiences, yielding insights that can complement research strategies that rely on elicited responses to surveys and interviews. In particular, for DTC-GT consumers who disclosed their experience in r/23andme and r/AncestryDNA, we observed that their discussion focused on kinship, with both positive and negative consequences, inquiries or updates on testing progress, ancestral origin, and intent to send raw DNA data to third parties for health risk interpretations. The findings suggest that DTC-GT consumer's purchase behaviors are associated with societal events (e.g., holiday promotions) and that future investigations will need to consider how DTC-GT challenges notions of kinship structure and function, genomic privacy, as well as health risk interpretation.
Supporting information S1 Fig. Topic visualization. The topic index in each circle is corresponding to the presenting order of the topics in Table 1. (TIF) S2 Fig. The rate at which topics co-occur in the subreddits. Each cell represents the percentage of posts that mentioned the corresponding combination of topics. Cells along the top-left to bottom-right diagonal correspond to posts that expressed one topic only. The matrix was generated by empirically setting the distribution threshold as 0.13 (i.e., topics with probability below 0.13 were deemed to be insufficiently representative of a post). (TIF)