Population size and self-reported characteristics and sexual preferences of men-who-have-sex-with-men (MSM) in Germany based on social network data

Background In the absence of detailed information about the population size and behaviour data of men-who-have-sex-with-men (MSM), the estimation of prevalence rates of sexually transmitted infections (STIs) and the design of public health interventions become difficult. The aim of the present study is to estimate the lower boundary of age-specific population sizes and retrieve self-reported information from this population. Methods We used publicly accessible data from a large online dating and social network website for MSM in Germany to retrieve data on the age and regional distribution of profiles. The profiles were also stratified by their information on the preferred position during anal intercourse, safer sex, and sexual identity. Results A total of 464,873 user profiles correspond to an average 15.2 profiles per 1,000 male inhabitants in Germany, varying between 7.6 and 45.6 across federal states. Although the information on the absolute numbers for different age groups is limited by the search engine, age-specific relative frequencies were found to increase from 12.9 in the age group of 18 to 20 year olds to 24.6 profiles per 1,000 male inhabitants in the 28 to 30 year olds. The data shows age-specific trends for safer sex with an increasing easiness of reporting “never” engaging in safer sex or stating that safer sex “needs discussion” with increasing age. Around one third of profile owners stated to be versatile with respect to the preferred position in anal intercourse. All other options (“only bottom”, “more bottom”, “only top”, “more top”) were preferred equally likely by roughly 10% of profile owners, respectively. Conclusions Online social network or dating sites can provide some information about specific populations in the absence of other data sources. The presented results are the first to report age-specific rates of MSM per 1,000 male inhabitants in Germany and may be useful to estimate age-specific prevalence or incidence rates as well as to inform health promotion activities and modelling studies for MSM in Germany.

apps have also been used to promote health messages for syphilis screening for MSM [16] and to analyse the risk behaviour associated with the use of online dating among MSM [17,7]. To our knowledge, there is no study so far that makes use of data from MSM dating platforms to provide estimates for the use in the evaluation of health interventions and prevalence and incidence estimations.
The aim of this study, by using data from a large MSM online dating and social network site, is twofold: 1. To estimate the lower boundary of the age-specific population size and regional distribution of MSM in Germany, 2. To provide additional information on general demographics, sexual identity, and selfreported sexual preferences relevant for modelling the impact of interventions among MSM.

Data source
We used data from PlanetRomeo, a large online dating and social network for MSM. Among all the websites used for recruiting participants for the EMIS study, PlanetRomeo contributed 83.3% of all German survey participants [18] and was chosen as it represents the most heavily used website for MSM in Germany. According to the web-traffic analytics of similarweb.com PlanetRomeo ranks 123 rd in the list of most used websites in Germany with approximately 8 million visits, with the website www.dbna.de which contributed the second most participants to EMIS only reaching 270,000 visits and ranking 12.829 th among the most used websites in Germany. In January 2016, we used the website's search engine (which at the time was accessible to non-registered users) to count and analyse user profiles from Germany. Via this search function, profiles of registered users can be stratified by certain profile characteristics, and the number of profiles and the corresponding list of all user profiles matching the search strategy are displayed accordingly. Additionally, the number of profiles stratified by federal state was stated on the website. We searched for every possible combination of the following terms: federal state, selfreported age (between 18 and 75 years), sexual identity ("gay", "bisexual", "transgender", or "no entry"), preferred position in anal intercourse ("top only", "more top", "versatile", "more bottom", "bottom only", "no" [anal intercourse], or "no entry"), and safer sex ("always", "needs discussion", "never", or "no entry"). A "top" is a person who usually engages in the insertive role in anal intercourse while a "bottom" is a person who prefers to take the receptive role. "Versatile" refers to an individual who may take both parts. PlanetRomeo does not define safer sex and does not give users the possibility to state how they protect themselves and others (e. g. condoms, PrEP). No other terms than the ones reported above were available within the search categories.
The PlanetRomeo search engine poses two problems. First, it principally limits results to 600 hits. In this case, we repeated the search using an additional search term, such as "hair colour", to reduce the number of hits (for example, if the search term for "18 to 20" year old profiles in "Northrhine-Westfalia", indicating to be "top" and "always" practising safer sex returns more than 600 hits, the search was split up in different searches additionally including the terms "blonde", "black", etc. for hair colour). The corresponding results were combined afterwards and the number of hits for each search was documented in a dataset. The second problem is the overlap of age groups resulting in non-distinct groups. I. e., when searching for 18 to 20 year olds and 20 to 24 year olds, profiles of 20 year olds will be counted twice. This poses a minor problem for the analyses of age-specific trends in behavioural data, but a major problem for the estimation of total age-specific numbers of MSM. However, by standardizing by the general male population of Germany according to data from the German Federal Statistical Office for the year 2015 [19] for each age group, by mirroring this overlap in the denominator, correct age-specific rates of MSM per 1,000 male inhabitants can be reported.
We report the relative frequency of each variable, and display cross tables of two queried characteristics where appropriate. To test for non-random group differences, the results of a chi-squared test will be provided for the latter with the degrees-of-freedom (df) and the corresponding p-value. No technical barriers were breached and no user log-in was necessary to retrieve the data at the time of data collection. Due to a re-design of the website, there is no longer public access to the data. The complete dataset can be found in S1 File.

Ethics and consent to participate
No individual profiles were accessed for the data retrieval and no individual/personal information has been retrieved as only the number of hits for specific searches was recorded. This is similar to using the number of google hits, which is often used in journalism to underline the importance of the search term or from previous research using only counts of online profiles (e.g., twitter accounts or tweets). We did not gain permission to use the data from the owners/ administrators of the website as they provided public access to the data for the general public. No registration or other form of identification was necessary to access the search engine of the website. A screenshot that shows the access of the search engine without being logged in is available upon request. Additionally, at the time of the data gathering, no terms&condition link is provided on the website. This can still be seen on the "classic" design version of the website (https://classic.planetromeo.com/) as well as on previous versions of the website stored in the webarchive (http://web.archive.org/web/20141031080702/https://www.planetromeo.com/).
As we did not access individual profiles, we did not seek consent to participate from the profile/data owners. Using this procedure, it is also impossible to re-identify any profile owner as the collected dataset does not contain information on an individual level. Furthermore, users of PlanetRomeo need to choose whether their profile information will be displayed to the public (i.e., unregistered users) or to remain anonymous to the public when creating a profile on the website.

Overall PlanetRomeo users
Overall, as of October 2017, the website stated to have 464,873 registered users with a regional distribution across the federal states of Germany as displayed in Table 1. The highest proportion of profiles among German men was found in Berlin with 45.9 profiles per 1,000 male inhabitants, followed by the other two federal city states of Hamburg and Bremen with 32.8 and 23.4 profiles per 1,000 male inhabitants. All other German federal states showed smaller relative numbers, ranging from 7.6 in Brandenburg to 16.3 per 1,000 male inhabitants in Hesse.
Retrieving data via the search engine of the website resulted in a higher number of 540,866 profiles for the age range from 18 to 75 years (as of January 2016). As explained in the methods section, this 1.16-fold overestimation was caused by the overlapping age group definitions in the search engine. The complete distributions of all variables can be found in Tables 2 and 3.
About 8% of all profiles (41,102) had no information on the collected variables besides the mandatory information about age and federal state.
With regard to the age groups, the relative number of profiles per 1,000 male inhabitants started at a level of 12.9 profiles for 18 to 20 year-old men in Germany. The relative density of profiles increased with increasing age to the maximum in the age group of 28 to 30 year-olds (24.6). Beyond 40 years of age, the profile density dropped below 19.2 and decreased to 0.8 profiles per each 65 to 75 year-old men. Fig 1 shows the distribution of the profiles over the different age groups stratified by federal states. The highest profile density among men was found among 28 to 30 year-olds (64.2) and 38 to 40 year-olds (63.9) in Berlin.

Sexual identity and sexual behaviour
For all optional characteristics (sexual identity, "position [in anal intercourse]", and safer sex), the proportion of profiles with missing information was 19.3%, 24.4% and 26.1%, respectively, across all age groups. For all variables, missing values decreased with increasing age. With regard to sexual identity, this proportion decreased significantly from 30.4% of profiles stating no information about their sexual identity in the age group 18-20 years (40.2% "gay", 28.6% "bisexual", and 0.8%"transgender") to 8.7% in the oldest age group (df = 1, p-value < 0.001). While the proportions of profiles stating to be "bisexual" or "transgender" remained relatively stable at around 25%, the share of profiles indicating the owner to identify as "gay" increased from 40.2% to 64.8% (df = 1, p-value < 0.001) from the youngest to the oldest age group (26.1% "bisexual" and 0.3% "transgender"). Transgender rates varied only little (between 0.10% and 0.13%) between 20 and 34 year-olds (df = 6; p-value = 0.001), but decreased with age for older age groups. There seemed to be a tendency for profiles of the federal city states Berlin and Hamburg to have a significantly higher probability of indicating to be "gay" (63.1% and Table 1. Number of user profiles in Germany stratified by federal state of residence as stated on the website (as of October 2017) and as retrieved from the search engine (as of January 2016). Due to limitations of the search engine, age groups were overlapping, and thus the total number of profiles is higher than the (correct) total of profiles as stated on the website. 57.4%, respectively; df = 2, p-value < 0.001), while transgender and no information about the sexual orientation showed the same relative frequency as in the other federal states. The preferred position in anal intercourse was distributed relatively symmetrical over the top-bottom-scale. A majority of profiles indicated a preference for both roles ("versatile" 32.9%), while "more top" (8.1%) and "more bottom" (10.9%) and "top only" (11.1%) and "bottom only" (10.0%) showed rather similar proportions. Overall, 2.7% of all profiles stated to have no interest in anal intercourse. The "no anal intercourse" self-statements are highly agedependent and their relative frequency increased from 0.7% in the age group 18-20 to 11.3% in the age group 65-75 (df = 1; p-value < 0.001). Besides the decrease of the profiles with "no information" on their position, a clear age-dependency was found in the "top only" category increasing from 5.7% in the youngest to 13.3% in the age group 53-56 (df = 1; pvalue < 0.001). All other categories showed no apparent age-dependency. In addition, no association was found between the preferred position in anal intercourse and federal state or sexual identity, besides profiles indicating a "transsexual" identity preferring "more bottom" or "bottom only" positions.
As can be seen in Fig 2, the age-dependency found for sexual identity and for the preferred position in anal intercourse was also apparent in the profile information about safer sex. On first sight, the decrease of missing information in the profiles with increasing age of the owner is also apparent. When stratifying the "no entry" profiles, the number of profiles containing no information about the preferred position in anal intercourse and no information about safer sex remained at around 12% across all age groups ("All optional information missing"). The number of persons without information on safer sex, but stating their preferred position in Table 2. Number of user profiles in Germany stratified by age group as retrieved by the search engine. Due to limitations of the search engine, age groups are overlapping and thus the total number of profiles is higher than the (correct) total of profiles. anal intercourse, decreased with increasing age, represented by the "No entry on safer sex" category in Fig 3. The proportion of profiles, stating to never practice safer sex, increased from 0.5% in the age group 22 to 24 years to 1.3% in the age group 60 to 65 years (df = 1; pvalue < 0.001). The statement "needs discussion" showed a very similar pattern with an increase from 11.7% to 23.2% from the age group 22 to 24 to the age group 60 to 65 years (df = 1; p-value < 0.001). However, in the youngest age groups under 22 years, the proportions of both statements ("never" and "needs discussion") were significantly higher than for the 22 to 24 year olds (df = 1; p-value < 0.001) and significantly lower than for the 65 to 75 year olds (df = 1; p-value < 0.001).
In the combination of the statements on the preferred position in anal intercourse and safer sex, it becomes apparent that the relative frequency of profiles stating to always practice safer sex decreased with a more receptive role in anal intercourse (see Fig 3). In the "bottom only" group 2.2% stated to never practice safer sex and 22.8% stated "needs discussion", leaving 57.4% stating that they always practice safer sex. "Top only" and "more top" showed the highest share of profiles stating to always practice safer sex (68.4 and 72.6%, respectively). No clear regional trends regarding safer sex were found.

Discussion
Using data from the largest German online social network and dating site for MSM, a lower bound of at least 464,873 MSM between the age of 18 and 75 could be found, corresponding to 1.52% of the respective (18-75) male adult population. The highest proportion was found in the age group of 28 to 30 year-olds, and the number and the proportion of profiles decreased after the age of 46. The age distributions showed slight "bumps" before the age of 30 and 40 years, which may be due to internet users being deceptive about their age [20]. Our results are comparable to the results of Marcus et al. (2009) [3], which give estimates of the regional distribution of MSM between 20 and 59 years in 2006, ranging between 574.750 and 786.500. A Table 3. Characteristics of the user community in Germany including sexual identity, preferred position in anal intercourse and declared commitment to safer sex. Due to limitations of the search engine, age groups are overlapping and thus the total number of profiles is higher than the (correct) total of profiles. With regard to the optional information, more than half of the profiles owners indicated to be "gay" and 28.1% to be "bisexual". In this context, it should be stated that "transgender" does not reflect sexual orientation such as "gay" or "bisexual" but rather a gender identity. The presence of "transgender" as the only other option given by the website might have pushed transgender persons to mis-identify their sexual orientation as "transgender". A higher number and proportion of profiles were found in metropolitan areas, and more users identified as "gay" in those areas. This may be due to the migration of MSM into larger cities. Most profiles (about one third) indicated equally preferring an insertive or receptive position in anal intercourse, respectively, with around ten percent preferring either a "rather active" or a "rather passive" role, or an "only active" or an "only passive" role, respectively. Around 60% of profiles indicated to practice safer sex, and only less than one percent stated to never practice safer sex. With age, the share of persons practicing safer sex, or only "by discussion", increases.

Variable Optional information Number of profiles Percentage
There are two issues that need to be considered interpreting the analyses of the behavioural aspects. Firstly, there is a relatively high amount of profiles with no information ("no entry") which may not be missing at random, thus being a potential source of bias. Secondly, certain terminologies (e.g., "safer sex") might be subject to a variety of definitions by different users, Age group (years) PlanetRomeo profile density per 1,000 men The specific values can be found in S1 Table. https://doi.org/10.1371/journal.pone.0212175.g001 Population size and sexual preferences of MSM in Germany based on social network data making a common interpretation difficult. The strong age-dependency could indicate two different phenomena. The higher share of "no information" profiles in the younger age groups may be due to a higher awareness for data privacy issues, leading to users who are unwilling to reveal personal information that are perceived to be of high sensitivity. Another reason may be indifference about certain variables at a young age that might become clearer with increasing age (e.g., sexual identity). The relatively high relative frequency of persons in the age group from 18 to 20 indicating to perform safer sex "never" or only according to a negotiated agreement ("needs discussion") might be due to younger MSM not knowing the community's common understanding of safer sex or having a stronger preference for monogamous relationships with seroconcordant partners. The increase of users stating that they practice safer sex "never" or based on a negotiated agreement ("needs discussion") in the older age groups may be attributed to a higher share of HIV-positive men in these age groups who engage in serosorting, meaning that a person prefers to choose sexual partners based on HIV status. However, motivational reasons for engaging in intentional unprotected anal intercourse (bareback sex) are diverse, and barebacking is not limited to HIV-positive MSM. An interview study conducted in five European and North American cities found that for some MSM semen exchange includes a symbolic role leading to a higher level of connectedness, completion, or naturalness [21]. Apart from such interpersonal processes, medical advances leading to a reduced risk of infection from HIV positive partners using combination antiretroviral therapy (cART) and therefore having undetectable viral load [22,23,24,25] may contribute to condomless anal sex. Other factors may contribute to barebacking [26] and/or negotiated risk  Table. [27] and cover the Internet as a facilitating factor of negotiated risk [28], lack of community activism, lower perception of safer sex norms, and intrapersonal factors like sociodemographic characteristics (e.g., low education) [29] and personality traits (e.g., low level of perceived responsibility, desire for sexual pleasure, or sexual sensation seeking) [30]. The difference between "tops" and "bottoms" with regards to their safer sex behavior may not necessarily indicate that bottoms are more open to condomless anal intercourse; it may just indicate the different capacity for decision-making. Since the top and not the bottom has to use the condom, the bottom necessarily has to discuss condom use with the top if he wants to make sure that condoms are used, while the top can just use a condom without discussing it. However, also tops might need to discuss condom use or safer sex in general. The presented results have several limitations. It remains unclear what percentage of the MSM population owns a profile on the studied website and if there are differences in the use of online services between certain groups (e.g., age). The latest survey among MSM in Germany found that 79% of all survey participants used chat, dating, or contact online services monthly or even several times a week. 9% of participants over 44 years of age did not use any online service while 5% to 6% in the younger age groups stated to never use online services [31]. It may also be possible that a single user might have several profiles for different regions, e.g. one for his work place and one for the location of residence or for different purposes (long-term relationship vs. casual dating). In summary, it is not possible to determine the proportion of gay and bisexual men that use PlanetRomeo and to incorporate this in the calculation of the overall   Table. https://doi.org/10.1371/journal.pone.0212175.g003 Population size and sexual preferences of MSM in Germany based on social network data number of MSM in Germany. Thus it is only possible to interpret our estimations as lower boundaries. Furthermore, the data does not allow to identify profiles that are connected to a certain region only for a short period of time (e.g., vacation, business trip, etc.). These limitations would allow both for under-and overestimation of the lower boundary of the number of MSM in Germany and may also be a biasing factor when analysing the regional distribution of MSM. In this context, the low relative, standardised frequency in the state Brandenburg may be the result of residents of this state to choose the more attractive but close-by state of Berlin instead of their actual place of residence. In general, federal city states have a higher density of profiles. Furthermore, as the data represents online, self-reported data, the numbers may significantly differ from the actual behaviour in reality and need to be interpreted with caution.

Conclusions
In the absence of other data sources, social media websites may to be one way to retrieve information about lower limits of population sizes and behaviour data of specific populations. The present study provided numbers on the total population size and age-specific rates of MSM in Germany as well as sub-group-specific information on stated behaviour. This data can be used for tailoring health promotion and to inform modelling studies, which otherwise might need to strongly rely on assumptions on the population size and behavioural parameters.
Supporting information S1 File Dataset. Dataset containing the collected data from PlanetRomeo.