Real-life helping behaviours in North America: A genome-wide association approach

In humans, prosocial behaviour is essential for social functioning. Twin studies suggest this distinct human trait to be partly hardwired. In the last decade research on the genetics of prosocial behaviour focused on neurotransmitters and neuropeptides, such as oxytocin, dopamine, and their respective pathways. Recent trends towards large scale medical studies targeting the genetic basis of complex diseases such as Alzheimer’s disease and schizophrenia pave the way for new directions also in behavioural genetics. Based on data from 10,713 participants of the American Health and Retirement Study we estimated heritability of helping behaviour–its total variance explained by 1.2 million single nucleotide polymorphisms–to be 11%. Both, fixed models and mixed linear models identified rs11697300, an intergene variant on chromosome 20, as a candidate variant moderating this particular helping behaviour. We assume that this so far undescribed area is worth further investigation in association with human prosocial behaviour.


Introduction
Prosocial behaviour-voluntary behaviour intended to benefit others [1]-is essential for social functioning in humans, who, next to eusocial insects, form the largest cooperative living groups on Earth. Extensive research has been conducted focusing on individual differences in this multifaceted trait that covers concepts such as helping, cooperation, altruism, and empathy [2][3][4] Ever since Hamilton [5] the evolution of social behaviour on a species level has been discussed in terms of genetics. Unsurprisingly, the traditional twin study approach suggests a partial hardwiring of human prosocial behaviour. Its heritability is typically estimated to be between 10 and 60%, increasing with age and varying with the respective concept of prosocial behaviour under investigation [6][7][8][9].
On the individual level, however, we are only just beginning to understand the genetic influences on human (pro)social behaviour. Research on the regulatory effects of neuropeptides such as oxytocin and vasopressin on social cognition and behaviour [10,11] and the search for their genetic basis have produced several candidate genes. These include the oxytocin receptor gene (OXTR), the argenine vasopressine receptor 1A (AVPR1A) as well as others involved in the dopamine and serotonin pathway of receptors (DRD4, 5 and degradation (COMT, MAOA), and in transportation (DAT, SERT). Studies focusing on these candidate genes found associations with social cognitive functioning, complex medical conditions, as well as social behaviour [12][13][14][15][16][17]. The predominant method in investigating the genetic basis of prosocial behaviour and decision-making is the application of incentivized laboratory-based experiments derived from the field of experimental economics. These complement behavioural genetics approaches [18][19][20][21][22]. All the commonly employed games in behavioural economics experiments (e.g. Dictator Game, Ultimatum Game, Trust Game, Public Goods Game) are easily adaptable and are increasingly being combined with brain imaging techniques to generate insights into the neurobiological structure of economic decision making [23], for example. Beyond this modularity, the approach provides researchers with experimental control by allowing for controlled variation of a variable while keeping all other conditions constant. This both facilitates interpretation of results and simplifies study replication.
Nonetheless, there are several drawbacks to this approach, varying in their severity with the field of application. The sample size of laboratory-based experiments is often small, limiting the generalizability of the results [24]. The trade-off between internal validity in the laboratory and external validity is a genuine, broadly discussed problem [25]. Increasing the sample size creates costly and time-consuming logistics to set up the study. This is especially true when researchers combine standard games with brain imaging techniques and behavioural genetics approaches. Consequently, the latter commonly employ a target gene approach that allows only a small number of variations to be analysed.
Today, the increasing number of predominantly medical studies provides a vast collection of genetic data of large study samples. Their aim is to reveal genetic influences on complex diseases such as Alzheimer's disease, breast cancer, and schizophrenia using genome-wide association approaches [26][27][28].
These studies are often designed as longitudinal studies to keep track of their participants over a longer period of time (Wisconsin Longitudinal Study http://www.ssc.wisc.edu/ wlsresearch/, Health and Retirement Study http://hrsonline.isr.umich.edu/index.php, Avon Longitudinal Study of Parents and Children http://www.bristol.ac.uk/alspac/). The study teams also collect comprehensive phenotypic data beyond basic demographic information and medical condition. Therefore, these data sets provide an excellent opportunity to investigate genetic influences on 'every day' prosocial behaviour beyond strictly controlled laboratorybased experiments and on a much larger sample base. Simultaneously, recent progress in estimating heritability from whole genome sequence data [29] enable heritability research beyond the traditional twin study design.
To date, genome-wide association studies (GWAS) have not been used very frequently to identify the genomic basis of behavioural traits, besides the GWAS used in mental diseases research. Although GWAS have historically only explained a small proportion of the variance in a variety of complex traits being studied, they are well suited to detect unknown causal variants associated with a trait as in contrast to candidate gene tests GWAS are hypothesis free. They therefore offer the opportunity to gain completely new insights into the genetic basis of behaviour. In addition, large study data sets of unrelated individuals allow for an estimation of genome-wide variance explained which due to the availability of common causal variants usually present underestimates. A typical problem of GWAS is their limited potential to describe biological mechanisms on basis of GWAS results. Gene set analysis addresses this issue and uses GWAS results which describe a limited number of significantly associated SNP's with a trait to estimate associations between the trait and entire gene sets known for their specific biological functions [30]. GWAS results also constitute the basis of the estimation of genetic correlations. This investigation of association between complex traits and diseases is especially relevant in gathering etiological insights in causal relationships [31].
All these points taken together, large study data sets provide a promising basis to explore new directions in behavioural genetics.
The goal of this study is to demonstrate new ways of exploring and investigating the genetic basis of (pro)social behaviour and decision making using established methods from medical/ complex disease research. Not unlike complex diseases the genetic basis of a certain human behaviour is complex and heavily interdependent on various influence factors. However, unlike at least some complex diseases human prosocial behaviour is much more difficult to measure, quantify and describe compared to diseases and conditions with specified measurable symptoms.
This leads to the probably single most important limitation of the study presented here: the phenotypic representation of human prosocial behaviour by self-reported helping behaviour. The amount of time a person spends in order to help out his/her family, friends and neighbours without getting paid covers by no means the entire spectrum of prosocial behaviour. However, we feel that it constitutes a valid real-life approximation of a well-defined characteristic of prosocial behaviour. Observations on real-life human helping behaviour with friends and family basically approximates the degree of helpfulness a person exhibits in its everyday life. Unlike in standardized laboratory experiments we can only speculate on the reasons for these observations based on the information we have at hand (the questionnaire). Generally, helping behaviour towards friends and family may be accounted for by Hamilton's rule of kin selection (family) or the basic principle of direct reciprocity [32]. The latter has often been targeted in well-constructed laboratory designs using (behavioural economic) settings in which participants interact-commonly under cover of anonymity-together in financially relevant interactions based on decisions on uncertainty. Trying to create an environment that resembles real-life interactions among fellow humans, interactions are being repeated over and over again, so that reputation and a history of (dis)trust can be established. From these studies we learned about facilitators and obstacles for the development of pro-and antisocial behaviour.
Using the data from the Health and Retirement Study we are able to go beyond this question. We can actually assess a degree of helpfulness in real-life. This comes of course with the cost of not being able to reproduce the motivations underlying these decisions.
The study at hand is limited to investigate a very narrow spectrum of human prosocial behaviour-namely individual differences in helping behaviour towards family and friends. And although it is not able to give answers similar to standardized (laboratory) studies, its exploratory approach might very well show new directions in investigating human prosocial behaviour.

Results
Based on the University of Michigan's Health and Retirement Study (HRS), an on-going longitudinal panel study that collects survey data, anthropometric measurements, and physical performance tests, where more than 10,000 Americans have been genotyped, we used selfreported helping behaviour (SHB) to run a genetic association analysis on 1.2 Million SNPs.
Hence, rs11697300 seems to represent a phylogentic "old" variant in the Hominidae. However, drawing any further evolutionary conclusions on the basis on the available information must, at the moment, remain purely speculative.
We confirmed the robustness of the results of the genetic association analysis with a linear model including six covariates from principal component analysis (Methods, PLINK). Again, only one locus exceeded genome-wide significance in association with SHB (SNP = rs11697300, P = 2.52 × 10 −9 ). And again, the area around SCL30A9 was revealed to be heavily populated with SNPs approaching genome-wide significance. Table 1 summarizes Top 10 SNPs for both genetic association analyses. S1 Fig shows Manhattan and Q-Q plots for PLINK results. Genomic inflation was estimated using the LD Score regression intercept to be 1.0318 (compare: λ gc = 1.0466).
Genetic variance estimation was conducted following Yang et al. [33]). Using the GREML-LDMS method, we estimated from 10,713 unrelated individuals that 1,244,134 SNPs (MAF > 5%) explain 11% (standard error (s.e.) = 2.9%) of variance for self-reported helping behaviour (S2 Table).  Genome-wide Association Studies Identify Genetic Loci Associated With Albuminuria in Diabetes, by Teumer et al. 2016 [36] (P = 0.0313; P = 0.0434). Studies a) and b) are flagged as "Caution" by LDHub because "using this data may yield results outside bounds due to relative low Z score". However, there seems to be a genetic correlation between the presented GWAS on SHB and GWAS on metabolism and diabetes (for a summary of the genetic correlations see Table 2).

Discussion
Prosocial behaviour is a distinct human trait that is strongly influenced by genetic factors [6][7][8]. Our genome-wide association analysis was based on data collected by the Health and Retirement Study covering over 10,000 individuals and more than 1.2 million SNPs.
Our results indicate that one locus, rs11697300, an intergenic variant located between solute carrier family 52 (riboflavin transporter), member 3 (SLC52A3), and scratch family zinc finger 2 (SCRT2) on chromosome 20, is associated with self-reported helping behaviour. To date, no literature is available on the function of this variant or variants in strong linkage disequilibrium (LD) with rs11697300 (S4 Table, based on data provided by the 1000 Genomes Project [37], S5 Table, based on the HRS dataset providing P-values and effect sizes for all SNPs in high LD with rs11697300).
For the last decade, research concerned with genetic influences on prosocial behaviour focused on neuropeptides such as oxytocin and their pathway genes [38,39]. Our results suggest hint towards certain yet undescribed areas in the human genome to influence human helping behaviour. Note that, although we used two different methods to calculate the GWAS (GCTA and PLINK), we, due to the lack of comparable studies at hand, still miss the opportunity to replicate these results using a different data set to get more insights on the validity of the results provided by HRS data. Unfortunately, to our knowledge there is no other study available today that would qualify (either in scope or range of the study regarding the investigated behaviours) as a replication sample. Apart from that, this study is still subject to the general limitations common to all GWAS [40]: GWAS mainly report correlations between genetic loci and certain phenotypes. As a "correlational method", a GWAS is unable to prove causality, as this is usually the case with correlational studies. A potential hint to the underlying biological mechanisms may be given by the genetic correlation and the gene set analysis we applied (discussed later). However, it will be necessary in future studies to investigate our results on a functional/physiological level, potentially clarifying the pathway from the genotype to the phenotype. Moreover, due to the LD structure of the genome, GWAS are mainly designed to detect associations with relatively common variants in a population. Importantly, typical for GWA studies, the SNPs found to be significantly associated with a trait usually explain only a small proportion of the total variance. Accordingly, we applied the method of Yang et al. [29]-the estimation of the variance of a trait explained by all SNPs of a genome-to calculate the heritability due to additive effects of the trait "helping behaviour". Due to the sample size of over 10k unrelated individuals this method yielded a robust estimate of heritability even for a substantially skewed measure of the trait "helping behaviour" (Table II) [41]. Existing studies on the heritability of prosocial behaviour report estimates between 10 and 60%. Estimates from 10 to 20% were found using a twin study design and cooperative behaviour in the trust game as a measure of behaviour [8]. 61% were found a twin study design by Knafo and Plomin 2006 [7] using parents and teacher ratings based on a validated behaviour questionnaire. While lower estimates are being achieved with measures of single behaviours (cooperative behaviour in the trust game), measures that combine observations of different behaviours [8] obtain a higher estimate. SHB presented in this study, yielding an estimate of 11%, however, only enabled measuring one dimension of human prosocial behaviour, namely "hours spent helping friends and family". Therefore it is more comparable to the former method of measuring a single behaviour. We assume that additional data on prosocial behaviour which could be integrated into a more comprehensive variable on "prosocial behaviour" will become available in the future. Thus, bolstering the robustness of the measure might increase the "heritability coefficient" (the total variance explained by genome-wide data) according to the comprehensiveness of the measure in use.
However, our approach of heritability estimation is of course different from "classical" twin study designs to calculate heritability in prosocial behaviour (e.g. [7]) as the estimation of the variance of a trait is explained by all SNPs of a genome which are used to calculate the heritability due to additive effects of the trait self-reported helping behaviour.
Interestingly, albeit intuitively there no association between urinary albumin-to-creatinine ratio (microalbuminuria) would be expected, the genetic correlation between SHB and Albuminuria may make sense as Albuminuria is known of being associated with lower cognitive functioning particularly in elderly individuals [42,43]. If cognition in general is affected it could be speculated that prosocial behaviour may be affected as well. This may work directly by mutagenic or pleiotropic effects or indirectly via confounding effects of diseases. Comparable mechanisms may also hold true for the correlation of prosocial behaviour and lipoprotein blood levels, as there seems to be an association between cognition and lipoprotein blood levels [44]. However, at this stage such potential explanations for the genetic correlations must remain speculative, future studies far beyond the scope of this paper are needed.
Also the gene set analysis did find significant associations of the results to some gene sets that make biological sense including the dopamine receptor genes (DRD1 to DRD5), OXTR, and AVPR1a, all well known in the research of social behaviour. Especially associations with (associative) learning and (negative) regulation of behaviour appear intuitive and supportive of the results of the GWAS. However, as a "correlative approach" a GWAS is not able to transfer the vague concept of "genetic influence" in causality and determination. Accordingly, the relevance of the gene sets found to be associated with the results of the present GWAS may not be over-interpreted, but may provide a starting point for future analysis and deliver ideas where to start looking for causality and determination.
Based on our results we suggest that i) the potential function of rs11697300 and its surrounding area, as well as the other nearly genome-wide significant SNPs on and around SLC30A9, should be investigated in more detail; ii) rs11697300 and the other nearly genomewide significant SNPs should be investigated in candidate-gene approaches, particularly in studies involving both laboratory-based experimental studies and studies on "every day" prosocial behaviour; iii) on the phenotypic level the accordance between lab and field data (laboratory-based experiments vs. "every day"prosocial behaviour) should be investigated in more detail because this issue is still under debate [25,45]; and iv) as mentioned above, additional GWA studies that sample a more comprehensive variety of "prosocial phenotypes" should be conducted in the future.
In conclusion, this study points towards new possible directions for research in behavioural genetics. We present results suggesting an association between yet undescribed genetic variants and human prosocial behaviour.
We encourage other studies to replicate and expand upon our findings. This would be an important step forward in clarifying the biological functioning of loci detected and supporting the notion that these areas are associated with prosocial behaviour.

Study description
The University of Michigan Health and Retirement Study (HRS) is an on-going longitudinal panel study designed to monitor changes in labour force participation and health transition of individuals toward the end of work life and beyond. The current sample population consists of 22,037 Americans over age 50. The sampling mechanism is based on a national probability sample to represent the entire American population. HRS collects survey data (demographic variables, physical and psychological well-being, life and job history, assets and financials, etc.), anthropometric measurements, and physical performance tests (e.g. body height, body weight, blood pressure, grip strength), as well as blood and saliva samples.
The Health and Retirement Study (Project #6192) genetic data is sponsored by the National Institute on Aging (grant numbers U01AG009740, RC2AG036495, and RC4AG039029) and was conducted by the University of Michigan [46]. Collection and production of HRS data comply with the requirements of the University of Michigan's Institutional Review Board (IRB). For a detailed description of the study, see http://hrsonline.isr.umich.edu/index.php. This individual research project was approved by the Ethics Committee of the University of Vienna (Reference number 00077), data use was approved by the National Center for Biotechnology Information Genotypes and Phenotypes Database (NCBI dbGaP) Data Access Request system at the National Institutes of Health (Project ID 6192).

Genotypic data
Based on voluntary participation, genotyping was performed on saliva samples. In total, 12,507 individuals have been genotyped since 2006. Genotyping was performed at the Center of Inherited Disease Research (CIDR) using the Illumina HumanOmni2.5-4v1 array and using the calling algorithm GenomeStudio version 2011.2, Genotyping Module 1.9.4 and GenTrain version 1.9. The medium call rate is 99.7% and the error estimated from 336 pairs of the study sample duplicates is 6 × 10 −5 . Further quality control steps were taken by teams at the University of Washington (UWGCC), the Health and Retirement Study investigator's team, and dbGaP. In total, 2,443,179 SNPs were genotyped. After several steps of stringent quality control measures, 1,244,134 SNPs were left for each participant Quality control steps included dropping dublicate SNPs and SNPs with a missing call rate > = 2%, Hardy-Weinberg-Equilibrium (HWE) P-value < 10 −4 in either European or African samples, and a MAF < 0.05. Table 3 presents a detailed QC summary pipeline with the numbers of SNPs lost after each step (for more details on the process of quality control, see http://hrsonline.isr.umich.edu/sitedocs/ genetics/HRS_QC_REPORT_MAR2012.pdf).
After removing 172 related individuals (80 families of two and four families of three individuals) of the initial pool of 12,507 study participants 12,235 individuals were left in the subject pool. Families were defined as individuals being connected by a kinship coefficient (KC) > 0.1. The threshold corresponds to the expected KC of half-siblings minus two standard deviations. Based on these questions, we merged the eight possible combinations of answers into five categories of hours spent helping others: 0, 1 to 50, 51 to 100, 101 to 200, and 200+. Table 4 summarizes the possible combinations and gives the distribution of participants for each category. Helping behaviours in the US: A GWA approach Genetic association analysis 10,713 individuals with non-missing answers to SHB were matched to 1,244,134 SNPs. Genetic association analysis was carried out with i) a linear mixed model with a genetic relatedness matrix (GRM) and the effects of SNPs treated as random and ii) a standard linear regression approach with six principal component analysis eigenvectors as covariates. GCTA. GCTA (version 1.25) provides options to perform mixed linear model (MLM)based association analyses [47]. The MLM association technique is a widely recognized method of choice for association mapping when sample structure is present. It is based on constructing a GRM modelling the genome-wide sample structure. A random-effects model then estimates the contribution of the GRM to phenotypic variance, and association statistics are calculated to account for this phenotypic variance [48].

Phenotype
We implemented the GCTA-LOCO approach, which evaluates markers on a given chromosome using a GRM calculated from the remaining chromosomes. This 'leaving-one-chromosome-out' (LOCO) method avoids double-fitting the candidate marker and increases power of the analysis compared to regular MLM approaches as well as linear regression [48,49].
PLINK. In PLINK (version 1.07) we used the implemented standard linear regression for quantitative trait data [50] to find potential associations of the genotype and self-reported helping behaviour, after including the eigenvectors of the PCA as covariates as recommended by the Health and Retirement Study for population stratification based on Patterson et al. [51]. PCA results are provided by the Health and Retirement Study. After LD pruning based on the set of autosomal SNPs with a missing call rate < 5%, MAF > 5%, and excluding the regions LCT, HLA, 8p23, and 17q21.31, 154,644 SNPs were selected for PCA. For details, see http:// hrsonline.isr.umich.edu/sitedocs/genetics/HRS_QC_REPORT_MAR2012.pdf.

Genomic inflation
We used the python tool LDSC to estimate genomic inflation (https://github.com/bulik/ldsc/ wiki/Heritability-and-Genetic-Correlation). LDSC calculates genomic inflation as the proportion of the inflation in the mean χ2 that the LD Score regression attributes to causes other than polygenic heritability [52]. Using the LD Score regression intercept as an estimate of inflation, the estimate is, other than λ gc , not biased by sample size in the presence of polygenicity [53]. Helping behaviours in the US: A GWA approach

Genetic variance estimation
We estimated genetic variance based on GCTA's GREML-LDMS method [33] using whole genome sequence data. As this method cannot account for variance attributable to extremely rare causal variant or variants that are not polymorphic in the dataset, we calculated a slight underestimate of the genetic variance. The analysis is conducted in four steps using GCTA [47] (steps i, iii, and iv) and R statistical programming software [54] (step ii). The first step is to calculate the segment-based LD score (i). Subsequently, SNP stratification (ii) is done based on (i) and MAF. Stratified SNPs are used to calculate four GRMs based on the quartiles of the ld score (iii), which are then used as multiple GRMs in performing a REML analysis (iv) [33].

Genetic correlation
We used the online tool LDHub (http://ldsc.broadinstitute.org) to estimate potential genetic correlations among SHB and 177 diseases and traits gathered from publicly available resources and consortia. Estimation is done on the basis of the summary level results of the present GWAS on SHB and the summary results of those 177 GWAS [55]. LDhub has been implemented on basis of Bulik-Sullivan et al. 2015a [52], Bulik-Sullivan et al. 2015b [31]. This method regresses the summary results statistics of GWAS including the genetic variants across the genome measuring each variant's ability to tag other variants locally (detailed explanation can be found in Bulik-Sullivan et al. 2015a [51]).

Gene set analysis
We applied the gene set analysis (GSA) approach developed by Nam et al [30] implementing in the Java application "GSA SNP" (https://sourceforge.net/projects/gsa-snp/files/?source= navbar) on the present GWAS results (SNP with its P value from the GWAS). GSA assigns SNPs to a gene that encompasses the SNP with some padding. Genes are clustered in gene sets of known function. As gene set we used the set "Gene Ontology" (default) with a padding size of +/-20,000 and k-th best P value (default 2). P values are corrected according to Benjamini and Hochberg [56]. The GSA-SNP analysis uses the PAGE method [57]. Details to the method can be found in Nam et al. 2010 [30] and Kim et al. 2005 [56].