> q` GbjbjqPqP ."::G@@@@\?t
$AhX
h0ijd@x
0?
?Text S1
Supplementary Text on simulation of GWA data
Data were simulated by randomly selecting one of the 10 phased ENCODE regions (http://www.hapmap.org) and choosing SNPs at random, weighting by MAF, such that their density and frequency distribution (mean density 1 SNP per 5.8kb, median distance 3.5kb, mean MAF 0.27, 96% of genome within 10kb of a SNP) were close to that used in the Affymetrix 500k array set (mean density 1 SNP per 5.8kb, median distance 2.5kb, mean MAF 0.22, 85% of genome within 10kb of a SNP) although slightly more evenly spaced across the regions. These represented the genotyped SNPs. All SNPs with minor allele frequency less than 0.05 were then excluded from both the genotyped and ungenotyped sets as GWA studies have very little power to detect variants with low MAF.
A non-genotyped SNP was selected at random from the data to be the disease-susceptibility locus. Selection of the disease susceptibility locus was weighted by frequency to follow Wrights formula [Wright, 1949[31]], as used in [Pritchard, 2001[26]] (Figure S1), EMBED Equation.3 where p is allele frequency, ( represents selection and (S and (N mutation rates to the disease and non-disease states respectively. k is a normalization constant. Here (=12, (N=0.01, (S=0.1, 1 or 3. These values give realistic distributions with disease alleles either almost exclusively low frequency ((S=0.1), mostly low frequency but some more frequent ((S=1) or mainly higher frequency ((S=3) (Figure S2).
Case-control samples were generated by sampling with replacement entire haplotypes at random, based on the allele at the disease locus. While this could inflate LD between SNPs the ENCODE samples are large and the analyses dependent only on pairwise LD between common markers, so that little such inflation is likely. Case-control data were simulated with GRRs of 1.2, 1.5 and 2, a multiplicative mode of inheritance, a population prevalence of disease of 5% and equal numbers of cases and controls (n=1000 or 3000). The Cochrane-Armitage trend test was applied to all genotyped SNPs. The observed allele frequencies and the estimated GRRs (assuming a multiplicative mode of inheritance) were recorded for the disease SNP and the most strongly associated genotyped SNP. This procedure was repeated 10,000 times (50,000 times for GRR=1.2, n=1000) and the frequency and effect size distributions were examined. We restricted the results to either those that reached a significance of 5x10-7 as in the Wellcome Trust study [WTCCC, 2007[6]].
Correlations were calculated using only those simulations that had reached genome-wide significance (p<5x10-7). The correlation between marker allele frequency and estimated GRR was calculated, but as samples were simulated with fixed GRRs rather than a range of GRRs, the data were combined across GRRs to give one dataset for each sample size and each (S. This is equivalent to assuming that GRR is independent of the population model and that GRRs of 1.2, 1.5 and 2 are equally likely to occur
The correlation between the frequency of the most significant marker and the susceptibility allele was calculated and averaged over the three (S models.
Rather than calculate correlations between the estimated GRR and the true, simulated GRR we looked at the absolute difference between the two in each simulation and averaged over these. Again this is because samples were simulated with fixed GRRs. Only simulations that had reached p<5x10-7 were included.
The effect of sequencing rather than genotyping was investigated under the same models by assuming that every marker with MAF>0.05 had been genotyped, including the susceptibility locus. Thus we could see whether sequencing improved the discrimination of marker frequency distributions modelled with different values of (S. Power was estimated using the simulations for both genotyped and sequenced data.
5 *,-@ABCJK_ab|}~諚teWeWKth=L~hMO6OJQJh=L~hMO6H*OJQJ jbh=L~hMO6OJQJ jsh=L~hMO6OJQJh=L~hMOOJQJh0q2hMO6OJQJ!jh[hMOEHOJQJU#jbgB
h[hMOCJUVaJjh[hMOOJQJU!hhMOB*OJQJaJphhhMOOJQJh[hMOOJQJhV5hMO5OJQJ56%&
~GgdMOGt
u
v
#*+^`UVWstuF̫vvvhInhMO6H*OJQJ jbhInhMO6OJQJhInhMOH*OJQJhInhMO6OJQJhInhMOOJQJh{mhMOOJQJh[hMOOJQJhhMOOJQJh=L~hMOOJQJh=L~hMO6H*OJQJ jbh=L~hMO6OJQJ+FGh{mh}OJQJ,1h/ =!"#$%MDd
|B
SA?29__JADe`!9__JA~hR0xڕK`%U;S*ZhQXP \CW_YhIckB{C_n!n@Pf:UUREu[xJ
aUn{C4Y7.q7uCX;^GONr*_~jjsId[Məx466Մy~UbHkw ڗ-t5Jĥ{[NYS>G??3"?K&%|5aYX/SӞZa.XNN\^R}p6<̼幬~RpYIfҮ4M'A2ݻwok.9UawF{o
0e#
m-5 r ?iAKu!?iA-Փj~BxT¨_R0ֵ]WzC?GOq4}~" R0PxVoD~nMI+HHdw{(WA4&7*gxmnJHAHTTqwĕ_GXoW6nXg+0{ ).=%)/ᵂ|K56\{a`=Tֳ3(wPI(O
6JFs^1˰Mh*$@$ϤFI{4Iqm-[SX>Ly؈|OVWSC51&9N18KK韜)GM}x>ǣhGQ3zTs8V0T]`}qG0gsrotgΔ1X:7'CvDKLe4CM$b;kP̝n7 w:![NNH"]kn+48#'Ϗ1D^Q1G|"u\L.K,l8iDP|TMqbt2,80|$uTl4/I2{U '~k52D܆Iv(~XxM"H0w%w9F,j!چa8hm3]7z:2Eô^&o.G(0~0
5bӵO\MZjgoNIx-T*N
E{
%1UplG#6F*0ٴjE.h>Ui6FѪi=T֮lKsΦ(d2Mrt|*/%Rm葋] NuĻM9uIܗXkmZ(U
n(tSè70Ga
HYgoӽ?2H˰$- OU㝵Ct,J5 V,?ޒԒH!:/f.d_x<
,!#$%&'()*+Root Entry
FC|jd@Data
WordDocument ."ObjectPoolfjdC|jd_1119446882FfjdfjdOle
CompObjNObjInfo
FMicrosoft EquationDNQEEquation.3
fp()=kpbS-1
1-p()bN-1
es1-p()Oh+'0tEquation Native 1Table"SummaryInformation(DocumentSummaryInformation88
0<
HT\dlText S1nsheikhNormal.dotacollings2Microsoft Office Word@F#@╷:@EGV?+j* (: (`
3
6
~#n8MO&[cXin-u>[rh?sB L/5gdqn;~s
1" "0#<3#nv#4x#6$%4 %I%h%C&|Y'cS().{)H*[* q,3s,1.
w.MD0Y]06s0f1M1&2:+2042tg234rO4T8' 9C:;i;"<YT<W<d=p=Z??3@*5@c0A 6AHA;DBD%EFXG-H0H^%K(LBLBwM\~MOO+CPD1Q+QQz7Sm:S TT TaTZcT%UPUZV)W-X
OXY@ZH[6]X]&^A^[_cY`
a&~a#b=bc{YdFldO(e7efe.f/fwfg
i]kGikF$m[mtqm{mfoiotIpE7qvvo{vIx0z {:{\{||&)|ME|AH|{|}ZS}\}Y)~%>}I19`#j.p,?fs^tIZ87=D1%"K|Gj_kCAS*vZ>`D/J1ALnZG-~969UbYDw]^GmJW<G0udbA3`A9GgjP1ju\$08y"*t>K*Y:<j2ewpir(NS}!%Rur]w;)?N`de)
^iZL8oEI%^6Hl1]6CK`>9JrI D&e4jBk