> $&#{h%bjbjzz.4h
SSSSSgggg,g$____:::L$N$N$N$N$N$N$$'m*Pr$S:::::r$SS__$(((:S_S_L$(:L$((:#,$_
"5<#
8$$0$#R**$*S$$(:::r$r$(:::$::::*:::::::::
!:SUPPLEMENTARY METHODS
Spatial Principal Component Analysis (sPCA)
In order to investigate the spatial distribution of genetic variability within the Italian Peninsula, a spatial principal component analysis (sPCA) was performed on haplogroup frequencies for both Y-Chromosome and Mitochondrial DNA data. Differently from classic PCA, where eigenvalues are calculated by maximizing variance of the data, in sPCA eigenvalues are obtained maximizing the product of variance and spatial autocorrelation (Moran's I index). In order to include spatial information in the analysis, we used a weighting procedure based on a Delaunay connection network [1]. Eigenvalues found by sPCA are both positive and negative, depending from Moran's I positive or negative values. The most informative components are those identified by eigenvalues with the highest absolute values. Large positive components correspond to global structures (cline-like structures); large negative components correspond to local structures (marked genetic differentiation among neighbours). The presence of global or local structures is further assessed by using the Global and Local random test as implemented in the adegenet package [2]-[4]. Loadings of the most informative components were used to identify haplogroups that mostly influence the genetic structure of Italian populations.
Discriminant Analysis of Principal Components (DAPC)
The genetic variability of mtDNA and Y-chromosome haplotypes within main haplogroups was explored by means of a DAPC analysis. The DAPC method [5] is aimed to describe the diversity between pre-defined groups of observations. Being designed to investigate individual genetic data, the method can be easily adapted to the study of haplotypes within haplogroups. Preliminarily, data are grouped using k-means, a clustering algorithm which finds a given number of clusters maximizing the variation between groups. The algorithm runs on a transformation of the raw data using Principal Component Analysis (PCA). We retained all the principal components in order to conserve all the variation in the original data. The optimal number of clusters is identified by running k-means with increasing values of k (up to a maximum, in our case, of 20). Clustering solutions for different k values are compared calculating Bayesian Information Criterion (BIC). The 'best' solution corresponds to the lowest BIC. The actual DAPC procedure consists of two further steps. First, original data (STR haplotypes) are transformed (centred, in our case) and submitted to a PCA. Second, the retained PCs are passed to a Linear Discriminant Analysis based on the groups identified during the preliminary k-means clustering step. As a result, discriminant functions are constructed as linear combinations of the original variables which have the largest between-group variance and the smallest within-group variance. Membership probabilities are based on the retained discriminant functions. Concerning the first step, it is important to observe that retaining too many PCs with respect to the number of populations can lead to over-fitting the discriminant functions, meaning that membership probabilities may become drastically inflated for the best-fitting cluster, resulting in apparent perfect discrimination. As a consequence, we decided to retain as much PCs are needed to represent ~80% of the variation in the original data. The same problem would hold also for the second step, e.g. the number of retained discriminant functions. In our case, given that the number of investigated clusters is relatively low, all the discriminant functions were retained.
Batwing analysis
We established prior distributions covering an expected range congruent with human population history. For mutation rate priors, muprior, these were set to EMBED Microsoft Equation 3.0 and EMBED Microsoft Equation 3.0 for 25 year generations, where the form of the gamma distribution was EMBED Microsoft Equation 3.0 . The prior for the ancestral population size was designed to be very flat over the range of likely ancestral values, with EMBED Microsoft Equation 3.0 , and EMBED Microsoft Equation 3.0 . The population growth rate priors, alpha prior and betaprior, were set to EMBED Microsoft Equation 3.0 , EMBED Microsoft Equation 3.0 , and EMBED Microsoft Equation 3.0 , EMBED Microsoft Equation 3.0 . The number of times parameters were updated between samples was Nbetsamp=10, and the number of times trees were changed before updating parameters was treebetN=20. The number of samples between writing the outfile was picgap=1500000. The total number of samples accumulated in the out file was 3.5 million, and 1 million were excluded as burn-in.
SNP information was integrated for the phylogenetic reconstruction, but it was not considered for posterior estimates. Chain convergence was evaluated by running three independent runs (starting from different seeds) and estimating the Gelman and Geweke diagnostic statistics [6], [7] for the parameters of interest with the R package CODA [4], [8].
Jackknife-like procedure for outliers identification
Being the SD-based time estimation of DAPC clusters sensitively affected by the presence of outliers, a jackknife-like procedure for their identification has been designed as follows. For each DAPC cluster of N individuals, the variance-based estimate (SD) was recomputed N times on a set of N-1 haplotypes, leaving out one individual at a time from the original data set. If one of the N estimates, recalculated with the exclusion procedure, is significantly different from the others, we can suspect the presence of an outlier in the original dataset. In this case the best estimation of time will be the one for which the "outlier" haplotype has been excluded. Otherwise, if none of the recomputed estimates differs significantly compared to the others, we can exclude the presence of outliers. In that case, we retain the time estimate calculated on the whole original dataset. The identification of outlier estimates was performed with Grubbs test [9] using the R software Outliers package [10].
References
Brassel KE, Reif D. (1979) A procedure to generate Thiessen polygons. Geogr Anal 325:31-36.
Jombart T (2008) adegenet: a R package for the multivariate analysis of genetic markers. Bioinformatics 24: 1403-1405.
Jombart T, Devillard S, Dufour AB, Pontier D (2008) Revealing cryptic spatial patterns in genetic variability by a new multivariate method. Heredity 101: 92-103.
R Development Core Team (2008) R: A language and environment for statistical computing. Vienna: R Foundation for Statistical Computing. ISBN 3-900051-07-0, URL HYPERLINK "http://www.R-project.org" http://www.R-project.org.
Jombart T, Devillard S, Balloux F (2010) Discriminant analysis of principal components: a new method for the analysis of genetically structured populations. BMC Genet. 11:94.
Gelman A, Rubin DB (1992) Inference from Iterative Simulation Using Multiple Sequences. Stat Sci 7:457-472.
Geweke J (1992) Evaluating the accuracy of sampling-based approaches to calculating posterior moments. In: Bayesian Statistics 4. Oxford (UK): Clarendon Press.
Plummer M, Best N, Cowles K, Vines K (2006) CODA: convergence diagnosis and output analysis for MCMC. R News 6:7-11.
Grubbs FE (1950) Sample Criteria for testing outlying observations. Ann Math Stat 21:27-58.
Komsta L (2006) Processing data for outliers. R News: 6:10-13.
A
H
}
>O
23ʶʠʶʶvnj_UnnjJj0
hUVjhEHUj~
hUVhjhU'hB*CJOJQJaJmH
phsH
*h6B*CJOJQJaJmH phsH *h:8g6B*CJOJQJaJmH phsH 'h:8gB*CJOJQJaJmH phsH 'hB*CJOJQJaJmH phsH hCJOJQJaJmH sH !h5CJOJQJaJmH sH BI
~
?P6 ! " # $ % & ' ( 3 !!"
&F7d7$^7`gd:8gd1$7$
dd345|}89:;AB`abcٹ٤ُzpej
hUVjIhEHUj'
hUVjhEHUjۋ
hUVjyhEHUjK
hUVj.
hEHUj
hUVjhEHUjpy
hUVhhCJOJQJaJmH sH jhUjhEHU$ !?@AB5 ݰwfR>R'h:8gB*CJOJQJaJmH phsH 'hB*CJOJQJaJmH phsH !h5CJOJQJaJmH sH %h5CJOJPJQJaJmH sH "hCJOJPJQJaJmH sH 'h:8gB*CJOJQJaJmH
phsH
'hB*CJOJQJaJmH
phsH
jhEHUj>
hUVhhCJOJQJaJmH sH jhUjhEHU ' ( 2 G H L M { !!!!L"M"±xxxaJxJx:jh:8gOJQJUmH sH -h<^h:8gB*CJOJQJaJmH
phsH
-h<^h:8gB*CJOJQJaJmH phsH 'h:8gB*CJOJQJaJmH phsH 'hB*CJOJQJaJmH phsH !h5CJOJQJaJmH sH !h:8g5CJOJQJaJmH sH 'h:8gB*CJOJQJaJmH phsH 'hB*CJOJQJaJmH phsH *h6B*CJOJQJaJmH phsH M"s"t"""""\%c%d%e%f%g%h%s]Is'hB*CJOJQJaJmH phsH +hB*CJOJQJ^JaJmH phsH hCJOJQJaJmH sH !h5CJOJQJaJmH sH 'hB*CJOJQJaJmH phsH 'h:8gB*CJOJQJaJmH phsH -h<^h:8gB*CJOJQJaJmH
phsH
hh:8g0J)OJQJmH sH jh:8gOJQJUmH sH h:8gOJQJmH sH
">##J$$%[%\%]%^%_%`%a%b%c%d%e%f%g%h%#ddd7$^`
&F7d7$^7`gd:8g,1h. A!n"n#$n%DdL
Z
6A?"mDeVf D
@=mDeVf
vx햻JA&%1J"[Q(EDQ`B$؉O+hsG6c6~3l,fLz@
[4r<ެW˯9=o1PHI-H$O\㲮j61#0]Rw>m˸Z*qd
Tq84wv]zHEzΝ+#Tv9y&ExqlK{m\sq]P9)qTX#:>$(]2xM-yA?$;bݛZyC3
)Pн)?_2ZTo|JVڟ{ZDd]
Z
6A?"K\$B
@=K\$B Nx핽NA($leĄV(lb
F-5(bcg>z#,$_Ν;{dU#θ3P?:@FK$Gf#s=0ofI,ɽBF_CVVZuumbC'$P晋~J^/kܢ+<3n"N(|OбЗ3឴w\:&|MRn?XגI[DtIIkFcvnUrOi!)on9߈^)c;%+0QRDd?Z
xZ
6A?"9dm3t
@=x9dm3t tBWZ"FxMHTQό93~)&B¢Zed
E>EP;kUmamZnZDJޣo5;hO87yyX,VrU^@&$k#uy#W8"O}'#>6>
09V6+1h O$Q#[@\0Uжp
W!Λ`3,q9i`
2'`ynuo*?rbLǶsQ'GZ"rCCp[ARX qw$Gxlw'^ply8g߱3b|f3߉1a?C{7{Lρ{qٓgbYþ>K_=y&~xvq7fKײq>y&b
!;0&}zޓ'ľG(bl<=3.q!xu>doo}~#$D`l@9X}«We=6ۀ_Fs=~}otg%?Lw㐋=yvmy~<_vwH{r}jⰿ-lwޓsݿ'JX%q- |ٓenI6|kX.Uo=j6=þ6*MF'Zc=?cTs.@3Gw~jb^QP6ｋfBX.yYc=þDR=y& }6S[RGbmǮCCKxLgb v]/o+kwu=La'&~'S(&*v/bOcrWL"Sk}ћOgbo0 Q vOKDd
* Z
6A?"a+@DEC yr
@=qa+@DEC ] ?x핽NAQA @ BklbFJM40M91|*3YBa'ܙ3U0>)0R@~k@fU%%!P %7K6&)K!-Wڐ*WNcn8̜o]zI\}\(=tpX6w=v]{r&ܑ>rٚy?cZMrO"}*xv}Z`/Q[>wy
z37%*ޟ+:!m-#yZ-9m-/aZM`Dd
Z
6A?"--0eg
@=--0eg 1 zx?K@Ɵ4ZRK,Jk8DATTh+ťgp{ԏg0>ibH3Kޗwpvxv@}c{`080&]k-[UٷLh
}\R㏒tY#51Wyx]ߧ=Cs>Cg@cAu.|"\..f~@&r{Lh~}]P}*4c=?Rh-i쵚%6ip]:)_7$g#u-K>?#(ׂ`ϽoO}~eabJE+^د@?_?]K},^Lb/4k}a'텤wPE_3|
JDd
+ Z
6A?"&EC xC
@=p&EC
`
!"4%(*)+,-/.0213:95678;=>?@ABCDEFGHIJKLMNOPQRoot Entry) F-5<'Data
5WordDocument(.4ObjectPool+
"5<-5<_58631584F
"5<
"5<Ole
CompObjfEquation Native @
"&)*+,-/0123457
FMicrosoft Equation 3.0DS EquationEquation.39q$
k=1.47
FMicrosoft Equation 3.0DS EquationEquation.39q$
=2173_59019312
F
"5<
"5<Ole
CompObj fEquation Native @_59144560F
"5<
"5<Ole
CompObj
fEquation Native
FMicrosoft Equation 3.0DS EquationEquation.39q
fx;k,()dx=
1
k()
x
()k"1
e"x/
d
x
()_59016992F
"5<
"5<Ole
CompObjfEquation Native 4
FMicrosoft Equation 3.0DS EquationEquation.39q
k=1
FMicrosoft Equation 3.0DS EquationEquation.39q,
=0.0001_59395008F
"5<
"5<Ole
CompObjfEquation Native H_59497424F
"5<
"5<Ole
CompObjfEquation Native 4
FMicrosoft Equation 3.0DS EquationEquation.39q
k=2
FMicrosoft Equation 3.0DS EquationEquation.39q
=400_59647936 F
"5<
"5<Ole
CompObjfEquation Native <_59748288$"F
"5<
"5<Ole
CompObj!#!fEquation Native #4
FMicrosoft Equation 3.0DS EquationEquation.39q
k=2
FMicrosoft Equation 3.0DS EquationEquation.39q
=1_59850368&F
"5<
"5<Ole
$CompObj%'%fEquation Native '4 >xN@1 !+$*k֥EF]W`7L|D>>zH
v=ɗ̽sg3*klb'|)|tz(US),Dj
N8)yJr@j#cKl%&1mE/̜A&y6nDuxN@1 !+$*k֥EF]W`7L|D>>zH
v=ɗ̽sg3*klb'|)|tz(US),Dj
N8)yJr@j#cKl%&1mE/̜A&y6nDuxJ@5ElRHNR5WzSzXԃxRA/|/KCdOwMS5-?|dv%F$KWP;>gkTIRU۷L~~y
:c7%hw)]-:95}~ioZ__*M1Table<*SummaryInformation(*(dDocumentSummaryInformation8.CompObj6vOh+'04x
$,StefyNormalStefania Sarno3Microsoft Office Word@0@@B~4<@r5<c՜.+,D՜.+,,hp|
5YTitolo 8@_PID_HLINKSAlZ_http://www.r-project.org/
F$Documento di Microsoft Word 97-2003
MSWordDocWord.Document.89q^*666666666vvvvvvvvv6666668666666666666666666666666666666666666666666666666hH6666666666666666666666666666666666666666666666666666666666666666662 0@P`p2( 0@P`p 0@P`p 0@P`p 0@P`p 0@P`p 0@P`p8XV~_HmHnHsHtH```Normaled*$(CJOJPJQJ_HaJmHnHsHtHLA`LCar. predefinito paragrafo\i@\
0Tabella normale :V44
la4k 4
0
Nessun elencoJ/JAbsatz-StandardschriftartP/PWW-Absatz-StandardschriftartR/RWW-Absatz-Standardschriftart16o!6 WW8Num1z0OJQJ^J6o16 WW8Num1z2OJQJ^J6oA6 WW8Num1z3OJQJ^JN/QNCar. predefinito paragrafo1xoaxPreformattato HTML Carattere(CJOJPJQJ^JaJmH
nHsH
tHZoqZTesto fumetto CarattereCJOJQJ^JaJBoBRimando commento1CJaJDRDIntestazione CarattereFRFPi di pagina CarattereBoBRimando commento2CJaJH/HTesto commento CarattereToTSoggetto commento Carattere5\NONHeading
x$OJQJCJPJ^J aJ:B@:Corpo testo
x,/@,Elenco ^J N"@N
Didascalia
!xx$CJ6^J aJ].O".Index"$^J e@2Preformattato HTMLH#d
2(
Px4 #\'*.25@9*$(OJQJCJmH
nHsH
PJtH^JaJX@BX
Testo fumetto$dOJQJCJ^JaJNRNIntestazione%d
%P bP
Pi di pagina&d
%BOrBTesto commento1'CJaJDj@qrDSoggetto commento(5\VU`V:8gCollegamento ipertestuale>*B*phPK![Content_Types].xmlN0EH-J@%ǎǢ|ș$زULTB l,3;rØJB+$G]7O٭V,cy$wc.bQKG7fK˵Riv4(xL}m{d$JfN268k.~4$
^6.%2`Z7ZW
~~__q#A .K[ҲMU0P3~St><~ePm$,S?xG_Te@(:/|۳'/U7Cn#c0x՜(e$8ZJ)fYt=:
x}rQxwr:\TzaG*y8IjbRc|X%'I
}3OKnD5NIB!ݥ.|]:VdHGN6͈iqVv|{u8zH
*:(W☕~JTe\O*tHGHYEsK`XaeE
Ex[8hHQrB3'}'ܧw4tT%~3N)cbZ
4uW4(tn+7_?mmٛ{UFw=wߝ$#P[Ի9{漨/%Ϻ04h=Aی©{L)#7%=A59sFSW2pp >*D8i&X\a,Wx=j6!v.^UhVdLVJYZݨf#0YiXxxyNZ4v0#Qp@icT7AsemM}pgR!M
*KhIV&Fgbe
_膖W`VcJD1#4b!:UJ0A?ݜy67bg1K#[]y%[iH橤V1 Si?3E'pp9,0ҕP.FLl]x
IWA,SpT4D~"A%}0g{e2F&JԪ="
u\{"HuM6`p'}*h!\oN'+^[crhr*lW<{ˆ1W+m_SsncY̕([@G>V/43HKv@ANv&fe]Nkf}n!fg9]g8/ٙ_۵Ȟ,QX%A)i S 3hH7PK!
ѐ'theme/theme/_rels/themeManager.xml.relsM
0wooӺ&݈Э5
6?$Q
,.aic21h:qm@RN;d`o7gK(M&$R(.1r'JЊT8V"AȻHu}|$b{P8g/]QAsم(#L[PK-![Content_Types].xmlPK-!֧60_rels/.relsPK-!kytheme/theme/themeManager.xmlPK-!theme/theme/theme1.xmlPK-!
ѐ' theme/theme/_rels/themeManager.xml.relsPK]
h43 M"h%"h%
24|8:A`b ?ALsh:::::::::X8@0(
B
S ?<@P$3:?Cfny~ &+2>D"j''gj~.srM]yth^`o(.
^`hH.
pL^p`LhH.
@^@`hH.
^`hH.
L^`LhH.
^`hH.
^`hH.
PL^P`LhH.^`o(.
^`hH.
pL^p`LhH.
@^@`hH.
^`hH.
L^`LhH.
^`hH.
^`hH.
PL^P`LhH.yt~.s:8g=hj@''''h@UnknownG*Ax Times New Roman5Symbol3.*Cx Arial7.@ Calibri?= *Cx Courier New;Wingdings5..[`)TahomaS&Liberation SansArialcWenQuanYi Micro HeiMS MinchoSLohit HindiMS MinchoA$BCambria Math"Ahȓc5c5!0YYKQX =!
xxStefyStefania Sarno__