Development of a polygenic risk score to improve screening for fracture risk: A genetic risk prediction study

Background Since screening programs identify only a small proportion of the population as eligible for an intervention, genomic prediction of heritable risk factors could decrease the number needing to be screened by removing individuals at low genetic risk. We therefore tested whether a polygenic risk score for heel quantitative ultrasound speed of sound (SOS)—a heritable risk factor for osteoporotic fracture—can identify low-risk individuals who can safely be excluded from a fracture risk screening program. Methods and findings A polygenic risk score for SOS was trained and selected in 2 separate subsets of UK Biobank (comprising 341,449 and 5,335 individuals). The top-performing prediction model was termed “gSOS”, and its utility in fracture risk screening was tested in 5 validation cohorts using the National Osteoporosis Guideline Group clinical guidelines (N = 10,522 eligible participants). All individuals were genome-wide genotyped and had measured fracture risk factors. Across the 5 cohorts, the average age ranged from 57 to 75 years, and 54% of studied individuals were women. The main outcomes were the sensitivity and specificity to correctly identify individuals requiring treatment with and without genetic prescreening. The reference standard was a bone mineral density (BMD)–based Fracture Risk Assessment Tool (FRAX) score. The secondary outcomes were the proportions of the screened population requiring clinical-risk-factor-based FRAX (CRF-FRAX) screening and BMD-based FRAX (BMD-FRAX) screening. gSOS was strongly correlated with measured SOS (r2 = 23.2%, 95% CI 22.7% to 23.7%). Without genetic prescreening, guideline recommendations achieved a sensitivity and specificity for correct treatment assignment of 99.6% and 97.1%, respectively, in the validation cohorts. However, 81% of the population required CRF-FRAX tests, and 37% required BMD-FRAX tests to achieve this accuracy. Using gSOS in prescreening and limiting further assessment to those with a low gSOS resulted in small changes to the sensitivity and specificity (93.4% and 98.5%, respectively), but the proportions of individuals requiring CRF-FRAX tests and BMD-FRAX tests were reduced by 37% and 41%, respectively. Study limitations include a reliance on cohorts of predominantly European ethnicity and use of a proxy of fracture risk. Conclusions Our results suggest that the use of a polygenic risk score in fracture risk screening could decrease the number of individuals requiring screening tests, including BMD measurement, while maintaining a high sensitivity and specificity to identify individuals who should be recommended an intervention.

a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 polygenic risk score for heel quantitative ultrasound speed of sound (SOS)-a heritable risk factor for osteoporotic fracture-can identify low-risk individuals who can safely be excluded from a fracture risk screening program.

Methods and findings
A polygenic risk score for SOS was trained and selected in 2 separate subsets of UK Biobank (comprising 341,449 and 5,335 individuals). The top-performing prediction model was termed "gSOS", and its utility in fracture risk screening was tested in 5 validation cohorts using the National Osteoporosis Guideline Group clinical guidelines (N = 10,522 eligible participants). All individuals were genome-wide genotyped and had measured fracture risk factors. Across the 5 cohorts, the average age ranged from 57 to 75 years, and 54% of studied individuals were women. The main outcomes were the sensitivity and specificity to correctly identify individuals requiring treatment with and without genetic prescreening. The reference standard was a bone mineral density (BMD)-based Fracture Risk Assessment Tool (FRAX) score. The secondary outcomes were the proportions of the screened population requiring clinical-risk-factor-based FRAX (CRF-FRAX) screening and BMD-based FRAX (BMD-FRAX) screening. gSOS was strongly correlated with measured SOS (r 2 = 23.2%, 95% CI 22.7% to 23.7%). Without genetic prescreening, guideline recommendations achieved a sensitivity and specificity for correct treatment assignment of 99.6% and 97.1%, respectively, in the validation cohorts. However, 81% of the population required CRF-FRAX tests, and 37% required BMD-FRAX tests to achieve this accuracy. Using gSOS in prescreening and limiting further assessment to those with a low gSOS resulted in small changes to the sensitivity and specificity (93.4% and 98.5%, respectively), but the proportions of individuals requiring CRF-FRAX tests and BMD-FRAX tests were reduced by 37% and 41%, respectively. Study limitations include a reliance on cohorts of predominantly European ethnicity and use of a proxy of fracture risk.

Conclusions
Our results suggest that the use of a polygenic risk score in fracture risk screening could decrease the number of individuals requiring screening tests, including BMD measurement, while maintaining a high sensitivity and specificity to identify individuals who should be recommended an intervention.

Author summary
Why was this study done?
• Osteoporosis screening identifies only a small proportion of the screened population to be eligible for intervention.
• The prediction of heritable risk factors using polygenic risk scores could decrease the number of screened individuals by reassuring those with low genetic risk.
• We investigated whether the genetic prediction of heel quantitative ultrasound speed of sound (SOS)-a heritable risk factor for osteoporotic fracture-could be incorporated

Introduction
Screening programs are generally designed to identify a proportion of the screened population whose risk of a clinically relevant outcome is high enough to merit an intervention. However, usually only a small proportion of individuals who undergo screening is found to be at high risk, indicating that much of the screening expenditure is spent on individuals who will not qualify for intervention. Osteoporosis is a common and costly disease that results in an increased predisposition to fractures [1]. Many guidelines [2][3][4][5][6] aimed at the prevention of osteoporosis-related fractures incorporate the Fracture Risk Assessment Tool (FRAX) [7,8], a validated method to risk stratify individuals for treatment by assessing their 10-year probability of major osteoporotic fracture. Guidelines vary widely, but often recommend a staged process where individuals are first assessed with a clinical-risk-factor-based FRAX (CRF-FRAX), and those at increased risk of fracture are then additionally characterized with a more expensive bone mineral density (BMD)-based FRAX (BMD-FRAX) score. Such approaches are usually recommended in the setting of enhanced case-finding strategies, but recently, a large randomized controlled trial (SCOOP) demonstrated the potential benefit of community-based fracture risk assessment in reducing rates of hip fractures in elderly women [9]. This trial used a strategy based on the National Osteoporosis Guideline Group (NOGG) screening strategy [3], which implements fracture risk stratification through the use of FRAX scores. In this trial, the entire screened population underwent FRAX assessment using clinical risk factors, and almost half (49%) had a sufficiently high probability of fracture to warrant further testing using a BMD-FRAX test. Yet, only 14% of the screened population had a resultant probability of fracture high enough to warrant intervention. This suggests that a method that improves screening efficiency and decreases the number of persons undergoing risk stratification, particularly BMD-FRAX assessments, would be a welcome addition to the screening strategy. Skeletal measures that predict fracture risk are highly heritable (50%-85%) and include BMD and quantitative ultrasound speed of sound (SOS) measurements, which are highly correlated [10][11][12][13]. Recently, large cohort resources have enabled the genomic prediction of such heritable clinical risk factors from genotypes through polygenic risk scores [14][15][16][17][18][19][20], which capture information from many single nucleotide polymorphisms assayed from genome-wide genotyping. These assays assess common genetic variation at millions of single nucleotide polymorphisms and cost approximately $40 in a research context. However, the clinical utility of such polygenic risk scores is unclear, widespread replication of polygenic risk scores is currently lacking, and it is unknown whether they can aid in screening programs. Defining their clinical relevance may be particularly relevant in a British context, where the National Health Service aims to sequence 5 million individuals within 5 years [21].

PLOS MEDICINE
Very large cohorts are required to train polygenic risk scores, and current cohorts lack sufficient sample size to generate useful BMD polygenic risk scores. However, since BMD is strongly correlated with SOS [22] and SOS has been measured in 341,449 individuals in UK Biobank, we developed a polygenic risk score for SOS termed "gSOS" (for "genetically predicted SOS") that could be used to identify individuals unlikely to have low enough BMD to warrant a clinical intervention. To improve screening efficiency, such individuals could be removed from an osteoporosis screening program prior to measurement of BMD. We then tested the generalizability and potential benefit of incorporating gSOS into the NOGG guidelines using 5 cohorts, comprising 10,522 eligible individuals. Last, we tested if gSOS could decrease the number of people requiring more detailed assessments, such as BMD measurement, while still identifying those who require interventions to decrease their risk of fracture.

Overall study design and cohorts
The purpose of this study was not to predict fractures. Rather, the purpose of this study was to understand if genetic prescreening could reduce the number of screening tests needed to identify individuals at risk of osteoporotic fractures. This study included 3 phases (Fig 1). The first 2 phases were conducted in 2 distinct subsets of the UK Biobank study cohort, and the final phase in a further subset of UK Biobank combined with 4 other cohorts. Characteristics of the cohorts are shown in Table 1, with the cohorts described in detail in Table A in S1 Tables.
The first phase used least absolute shrinkage and selection operator (LASSO) regression [23] to train a set of polygenic risk score models to predict SOS in the UK Biobank Training Set (N = 341,449). In phase 2, the polygenic risk score model explaining the most variance in measured SOS in the UK Biobank Model Selection Set (N = 5,335) was selected and named gSOS. The ability of gSOS to explain variance in measured SOS was then tested in the UK Biobank Test Set (N = 84,768). In phase 3, gSOS was tested for its performance in a screening strategy, based on NOGG guideline thresholds of fracture risk, applied to a population of 10,522 individuals derived from 5 separate cohorts. Inclusion in the screening program required these individuals to be �50 years with at least 1 risk factor and available measurement of femoral neck BMD. This population comprised a further distinct subset of the UK Biobank Test Set (N = 2,445), as well as individuals from the Canadian Longitudinal Study on Aging (CLSA) (N = 2,931), the Study of Osteoporotic Fractures (SOF) (N = 2,094), Mr OS US (2,026), and Mr OS Sweden (N = 1,026). Together these 5 cohorts in phase 3 are referred to as the validation cohorts. Next, to test the effect of gSOS on fracture screening by age, we stratified the CLSA cohort by age, dividing the population into 3 age groups: 50-59 years, 60-69 years, and �70 years. The CLSA cohort was chosen for this age-stratified analysis, because it was the largest validation cohort and had the widest age range. To assess the performance of gSOS in ancestries other than White British, we tested it in individuals in the UK Biobank Test Set who were eligible for screening and were of non-White British ancestry, as defined by genotypes (see S1 Text for further details of definition of ancestry; Table B in S1 Tables shows the demographic and risk factor characteristics of the sub-population).
This study adheres to the GRIPS statement (see S1 Checklist) and did not have a pre-specified analysis plan [24]. Specific ethics approval was not required for this study.

SOS and BMD measurement
We decided to use polygenic risk scores to predict SOS, rather than BMD, because polygenic risk scores require a large number of individuals with proper phenotyping and genome-wide genotyping. The largest dataset for SOS is approximately 10-fold larger than that for BMD  [10,25]. SOS also predicts fracture, with similar performance characteristics compared to BMD, and the 2 measures are correlated (r = 0.4-0.6) [22]. However, since femoral neck BMD is required for FRAX calculations used in screening programs [26], we required that all individuals in the phase 3 analysis have femoral neck BMD measure available. Details of SOS and BMD measurement are available in S1 Text. All analyses used SOS standardized to a mean of 0 and standard deviation of 1.

Development of machine learning model to predict SOS
Training, model selection, and test datasets. To develop and test gSOS, we followed best practices in clinical prediction to ensure unbiased estimates of model performance by developing the models in datasets distinct from the datasets that were used to test model performance [27]. Participants in the UK Biobank with White British ancestry (see S1 Text), measured SOS, and genotyping information (N = 426,811) were randomly assigned to the UK Biobank Training Set (80% of participants), the UK Biobank Model Selection Set (1.25% of participants), or the UK Biobank Test Set (18.75% of participants) (Fig 1; Table 1). Since BMD was measured in only 4,741 individuals in all of UK Biobank [28], these individuals were assigned to the UK Biobank Test Set to enable them to be used in phase 3 of the study.
Genome-wide association study (GWAS). Using methods from our previous GWAS of estimated BMD in UK Biobank [25], but using a different sample size and SOS as the outcome, we undertook a GWAS for SOS in the UK Biobank Training Set (N = 341,449 individuals with

PLOS MEDICINE
White British ancestry). We tested the additive allelic effects of each of the 13.9 million SNPs passing quality control, separately, on SOS using a linear mixed model to adjust for cryptic relatedness and population stratification [29], as well as adjusting for age, sex, assessment center, and genotyping array (S1 Text). Linkage-disequilibrium-independent associations where obtained using PLINK by clumping SNPs in linkage equilibrium at a r 2 > 0.05 and selecting a single most significant SNP from within each clumped set. To reduce potential bias due to population stratification, the UK Biobank Training, Model Selection, and Test Sets included only White British participants, while all other cohorts included only people of general European ancestry (as defined in S1 Text). Further, as stated above, the performance of gSOS-based screening was also tested in non-White British participants in UK Biobank. Polygenic risk scores using LASSO. Using the UK Biobank Training Set, we fitted 6 LASSO models [23] to predict SOS using only SNPs with p-values smaller than a chosen set of thresholds (Table C in S1 Tables). The UK Biobank Model Selection Set was then used to identify the p-value threshold and regularization parameter (λ) that resulted in the lowest root mean square error for the prediction of SOS. This p-value threshold and regularization parameter were then taken forward for testing in the UK Biobank Test Set. Hence, we ensured that the performance of only 1 final polygenic risk score was evaluated in the UK Biobank Test Set. We refer to this final predictor as gSOS, in which SOS is predicted only from genotype.
Traditional polygenic risk scores. Traditional polygenic risk scores [15] were derived from the GWAS for SOS performed in the UK Biobank Training Set, without the use of LASSO, by including different sets of SNPs, selected by p-value threshold and linkage disequilibrium clumping as described in S1 Text (Table C in S1 Tables).

Generation of FRAX scores
FRAX risk scores for major osteoporotic fracture (hip, clinical vertebra, proximal humerus, or wrist) can be generated with or without BMD, referred to in this paper as BMD-FRAX and CRF-FRAX, respectively [26]. Therefore CRF-FRAX and BMD-FRAX were calculated for all participants in each validation cohort [26]. FRAX clinical risk factors were assessed at the baseline visit for each cohort and included age, sex, body mass index (BMI), previous fracture, smoking, glucocorticoid use, rheumatoid arthritis, and secondary causes of osteoporosis. Measures of more than 2 daily units of alcohol and parental history of hip fracture were not available in UK Biobank and were set to "no" for this cohort, as suggested by FRAX guidelines. Not all secondary causes of osteoporosis were available for the SOF, Mr OS US, and Mr OS Sweden cohorts, and these variables were also set to "no" for these cohorts, as recommended by FRAX. Age was recorded at baseline visit. Sex was self-reported and verified by genotype. Individuals with discordant sex between self-report and genotype were excluded. CRF-FRAX and BMD-FRAX were calculated for all participants in each of the clinical cohorts, using countryspecific FRAX models [26].

Genomic screening in fracture risk screening
In the absence of an international consensus on fracture risk screening [2,4,5,30], we chose to use the assessment and management clinical algorithm developed by NOGG [3], since a screening program similar to the NOGG screening strategy is supported by randomized controlled trial evidence [9]. The NOGG screening strategy uses 10-year absolute probability of fracture as calculated by FRAX and suggests treatment or reassurance based on thresholds of risk, which are age dependent and consider competing risks. The NOGG guidelines (Fig 2) also aim to identify individuals at risk for fracture in a cost-efficient manner by reserving clinical visits and more costly BMD testing for those at intermediate risk, i.e., where the FRAX score lies close to an intervention threshold. This intervention threshold is equivalent to the age-specific FRAX 10-year probability in women with a prior fragility fracture, since nearly all such women would be recommended an intervention [3]. Individuals without any risk factors are excluded from the CRF-FRAX assessment. By applying CRF-FRAX, individuals can be recommended for either an intervention (high risk), a BMD-FRAX assessment (intermediate risk), or reassurance and no further participation in the screening program (low risk). Those having a BMD-FRAX assessment can then be recommended an intervention if their resulting 10-year probability of major osteoporotic fracture exceeds the age-specific threshold, or they can be reassured (see Fig 2).
Despite the efficiencies gained by using this stepwise approach [31], false negatives can occur when interventions are not recommended to individuals who have a low CRF-FRAXbased probability and are discharged from subsequent screening, whereas if they had undergone BMD-FRAX, would have qualified for intervention. Likewise, false positives can arise when an individual is recommended for an intervention based on the CRF-FRAX score but would not have qualified for an intervention with BMD-FRAX.
To try to reduce the number of individuals undergoing testing, particularly more costly BMD testing, who would subsequently not require intervention, we introduced a gSOS-based screening step in the NOGG algorithm, where individuals were reassured if their gSOS was above a threshold (Fig 3). This is because individuals with a high SOS are likely to have a high BMD and are thus less likely to be recommended for an intervention. The trade-off of this strategy is that it could result in reassurance of individuals who, if their BMD was measured, would have been recommended an intervention. This would result in a decrease in sensitivity to identify individuals requiring an intervention. To calculate the sensitivity and specificity of the gSOS-modified NOGG algorithm, we used BMD-FRAX as a reference standard within the NOGG screening strategy (Fig 4). According to NOGG guidelines, women �50 years with a

PLOS MEDICINE
prior fragility fracture are recommended treatment without further FRAX testing. As a result, these individuals were assigned an intervention recommendation when calculating the sensitivity and specificity of correct treatment assignment (Fig 4).
Since resources are often expended to measure BMD-FRAX in individuals whose final probability of fracture is too low to warrant intervention, we also estimated the number of

PLOS MEDICINE
CRF-FRAX and BMD-FRAX tests that were performed but led to the individual being reassured without a recommended intervention.
We chose the sex-specific thresholds of gSOS that reduced CRF-FRAX and BMD-FRAX testing but minimized the loss of sensitivity to identify individuals who would be recommended for treatment. This threshold was chosen using data from the UK Biobank Test Set (S4 Fig). The generalizability of the selected gSOS threshold was then tested in the remaining 4 validation cohorts (CLSA, SOF, Mr OS US, and Mr OS Sweden). The number of CRF-FRAX and BMD-FRAX tests performed but not leading to an intervention were counted. These analyses were conducted in each validation cohort, men and women separately, and in all groups combined. We also tested individuals of non-White British ancestry in UK Biobank (N = 350), i.e., the individuals who remain subsequent to filtering out the White British subset and who have available measurements of femoral neck BMD. The characteristics are provided in Table B in S1 Tables. Table 1 describes the FRAX risk factors for all of the cohorts. There were few clinically relevant differences in any of the osteoporosis-related risk factors in the UK Biobank Training, Model Selection, and Test Sets, as expected, since these sets were generated randomly. As planned, all individuals from UK Biobank with BMD measures were included in the UK Biobank Test Set, to ensure availability of BMD-FRAX scores as the reference standard. There were few differences in demographics or clinical risk factors between individuals with and without BMD measured. The validation cohorts (CLSA, SOF, Mr OS US, and Mr OS Sweden) provided a range of characteristics, allowing for a better assessment of the generalizability of results (Table 1).

GWAS
After quality control (see S1 Text), 13,958,249 SNPs were included in the GWAS. The GWAS in the training set identified 1,404 independent (r 2 � 0.05) genome-wide significant loci at a pvalue threshold of <5 × 10 −8 . S1 Fig shows the Manhattan and QQ plots for this GWAS.

Variance explained in SOS in the UK Biobank Model Selection Set
The polygenic risk score models trained with LASSO explained at most 25.0% (95% CI 23.0%-27.0%) of the variance in SOS in the UK Biobank Model Selection Set (Table C in S1 Tables). S2 Fig provides detailed information on the optimal algorithm tuning parameters. None of the traditional polygenic risk scores performed better than the polygenic risk score derived from the LASSO regression. S3 Fig demonstrates that, as expected, the estimated effects of the activated SNPs from the LASSO algorithm were attenuated compared to the effects estimated from the GWAS.

Variance explained in SOS in the UK Biobank Test Set
Age, sex, and BMI explained 4.0% (95% CI 3.7%-4.2%) of the variance in SOS. Adding all available FRAX clinical risk factors increased the variance explained to 5.3% (95% CI 5.0%-5.6%). The polygenic risk score from the UK Biobank Model Selection Set explaining the most variance in measured SOS was designated as "gSOS" and was then tested for its correlation with SOS in the UK Biobank Test Set. This model explained 23.2% (95% CI 22.7%-23.7%) of the variance in measured SOS and included 21,717 SNPs activated from a total of 345,111 SNPs that had p-values for association with SOS of �5 × 10 −4 (Table C in S1 Tables; Fig 5).

Screening by NOGG guidelines in validation cohorts
The validation cohorts comprised 10,522 individuals eligible for fracture risk screening (Table 1). Both the sensitivity and specificity of the NOGG screening strategy to identify individuals at high enough risk to merit an intervention, compared to the reference standard, BMD-FRAX, were high (99.6% and 97.1%, respectively; Fig 6; Table D in S1 Tables). This high sensitivity and specificity required CRF-FRAX tests to be undertaken in 81% of the population eligible for screening, with BMD-FRAX tests subsequently recommended in 37% of the population. In total, 74% of those requiring CRF-FRAX tests were classified for reassurance, i.e., without a recommendation for an intervention. As well, just over one-third of all individuals who received a BMD-FRAX test were classified for reassurance without intervention (Fig 6; Table D in S1 Tables).

Screening incorporating a gSOS-based screening step
Using the UK Biobank Test Set, we selected the threshold of gSOS that would minimize the number of BMD tests done in persons who would ultimately be reassured rather than receiving an intervention, but also would minimize the number of false negatives (S3 Fig). Applying this threshold separately in men and women, we found that a threshold of standardized gSOS set to 0.5 and 0 for men and women, respectively, balanced these goals in the UK Biobank Test Set, and subsequently individuals above these thresholds were excluded from further screening in the validation cohorts, prior to receiving a CRF-FRAX or BMD-FRAX test (Fig 3). The utility of this threshold was then tested in all validation cohorts. that applying a gSOS screening step in the validation cohorts resulted in a small decrease in sensitivity to identify eligible participants for therapy, to 93.4%, but that the specificity increased slightly, to 98.5%. However, the proportion of screened individuals requiring CRF-FRAX testing decreased from 81% to 51% (representing a relative decrease of 37%) compared to NOGG-based screening without a gSOS screening step. Additionally, the proportion of screened individuals requiring BMD-FRAX testing decreased from 37% to 22% (representing a relative decrease of 41%) (Fig 6; Table D in S1 Tables).
The proportion of CRF-FRAX and BMD-FRAX tests that resulted in an individual being excluded from the screening program without a recommendation for an intervention also decreased from 74% to 46% and from 34% to 20%, respectively (Fig 6; Table D in S1 Tables). Cohort-specific results are shown in Tables E-I in S1 Tables.
The positive predictive value for correct treatment assignment in all validation cohorts was 91.8% without a gSOS screening step and increased to 95.4% with the gSOS screening step ( Table D in S1 Tables; cohort-level results and subgroup results are available in Tables D-P in  S1 Tables).

Women and men separately
The SOF cohort was composed of only women, while Mr OS US and Mr OS Sweden were composed of only men, providing the opportunity to explore performance characteristics by sex. Further, we divided the UK Biobank Test Set and CLSA into sex-specific datasets (Tables J-M in S1 Tables). Amongst 4,859 women who were eligible for screening in the cohorts (SOF, UK Biobank Test Set, and CLSA), the sensitivity and specificity for correct treatment
When applying a gSOS screening step, the sensitivity decreased marginally, to 94.6%, and the specificity increased marginally, to 98.2%. The proportion of the population requiring a CRF-FRAX test reduced from 58% to 27% (representing a relative decrease of 55%), while the proportion requiring a BMD-FRAX test reduced from 43% to 20% (representing a relative decrease of 55%) ( Table N in S1 Tables).
Amongst the 5,668 men eligible for screening, the sensitivity and specificity were 96.9% and 98.2%, respectively, using CRF-FRAX alone as the screening step. In order to achieve this performance, 100% of men had a CRF-FRAX test, and 31% required a BMD-FRAX test. The yield of high-risk individuals from these tests was low, such that 94% of men receiving a CRF-FRAX test were reassured, and 29% of those receiving a BMD-FRAX test were reassured ( Table O in S1 Tables). Applying a gSOS screening step to these men reduced the sensitivity to 82% while maintaining a similar specificity at 99%. However, the proportion of men requiring a CRF-FRAX test reduced to 72% (representing a relative decrease of 28%), and the proportion undergoing BMD-FRAX reduced to 23% (representing a relative decrease of 25%).

Stratification by age
We next tested the performance of gSOS in different age strata to understand if the screening efficiency improved more for one age group than another. Using the largest cohort, with the largest variation in age (CLSA, N = 6,704), we found that gSOS had the highest performance in individuals aged �70 years. Specifically, the sensitivity and specificity to identify individuals who require an intervention remained high, at 99.6% and 94.9%, respectively. The proportion of screened individuals requiring CRF-FRAX testing decreased from 73% to 37% (representing relative decrease of 49%) compared to the NOGG screening strategy without a gSOS screening step. Additionally, the proportion of screened individuals requiring BMD-FRAX testing decreased from 24% to 12% (representing a relative decrease of 50%) (Table F in S1 Tables). In contrast, in individuals aged 50-59 years, sensitivity reduced to 86%, but specificity was 99.6%. The percent of individuals requiring CRF-FRAX and BMD-FRAX testing reduced by 51% and 50%, respectively. This demonstrates that gSOS pre-screening improves the efficiency of screening, but that the sensitivity to correctly identify individuals requiring therapy is maximized in older age groups.

Non-White British individuals
We then assessed the effect of a gSOS pre-screening in individuals from UK Biobank with dual-energy X-ray absorptiometry BMD measures who were of non-White British ancestry ( Table B in S1 Tables). We found that the results were generally consistent with those in individuals of White British ancestry. Specifically, the proportion of screened individuals requiring CRF-FRAX testing decreased from 94% to 48% (representing a relative decrease of 49%) compared to NOGG-based screening without a gSOS screening step. Additionally, the proportion of screened individuals requiring BMD-FRAX testing decreased from 39% to 17% (representing a relative decrease of 57%) (Table P in S1 Tables).
The proportion of CRF-FRAX and BMD-FRAX tests that resulted in an individual being excluded from the screening program without a recommendation for an intervention also decreased from 92% to 47% and from 38% to 16%, respectively (Table P in S1 Tables).

Discussion
By building a polygenic risk score using 341,449 individuals and validating its utility in fracture risk screening in 5 separate cohorts totaling 10,522 individuals, we determined that genomicsenabled fracture risk screening could reduce the proportion of people who require BMDbased testing by 41%, while maintaining a high overall sensitivity and specificity for correct treatment assignment. While these findings are not meant to be prescriptive, they indicate the possible utility of polygenic risk scores in screening programs that are dependent on heritable risk factors.
Fracture risk assessment is expensive, with estimates of approximately US$50,000 per quality-adjusted life year gained [32], but is less expensive, or even cost-saving, using NOGG assessment strategies [33,34], because NOGG decreases the number of individuals who require CRF-FRAX and BMD-FRAX testing. Current guidelines suggest testing a large proportion of the population [2,3,5], yet most screened individuals are not identified as having a clinically actionable level of fracture risk [9,35]. This circumstance provides an opportunity for genetically derived measures of risk to increase cost-efficiencies in healthcare systems where investments have been made in genome-wide genotyping. Already at least 7 large healthcare systems have invested in genome-wide genotyping of a large proportion of their population, among whom electronic health record data are available [36,37]. Since the costs associated with genome-wide genotyping have now dropped below those of several routine clinical tests, the use of polygenic risk scores could be particularly helpful in these environments since a onetime genotyping cost could be used to generate several polygenic risk scores. However, there is a clear need to study the translation of such polygenic risk scores to clinical applications [38]and the work presented here provides one example of how such scores could be translated to the clinic.
Previous attempts to predict osteoporosis from genomic data did not substantially increase discrimination compared to standard clinical measures alone, likely because the GWAS that underpinned these attempts was derived from 32,961 individuals and explained only 5.8% of the variance in BMD [39,40]. The improvement in variance explained in this study was attributable to the increase in sample size afforded by UK Biobank and to the LASSO regression's ability learn SNP associations with SOS jointly, as opposed to summing over independently measured effects on BMD. Other attempts to predict BMD have been based on several dozen genome-wide significant SNPs [39], whereas our approach used machine learning to jointly consider the effects of 642,127 SNPs (Table C in S1 Tables). LASSO regression has recently been used to predict estimated BMD, but from a GWAS sample size that was one-third of that used here, explaining only 17.2% of the BMD variance, and it was not used in a screening program [14]. Our work has improved the genomic prediction of BMD and demonstrated its potential clinical relevance.
We observed similar predictive performance across all LASSO models in the model selection step (Table C in S1 Tables); therefore, it remains possible that a more parsimonious model containing fewer SNPs would perform as well. As a result, further exploration of these LASSO models is warranted in a future technical study. However, should a more complex model with more SNPs prove to be ideal, the hinderance to clinical translation should be minimal, as the computational burden is limited to the training of the models, and is not in the prediction of an individual's genetic risk.
The sensitivity and specificity to correctly assign intervention was maximized in individuals �70 years of age. This could be clinically relevant because this is the age range for which the SCOOP trial demonstrated that a community-based screening program could be effective in reducing hip fractures [9].
We acknowledge that for many practicing physicians, such as those in the UK, who have access to an automatically generated electronic-health-record-based CRF-FRAX test, the result of interest would be the reduction in BMD-FRAX tests. However, we observed no appreciable difference in the sensitivity and specificity to correctly identify individuals requiring therapy if the gSOS screening step was placed prior to the CRF-FRAX test or immediately after the CRF-FRAX test. Tables E-O in S1 Tables show the results for a reduction in BMD-FRAX tests by cohort and sex.

Limitations
We have generated a polygenic risk score for SOS, rather than BMD, since there are insufficient data resources to generate such a score for BMD. Nevertheless, the correlation between SOS and BMD has enabled the identification of individuals unlikely to have a BMD low enough to warrant an intervention. Further refinement could improve the efficiencies presented here, including a polygenic risk score for BMD, when sample sizes are large enough to enable this. While nearly all FRAX risk factors were available for study, alcohol intake and parental history of fracture were not available from the UK Biobank cohorts. However, these were available in the other validation cohorts. Secondary causes of osteoporosis were not uniformly available in SOF, Mr OS US, and Mr OS Sweden. Nevertheless, CLSA provided similar results to other cohorts and had all required information. Like participants in most cohort studies, the participants used in these studies are, on average, healthier than the general population [41]. Thus, external validation in a truly population-based study may provide helpful estimates of the real-world performance of genomics-enabled fracture risk screening. While we have tested the utility of gSOS in individuals of non-White British ancestry, the sample size available for study was relatively small, and thus results should be replicated in additional cohorts of different ancestry, underlining the need for large-scale GWAS datasets in individuals of non-European ancestry [42]. We recognize that different approaches could be taken to incorporate polygenic risk scores into fracture risk screening, but here we offer a simple approach that could be readily implemented in a genotyped population with required FRAX risk factors using the NOGG strategy [9].

Conclusions
In summary, we have developed and tested gSOS, a polygenic risk score for SOS, which when introduced into a fracture risk screening program decreased the number of people requiring CRF-FRAX and BMD-FRAX assessments, while still maintaining a high sensitivity and specificity to identify individuals in whom an intervention should be recommended. These findings highlight the role that genetic prediction could play in screening programs that rely upon heritable risk factors.