Pharmacological signatures of the reduced incidence and the progression of cognitive decline in ageing populations suggest the protective role of beneficial polypharmacy

Preventive treatments for dementia are warranted. Here we show that utilization of certain combinations of prescription medications and supplements correlates with reduced rates of cognitive decline. More than 1,900 FDA-approved agents and supplements were collapsed into 53 mechanism-based groups and traced in electronic medical records (EMRs) for >50,000 patients. These mechanistic groups were aligned with the data presented in more than 300 clinical trials, then regression model was built to fit the signals from EMRs to clinical trial performance. While EMR signals of each single agents correlated with clinical performance relatively weakly, the signals produced by combinations of active compounds were highly correlated with the clinical trial performance (R = 0.93, p = 3.8 x10^-8). Higher ranking pharmacological modalities were traced in patient profiles as their combinations, producing protective complexity estimates reflecting degrees of exposure to beneficial polypharmacy. For each age strata, the higher was the protective complexity score, the lower was the prevalence of dementia, with maximized life-long effects for the highest regression score /diversity compositions. The connection was less strong in individuals already diagnosed with cognitive impairment. Confounder analysis confirmed an independent effect of protective complexity in multivariate context. A sub-cohort with lifelong odds of dementia decreased > 5-folds was identified; this sub-cohort should be studied in further details, including controlled clinical trials. In short, our study systematically explored combinatorial preventive treatment regimens for age-associated multi-morbidity, with an emphasis on neurodegeneration, and provided extensive evidence for their feasibility.


Forming groups based on individual active compounds
All databases mentioned in the Data Sources are relational, with the rows linked to the patient ID and the columns designated for comorbidities, treatments, age, gender, race and lifestyle parameters. In case of NACC, multiple rows may represent dated visits by the same patient (38000 patients). All components in the datasets can be clustered using Pivot Tables function of Microsoft Excel, aggregating numerical and non-numerical identifiers linked to the given identifier in the same row. The groups analysed in Table 1 of the Results were formed by Pivot Table clustering of names for the dosage forms (FDA approved pharmaceuticals and OTC supplements) available in NACC in > 134000 visit profiles. All other metrics associated with the patients receiving a given dosage form can be also summarized using the same tool: the counts of dementia tags, the count of MCI tags, average MMSE scores, the counts of other actives and comorbidities. These counts can be related to the number of visits in the cluster, and the result can be related to the same measurement for the control, producing hazard ratios reported in Table 1 of Main Manuscript. Versatile nature of the Pivot Table tool allows labelling of dosage forms as members of the same pharmacological mechanism and further aggregation of individual dosage forms in pooled groups based on mechanistic commonality, using the label for the aggregate as the grouping element. General flow of the data analysis is presented at Fig 1. Computing database metrics for aligning with clinical trials [HRDEM], [HRNORM], [HRAMRT]-hazard ratios respectively of dementia, cognitive norm and all-cause mortality in groups formed based on receiving a treatment or agent and normalized to the entire database control. When data were available, another a-priori parameter [MOD] (modality) was applied, defined as the number of agents/factors/mechanisms simultaneously tested in a clinical trial.

Screening the groups formed by a single factor for non-random origin of the signal
The tags for dementia (DEM), cognitive norm (NORM) and all-cause mortality (AMRT) were assumed not to follow normal distribution when considering grouping by actives. Random  [24] Bronchodilator 0.84 1.02 [24], [25], [36] Biotin 0.21 0.09 [26], [27] Adrenaline 0.44 0.25 [28] Antidiabetic 1.1 1.17 [29] Chondroitin / glucosamine 0.62 0.5 [30] Antiglaucoma 0.86 1.13 [31] Probiotic 0.7 0.5 [32] Anti-gout 0.97 1.1 [33], [34] Corticosteroid 0.84 1.06 [35], [37] HR DEM-hazard ratio of all-cause dementia in the category of compounds aggregated by mechanism of action, HR AMRT-hazard ratio of all-cause mortality, HR NORM-hazard ratio of the patients remaining cognitively normal in the aggregate. The hazard ratios were computed by dividing the value observed in the mechanism aggregate to the value observed across the entire database. RCT +, RCT -, RCT 0 -numbers of randomized clinical trials (RCTs) directed to the prophylaxis and treatment of neurodegeneration with positive, negative and neutral outcomes, respectively. The trials are attributed to the pharmacological mechanisms or aggregates of the latter. A positive outcome was defined as confirmation of an original hypothesis, a negative outcome was defined as non-confirmation, and a neutral outcome was defined as inconclusive. (RCT+-RCT-)/TOT is the ratio between the positive balance of clinical trials and total trial outcomes.  grouping was imposed by producing random tags and aligning this profile of 134000 numbers in the 1-300 range with 134000 visit profiles in NACC. The same Pivot Table procedure applied for combining the profiles based on a common agent was applied to combining according to the same random number. In the groups with the same random number, the parameters for dementia, norm and mortality were computed and variation between the groups was computed as well. Variation was assessed individually for HRDEM, HRNORM and HRAMRT, and for the combination of these metrics in the parameter REG (below) for each random group. These randomly derived standard deviations were adjusted to the size of the groups and served to form Z-scores: Z ¼ ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi N=460  The columns indicate source, the active agent, modality of the trial (MOD) denoting how many components were in a cocktail, or if that was a single substance trial, and self-assessment incremental success metric: +1 -positive, 0 -neutral, -1 -negative.
https://doi.org/10.1371/journal.pone.0224315.t002   Where Z-z-score, ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi N=460 p -adjustment for the size of the groups N different from 460 used in random model calibration, FG-frequency of the marker in the group measured in relative units of hazard ratios, FC-frequency of the marker in the total population control in relative units (FC = 1), S = σ x t-Studentized variance, σ standard deviation produced in the random model, t-correction coefficient in Student's distribution, a function of allowed reliability for the test (α< 0.1) and degrees of freedom (1). The correction coefficient is 6.314 in these assumptions for a two-sided test. The studentized Z-score > 1 in absolute values means > 90% probability of non-random origin of the effects irrespective to normal distribution of the effects. DEM, NORM, AMRT, REG can be the markers of interest.

Meta-analysis of clinical trials and derivation of a classifier predicting clinical trial performance with a-priori inputs from the databases of electronic medical records
Clinical trials relating to prevention of neurodegeneration were selected for the training set. The trials included prevention of subjective or mild cognitive impairment (MCI), treatment of mild dementia, treatments of multiple sclerosis, Parkinson's disease, resistant depression, psychoses. The trials directed to treatment of severe dementia were included as well but known anti-Alzheimer's drugs and anti-psychotics were not considered due to lack of relevance to prevention. The focus of the effort was re-purposing of non-dementia directed FDA-approved agents and OTC supplements. The training set included > 250 elementary trials, presented in the (File C in S1 Data), collected for the period 2007-2017, using PubMed as a search engine.
The PubMed IDs of the trials were tabulated, the outcomes classified as success if the authors state the original hypothesis was confirmed (RCT+ = 1), failing if the authors state so (RCT-= 1) or otherwise neutral (RCT 0 = 1). Multiple components tested in the same trial were classified according to relevant general mechanisms (NSAID, antihistamine, hormone etc.) and the level of modality was noted (MOD = 1 -a single agent, MOD = 2-4-2-4 agents, MOD = 5 -five or above). These trials were attributed to 53 major groupings of pharmacological mechanisms defined either in a standard manner or according to the composition details provided in Results. The difference BAL between positive and negative results was normalized to total number of trials in a given mechanism (TOT). This metric (derived on clinical trial side) was correlated to REG (derived on EMR database side)-a linear combination of interpretable generic features, not depending on specifics of a mechanism, with the regression T-Stat is defined by LINEST function of EXCEL as the ratio of the coefficient and standard error. P-value of the T-stat are provided, as well as confidence intervals CI95 for the coefficients.
B. The results of regression in NSHAP data, 2138 respondents, the outcome is the number of protectants in personal profiles, the coefficients are determined for the contribution of factors that either parallel or counter the presence the multiple protectants in personal profiles. C. The analysis analogous to A and conducted in NACC, 38838 patients, the outcome is fraction of dementia at the end of follow up.
D. The analysis analogous to B and conducted in NACC, 38838 patients, the outcome is the number of protectants in personal profiles. https://doi.org/10.1371/journal.pone.0224315.t003 Reduced cognitive decline in ageing populations exposed to multiple protective pharmaceutical factors coefficients derived by optimizing Pearson correlation with the outcomes BAL/TOT.   Reduced cognitive decline in ageing populations exposed to multiple protective pharmaceutical factors Reduced cognitive decline in ageing populations exposed to multiple protective pharmaceutical factors  confounders were re-measured in both general and decedent populations, but confounder normalization was conducted only in general population, leaving the same metrics to float and accept the final value in the decedents. The columns 1-5% refer to the top 1-5% of rank by the protectants, 51-100% refer to the bottom 51-100% rank by the protectants.
The columns from 2 to 7: 2. 0-5%-outcome and confounder values in the cohort formed by the top 0-5% of the rank in the general population by the number of protectants in the patient's profile.
4. p-values assessing significance of differences between the 0-5% and 51-100% cohorts. 5-7 -the same as columns 2-4 (which are measured in living population) but measured for the decedents formed in the general population.
The most important outcomes are marked in bold (H(T), DEM, MMSE).   Table 5 for details of notations. The difference between the Tables 5 and 6 data is the composition. The agents tested as protectants included any of: zinc, selenium, chromium picolinate, biotin, herbal supplements, vitamin A and lutein supplement, angiotensin-receptor blockers.
For the addendum part, 9203 decedents only were included in the analysis, produced in all NACC versions. Composition 1 (high complexity, higher REG agents as per Table 1) includes any of: angiotensin receptor blockers, NSAIDs, anti-diabetics, selenium, biotin, chromium picolinate, zinc, Vitamin A + lutein, garlic, red yeast rice, turmeric, cranberry, flax, the rest are any of the prescription medications. Composition 2 (high complexity, lower REG agents as per Table 1)   To test stability and generalization by the predictor, the training set was permuted by randomly excluding 10 elements on the BAL/TOT side and the resulting variation in the prediction rule allowed to compute variations in the regression coefficients in (2), providing 5-fold cross-validation. Generalization was assessed by variation in the resulting correlation coefficient between permuted BAL/TOT and adjusted REG profiles. Generalization was considered acceptable if the excluded elements retain the original rank being outside of the initial prediction rule. The performance of the predictor was assessed by Response Operative Characteristic (ROC), using the average permuted score as a ranking function and presenting BAL/TOT as either + 1 if BAL/TOT > 0 (positives) and -1 if BAL/TOT = 0 or < 0 (negative). Variation of the point's locations due to permutation was reflected on the ROC plotting (Fig 2).
The predictor was assessed for signal-to-noise ratio by randomly scrambling BAL/TOT profiles (12 times) and the scrambled profile was re-correlated with REG.
The predictor was tested on a new set of 70 clinical trials identified by searching a different interface (Google Scholar) and representing clinical performance of the top octile by REG rank. The agents of interest were identified in the tables of compositions tested in the trials Reduced cognitive decline in ageing populations exposed to multiple protective pharmaceutical factors and the results were digitized identically to the training set. The BAL/TOT ratios for the agents were computed and compared with the profile of ratios for the training set. The ability to predict predominantly successful clinical trials and the ability to extrapolate continuous strong results based on initial promise were explored as validation tests.

Forming groups based on simultaneous presence of multiple agents
With the ability of Regression scores (REG) to predict increased probability of trial success being confirmed, both REG and empirical trial performance were used to select a limited number of agents for combinational study. The candidates passing this double filter were traced in the individual profiles in NACC and counts PR (protectants) or N were produced. In some patients, PR was 0 (no protectants) and in some as high as 24 protective molecules consumed by a patient. Using Pivot Tables, visits in NACC were aggregated for each patient ID, producing the data point at the first and at the last visit during follow up. These data points were available for all parameters (cognitive, comorbidity, polypharmacy, protectants, duration of the follow up for each patient being a variable in NACC) tracked in this study for the first and last visit. In other datasets these metrics were pre-computed by the providers-such as the number of supplements per a profile in NSHAP. If validated agent on the short list was present in the profile during the final visit-PR was incremented by 1. The presence of the next agent on the short list of interest was producing the next increment etc.

Normalization to confounding factors
Multiple regression was chosen as the major normalization method due to complex interactions between the confounding factors [20,21]. The program LINEST is available as a supplement to Microsoft Excel [22]. Statistical significance of the regression coefficient for a factor X analysed in the context of all known confounders included in the model (p-value < 0.1) and its expected direction were the criteria for conclusion that the factor is independent. The data of Table 3, Figs 3 and 4 were normalized using multivariate regression.
With Bayesian competition hypothesis proposing reduction of dementia in multimorbid patients (Discussion) it was desirable to individually equalize all confounders between the case and control cohorts. For the given system: Where 1 -HR(DEM) is deviation of the hazard ratio of dementia in a given case stratum of database with higher PR case and the stratum with the lower PR cont used as a control, where PR is the number of protectants per the profile, where W1 is non-adjusted regression coefficient. Considering the differences between the confounder levels in the case and control strata expands the regression: Where ∑WiΔXi is the contribution of the confounders Xi in the observed outcome, W'1 is confounder-adjusted value of the primary regression coefficient. A transformation of ranks can be introduced that nullifies ∑WiΔXi term and each ΔXi individually: Measuring the modified (PR' case − PR' cont ) and adjusted 1 -HR'(DEM) after the transformation allows to identify the confounder-adjusted regression coefficient W'1 linking the difference in the number of protectants and decrease in dementia in the case cohort. Combining The rank transformation used in this report was the following: Where is PR'-adjusted rank for a given patient profile according to the number of protectants, PR-initial non-adjusted rank, X i − confounder values, V i − variable weight coefficients. These weight coefficients were subjected to an optimization process with the criterion of ∑WiΔXi = 0; ΔXi, j, k = 0 as the convergence condition. The coefficients Vi were incrementally varied and after each variation the database was re-ranked and convergence criteria re-tested. Equalization of readings for all confounders accompanies the results reported in Tables 4-6.

Choice of metrics to report the outcomes in the cohorts
With the cohorts of the fixed size, rare tag occurrence may produce random fluctuations exceeding the size of the intended measured effect. The tags were screened for the adequate prevalence in the cohort to report the effects with maximized signal-to-noise ratio.

Cohort size
Where NC-the requested sample size, Z-z-score of the confidence probability, 1.98 for α < 0.05, e-acceptable value of noise-to-signal ratio, assumed 0.1, σ-variation of the tag prevalence between random groups of the size M, M/[Cohort size]-adjustment of the variation determined in a one-time calibration for the random group of size M to various sizes of the cohorts, see [23]. The values of NC were computed for different possible outcomes with different relative variation σ and those that satisfied (7) were acceptable.

Estimating hazard function of dying by Gompertz model using EMR records
Hazard function H(T) of dying in the year T+1 of age can be estimated empirically from the expression: Where N d is the number of accrued decedents in the initial elderly population N c , entering the observation at 100% survival rate, FPB-the group-averaged year of follow up beginning, LSthe group averaged life-span.

Data analysis flow
General flow of the data analysis is presented at Fig 1. Dementia prevalence was measured in relatively large groups of patients that took an OTC supplement or an approved pharmaceutical (Methods). These groups were produced by Pivot Table clustering of the patient's profiles in the database, using a common dosage forms name shared by different patients. The average dementia, norm and mortality in the groups were normalized to the respective averages for the database total, forming hazard ratios. The hazard ratios were validated by clinical trials, studies and animal experiments for a given agent or related compounds which exert their action by similar mechanisms (Table 1). The agents were ranked based on a linear combination of dementia, residual cognitive norm and mortality computed in the groups, and the rank was demonstrated to correlate with clinical trial success (Tables 1 and 2). Next, the validated higher-ranking compounds (first octile of rank) were traced as combinations of N agents in the personal profiles of the patients, and the outcomes were reported (Tables 3-6, Figs 3 and  4). The role of confounding factors was ruled out and the role of the agent diversity as an independent factor was demonstrated (Table 3, Figs 3 and 4). The material of Tables 4-6 deals with permutations of the patient's cohorts included in the studies, reproducibility of the effects in multiple independent databases, attempts to rule out biases, attempts to show that the effects persist in all ages. Supplemental files are provided for independent review of the original data and of the intermediate processed files reflecting the key elements of the data flow. Reading the text and concurrently reviewing supplemental data is the shortest path to learn the methodology. File A in S1 Data (161 MB) presents sub-files "Grouping by patient ID", "Grouping by Drug Name", with detailed Pivot Table templates illustrating transition from visit organization of NACC database (multiple rows with the same patient ID but different dates and content) to clusters by agent/factors or to longitudinal format (one patient ID per one row, timedependent information presented as MIN or MAX indicating beginning and end of follow up).

Screening of individual pharmacological mechanisms for correlation with dementia rate, preservation of cognitive norm and all-cause mortality
In NACC dataset, the columns with pharmaceuticals and supplements were identified, the agents were mapped to the patient's visits and exposures on the first and the last visits were computed (File A in S1 Data, examples of using Pivot Tables). The 118,000 visits in the NACC database as of 5/2017 were clustered into the cohorts according to use of > 1,900 dosage forms of FDA-approved pharmaceuticals and over-the-counter supplements (see Methods, File B in S1 Data). For each group, defined by presence of an individual compound or factor, fractions of dementia, cognitive norm and all-cause mortality were compared to similar fractions in the entire database, thus producing following hazard ratios: HR DEM-hazard ratio of all-cause dementia in the category of compounds aggregated by mechanism of action, HR AMRT-hazard ratio of all-cause mortality, HR NORM-hazard ratio of the patients remaining cognitively normal (File B in S1 Data). The groups with insufficient numbers of patients were pooled based on common pharmacological mechanism or co-use group (ex: vitamins of B group). Each mechanistically similar pooled category of the compounds (or individual compounds) was aligned with matching clinical trials (Table 1).
Additional mechanistic groups not yet tested in RCTs with neurodegeneration prevention endpoints but showing promise based on the HRNORM, HRDEM and HRAMRT signals do require further verification by clinical observations, and, therefore, are listed in Table 1 separately from the main body of compounds. For these compounds, the literature was mined for evidence of their exclusion by prescribing physicians in dementia (barbiturates, baclofen). The data presented in Table 1 suggest the inclusion of biotin, probiotics, bronchodilators, nasal steroids, chondroitin/glucosamine, and anti-gout medicines as candidate compounds to explore for their potentially neuroprotective properties (See Table 1 for references). The data on verified use of these compounds in various clinical trials formed a training set.
The question that we tried to answer was whether the signals produced based on epidemiological parameters in the database have any relevance to the reported incremental success of >250 clinical trials addressing neurodegeneration in the period 01-2007 to 10-2017, extracted in PubMed (referred to as "training set", see Methods, File C in S1 Data). The relevance of the enquiry is driven by a known disconnect between the promising epidemiological data and incremental results in the clinical trials of individual agents [15][16]. Table 1 shows relationship between database-born metric Regression Score (REG) derived in the individual mechanisms and BAL/TOT = ((RCT+)-(RCT-))/TOT for clinical trials (Pearson correlation R = 0.35), where BAL is the balance of positive RCT+ and negative RCTtrial outcomes, TOT is the total number of trials for the mechanism of interest. The predictor was tested for stability by randomly excluding 10 (18%) of correlated pairs in the profiles of 53 pairs (details of this test are shown in File C of S1 Data).
To improve the strength of the signal, we pooled several mechanisms into one test, by computing average BAL/TOT and average REG for N pharmacological actives. Pooling was conducted in the following manner: Table 1 was ranked by REG, the top 5 agents were averaged on BAL/TOT and REG sides and the 5-member window was moved one step down the rank. As a result, a profile of 50 windows producing 5-member averages was correlated. Combining 5 agents increased correlation coefficient to 0.72 between database signals and clinical trial validations. To assess its non-randomness, the same scrambling procedure was applied producing the random arrays (see File C in S1 Data, scrambling tests). With Kolmogorov-Smirnov test pvalue 0.82, Z-test followed with value of z being -5.63843. The result points to strengthening of signal-to-noise ratio at this degree of pooling and to strengthening of signal itself. Analogously pooling 10 actives in a single test led to the correlation coefficient 0.93 between the averaged predictor in the window of 10 and the averaged outcome in a window of 10 (Z-score against scrambled set = -5.5, p-value = 3.8x10^-8).
This result has practical significance. Testing simultaneously 10 promising agents (or the number of any database-nominated protective factors that can be provided as a part of a clinical trial) is more inherently predictable and successful than testing a single agent. In a 10-member parcel, the outcome is determined by the predictor by R 2 = (0.93) 2 = 86%, while 14% of the future observed effect is defined by inherently unknown factors. By contrast, in a single agent test the outcome is determined by the predictor by R 2 = (0.35) 2 = 12%, while the remaining 88% are defined by inherent unknowns. With the outcome being trial success, the percentage of stronger biological results must theoretically increase more than linearly with the diversity (complexity) of factors included in the anti-dementia trial rationale.
Choosing highly scoring agents based on dosage form epidemiology (as dementia prevalence correlate) and using combinations of such highly scoring agents leads to greater success rate in clinical trials and clinical applications. The predictive power of database-driven metrics was tested for its ability to 1. predict success in trials based on individual mechanisms and combinations not included in the training set; 2. predict the alignment with research literature describing the effects of different pharmaceuticals on short-and long-term cognitive status.
The testing set was assembled by mining of controlled clinical trials including the top 7 agents ranked by REG (Vitamin A-lutein, chromium picolinate, biotin, selenium, zinc, antimigraine, oestrogen). Unlike training set in Table 1 identified in PubMed between 2007 and 2017 (File C in S1 Data), the testing set was extracted in Google Scholar without limitations by time. The results are combined in Table 2. In testing context, biotin and anti-migraine drugs were the agents with known high REG and no trials available in training set.
Several clinical trials were identified for biotin, including those inducing remissions in symptomatic multiple sclerosis (Tourbah et al., 2016, Table 2). Other trials of mostly nutritional multivitamin-multimineral compositions including biotin demonstrate high rate of success in the incremental definition accepted in this report (in 6 out of 8, BAL/TOT = 0.625). Strong REG signal for anti-migraine medicines was not accompanied by any tests for such in controlled trials. Instead in (Vuralli et al, 2018 in Table 2, [38], [39]), a mechanistic link between migraine and cognitive decline is discussed as under-appreciated. This leads to idea that anti-migraine medications may contribute to preventing gradual long-term cognitive decline as well, thus producing a validated database signal. The authors were not aware of the supporting literature at the time when extracting the training set was performed (Table 1). Other prominent BAL/TOT ratios were at 0.6 for carotenoids, 1.0 for chromium picolinate, 0.625 for selenium, 0.75 for zinc. For oestrogens, the scoring is complicated by apparent difference in performance of transdermal patches, non-progestin oral hormone replacement therapies and oestrogen-progestin combinations. Differentiation of oestrogen treatments into transdermal estradiol (T), generic oral treatments (E) and combinations with progesterone (P) leads to BAL/TOT = 0.84 for transdermal oestrogen (T), BAL/TOT = -0.2 for oral oestrogen (E) and BAL/TOT = 0.09 for oestrogen combined with progestin (P). Pooling all available formulations produces BAL/TOT = 0.06. The highest octile of the training set ranked by REG includes 10 incrementally positive, 3 negative and 3 neutral trials. In the testing set, the respective numbers are 45, 17 and 10. By contrast, the entire training set includes 174 incrementally positive, 104 negative and 45 neutral results, while the 2 bottom quartiles include 60 incrementally positive, 59 negative and 22 neutral results. These numbers indicate that incorporation of REG ranking in the planning of anti-dementia clinical trials is likely to produce~3-fold increase in their success rate. Fig 2 presents formal Response Operative Characteristic performance data for REG classifier, with the margins of variation produced by 5:1 cross-validation, modifying the prediction rule (average of 5 re-training subsets after excluding 10 mechanisms out of the panel of 53 in each case). The permutation-averaged ratio of positives to negatives is > 5 in the highest ranking octile of the REG score, while being 0.95 for the entire dataset.
With meaning of REG rank established as an a-priori predictor of success rate, a total of 1962 dosage forms available in NACC were screened by computing REG and Studentized Zscores for each group (File C in S1 Data). Relative errors for REG components (NORM, DEM, AMRT) were assessed based on a random model and adjusted to the drug group sizes after computing a propagated error when combining the components in REG, also incorporating REG variation between prediction sub-rules. Out of 1962 dosage forms, 1033 produced the groups of suitable size. Enrichment between the best octile by Z-score and the rest of the agents with the available metric produced the following leaders: vitamins D (enrichment 7, 6 df (dosage forms)), vitamins B (4.2, 8 df), vasodilators (14, 2 df), metabolic supplements (3.5, 18 df), probiotics (2.8, 7 df), antivirals (4, 13 df), stimulants (2.7, 19 df), ophthalmic cellulose (7, 2 df), omega-3 (4.  [42], antiplatelet [43,44], vasodilators [43,45], antimigraine [38,39], antivirals [46], probiotics [47], magnesium [48,49], transdermal oestrogen (Table 2) and possibly metformin [40,41,50,51] are the agents of interest to perhaps combine with the multivitamin-multimineral formulas of Table 2. Conversely, the signals by corticosteroids [52,53], anti-cancer therapies [54,55], anti-acid drugs [56,57], infections underlying use of antibiotics and antifungals [58], the link between dementia and respiratory deficiency [59], between dementia and cardiac arrythmia [44] mostly align with outside evidence of involvement in neurodegeneration. Effects of statins are likely due to underlying conditions outweighing the effects of medications (cardiovascular disease, metabolic syndrome), not unlike the insulin case described above. Association of anti-acids may need to be deconvoluted further in the effects of soluble aluminium and proton-pump inhibitors [57]. Overall, the profile of REG distribution coefficients between the top and the remaining octiles of REG rank for 1033 individual dosage forms agrees with literature and the observed coefficients can be rationalized through the balance between the effects of therapy and that of underlying cause on the incidence of cognitive decline.
Reviewing the clinical trial space, literature and enrichments by REG, we noted that three larger groups of active compounds can be formed: 1. Cerebrovascular modulators: antihypertensives, vasodilators, PDE5 inhibitors, antiplatelet, antimigraine.

Metabolic stimulators, coenzymes, antioxidants, vitamins
High correlation between averaged REG for groups > 5 and clinical trial success produces a rationale to combine these large domains in higher order compositions, and those-with the components of lifestyle [60]. Thus, we tracked the complex combinations (5-20 elements) of the factors analogous to the best performing in the clinical trial space in the personal profiles of patients across multiple databases.

Combinatorial intervention as efficient approach to delay cognitive decline: Analysis of confounders
For each patient, we counted a total equally weighted number of the factors in highest ranking quartile by regression score REG to generate a novel index that we termed "protective complexity" which, in a sense, reflects summarized effort in staving off cognitive decline. This index can include pharmacological, behavioural, comorbidity or even social factors, as soon as they are pre-validated by controlled studies or at-least by mechanism-based outside evidence. In this report, we limit our analysis to pharmacological factors only.
Protective complexity indices PR were computed for each patient, followed by re-ranking of entire cohort by PR score, producing divisions with higher or lower protective scores. Files D-F in S1 Data illustrate protective complexity indexes and present overall layout of the computational experiments reported below.
Before detailed study of cohorts with high protective complexity index, we addressed a bigger picture of potential confounding of the effects by other factors. Table 3 and Figs 3 and 4 illustrate the contributing factors in cognitive state at the end of follow up as well as potential interferents overlapping with the effects of interest. The problem of deconvoluting these effects in the background of numerous covariates was addressed by multiple regression analysis. The confounders and the target effect were simultaneously included in the combined model and the statistical weights of the regression coefficients (t-STAT, p-value) were reported. When the outcome (Y) in the regression was cognitive status, the positive coefficients (and t-STAT values) reflect the factors that facilitate cognitive decline. The opposite is true for the negative coefficients. The protective complexity and the positive coefficient for confounders indicate co-enrichment by these covariates in the highly protected cohorts, while negative coefficients mean mutual exclusion.
Review of Table 3 (A and C) shows that pre-existing cognitive decline, age, length of follow up and prescription polypharmacy produces positive contributions in the final dementia rate in NACC and NSHAP datasets. By contrast, higher education, presence of osteoarthritis, multiple protectants, positive emotionality, good physical and mental health as baseline, and social alcohol consumption correlate with delayed cognitive decline (in NSHAP). Review of Table 3 (B and D) shows that protective complexity covariates positively with length of follow up, higher cognitive score at baseline, prescription polypharmacy, osteoarthritis, age, female gender, positive emotionality and higher education. Positive emotionality, longer follow ups and higher cognitive score at baseline themselves can be outcomes of protective complexity, but we treated these parameters as confounders and included them in multivariate background. Despite this inclusion, protective complexity produced statistically significant contributions in both NSHAP and NACC in the simultaneous presence of all expected confounding factors.
The combination of confounding factors form a background model, predicting cognitive decline rates under the assumption that the factor of interest (not included in the model) is comparably distributed between the groups, producing negligible influence on the result. The predicted and observed values of cognitive decline were equal within confidence intervals when testing the background model (Figs 3E, 4A and 4B).
This agreement with the prediction means that for the groups as large as 5-10% of the total dataset, consideration of known confounders suffices for assessing the baseline correlating with the factor of interest, even if for individual profiles the regression model and observation deviates significantly (R 2 = 0.41 for NACC and 0.07 for NSHAP). The plots comparing the predicted baseline cognitive decline and the observed decline are in Figs 3D and 4C. In the cohorts with the highest diversity of protectants, the observed rate of cognitive decline was 3-fold lower than that predicted by confounding background. These results were outside of confidence CI95 intervals even for the smaller cohorts with the top level of protectant exposure.
Interestingly, the groups with highest cognitive score at baseline had largest intake of the protectants, possibly as a component of lifestyle. Arguably, lifestyle itself may be a key contributor to higher baseline cognitive scores, especially in the highly educated group, and in multimorbid osteoarthritic patients, in which the pain motivates to seek relief by a plethora of alternative medicine and supplementation over extended periods of time. This element of reverse causation was resolved conservatively by attributing baseline cognitive score to confounding factors and not to outcomes. If baseline cognitive score was treated as an outcome together with the final score, background model would have shown 5-fold differences vs. the observation. Thus, the confounder-corrected magnitude of the effect is in a broad range between 3-fold and 5-fold reduction as compared to the confounder-only background model.

Combinatorial effects survive variations in the methods of data collection
True biological effects should manifest despite permutations in the method of data collection, as all objective phenomena with high signal-to-noise ratios of detection. Table 4 presents the dementia rate as a function of time in the cohorts with variable degrees of the consumption of protective compounds, as measured by PR.
In individual profiles collected in National Social Life Health and Aging Project (NSHAP), vitamins, minerals, nutraceuticals, herbals, aspirin and hormones (androgens and oestrogens) were traced as combinations. Main dataset of NSHAP, which is a self-reporting survey, is significantly depleted in dementia tags for the dataset of that size. Cognitive test results are available in NSHAP WAVE1 (2006) and WAVE2 (2011), which allow us to extract the number of errors per each component of a test, including the orientation in time, and number sequences, in a longitudinal fashion. In National Ambulatory Medical Care Survey (NAMCS), the set of profiled protective agents included angiotensin receptor blockers, bronchodilators, omega-3, vitamins D, C and multivitamins, aspirin, ibuprofen, celecoxib, as well as mineral supplement of calcium, magnesium and lithium. Table 4 shows a strong negative correlation between the dementia fraction and the number of protective compounds reported in personal profiles in comparable age brackets. The results for dementia are paralleled by the results for stroke, the latter being a precursor and correlate of dementia. Notably, in groups with higher number of protective compounds (PR), the deferral of dementia onset was matched by proportional decrease in all-cause mortality. For each dataset, the dementia rates were converted into hazard ratios measured as [most protected]/[least protected], The Pearson correlation between the number of protective mechanisms (protective complexity) present in the person's profile and hazard ratio for dementia as well as for all-cause mortality were -0.91 (P <5.6x10^-5) and -0.92 (P < 1.1x10^-4), respectively (See File E in S1 Data, negative correlation between cognitive decline outcomes and protective complexity). Observed increases in lifespan (LS) are consistent with blocking of dementia-one of several major fatal conditions. In the cohorts ranked according to the number of protective compounds (PR), a 4-fold decrease in dementia burden between the top 2% and bottom 49% of 72-73 years old, and a 2.5-fold decrease of 78-79 years old strata were detected. The size of this effect decreases at ages > 90 but to approximately 1.4-fold (NACC2 data).
In both the NSHAP and NACC datasets, the rates of cognitive decline accumulation were dependent on the number of protective mechanisms (PR) covered per patient. In the NAMCS dataset which allowed extraction of the combinations of supplements, angiotensin receptor blockers, bronchodilators and COX2 inhibitors, trends in dementia reduction between more and less protected cohorts were stronger than in the NSHAP dataset, where only supplements, salicylates and hormones were profiled.
To minimize possibility that the trends presented in Table 4 are due to poor balancing of confounders between the cohorts, the distributions of potential confounding factors between Top 5% and Bottom 49% was equalized (Methods) for NACC. The values are provided in the Table 4 addendum. In Tables 5 and 6 below, the equalization of confounders between the cohorts produced by the same equalization protocol was confirmed statistically by p-value of T-test.
The results observed for protective complexity are applicable to a broad spectrum of neurological diseases. Specifically, 3-fold differences were observed for frontotemporal dementia and Parkinson's disease tags, 2.5-fold for seizures, and a~1.5-fold for vitamin B12 deficiency and chronic alcohol dependence; all these parameters behaved as a function of protective complexity (PR). It is worthwhile to note, however, that the protective effects correlating with high PR were not limited to neurodegeneration. In the NSHAP and NAMCS data, the hazard ratio for chronic renal disease in the most protected/least protected cohorts was 0.3-0.5 after normalization by the total comorbidity loads. We concluded that the effects of this report are stable across the methods of producing a dataset and methods of confounder normalization.

Combinatorial effects are more effective when applied in cognitively normal state then in mild cognitive impairment (MCI)
To ensure that baseline dementia tags are excluded from groups of patients consuming certain groups of active compounds, all NACC records labelled with any dementia-related tags, including mild cognitive impairment (MCI) or pre-MCI were removed, and the rest of patient profiles (N = 13,355) were analysed longitudinally by accounting only for emergent dementia related tags. Fig 5A presents the kinetics of the conversion from a cognitively normal state to dementia in cognitively normal cohorts stratified by number of potentially protective compounds consumed. Separately, analysis was performed in cohort diagnosed with MCI at inception (N = 7,350). Fig 5B presents kinetics of the conversion to dementia in the records of MCI group.
The circles indicate the dementia levels accumulated at the end of follow-up, and the triangles indicate time-averaged values of dementia accumulation.

DEM av: ¼ DEM E x ðTE À TDÞ=ðTE À TBÞ
where DEM av. is the time-averaged dementia; DEM E is the final accrued dementia value at the end of follow-up; (TE-TD) is the interval of time between the beginning of clinical dementia at TD and the end of follow-up TE; and (TE-TB) is the interval of time between the beginning TB and the end TE of follow-up.
In Fig 5A, the follow-up interval is 3.5 years; in Fig 5B, the follow-up interval is 2.5 years. The cohorts are normalized by age, gender, education, comorbidity, and unrelated polypharmacy.
The results shown in Fig 5A and 5B indicate that in groups exposed to higher protective complexity the rates of the transition to dementia slow down, and dementia onset is later in the more protected stratum. This conclusion follows comparison of final and time-averaged dementia rates born in the originally cognitively normal population. On the other hand, the rates of the conversion to dementia among patients in pre-existing MCI were higher than that in cognitively normal participants, in agreement with the previous reports [54] and were less amenable to delay by the combinations of protectants.

The effects of protective complexity impact all stages of aging and are lifelong
In the next step, we tested if the protective effects are observed in decedents. (Files G, H, I, J in S1 Data). Using the fraction of mortality with dementia diagnosis in the total mortality, we minimize the biases associated with different place of different patient sub-sets along aging trajectory, with some groups closer to terminal decline and some more healthy. To minimize possible population heterogeneity bias even further, we restricted analysis to a newer release of NACC (09-2018, Version 3), confining to a single version 3.
Comparison of Tables 4 and 5 data points to qualitative reproducibility of all effects despite use of different NACC versions and years of release. In the presence of multiple protectants (PR > 5), the accrued dementia, rate of dementia accrual, rate of MMSE decline and mortality hazard function are reduced, while the length of follow up and life expectancy are increased with significant p-values vs low protectant control. In addition to Table 4, the Tables 5 and 6 demonstrate that the effects described in this article are not transient but persist through the entire lifespan as visible in the diminished dementia rate in decedents. No major confounders such as age, education, gender ratio, arthritis, diabetes, cardiovascular events, sum of comorbidities and polypharmacy were statistically different to explain the observation. Both equalization and multiple regression approaches to confounder neutralization produce comparable results.
Angiotensin-receptor blockers (ARBs) were demonstrating high enrichment by REG in the leading rank octile. The consistent interest in these agents as anti-dementia protectants [61] motivated us to include ARBs in the compositions together with [Zn, Se, Cr, biotin, vitamin-A/lutein, herbals] sub-set (leading octile by REG), to amplify the number of mechanistic domains putatively countering dementia. Any of these agents were considered with equal and traced in the patient's personal profile producing the count of different species (protective complexity). The profiles were re-ranked by protective complexity and the results of the study are presented in Table 6. The Table 6 demonstrates a deeper negative correlation with dementia rate (0.04 at the age 70 and 0.14 at the age 75 in a more protected group vs. 0.21 and 0.31 for the same 5-year difference in the control in a living population; 0.066 at the age 74 and 0.266 at 81.5 in a more protected group vs. 0.51 at the age 75 and 0.76 at 80.6 in a control for decedent category). The T-test statistic is provided in the Table 6. This reduction in dementia frequency is observed in different permutations of the exposures in decedents (File G in S1 Data) and follows high REG for the group with [Zn, Se, Cr, biotin, vitamin-A/lutein, herbals], also confirmed by observed performance (BAL/TOT).
The addendum of Table 6 presents the test of multicomponent compositions in 9203 decedents for all versions of NACC together (Files I, J in S1 Data). We tested the decedent subsets receiving compositions of comparable complexity, of comparable proportion for prescription and OTC components, with close levels of confounders-but with different average REG and Z-score of the component groups. If non-compliance of cognitively impaired patients is the driving factor for exclusion of pharmaceuticals from personal profile data-this setting provides a test to the hypothesis of retrospective bias as a source of observed dementia declines in EMR. Both higher and lower REG groups maintain complex scheduling with similar proportions of OTC components and the group with the higher average REG/Z-score is expected to demonstrate greater life-long dementia reduction.
Both compositions 1 (14 complexity elements in the regimen) and 2 (13 complexity elements in the regimen) require comparable scheduling and disruption of such scheduling by cognitive decline as a cause of observation is unlikely. High exposure to the protective compounds was measured within 1.5-2 years before dying, with the MMSE cognitive scores off the maximum value. The background model in Figs 3D and 4C show that the drift in confounder composition of Table 6 as a function of protective complexity gradient can lead to substantial differences in confounder-induced baseline, but not enough to explain the entire effect. The pattern of this dataset is more consistent with deferral of dementia onset and progression as a function of REG/ Z-score.

Discussion
To identify potential therapeutics suitable for the prevention of dementia or the slowing down of age-associated cognitive impairment, we have conducted a meta-analysis of the longitudinal patients' cohorts with the records presented in three datasets: NAMCS, NSHAP and NACC, totalling about 50000 patients. These results point to an inverse relation between the number of top-ranking active agents present in a patient's or respondent's profile and the fraction of dementia in such groups. Uncovered relationships are stable as they reproduce in partial subsets of the same database, across age ranges and across different and independently processed databases (Tables 3-6). Our findings support observations made in the reports of Bredesen, who produced the conclusions similar to ours [62]. Amounts of agents in a profile were also reciprocal to the rate of mortality accrual and positively correlated to lifespan (Tables 3-6, Supplemental materials). Confounder analysis by multiple regression and factor-by-factor equalization of major confounders confirmed independence of the confounder-adjusted effects attributable to the diversity of protective compounds.
Before proceeding further, we should note that electronic medical record (EMR) studies are correlative in nature. While causal inference is made from experimental evidence, EMRs should be utilized as a guide for efficient designing of this experimentation. Likewise, we should treat putatively beneficial combinations of protective compounds as a "composite marker" or a "signature" of neuroprotection phenotype, not as proven mechanistic mediator of neuroprotection as such. Utilizing databases as sole source of data may generate systematic reverse-causation biases, for example, due to a tendency of cognitively impaired individuals to neglect healthy lifestyle or adhere to pharmaceuticals or supplements. While we rigorously tried to address this problem (Figs 3 and 4, S1 Data, Tables 3-6), complete negation of this bias is difficult to produce. While controlling for known confounders and sources of possible errors, a bias due to yet unknown confounders may still be present. For example, certain comorbidities evidenced by additional polypharmacy may compete with dementia as a potential cause of death, leading to a decrease in dementia rates. Indeed, given [q'] as an inherent probability of dementia, the probability [q] of non-dementia causes decreases the effective q" = (1-q) x [q'], irrespective of mechanistic interaction between the diseases. Accumulation of competing co-morbidities correlates with accumulation of complexity in the patient's profile. We attempted to take this bias into account by equalizing the number of comorbidities as well as prescription polypharmacy through conducting factor-by-factor confounder equalization. We report that observed effects persisted in factor-by-factor normalized datasets as well (Table 4).
In view of these limitations, we explored evidence collected in controlled clinical trials by decreasing the threshold of what is considered "a success" in such trials. Following minimal definition such as "confirmation of original hypothesis", many clinical trials yield positive results by showing incremental improvement of cognitive scores or delay of cognitive score decline or delay of other symptoms of neurodegeneration. If epidemiological signals in EMR databases arise due to protective agents as such, and not by underlying Bayesian competitive factors, EMR-based findings should correlate to the incremental success of the treatments with same agents detected in controlled trials.
Indeed, higher ranking candidates in NACC database demonstrated non-random correlation between measured quantitative signals and success of clinical trials, exploring these agents. This correlation reaches > 0.9 for larger groups of agents (10 or more factors per profile) and such groups are readily available in general population, producing an element of evidence that dementia can be controlled by such approaches (in both age of onset and in the extent of prevalence). These conclusions were produced based on the analysis of > 50000 patients, > 1900 dosage forms and > 300 clinical trials linked in a single alignment presented earlier (Tables 1 and 2). The mechanisms at the highest tier of database signals in Table 1 and Files A-C in S1 Data (Mg+2, Zn+2, Se, biotin, chromium picolinate, vitamin A-lutein) are traditionally prescribed as life-extending or preventing retinal degeneration and substantial evidence of mechanistic involvement or clinical efficiency are available for each, including successful controlled clinical trials or successful treatment of organic neurological conditions in humans, based on incremental definition of "success" ( [63][64][65][66][67][68][69][70][71][72], Tables 1 and 2).
Perhaps the strongest argument in favour of feasibility of database-driven anti-dementia therapy design is the existence of MEND protocol, currently tested in 100 patients (2018) [62], up from the original 10 first reported in 2014 [73]. In this approach, the factors identified in databases and literature as dementia modifying are all combined in a single multi-modal intervention, tailored to an individual phenotype and the list of necessary pharmaceuticals. MEND is found to reverse cognitive decline progressing up to MCI stage [73] and increasing cognitive scores by 5-7 units or more. A significant cognitive decline delay was observed in a multidomain lifestyle FINGER trial (2-year multidomain intervention including diet, exercise, cognitive training, vascular risk monitoring, [74]), conducted by an independent group and in magnesium trial [75]. The results align with large-scale earlier observations of reversing cognitive declines [76]. Theoretically, the results of [72][73][74][75] can be explained by a combination of high random MMSE variation in year-to-year tests and multiple comparison contexts, when multiple research groups stage Phase I anti-dementia trials worldwide. But our data in > 50000 patients indicate strong negative correlation between complex combinations of factors analogous to those in [72][73][74][75] and dementia prevalence. This argues against insufficient trial sizes in [69][70][71][72][73][74][75] as the reason for the observations, plus the results can be pooled as produced by similar multi-component methodologies. Specifically, MEND involves multiple subgroups recruited in the study over multiple years. The main table in [73] illustrates a consistent record in the same direction of improvement of both cognitive score and imaging data. This consistency over 4 years (2014-2018) is difficult to explain away by a study size alone, especially in the context of our findings.
Perhaps in our data the elements of MEND arise non-intentionally due to antihypertensive, vasodilator, antiplatelet, antidiabetic, antihistamine and NSAID polypharmacy combined with use of supplements and covariate components of healthy lifestyle. As a minimum, these lifelong, non-transient >3-fold decreases of dementia risk suggested by our study deserve surveys of smaller subsets of patients identified by protective complexity signatures. These surveys (clinician assessment, psychometric analysis) could identify additional correlates of benign polypharmacy presence that may lead to deliberate application of these factors in controlled therapeutic setting. The possibilities of compounded placebo effects and Hawthorne effect in the treated vs control population control need to be revisited as well [77].
Regardless of the actual mechanism, our study clearly indicates that, in elderly populations, multi-modal supplementation engaging multiple neuroprotective pathways correlates with sizeable delays in cognitive impairment and conversion to dementia. Importantly, these effects are "dose-dependent" in a sense that higher degrees of protection are provided by larger functional variety of the consumed compounds rather than an increase in utilized amounts of any single compound (Table 1-6, Figs 3-5). Notably, in individuals already staged as having MCI, engaging protective complexity regimens (or lifestyle correlates of such regimes) appear to be less efficient in preventing further sliding towards dementia than in cognitively intact cohorts ( Fig 5). These results are consistent with relative lack of clinical success in trials conducted in individuals already experiencing significant cognitive impairment [78,79].
The magnitude of the maximal observed effects identified by the signatures of the report is significant and can be tracked in Table 6, second part. Among the individuals that become decedents within 3 to 7-year observation range, the most protected stratum demonstrated 9.7% presence of dementia at 79 years age (FPB, baseline) and in the least protected stratum (control) the percentage was 63% in 76 years old patients (baseline). After reaching 85 years (FPE), the most protected patients demonstrated 31% of dementia within 1-2 years before dying. By contrast, in the control group, 76% of the patients at the average age of 78.7 years (FPE) were diagnosed with dementia within 1-2 years before dying. Assuming doubling of dementia prevalence each 5 years in this range [3], age-adjusted hazard ratio becomes~0.1 at the age of 76 and~0.2 at the age of 80. Minimization of dementia prevalence is consistent with > 6 years of increase in lifespan.
Present analysis does not include detailed investigations of other factors which were curiously influential in decreasing dementia prevalence. Decreases of the odds to develop dementia in presence of osteoarthritis are analysed in detail elsewhere [80]. It is likely that an investigation of the full space of the factors defining the risk of dementia will lead to other significant findings. For example, unexpected reduction of dementia rates was reported in the study of anti-viral combinations for controlling Herpes Zoster; these finding were made by tracing prescription signatures in more than 28,000 patients in Taiwan [46].
To summarize, our large-scale retrospective study of more than 50,000 patients extracted from 3 independent EMR sources (NACC, NSHAP, NAMHC) had explored more than 1,900 OTC supplements and FDA-approved dosage forms and their combinations for their potential to prevent dementia. Strongest negative correlations with diagnosed dementia and/or dementia progression rate were identified for herbals, Zn, Se, Mg, biotin, vitamin A, lutein and chromium picolinate. In context of EMR, the supplements are often reported in combinations. These combinations correlate with cognitive outcomes much better than any single supplement under other equal conditions (Table 1 and narrative analysis). Results of our study may aid the design and the interpretation of the data collected in frame of clinical trials. In short, our study presents the results of systematic exploration of combinatorial preventive treatment regimens for age-associated multi-morbidity, with an emphasis on neurodegeneration, and provides compelling evidence for their feasibility.
Supporting information S1