Figures
Abstract
Aging is a complex process with poorly understood genetic mechanisms. Recent studies have sought to classify genes as pro-longevity or anti-longevity using a variety of machine learning algorithms. However, it is not clear which types of features are best for optimizing classification performance and which algorithms are best suited to this task. Further, performance assessments based on held-out test data are lacking. We systematically compare five popular classification algorithms using gene ontology and gene expression datasets as features to predict the pro-longevity versus anti-longevity status of genes for two model organisms (C. elegans and S. cerevisiae) using the GenAge database as ground truth. We find that elastic net penalized logistic regression performs particularly well at this task. Using elastic net, we make novel predictions of pro- and anti-longevity genes that are not currently in the GenAge database.
Author summary
Aging is a complex process with poorly understood genetic mechanisms. Recent studies have sought to classify genes as pro-longevity or anti-longevity using a variety of machine learning algorithms. However, it is not clear which types of features are best for optimizing classification performance and which algorithms are best suited to this task. Further, performance assessments based on held-out test data are lacking. We systematically compare five popular classification algorithms using gene ontology and gene expression datasets as features to predict the pro-longevity versus anti-longevity status of genes for two model organisms (nematode worms and yeast) using the GenAge database as ground truth. We find that elastic net penalized logistic regression performs particularly well at this task. Using elastic net, we make novel predictions of pro- and anti-longevity genes that are not currently in the GenAge database.
Citation: Townes FW, Carr K, Miller JW (2020) Identifying longevity associated genes by integrating gene expression and curated annotations. PLoS Comput Biol 16(11): e1008429. https://doi.org/10.1371/journal.pcbi.1008429
Editor: Inna Lavrik, OvGU; Medical Faculty, GERMANY
Received: May 1, 2020; Accepted: October 9, 2020; Published: November 30, 2020
Copyright: © 2020 Townes et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: The data are available from https://github.com/willtownes/longevity-paper. A permanent archive (with DOI) is at https://doi.org/10.5281/zenodo.4007001.
Funding: FWT was supported by National Institute of Health (https://www.nih.gov/) grant T32CA009337. KC was supported by the Vasilios Stavros Lagakos Fellowship. JWM gratefully acknowledges support from the Harvard Data Science Initiative Competitive Research Fund (https://datascience.harvard.edu/) and the McLennan Family Fund (https://www.hsph.harvard.edu/research-strategy-and-development/funding-announcements/mclennan-family-fund-deans-challenge-grant-program-2/). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing interests: The authors have declared that no competing interests exist.
Introduction
Identifying the genetic and molecular basis of aging is a longstanding goal in medical science [1, 2]. Advances in aging research have uncovered several common denominators of aging that are conserved across a wide range of organisms [3], and several drugs have been identified that have remarkable pro-longevity effects in model organisms [4]. However, much remains unknown about the biology of aging.
Many studies have investigated whether individual genes are pro-longevity or anti-longevity on a case-by-case basis [5]. Typically, an intervention such as a knockout/knockdown or overexpression is applied to a small number of genes in a model organism such as nematode worm (Caenorhabditis elegans) or yeast (Saccharomyces cerevisiae) followed by quantification of lifespan. A gene is considered pro-longevity if its expression is directly related to lifespan—for instance, if overexpression increases lifespan or underexpression decreases lifespan [6]. Conversely, a gene is considered anti-longevity if its expression is inversely related to lifespan. Meanwhile, many genes do not fall clearly into either category, for instance, a gene might have no discernable effect on lifespan. The GenAge database [6] contains a catalogue of putative pro- and anti-longevity genes based on current evidence.
Pro/anti-longevity genes can be identified by intervening on individual genes, but this is slow and expensive. Alternatively, a common technique is to randomly knock out or disrupt many genes in a population of organisms, screen for the longest living individuals, and then determine which genes were disrupted in these individuals. This screening technique can rapidly identify anti-longevity genes, but systematically identifying pro-longevity genes is less straightforward. Indeed, among the small number of genes annotated as having some impact on longevity in worms and yeast, there are considerably more anti-longevity genes than pro-longevity genes.
To prioritize which genes to investigate and speed up the discovery process, recent studies have sought to computationally predict the effect of gene interventions on aging, using annotations like Gene Ontology (GO) terms [7] as predictors. A survey of such efforts is provided by Fabris et al [8]. However, these recent studies suffer from several limitations. First, annotations like GO may be biased by the scope of the existing literature [9]. Second, it is difficult to compare results across studies since there is a lack of consistency in the choice of algorithms, feature sets, and predictive target/outcome. Finally, most recent studies do not report predictive performance on a held-out test dataset, leading to possible overestimation of performance.
We address these gaps by systematically assessing the performance of five popular machine learning algorithms on the task of predicting the pro- versus anti-longevity status of genes in S. cerevisiae and C. elegans. We use a consistent outcome in all comparisons based on GenAge annotations [6]. We compare the efficacy of GO terms versus gene expression profiles as feature sets for prediction. Further, we predict possible pro/anti-longevity genes that are not currently annotated in GenAge to suggest directions for future experimental studies.
Results
Data sources and algorithms
We compare the performance of five machine learning classification algorithms: elastic net penalized logistic regression (pglm) [10], support vector machine with radial basis function (svm) [11], gradient boosted trees (xgb) [12], naive Bayes (nb) [13], and k-nearest neighbors (knn) [14].
We define the outcome (that is, the target of prediction) to be the pro- versus anti-longevity annotation of individual genes from GenAge. After data cleaning, we identified 398 yeast genes and 848 worm genes with unambiguous annotations. Of these, the majority were labeled as anti-longevity (347 for yeast and 565 for worm). For validation and comparison, in yeast, we also consider replicative lifespan (RLS) outcome data for a comprehensive set of 4,698 single-gene deletions [15]; we refer to this as the McCormick dataset. In yeast, it is more common to use replicative lifespan rather than chronological lifespan to study aging.
As features for prediction, we consider using GO terms [7] and ARCHS4 gene expression profiles [16] for both yeast and worm. For yeast only, we also consider using the Deleteome dataset [17], which contains gene expression profiles for nearly 1500 single-gene deletions. For worm only, we also consider using the Worm Cell Atlas dataset [18], which contains gene expression profiles for around 50,000 cells. We write GXP to signify Deleteome and Worm Cell Atlas for yeast and worm, respectively. Altogether, we compare the performance of five feature sets for each species: (1) ARCHS4 alone, (2) GO alone, (3) GXP alone (Deleteome for yeast, Worm Cell Atlas for worm), (4) GO combined with ARCHS4, and (5) GO combined with GXP. Normalization, filtering, and other preprocessing steps are described in the Methods section.
To predict whether a particular gene g is pro- or anti-longevity, we construct features in the following manner. Each GO term is considered a separate binary feature taking a value of one if gene g is annotated to the term and zero otherwise. For the ARCHS4, Deleteome, and Worm Cell Atlas data each experimental condition (e.g., a perturbation or tissue sample) is considered a feature and its value is given by the expression of gene g under that condition. Note that this is the transpose of how gene expression data are usually investigated. However, by treating experimental conditions as features and genes as observations, this allows us to exploit arbitrary gene expression data for gene g, not just data from when g is perturbed.
Comparative performance of algorithms and feature sets
To assess predictive performance, we use the following cross-validation scheme. For each of the two species, we split the GenAge-annotated genes into five cross-validation folds, and then for each combination of fold, algorithm, and feature set, we compute the area under the receiver-operator curve (AUC). Thus, in total, we compute 2 × 5 × 5 × 5 = 250 AUC values, 50 for each algorithm (S1 and S2 Figs).
To summarize the relative performance of the five algorithms, Fig 1 shows how frequently algorithm a has higher AUC than algorithm b for each pair a, b. More precisely, for each pair of algorithms, Fig 1 shows the fraction of times algorithm a has higher AUC than algorithm b across the 50 combinations of species, fold, and feature set. The pglm and svm algorithms consistently outperform the others in terms of AUC. The ranking of algorithms is unchanged when compared using only yeast data. Using only worm data, svm slightly outperforms pglm (0.52 instead of 0.46 in Fig 1), and knn slightly outperforms nb (0.56 instead of 0.34 in Fig 1).
Numeric values indicate the fraction of times the row algorithm has higher classification performance than the column algorithm. pglm: elastic net penalized logistic regression, svm: support vector machine with radial basis function, xgb: gradient boosted trees, nb: naive Bayes, knn: k-nearest neighbors.
To compare the relative performance of the five different feature sets, Fig 2 shows boxplots of the AUC values over the five cross-validation folds, stratified by species, algorithm, and feature set. For visual clarity, here we only show the results for pglm and svm (the two best algorithms); see S2 Fig for the other algorithms. Generally speaking, using GO terms yields better predictions than gene expression features alone (ARCHS4 or GXP). However, combining GO with gene expression (GO+ARCHS4 or GO+GXP) tends to increase AUC performance compared to GO alone.
pglm: elastic net penalized logistic regression, svm: support vector machine with radial basis function. An AUC value of 1 indicates perfect classification, whereas an AUC of 0.5 signifies performance no better than random.
Comparing gene expression feature sets, the ARCHS4 features give better performance than GXP (Worm Cell Atlas) for worms, but for yeast, GXP (Deleteome) is superior to ARCHS4. This could be simply due to the fact that the number of features in the worm ARCHS4 data is much larger than in the Worm Cell Atlas data. Alternatively, it could be due to the greater variation in experimental conditions across Deleteome features (which covers a comprehensive set of gene knockouts) compared to Worm Cell Atlas features (which consists of expression profiles of different cell types in normal worms).
Overall, for worms, pglm with GO+ARCHS4 features yields the best performance, whereas for yeast, pglm with GO+GXP is best (Fig 3).
Each curve represents predictive performance on the held-out data from a single cross validation fold. The diagonal gray dotted line indicates the theoretical performance of an untrained random classifier as a baseline.
Novel predictions of pro/anti-longevity genes
Given the encouraging performance of pglm for predicting pro/anti-longevity genes in GenAge, we applied the algorithm to make novel predictions of pro/anti-longevity genes in C. elegans (worm) and S. cerevisiae (yeast). To do this, for each species separately, we retrained a pglm model on the full GenAge database, using the combined GO terms plus ARCHS4 gene expression as features (see the Methods section for details on hyperparameter selection). Although for yeast the GO+GXP (Deleteome) features had slightly higher median predictive performance than GO+ARCHS4, we used the latter instead to maintain consistency across the two species. We then used the trained model to generate a predictive score for the pro/anti-longevity effect of each gene not in the GenAge database. Specifically, the predictive score is defined to be the probability that the gene is pro-longevity under the trained model. A score close to 1 indicates that the gene is predicted to be pro-longevity, whereas a score close to 0 indicates that the gene is predicted to be anti-longevity. An intermediate score indicates a gene with unclear pro- or anti-longevity status. Table 1 shows the unannotated genes with the highest confidence levels of being pro- and anti-longevity for worm and yeast, respectively. These genes do not significantly overlap with predictions from the pglm model trained using only GO terms as features (S2–S5 Tables, S3 Fig), suggesting that these predictions are not simply recapitulating the known biology represented in the GO terms. Complete lists of predictions for all genes are provided in S1 and S2 Data.
To assess the accuracy of the predictions, we looked at the literature to see if there is experimental evidence of pro/anti-longevity effects for these genes. Based on the existing experimental evidence, we find that the model predictions are remarkably good. It turns out that—even though they are not in GenAge yet—there is experimental evidence for the pro/anti-longevity status of most of the predicted genes.
Predicted pro-longevity worm genes.
For many of the predicted pro-longevity genes in Table 1, there already exists direct experimental evidence of pro-longevity status. Note that this evidence was not used in making the predictions, implying that the model is producing reliable out-of-sample predictions. We discuss what is known about the top 10 predicted pro-longevity genes: CLEC-196, F44E5.4, CEH-13, LPR-3, HIL-7, W04A8.4, TTH-1, GST-1, F44E5.5, and F20C5.6.
F44E5.4 and F44E5.5 encode members of the hsp70 family of heat shock proteins. The heat shock response is well-known to have strong pro-longevity effects in C. elegans. Indeed, knocking in extra copies of hsp70 extends lifespan [19] and knocking down hsp70 via RNAi decreases lifespan and leads to rapid aging phenotypes [20]. GST-1 (Glutathione S-transferase P) is also involved in stress response—particularly, immune response—and GSTs are well-known to be pro-longevity. Overexpression (underexpression) of GSTs has been found to increase (decrease, respectively) lifespan and stress resistance [21, 22]. W04A8.4 is an uncharacterized protein that is involved in the pro-longevity effect of metformin on C. elegans [23]; specifically, knockdown of W04A8.4 leads to metformin resistance. This is intriguing, since metformin treatment has been shown to promote health and extend lifespan in many organisms. Homeobox protein CEH-13 exhibits pro-longevity characteristics based on experimental evidence—specifically, a ceh-13 mutant strain has decreased lifespan compared to wildtype controls [24]. LPR-3 (LiPocalin-Related protein) is known to be involved in nematode worm locomotion, and appears to mediate the longevity-inducing effect of daf-7 mutation [25]; additionally, expression of lpr-3 is increased in worms fed with rBm αTX14, an α-neurotoxin that increases worm lifespan [26].
For the remainder of the genes in Table 1, there is suggestive experimental evidence of pro-longevity status based on associations. C-type Lectin clec-196 expression increases and lifespan increases when hsb-1 is knocked out [27]. Also, clec-196 is directly adjacent to hsp-1 on chromosome IV, suggesting possible co-involvement, and hsp-1 (heat shock protein) is well-known to be pro-longevity. HIL-7 (Histone H1 Like) gene expression may be associated with Ethosuximide treatment, a drug that increases worm lifespan and affects DAF-16/FOXO target gene expression [28]. TTH-1 (Thymosin beta) is significantly increased in daf-2 mutants, which are very long-lived, suggesting possible pro-longevity status by association [29]. F20C5.6 is affected by the well-known longevity genes clk-1 and sir-2.1, as well as by treatment with 1-methylnicotinamide and rotenone, which are well-known for increasing worm lifespan.
This validating evidence from the literature indicates that the model predictions are surprisingly accurate. The predicted pro-longevity genes CLEC-196, HIL-7, TTH-1, and F20C5.6 are candidates for further experimental exploration.
Predicted anti-longevity worm genes.
Similarly to the predicted pro-longevity genes, there exists experimental evidence of anti-longevity status of most of the predicted anti-longevity genes in Table 1. We discuss what is known about the top 10 predicted anti-longevity genes: MSP-59, Y59E9AR.7, RPL-39, MSP-57, MSP-81, MSP-113, MSP-19, NLP-27, and RPL-11.1.
Major sperm proteins appear to be anti-longevity based on the experimental evidence. A mutation reducing sperm production leads to significantly increased lifespan [30]. Additionally, the expression of sperm-related genes—especially major sperm protein (MSP) genes—is decreased in adult daf-2 mutants, providing further support for an anti-longevity role of MSP genes [31].
RSP-39 and RPL-11.1 are 60S ribosomal proteins. RNAi knockdown of genes encoding ribosomal proteins consistently increases lifespan in C. elegans, both in the case of 40S and 60S ribosomal proteins [32]. This supports the predicted anti-longevity status.
NLP-27 (Neuropeptide-Like Protein) is the only other predicted anti-longevity gene in the top 10 list. Expression of nlp-27, along with other nlp genes, is increased in long-lived daf-2 mutants. Further, nlp-27 expression is reduced in a short-lived mir-71 deletion strain. This indirect evidence by association suggests a possible pro-longevity role of NLP-27—which would contradict the predicted anti-longevity—but direct over/under-expression of nlp-27 would be needed to establish its pro/anti-longevity status.
Predicted pro-longevity yeast genes.
Table 1 lists the top 10 predicted pro-longevity yeast genes. Several of these predictions are borne out by direct experimental evidence via single-gene deletions—specifically, Marek & Korona [33] found that deletion of ACS1, ETR1, UBI4, and POR1 leads to decreased lifespan. Marek & Korona did not find a significant pro- or anti-longevity effect for UBC5, HSP12, or SBA1, and they do not report results for the remainder of the top 10 genes. However, UBC5 is a strong pro-longevity candidate, since it is involved in cellular stress response and mediates selective degradation of short-lived and abnormal proteins [34]. HSP12 (heat shock protein) is required for the lifespan-extending effect of dietary restriction in yeast [35], validating the pro-longevity prediction. SBA1 is also a strong pro-longevity candidate, as a chaperone-binding protein that is involved in heat shock response and is required for telomere length maintenance [36, 37]. PRE3 and PRE7 are part of the proteasome, and it is known that increased proteasome capacity extends lifespan [38], providing indirect validation of their predicted pro-longevity status. PDI1 is a downstream target of the unfolded protein response (UPR), which is well-known to be pro-longevity [39].
Predicted anti-longevity yeast genes.
Table 1 lists the top 10 predicted anti-longevity yeast genes. As in worms, depletion of ribosomes increases lifespan [40], validating the predictions of the ribosome-biogenesis proteins RPS30B, TMA23, RPS29B, and RLP24 as anti-longevity. HOR7 is reported to influence lifespan, but the direction of the effect may be context-dependent: HOR7 deletion increases lifespan [15], whereas Schleit et al [41] find that HOR7 deletion decreases lifespan under dietary restricted conditions.
For URA3, COX9, TOM7, MFA1, and TAR1, we do not find pre-existing corroboration of the predicted anti-longevity status in the literature. TOM7 deletion has been reported to decrease chronological lifespan [42], and it does not appear to have a strong effect on replicative lifespan [33]. TOM7 is part of the translocase of the outer mitochondrial membrane (TOM) complex, and the mitochondrial membrane is well-known to be important in yeast longevity [43]. Marek & Korona [33] report a pro-longevity effect for COX9, contrary to the model prediction. (Except for COX9, the results of Marek & Korona are inconclusive for all of the genes in Table 1). Further investigation of URA3, COX9, TOM7, MFA1, and TAR1 might be interesting to pursue.
Validation on a secondary dataset
To further evaluate the predictive accuracy of the trained pglm model, we compare the model predictions to actual lifespan measurements from a non-GenAge validation dataset. For this purpose, we use the McCormick et al [15] dataset of replicative lifespan for a comprehensive set of 4,698 single-gene deletions in yeast. Since the McCormick dataset contains lifespan measurements for deletions of many genes that do not appear in GenAge, in principle it should be well-suited as a secondary validation dataset. Using the pglm model trained on the full GenAge database for yeast with the GO+ARCHS4 feature set as predictors, we made predictions of the longevity effect of all 4,698 genes in the McCormick dataset.
First, as a sanity check, we observe that among genes in GenAge, the predicted probability of a gene being pro-longevity is clearly inversely related to the change in lifespan after deletion (Fig 4, left panel). This is not surprising since it simply means that the GenAge annotations are roughly consistent with the McCormick data, and the model was able to fit the GenAge-based training data. More interestingly, we see that the model is able to predict which genes have a larger or a smaller effect on lifespan (Fig 4, left panel). For instance, among pro-longevity genes, the genes with predicted probability near 1 do indeed tend to lead to a larger decrease in lifespan. Meanwhile, among anti-longevity genes, the genes with predicted probability near 0 do indeed tend to lead to a larger increase in lifespan. Since the training data contain no information about the magnitude of the effect on lifespan, this indicates that the model is not simply recapitulating the training data, but is indeed making generalizable predictions.
Probabilities are from the pglm classifier trained on the full GenAge dataset. Solid curve is a nonparametric smoother.
Next, we compare the model predictions to the lifespan data for genes outside the GenAge database. Fig 4 (right panel) shows the change in lifespan versus the predicted probability of a gene being pro-longevity, for genes in the McCormick dataset that are not in GenAge. A downward trend in this plot would indicate concordance between model predictions and the validation data. There is an extremely slight but not convincing downward trend; thus, while suggestive, this does not provide a compelling out-of-sample validation of the model predictions. Note that the pglm classifier trained on GenAge has a strong bias toward predicting genes to be anti-longevity; see Fig 4 (right panel) and S4 Fig. This bias is due to class imbalance in the training data, since the majority of genes annotated in GenAge are anti-longevity. This is common when the training data are imbalanced, and can easily be addressed by selecting the classification threshold to yield appropriately balanced predictions.
The lack of concordance between the out-of-sample model predictions and the McCormick lifespan data may be attributable to the fact that for many genes, the McCormick data are not in agreement with the GenAge annotations of pro/anti-longevity. Specifically, many putatively pro-longevity genes led to large increases in lifespan when deleted, and many putatively anti-longevity genes led to large decreases in lifespan when deleted (Fig 4, left panel). It is not clear whether this discrepancy is primarily due to limitations of the GenAge database (e.g., bias and relatively small sample size) or limitations of the McCormick assay. Focusing on the latter possibility, recent studies have identified mechanisms by which disruption of a gene through knockout can activate compensatory mechanisms leading to a dramatically different phenotype than disruption of the same gene through knockdown, which reduces but does not eliminate expression [44]. If deletion of a single gene activates similar compensatory mechanisms in yeast, then this could explain the lack of concordance, since it would imply that the change in lifespan under a single-gene deletion is not necessarily related to that gene’s pro/anti-longevity status. A comprehensive assay of knockdowns (rather than deletions or knockouts) would shed light on this intriguing question. The discrepancy between GenAge and McCormick could also partially be due to the fact that GenAge includes results for both replicative and chronological lifespan. However, this does not fully explain the discrepancy since many of the most discordant genes were annotated as affecting replicative lifespan in GenAge.
Functional interpretation of model predictions
To interpret the biological basis for the model predictions in terms of functional categories, for each species we retrained the pglm model on the full GenAge dataset using only GO terms as features. We extracted the 20 most influential GO terms from the trained model by ranking the regression coefficients from largest to smallest in absolute value (Table 2). Note that in this model, the coefficient is equal to the log-odds ratio (logOR) of a gene being pro-longevity when it is annotated to a GO term versus when it is not annotated to that GO term. If a GO term has a positive logOR value, then genes annotated with that GO term are more likely to be pro-longevity under the model. Conversely, a negative logOR indicates that genes annotated with that GO term are more likely to be anti-longevity.
logOR: log-odds ratio. OR: odds ratio. Positive logOR indicates a gene annotated to that GO term is more likely to be pro-longevity. BP: biological process, CC: cellular component, MF: molecular function.
Top GO terms for worm.
The current literature supports a strong longevity effect for many of the top categories in Table 2. Translation inhibition is known to increase lifespan [32], so a large negative coefficient for the translation and ribosome GO terms makes sense. Protein homeostasis is known to be key to longevity [45], so it makes sense that the model has positive coefficients for protein transport, endoplasmic reticulum membrane, and endoplasmic reticulum. Ubiquitin-mediated proteolysis is known to be important for promoting longevity, implying that a positive coefficient for ubiquitin-dependent protein catabolic process makes sense. Heat shock response is known to extend lifespan, and indeed, the model has a positive coefficient for response to heat. Activation of the mitochondrial unfolded protein response is known to promote longevity [46], so a positive coefficient for protein import into mitochondrial matrix makes sense. Mitochondria are known to be important for longevity [47], so a large coefficient for mitochondria makes sense; further, inhibition of mitochondrial respiration is known to extend lifespan [48], so a negative sign for the coefficient could make sense. Similarly, the importance of DNA repair makes sense, and surprisingly, in some cases, DNA repair gene knockdown increases lifespan, possibly due to compensatory biological mechanisms [49]; thus, a negative coefficient is, in fact, consistent with the literature.
Top GO terms for yeast.
For yeast, Table 2 shows the top longevity-related GO terms in the model. The importance of these terms is consistent with the current literature, but the appropriate sign of the coefficient is not always clear, since the genes annotated to each GO term may have contradictory pro/anti-longevity effects and further, there may be compensatory relationships between terms due to correlated predictors.
Replicative cell aging, apoptotic process, and cell cycle obviously make sense as related to yeast aging and longevity. Mitochondrial membrane maintenance is known to be important in yeast longevity [43], and other membranes (e.g., the vacuole membrane) may also be important [50]; thus, large coefficients for mitochondrion, integral component of mitochondrial outer membrane, mitochondrial intermembrane space, membrane, membrane fraction, and transmembrane transport are consistent with the literature. Depletion of ribosomes is known to increase lifespan [40], so a negative coefficient for chromatin silencing at rDNA is appropriate. Telomeres are known to be important in yeast longevity [51, 52], so a large coefficient for telomere maintenance makes sense. Longevity effects of cellular response to oxidative stress are corroborated in the literature [53]. Finally, a negative coefficient for zinc ion binding is consistent with experimental evidence that zinc limitation extends chronological lifespan [54].
Pathway enrichment analysis of model predictions
To further interpret the model predictions in terms of known biology, we performed pathway enrichment analysis. First, we took the list of non-GenAge genes that were predicted to be pro-longevity and tested for enrichment of KEGG pathways using the Database for Annotation, Visualization and Integrated Discovery (DAVID) v6.8 [55, 56]. Adjusting for multiple testing using the Benjamini–Hochberg correction, we found that the “Proteasome” pathway was significantly enriched (corrected p-value 0.0031). The KEGG pathway diagram in S5 Fig (used with permission from Kanehisa Laboratories [57]) indicates that several of the predicted pro-longevity genes are in the 20S proteasome core particle, particularly in β subunits. This is intriguing, since the proteasome is a protein complex that breaks down unneeded or damaged proteins by proteolysis, and the β subunits play a central role in this process [58]. Sustained proteasome activity appears to be associated with longevity based on studies of long-lived humans and rodents, and directly elevating proteasome activity increases longevity in yeast [38].
We performed the same enrichment analysis using the top predicted anti-longevity genes for yeast, and separately, the pro- and anti-longevity genes for worm. In each case, we capped the number of genes at 100. S6 Table shows the top KEGG pathway hit in each case. Notably, in both yeast and worm, the “Ribosome” pathway was highly significantly enriched with predicted anti-longevity genes (corrected p-values 7.1 × 10−15 and 8.1 × 10−27, respectively). These results are consistent with known aging biology, and since these genes are not currently in GenAge, the model predictions may offer new avenues of research.
Discussion
Limitations
A limitation of our models is that the pro/anti-longevity status of a gene is predicted based on how similar its GO terms and/or gene expression pattern are to genes with known pro- or anti-longevity status. This similarity does not necessarily imply that manipulation of these genes will have the predicted effect on lifespan, and further, the predictions are limited by the accuracy of the input data. This is illustrated by SIR2 and DNL4, the top two hits in S4 Table for yeast when using the GO-only model. Both SIR2 and DNL4 are annotated with the “replicative cell aging” GO term, which is strongly indicative of pro-longevity status in this model, as indicated by the odds ratio of 5.8 in Table 2. Experimental evidence is consistent with the SIR2 prediction, but not the DNL4 prediction [59]. This appears to be due to the interesting fact that although DNL4 is required for DNA repair by nonhomologous end joining (NHEJ), apparently NHEJ does not affect replicative aging in yeast [59]. Thus, in the case of DNL4, the discrepancy between prediction and experiment may be viewed as an inadequacy of this particular GO term annotation.
Another limitation is that although S. cerevisiae (yeast) can be haploid or diploid, our models are not ploidy-specific since much of the data we use (GenAge, GO terms, and gene expression) are not annotated in a way that indicates whether they pertain to haploid or diploid. Significant differences have been observed between haploid and diploid yeast aging [59, 60], making it difficult to know whether results for one would extend to the other. That said, overall we would expect the set of genes that are strongly involved in longevity to be similar for haploid and diploid, although the magnitude (and possibly the direction) of the pro/anti-longevity effect may vary.
Similarly, our models do not distinguish between chronological lifespan and replicative lifespan in yeast. In future work, it would be interesting to analyze chronological lifespan separately from replicative lifespan since there may be major differences.
By a fortunate coincidence, the best performing algorithm, pglm (GLM-Net), enabled us to perform functional interpretation of the results by simply considering the largest regression coefficients. In future studies on alternative datasets, higher predictive performance might be obtained with other algorithms lacking easily understandable coefficients. Under that scenario, we would recommend researchers consider alternative feature importance metrics such as those provided by the caret R package [61].
Conclusions and future directions
We systematically compared the performance of popular machine learning algorithms in classifying genes as pro- or anti-longevity using the GenAge database and combinations of gene expression and gene ontology (GO) feature sets. We identified elastic net penalized logistic regression (pglm) as the most effective classifier and made predictions for unannotated genes. We offer our predictive probability scores as one possible tool to prioritize future experimental studies which can validate individual genes as pro-longevity mechanistically. Our approach of combining feature sets to improve predictive performance is generalizable in principle to a wider variety of model organisms as more annotations and datasets become available over time.
We encourage other computational researchers to use metrics such as area under receiver-operator curve (AUC) on held-out data from standard databases such as GenAge to assess classification performance and facilitate comparisons across studies. We suggest that future comprehensive longevity assays consider using knockdowns instead of deletions and knockouts, due to the existence of compensatory mechanisms that are known to mitigate the effects of knockouts [44]; this may improve the concordance between predictions and experimental evidence. Additionally, there appears to be a need for increased focus on pro-longevity genes as opposed to anti-longevity genes, since pro-longevity genes are much less common in the GenAge database.
In addition to genetic variation, environmental factors such as exposure to drugs or other chemical compounds are known to influence longevity [4, 62, 63]. Future studies may benefit from our computational framework in this context, for example, by using outcome variables from the DrugAge database [64] to train classifiers or regression models. However, a key challenge will be to identify suitable covariates analogous to gene expression or GO terms. One intriguing possibility would be to convert the molecular structure of each drug into a vector of continuous features [65].
Finally, it is clear that genes act in networks rather than individually—for instance, top-down analysis has identified the nutrient sensing pathway, the mitochondrial effector pathway, and the proteostasis pathway as collectively regulating single-cell longevity [66]. Thus, network-based approaches are likely to yield further insights into the molecular mechanisms of aging. In particular, while we have considered only single-gene manipulations, it would be valuable to be able to predict the effect of multiple simultaneous interventions. This is very challenging in general, but it might be possible to exploit special structure in the mechanisms of aging—for instance, recent papers have argued that aging may be governed by a single global state variable, based on the finding that many diverse interventions lead to a temporal scaling of survival curves in C. elegans and S. cerevisiae [67, 68].
Methods
Acquisition and preprocessing of datasets
Binary pro/anti-longevity annotations were accessed from the GenAge model organisms database build 19 [6], available at http://genomics.senescence.info/genes. We used the subset of genes for yeast and worm, and we excluded ambiguous annotations (e.g., if GenAge lists two studies for a gene, one finding it to be pro-longevity and the other finding it to be anti-longevity). GO annotations for all genes were downloaded from the BioMart ENSEMBL database (release 93, July 2018) using the biomaRt package in Bioconductor (version 3.7). For both species, gene expression data in the form of RNA-Seq read counts were obtained from the ARCHS4 database version 1 [16], currently available at https://amp.pharm.mssm.edu/archs4/archs4zoo.html. For yeast only, we acquired the Deleteome gene expression microarray dataset [17], currently available at http://deleteome.holstegelab.nl (no version available but last updated May 2014). For worm only, we obtained gene expression data from the single-cell RNA-Seq Worm Cell Atlas [18], currently available at http://atlas.gs.washington.edu/worm-rna (no version available but last updated August 2017). We reduced the dimensionality of the Worm Cell Atlas data by summing the unique molecular identifier (UMI) counts across all cells within the same tissue, so that each feature is a “pseudobulk” tissue rather than a single cell.
Replicative lifespans (RLS) for 4,698 single-gene deletion yeast strains were obtained from McCormick et al [15] in June 2017. Perturbation genotypes with percent_change greater than 30 and set_lifespan_count less than or equal to 5 were excluded based on the authors’ recommendations. We merged results for the same genotype across replicate experiments in the following way. The outcome for each genotype in a single replicate was quantified as the mean of RLS in the perturbation group minus the mean of RLS in the control group. To obtain a single value for the genotype across all replicates, we then computed a weighted average of the outcome values from each replicate, where the weights corresponded to the sample sizes in each group. This ensured that replicates with more observations contributed more to the final value. We refer to this as the McCormick dataset.
Data normalization and quality control
All gene expression measurements were normalized to account for sample-specific biases. Specifically, the Deleteome data were already normalized, the ARCHS4 read counts were converted to transcripts-per-million (TPM), and the Worm Cell Atlas UMIs were converted to counts-per-million (CPM). The normalized counts were then log transformed with a pseudocount of one. For Deleteome, genes that were variable in controls and non-responsive mutants were excluded, since these data were likely to contain mostly noise. For each species, we used the subset of genes with no missing values across all feature types (GO features and the two sources of gene expression features), resulting in 703 worm genes (246 pro-longevity, 457 anti-longevity) and 368 yeast genes (46 pro-longevity, 322 anti-longevity). Features with no variation across the included genes were discarded. For yeast, the number of retained features was 3268, 700, and 1390 for ARCHS4, Deleteome, and GO terms, respectively. For worms, the number of features was 2935, 270, and 2051 for ARCHS4, Worm Cell Atlas, and GO terms, respectively. All gene expression features were centered and scaled to have mean zero and standard deviation 0.5 as suggested by [69], while binary features (GO) were not centered and scaled. The five sets of features considered for each species were (1) ARCHS4 alone, (2) GO alone, (3) GXP alone (Deleteome for yeast, Worm Cell Atlas for worm), (4) GO combined with ARCHS4, and (5) GO combined with GXP.
Comparison of predictive performance by algorithm and feature set
To assess predictive performance of different combinations of feature sets, each dataset (consisting of the binary GenAge outcome for a single species matched with one of the five feature sets) was split into 5 external cross-validation (CV) folds. Within each fold, machine learning classifiers were fit to the training data using the caret package version 6.0 [61] in the R programming environment (version 3.5). The same partitioning of the data was preserved across algorithm runs to ensure identical training and test conditions. The algorithms used were k-nearest neighbors (knn, R package kknn version 1.3.1), naive Bayes (nb, R package naivebayes forked at version 0.9.2 and modified for numerical stability, https://github.com/willtownes/naivebayes), gradient boosted trees (xgb, R package xgboost version 0.8.0), support vector machine with radial basis function (svm, R package kernlab version 0.9), and logistic regression with elastic net penalty (pglm, R package glmnet version 2.0). Hyperparameters (S1 Table) were selected by grid search using repeated 10-fold internal CV with two repeats within each training fold using the Kappa criterion. Note that this means each algorithm could potentially use different hyperparameter values across the five external CV folds. For all algorithms except naive Bayes, the grid consisted of default caret values. For naive Bayes, the Laplace correction was set to zero, kernel smoothing was always used, and the adjustment to the probabilities was chosen between 0.5 and 1.0. Additionally, for naive Bayes only, many features with near-zero variance caused numerical instabilities and were excluded. Having chosen a final set of hyperparameters for each training fold, the predicted probabilities were computed for the held-out test data and the area under the receiver-operator curve (AUC) was computed to quantify prediction performance (discrimination). An AUC value of 1 indicates perfect classification performance, whereas an AUC of 0.5 signifies performance no better than random, or simply always predicting the majority class.
Model fitting for novel predictions and validation
For the results in sections ‘Novel predictions of pro/anti-longevity genes’ and ‘Validation on a secondary dataset’, the best-performing algorithm (pglm) was retrained on all of the GenAge data for each species with the combined GO plus ARCHS4 feature set. The hyperparameter grid was expanded to 21 alpha values (evenly spaced between zero and one, inclusive), and 97 automatically selected lambda values using five-fold CV. For worm, the optimal alpha was 0.05 (close to an L2 ridge penalty). For yeast, the optimal alpha was 0.5 (an even mix between ridge and the L1 lasso penalty). Using the optimal hyperparameters, predictive probabilities were computed for all genes.
Model fitting for functional interpretations
For the results in section ‘Functional interpretation of model predictions’, for each species the pglm algorithm was retrained on the full GenAge dataset using GO features only. This choice of feature set was used to enable interpretation of regression coefficients. Here, the hyperparameter grid was the same 21 alpha values and 97 automatically selected lambda values with five-fold CV. The optimal alpha values were 0.15 for worm and 0.10 for yeast (both closer to ridge than lasso).
Supporting information
S1 Fig. Comparison of predictive performance of machine learning algorithms on classifying genes as pro- or anti-longevity.
pglm: elastic net penalized logistic regression, svm: support vector machine with radial basis function, xgb: gradient boosted trees, nb: naive Bayes, knn: k-nearest neighbors, gxp: gene expression, ROC: receiver-operator curve.
https://doi.org/10.1371/journal.pcbi.1008429.s001
(EPS)
S2 Fig. Comparison of predictive performance of different feature sets on classifying genes as pro- or anti-longevity.
pglm: elastic net penalized logistic regression, svm: support vector machine with radial basis function, xgb: gradient boosted trees, nb: naive Bayes, knn: k-nearest neighbors, gxp: gene expression, ROC: receiver-operator curve.
https://doi.org/10.1371/journal.pcbi.1008429.s002
(EPS)
S3 Fig. Predictive probabilities of genes being pro-longevity under GO+ARCHS4 feature set versus GO only feature set.
https://doi.org/10.1371/journal.pcbi.1008429.s003
(EPS)
S4 Fig. Distribution of predictive probabilities after training elastic net penalized logistic regression (pglm) on the full GenAge dataset with GO terms and ARCHS4 gene expression as features.
https://doi.org/10.1371/journal.pcbi.1008429.s004
(EPS)
S5 Fig. Proteasome KEGG pathway schematic [57].
Subunits containing predicted pro-longevity genes are indicated with red stars. (Copyright of Kanehisa Laboratories, used with permission.).
https://doi.org/10.1371/journal.pcbi.1008429.s005
(TIFF)
S1 Table. Hyperparameters used.
pglm: elastic net penalized logistic regression, svm: support vector machine with radial basis function, xgb: gradient boosted trees, nb: naive Bayes, knn: k-nearest neighbors.
https://doi.org/10.1371/journal.pcbi.1008429.s006
(CSV)
S2 Table. Top worm pro-longevity genes not in GenAge predicted using only GO terms as features.
https://doi.org/10.1371/journal.pcbi.1008429.s007
(CSV)
S3 Table. Top worm anti-longevity genes not in GenAge predicted using only GO terms as features.
https://doi.org/10.1371/journal.pcbi.1008429.s008
(CSV)
S4 Table. Top yeast pro-longevity genes not in GenAge predicted using only GO terms as features.
https://doi.org/10.1371/journal.pcbi.1008429.s009
(CSV)
S5 Table. Top yeast anti-longevity genes not in GenAge predicted using only GO terms as features.
https://doi.org/10.1371/journal.pcbi.1008429.s010
(CSV)
S6 Table. Top KEGG pathway hits from DAVID enrichment analysis.
https://doi.org/10.1371/journal.pcbi.1008429.s011
(CSV)
S1 Data. All genes pglm predictions for yeast with GO+ARCHS4 features.
https://doi.org/10.1371/journal.pcbi.1008429.s012
(XLSX)
S2 Data. All genes pglm predictions for worm with GO+ARCHS4 features.
https://doi.org/10.1371/journal.pcbi.1008429.s013
(XLSX)
Acknowledgments
The authors thank Sheila Gaynor for advice on initial data cleaning, and Will Mair for helpful suggestions on biological interpretations. The original authors of McCormick et al [15] and Kemmeren et al [17] provided helpful advice in understanding their data, for which the authors are grateful.
References
- 1. Johnson TE, Lithgow GJ. The Search for the Genetic Basis of Aging: The Identification of Gerontogenes in the Nematode Caenorhabditis Elegans. Journal of the American Geriatrics Society. 1992;40(9):936–945. pmid:1355097
- 2. Remolina SC, Chang PL, Leips J, Nuzhdin SV, Hughes KA. Genomic Basis of Aging and Life-History Evolution in Drosophila Melanogaster. Evolution. 2012;66(11):3390–3403. pmid:23106705
- 3. López-Otín C, Blasco MA, Partridge L, Serrano M, Kroemer G. The Hallmarks of Aging. Cell. 2013;153(6):1194–1217. pmid:23746838
- 4. Partridge L, Fuentealba M, Kennedy BK. The Quest to Slow Ageing through Drug Discovery. Nature Reviews Drug Discovery. 2020;19(8):513–532. pmid:32467649
- 5. Ailion M, Inoue T, Weaver CI, Holdcraft RW, Thomas JH. Neurosecretory Control of Aging in Caenorhabditis Elegans. Proceedings of the National Academy of Sciences of the United States of America. 1999;96(13):7394–7397. pmid:10377425
- 6. Tacutu R, Thornton D, Johnson E, Budovsky A, Barardo D, Craig T, et al. Human Ageing Genomic Resources: New and Updated Databases. Nucleic Acids Research. 2018;46(D1):D1083–D1090. pmid:29121237
- 7. Gene Ontology Consortium. The Gene Ontology Resource: 20 Years and Still GOing Strong. Nucleic Acids Research. 2019;47(D1):D330–D338. pmid:30395331
- 8. Fabris F, de Magalhães JP, Freitas AA. A Review of Supervised Machine Learning Applied to Ageing Research. Biogerontology. 2017;18(2):171–188. pmid:28265788
- 9. Haynes WA, Tomczak A, Khatri P. Gene Annotation Bias Impedes Biomedical Research. Scientific Reports. 2018;8(1):1–7. pmid:29358745
- 10. Friedman J, Hastie T, Tibshirani R. Regularization Paths for Generalized Linear Models via Coordinate Descent. Journal of Statistical Software. 2010;33(1):1–22. pmid:20808728
- 11. Karatzoglou A, Smola A, Hornik K, Zeileis A. Kernlab—An S4 Package for Kernel Methods in R. Journal of Statistical Software. 2004;11(9):1–20.
- 12.
Chen T, He T, Benesty M, Khotilovich V, Tang Y, Cho H, et al. Xgboost: Extreme Gradient Boosting; 2019.
- 13.
Majka M. Naivebayes: High Performance Implementation of the Naive Bayes Algorithm; 2019.
- 14.
Schliep K, Hechenbichler K. Kknn: Weighted k-Nearest Neighbors; 2016.
- 15. McCormick MA, Delaney JR, Tsuchiya M, Tsuchiyama S, Shemorry A, Sim S, et al. A Comprehensive Analysis of Replicative Lifespan in 4,698 Single-Gene Deletion Strains Uncovers Conserved Mechanisms of Aging. Cell Metabolism. 2015;22(5):895–906. pmid:26456335
- 16. Lachmann A, Torre D, Keenan AB, Jagodnik KM, Lee HJ, Wang L, et al. Massive Mining of Publicly Available RNA-Seq Data from Human and Mouse. Nature Communications. 2018;9(1):1366. pmid:29636450
- 17. Kemmeren P, Sameith K, van de Pasch LAL, Benschop JJ, Lenstra TL, Margaritis T, et al. Large-Scale Genetic Perturbations Reveal Regulatory Networks and an Abundance of Gene-Specific Repressors. Cell. 2014;157(3):740–752. pmid:24766815
- 18. Cao J, Packer JS, Ramani V, Cusanovich DA, Huynh C, Daza R, et al. Comprehensive Single-Cell Transcriptional Profiling of a Multicellular Organism. Science. 2017;357(6352):661–667. pmid:28818938
- 19. Yokoyama K, Fukumoto K, Murakami T, Harada Si, Hosono R, Wadhwa R, et al. Extended Longevity of Caenorhabditis Elegans by Knocking in Extra Copies of hsp70F, a Homolog of Mot-2 (Mortalin)/Mthsp70/Grp75. FEBS Letters. 2002;516(1-3):53–57. pmid:11959102
- 20. Kimura K, Tanaka N, Nakamura N, Takano S, Ohkuma S. Knockdown of Mitochondrial Heat Shock Protein 70 Promotes Progeria-like Phenotypes in Caenorhabditis Elegans. Journal of Biological Chemistry. 2007;282(8):5910–5918. pmid:17189267
- 21. Ayyadevara S, Engle MR, Singh SP, Dandapat A, Lichti CF, Beneš H, et al. Lifespan and Stress Resistance of Caenorhabditis Elegans Are Increased by Expression of Glutathione Transferases Capable of Metabolizing the Lipid Peroxidation Product 4-Hydroxynonenal. Aging cell. 2005;4(5):257–271. pmid:16164425
- 22. Ayyadevara S, Dandapat A, Singh SP, Siegel ER, Reis RJS, Zimniak L, et al. Life Span and Stress Resistance of Caenorhabditis Elegans Are Differentially Affected by Glutathione Transferases Metabolizing 4-Hydroxynon-2-Enal. Mechanisms of Ageing and Development. 2007;128(2):196–205. pmid:17157356
- 23. Wu L, Zhou B, Oshiro-Rapley N, Li M, Paulo JA, Webster CM, et al. An Ancient, Unified Mechanism for Metformin Growth Inhibition in C. Elegans and Cancer. Cell. 2016;167(7):1705–1718. pmid:27984722
- 24. Tihanyi B, Vellai T, Regős Á, Ari E, Müller F, Takács-Vellai K. The C. Elegans Hox Gene Ceh-13 Regulates Cell Migration and Fusion in a Non-Colinear Way. Implications for the Early Evolution of Hox Clusters. BMC Developmental Biology. 2010;10(1):78. pmid:20667114
- 25. Hyun M, Kim J, Dumur C, Schroeder FC, You YJ. BLIMP-1/BLMP-1 and Metastasis-Associated Protein Regulate Stress Resistant Development in Caenorhabditis Elegans. Genetics. 2016;203(4):1721–1732. pmid:27334271
- 26. Chen L, Zhang J, Xu J, Wan L, Teng K, Xiang J, et al. rBmαTX14 Increases the Life Span and Promotes the Locomotion of Caenorhabditis Elegans. PLOS One. 2016;11(9):e0161847. pmid:27611314
- 27. Sural S, Lu TC, Jung SA, Hsu AL. HSB-1 Inhibition and HSF-1 Overexpression Trigger Overlapping Transcriptional Changes to Promote Longevity in Caenorhabditis Elegans. G3: Genes, Genomes, Genetics. 2019;9(5):1679–1692. pmid:30894454
- 28. Chen X, McCue HV, Wong SQ, Kashyap SS, Kraemer BC, Barclay JW, et al. Ethosuximide Ameliorates Neurodegenerative Disease Phenotypes by Modulating DAF-16/FOXO Target Gene Expression. Molecular Neurodegeneration. 2015;10(1):51. pmid:26419537
- 29. Narayan V, Ly T, Pourkarimi E, Murillo AB, Gartner A, Lamond AI, et al. Deep Proteome Analysis Identifies Age-Related Processes in C. Elegans. Cell Systems. 2016;3(2):144–159. pmid:27453442
- 30. Van Voorhies WA. Production of Sperm Reduces Nematode Lifespan. Nature. 1992;360(6403):456. pmid:1448167
- 31. Halaschek-Wiener J, Khattra JS, McKay S, Pouzyrev A, Stott JM, Yang GS, et al. Analysis of Long-Lived C. Elegans Daf-2 Mutants Using Serial Analysis of Gene Expression. Genome Research. 2005;15(5):603–615. pmid:15837805
- 32. Hansen M, Taubert S, Crawford D, Libina N, Lee SJ, Kenyon C. Lifespan Extension by Conditions That Inhibit Translation in Caenorhabditis Elegans. Aging Cell. 2007;6(1):95–110. pmid:17266679
- 33. Marek A, Korona R. Restricted Pleiotropy Facilitates Mutational Erosion of Major Life-History Traits. Evolution. 2013;67(11):3077–3086. pmid:24151994
- 34. Seufert W, Jentsch S. Ubiquitin-Conjugating Enzymes UBC4 and UBC5 Mediate Selective Degradation of Short-Lived and Abnormal Proteins. The EMBO Journal. 1990;9(2):543–550. pmid:2154373
- 35. Herbert AP, Riesen M, Bloxam L, Kosmidou E, Wareing BM, Johnson JR, et al. NMR Structure of Hsp12, a Protein Induced by and Required for Dietary Restriction-Induced Lifespan Extension in Yeast. PLOS One. 2012;7(7):e41975. pmid:22848679
- 36. Fang Y, Fliss AE, Rao J, Caplan AJ. SBA1 Encodes a Yeast Hsp90 Cochaperone That Is Homologous to Vertebrate P23 Proteins. Molecular and Cellular Biology. 1998;18(7):3727–3734. pmid:9632755
- 37. Toogun OA, Zeiger W, Freeman BC. The P23 Molecular Chaperone Promotes Functional Telomerase Complexes through DNA Dissociation. Proceedings of the National Academy of Sciences. 2007;104(14):5765–5770. pmid:17389357
- 38. Kruegel U, Robison B, Dange T, Kahlert G, Delaney JR, Kotireddy S, et al. Elevated Proteasome Capacity Extends Replicative Lifespan in Saccharomyces Cerevisiae. PLOS Genetics. 2011;7(9):e1002253. pmid:21931558
- 39. Patil C, Walter P. Intracellular Signaling from the Endoplasmic Reticulum to the Nucleus: The Unfolded Protein Response in Yeast and Mammals. Current Opinion in Cell Biology. 2001;13(3):349–355. pmid:11343907
- 40. Steffen KK, MacKay VL, Kerr EO, Tsuchiya M, Hu D, Fox LA, et al. Yeast Life Span Extension by Depletion of 60s Ribosomal Subunits Is Mediated by Gcn4. Cell. 2008;133(2):292–302. pmid:18423200
- 41. Schleit J, Johnson SC, Bennett CF, Simko M, Trongtham N, Castanza A, et al. Molecular Mechanisms Underlying Genotype-Dependent Responses to Dietary Restriction. Aging Cell. 2013;12(6):1050–1061. pmid:23837470
- 42. Garay E, Campos SE, de la Cruz JG, Gaspar AP, Jinich A, DeLuna A. High-Resolution Profiling of Stationary-Phase Survival Reveals Yeast Longevity Factors and Their Genetic Interactions. PLOS Genetics. 2014;10(2):e1004168. pmid:24586198
- 43. Jazwinski SM. Yeast Longevity and Aging—the Mitochondrial Connection. Mechanisms of Ageing and Development. 2005;126(2):243–248. pmid:15621203
- 44. Wilkinson MF. Genetic Paradox Explained by Nonsense. Nature. 2019;568(7751):179. pmid:30962551
- 45. Sampaio-Marques B, Ludovico P. Linking Cellular Proteostasis to Yeast Longevity. FEMS Yeast Research. 2018;18(5):foy043. pmid:29800380
- 46. Durieux J, Wolff S, Dillin A. The Cell-Non-Autonomous Nature of Electron Transport Chain-Mediated Longevity. Cell. 2011;144(1):79–91. pmid:21215371
- 47. Sun N, Youle RJ, Finkel T. The Mitochondrial Basis of Aging. Molecular Cell. 2016;61(5):654–666. pmid:26942670
- 48. Hwang AB, Jeong DE, Lee SJ. Mitochondria and Organismal Longevity. Current Genomics. 2012;13(7):519–532.
- 49. Lans H, Lindvall J, Thijssen K, Karambelas A, Cupac D, Fensgård Ø, et al. DNA Damage Leads to Progressive Replicative Decline but Extends the Life Span of Long-Lived Mutant Animals. Cell Death and Differentiation. 2013;20(12):1709. pmid:24013725
- 50. Carmona-Gutierrez D, Hughes AL, Madeo F, Ruckenstuhl C. The Crucial Impact of Lysosomes in Aging and Longevity. Ageing Research Reviews. 2016;32:2–12. pmid:27125853
- 51. Liu J, Wang L, Wang Z, Liu JP. Roles of Telomere Biology in Cell Senescence, Replicative and Chronological Ageing. Cells. 2019;8(1):54. pmid:30650660
- 52. Austriaco NR, Guarente LP. Changes of Telomere Length Cause Reciprocal Changes in the Lifespan of Mother Cells in Saccharomyces Cerevisiae. Proceedings of the National Academy of Sciences. 1997;94(18):9768–9772. pmid:9275199
- 53. Postnikoff SD, Johnson JE, Tyler JK. The Integrated Stress Response in Budding Yeast Lifespan Extension. Microbial Cell. 2017;4(11):368. pmid:29167799
- 54. Shimasaki T, Ohtsuka H, Naito C, Azuma K, Tenno T, Hiroaki H, et al. Ecl1 Is a Zinc-Binding Protein Involved in the Zinc-Limitation-Dependent Extension of Chronological Life Span in Fission Yeast. Molecular Genetics and Genomics. 2017;292(2):475–481. pmid:28160081
- 55. Huang DW, Sherman BT, Lempicki RA. Systematic and Integrative Analysis of Large Gene Lists Using DAVID Bioinformatics Resources. Nature Protocols. 2009;4(1):44.
- 56. Huang DW, Sherman BT, Lempicki RA. Bioinformatics Enrichment Tools: Paths toward the Comprehensive Functional Analysis of Large Gene Lists. Nucleic Acids Research. 2009;37(1):1–13.
- 57. Kanehisa M, Goto S. KEGG: Kyoto Encyclopedia of Genes and Genomes. Nucleic Acids Research. 2000;28(1):27–30. pmid:10592173
- 58. Finley D, Ulrich HD, Sommer T, Kaiser P. The Ubiquitin–Proteasome System of Saccharomyces Cerevisiae. Genetics. 2012;192(2):319–360. pmid:23028185
- 59. Kaeberlein M, McVey M, Guarente L. The SIR2/3/4 Complex and SIR2 Alone Promote Longevity in Saccharomyces Cerevisiae by Two Different Mechanisms. Genes & Development. 1999;13(19):2570–2580. pmid:10521401
- 60. Sarnoski EA, Song R, Ertekin E, Koonce N, Acar M. Fundamental Characteristics of Single-Cell Aging in Diploid Yeast. iScience. 2018;7:96–109. pmid:30267689
- 61.
Kuhn M, Weston S, Williams A, Keefer C, Engelhardt A, Cooper T, et al. Caret: Classification and Regression Training; 2019.
- 62. Petrascheck M, Ye X, Buck LB. A High-Throughput Screen for Chemicals That Increase the Lifespan of Caenorhabditis Elegans. Annals of the New York Academy of Sciences. 2009;1170(1):698–701. pmid:19686215
- 63. Sarnoski EA, Liu P, Acar M. A High-Throughput Screen for Yeast Replicative Lifespan Identifies Lifespan-Extending Compounds. Cell Reports. 2017;21(9):2639–2646. pmid:29186697
- 64. Barardo D, Thornton D, Thoppil H, Walsh M, Sharifi S, Ferreira S, et al. The DrugAge Database of Aging-Related Drugs. Aging Cell. 2017;16(3):594–597. pmid:28299908
- 65. Gómez-Bombarelli R, Wei JN, Duvenaud D, Hernández-Lobato JM, Sánchez-Lengeling B, Sheberla D, et al. Automatic Chemical Design Using a Data-Driven Continuous Representation of Molecules. ACS Central Science. 2018;4(2):268–276. pmid:29532027
- 66. Song R, Sarnoski EA, Acar M. The Systems Biology of Single-Cell Aging. iScience. 2018;7:154–169. pmid:30267677
- 67. Stroustrup N, Anthony WE, Nash ZM, Gowda V, Gomez A, López-Moyado IF, et al. The Temporal Scaling of Caenorhabditis Elegans Ageing. Nature. 2016;530(7588):103–107. pmid:26814965
- 68. Liu P, Acar M. The Generational Scalability of Single-Cell Replicative Aging. Science Advances. 2018;4(1):eaao4666. pmid:29399632
- 69. Gelman A. Scaling Regression Inputs by Dividing by Two Standard Deviations. Statistics in Medicine. 2008;27(15):2865–2873. pmid:17960576