Figure 1.
Data characteristics and basic comorbidity statistics.
A. Age distribution for the study population. B. Demographic breakdown of the study population. C. Prevalence distribution for all diseases measured using ICD9 codes at the 5 digit level. D. Distribution of the relative risk (RR) between all disease pairs. E. Distribution of the φ-correlation between all disease pairs. F. Scatter plot between the φ-correlation and the relative risk of disease pairs.
Figure 2.
Phenotypic Disease Networks (PDNs).
Nodes are diseases; links are correlations. Node color identifies the ICD9 category; node size is proportional to disease prevalence. Link color indicates correlation strength. A. PDN constructed using RR. Only statistically significant links with RRij>20 are shown. B. PDN built using φ-correlation. Here all statistically significant links where φ>0.06 are shown.
Figure 3.
The Phenotypic Disease Network and disease dynamics.
A. Schematic representation of the three dynamical questions explore here. B. Average φ-correlation between diseases diagnosed in the first two and last two visits for the 946,580 patients with 4 visits (green) and when we consider a randomized set of diseases for the first two visits (red). C. Same as B but for the RR-PDN. D. Ratio between the average φ-correlation among diagnoses received by a patient in its first two and last two visits relative to the control case. E. same as D but for the RR-PDN. F. Gender and race differences. The subset of Fig 2 B where all diseases connected to hypertension and ischemic heart disease is shown. Blue links indicate comorbidities that are strongest among black males; whereas red links indicate comorbidities that are strongest among white males (see legend).
Figure 4.
Disease connectivity and lethality.
A. Scatter plot between the connectivity of a disease measured in the φ-PDN and the percent of patients that died 8 years after this disease was first observed in our data set. B. Same as A for the RR-PDN. C. percent of patients that died 8 years after this disease was first observed in our data set as a function of disease prevalence. D. same as A showing only neoplasms. E. same as B showing only neoplasms. F. same as A showing only mental disorders. G. same as B showing only mental disorders.
Figure 5.
Connectivity lethality control.
A. Histogram with the number of visits for each patient for which the year of death is known. B. Histogram for the number of diagnosis assigned to each patient for which the year of death is known. C. Correlation between the average connectivity of the diagnosis assigned to a patient and the number of years survived after the last diagnosis was recorded for groups of patients with the same number of hospital visits. D. Correlation between the average connectivity of the diagnosis assigned to a patient and the number of years survived after the last diagnosis was recorded for groups of patients with the same number of total number of diagnosis assigned. Error margins in C and D represent 95% confidence intervals.
Figure 6.
Directionality of disease progression.
A. Distribution of λ1→2 B. Disease precedence Λi as a function of disease prevalence Pi. The inset shows the same plot after removing the trend from disease precedence (Λi* = ΛI+496.08log10(Pi)-2446.2) C. Disease connectivity calculated from the φ-PDN as a function of Λi*. The green line shows the best fit for the 518 diseases with a prevalence larger than 1/500 (green circles) while the red line shows the best fit for the 463 diseases at the center of the cloud (red points). The correlation coefficient is represented by r and its associated p-value by p. D. Percentage of patients that died 2 and 8 years after being diagnosed with a disease with a given detrended precedence Λi*. The green lines show the best fit for all the 518 diseases (green circles) while the red lines show the fit for the 434 (top panel) and 465 (bottom panel) diseases at the bulk of the cloud.