Age density patterns in patients medical conditions: A clustering approach

doi:10.1371/journal.pcbi.1006115

Fig 1.

(A) The distribution of age in patients. (B) The cumulative density function of the ICD-10 codes by the number of observations in the data.

More »

Expand

Fig 2.

Network representation of the ICD-10 grouped in their 22 category chapters.

The weight of the links represent the number of co-occurrences in the patients records and the size of the nodes represent the frequency of each chapter.

More »

Expand

Fig 3.

Age distribution for patients with Chickenpox (A) and Glaucoma (B).

More »

Expand

Fig 4.

Lines in gray represent cumulative distribution of P(age|patients ∈ c) and lines in red are the cluster averages for illustration.

The clusters of ICD-10 codes given by the HAC are labeled from A to F. Cluster A of ICD-10 codes have more concentration towards infants and children. Cluster B of diseases having a density closer to a uniform but with a tendency to have relatively more concentration in teenage years and early adulthood. Cluster C has the narrowest concentration of age in the thirties. Cluster D groups codes that distribute uniformly in all ages. Cluster E groups codes for ages over 60. Cluster F groups ICD-10 codes in patients over 70.

More »

Expand

Fig 5.

Hierarchical clustering with a depth of six in the dendrogram tree, branches of depth higher than six are represented by the ICD-10 code that is most common in that branch.

The frequency of each ICD-10 code is in parenthesis in percentage of the total population of patients. The alphabet letters assignments correspond to the clusters discussed in Fig 4.

More »

Expand

Fig 6.

Patient characteristics per cluster.

(A) Sex distribution. (B) Age distribution. (C) Probability of associations between our identified clusters and the category chapters of ICD10 codes (1 − (p- value)). The alphabet letters correspond to the clusters discussed in Fig 4.

More »

Expand

Fig 7.

The comorbidity network of highest two thousand values of relative risk (i.e. comorbidity).

Nodes in the network are ICD-10 codes and edges represent the relative risk between the disease codes, the edges displayed in the figure belong to the highest two thousand relative risk values for purposes of visualization. Edges in the network (A) show intra-cluster comorbidities and edges in network (B) shows the inter-cluster comorbidities.

More »

Expand

Fig 8.

The distribution of relative risk for inter versus intra cluster edges.

In gray is the distribution of relative risk of inter-cluster edges. In red are the distributions of relative risk for intra-cluster edges for the respective cluster.

More »

Expand