Fig 1.
The first two principal components of the PCA based on the RPKMs of the coding genes.
The areas are the convex hulls of the conditions. The largest point of one color depicts the center of a hull. A, B, and D are the same PCA analysis with the same coordinates, where in D all conditions except OA are visible, in A and B only three of them for a better overview. C is a PCA with OA, where four conditions are shown to depict the variability of OA. Number of samples: 10 arthralgia, 57 earlyRA, 95 establishedRA, 27 normal, 22 OA, 19 RAtripleDMARD and 6 UA.
Fig 2.
The first two principal components of the PCA considering RPKMs of the coding genes.
The areas are the convex hull of the condition. The largest point of one color depicts the centers of the hull. Only those conditions are shown where more than ten samples were available for male and female individuals. Number of samples: 33 earlyRAF, 24 earlyRAM, 73 establishedRAF, 22 establishedRAM, 13 NormalF, 14 NormalM.
Fig 3.
GO (BP) term enrichments of DEGs in earlyRA, arthralgia, OA and undifferentiated arthritis.
Base state is normal, each term has the DEG enrichment of the four conditions in the circle's quadrants according to the legend bottom right. Red indicates there is an enrichment in the upregulated DEGs, blue indicates an enrichment in the down-regulated DEGs and gray indicates no enrichment. The node size represents the number of genes in the annotation for that term. The edge thickness represents the degree of overlap between the gene-sets of two terms. The terms in a larger font are a selection for meaningful terms specifically for earlyRA (somewhat arbitrary). See methods and main text for filtering and discussion. Number of samples: 10 arthralgia, 57 earlyRA, 22 OA and 6 UA.
Fig 4.
Log2 fold-changes of gene expression between men and women in early RA, established RA and normal condition.
Only genes are shown which are significantly differentially expressed in men and women. (A) The sex-ratio of gene expression in early RA and normal condition. The size and shape shows the significance in differences of men and women: the large circles are genes significantly differentially expressed between the sexes in early RA and normal condition, these genes are also labelled. Small squares mean a significant difference between men and woman only in early RA, small diamonds mean a significant difference only between healthy men and woman. The color represents the significance of the difference in expression between normal and early RA. (B) The sex-ratio of gene expression in established RA and early RA. There are no gene significantly differentially expressed in men and women in established RA and early RA and differentially expressed between established RA and early RA. Genes are named if one of the gene expression sex-ratios is significant: squares mean a significant difference between men and woman only in established RA, small diamonds mean a significant difference only between men and woman in early RA.
Fig 5.
Average fold-changes and counts of different biotypes of genes.
The labels on the x-axis mean the change of this condition relative to normal. Only significant changes are regarded. The labels for the biotypes of genes are defined by Ensembl (the biotypes of genes). Missing bars mean that there was no significant change in any gene of this biotype. (A) The log10 count of the significantly differentially expressed gene by biotype. Each column consists of two bars: from 0 to the positive side are the (log10) numbers of significantly higher expressed genes, from 0 to the negative side are the numbers of significantly lower expressed genes. (B) The average log2 fold-change of the significantly differentially expressed genes. It is to see that the average log2 fold-change roughly corresponds in the difference of the counts of the significantly higher and lower expressed genes. Number of samples: 10 arthralgia, 57 earlyRA, 95 establishedRA, 27 normal, 22 OA, 19 RAtripleDMARD and 6 UA.
Fig 6.
Classification model for distinguish between early RA and normal based on RPKMs.
The model consists only of the single rule LXN > 3.8 AND CXCL8 > 0.04 -> early RA. It corresponds to an accuracy of 92% at the 10-fold cross-validation (p-value 2.02*10−13). Variables for model-generation were pre-selected upon the intersection of single-variable comparisons. This pre-selection weakens the cross-validation as it is no part of it. This model is only intended for a distinction between early RA and normal as simple as possible based on gene expression. In total there are 84 samples. The RPKM values are cut at 10 and at 1, respectively.
Fig 7.
Classification model for distinguish between early RA and not healthy but also not RA based on RPKMs.
'not RA' as present in the data, which is OA, arthralgia and undifferentiated arthritis, shortcut 'notRAarth'. The model consists of four rules, where three of them are shown here graphically. Panels A and B depict the first rule, where in B are only cases left which are higher than the threshold in A. The red shaded area in B shows the cases hit by the first rule, which are all earlyRA. Panel C shows the second and third rule, where the model output for the red shaded area is earlyRA and for the blue shaded area is notRAarth. The complete model as text is shown in D. The threshold values are rounded to one decimal place. The model corresponds to an accuracy of 86% at the 10-fold cross-validation (p-value 1.7*10−12). Variables for model-generation were pre-selected upon the intersection of single-variable comparisons. This pre-selection weakens the cross-validation as it is no part of it. This model is only intended for a distinction between early RA and other conditions as simple as possible based on gene expression. In total there were 95 samples as input data.