Dissecting the relationships of IgG subclasses and complements in membranous lupus nephritis and idiopathic membranous nephropathy

Membranous lupus nephritis (MLN) and idiopathic membranous nephropathy (IMN) are kidney diseases with similar morphology, but distinct etiologies, both producing glomeruli with immune deposits. Immunoglobulins and complements, the main components of the deposits, can be detected by immunofluorescence (IF) microscopy. Previous researches characterized the immune deposits only individually, but not the interactions between them. To study these relationships we analyzed an IF profile of IgG subclasses and complements (IgG1, IgG2, IgG3, IgG4, C3, C1q, and C4) in 53 and 95 cases of biopsy-confirmed MLNs and IMNs, respectively, mainly using information theory and Bayesian networks. We identified significant entropy differences between MLN and IMN for all markers except C3 and IgG1, but mutual information (a measure of mutual dependence) were not significantly different for all the pairs of markers. The entropy differences between MLN and IMN, therefore, were not attributable to the mutual information. These findings suggest that disease type directly and/or indirectly influences the glomerular deposits of most of IgG subclasses and complements, and that the interactions between any pair of the markers were similar between the two diseases. A Markov chain of IgG subclasses was derived from the mutual information about each pair of IgG subclass. Finally we developed an integrated disease model, consistent with the previous findings, describing the glomerular immune deposits of the IgG subclasses and complements based on a Bayesian network using the Markov chain of IgG subclasses as seed. The relationships between the markers were effectively explored by information theory and Bayesian network. Although deposits of IgG subclasses and complements depended on both disease type and the other markers, the interaction between the markers appears conserved, independent from the disease type. The disease model provided an integrated and intuitive representation of the relationships of the IgG subclasses and complements in MLN and IMN.


Introduction
Membranous lupus nephritis (MLN) and idiopathic membranous nephropathy (IMN) are morphologically similar renal diseases exhibiting subepithelial immune deposits mainly composed of immune complexes and/or complements with minimal inflammatory reactions [1][2][3][4][5]. Though morphologically similar, the diseases are etiologically distinct and should be differentiated because of their clinical importance. MLN is one of the complications of systemic lupus erythematosus (SLE), whose main antigens include autoantigens against dsDNA, histones, and ribonucleoproteins [6]. In IMN, the M-type phospholipase A2 receptor is believed to be the target antigen [7,8]. The deposition of immune complexes against these antigens in the subepithelial portion of the glomerulus is a key pathologic feature of these diseases and the cause of the main clinical symptoms such as proteinuria and nephrotic syndrome.
Because IgG is a key component of these immune deposits, special attention has been given to them. According to the early results, IgG1, IgG2, and IgG3 tend to be highly expressed in MLN, and IgG1 and IgG4 in IMN [9][10][11][12][13]. Though a tight association between IgG subclasses and disease entities was revealed in these studies, only the behavior of individual markers was compared and the interactions between IgG subclasses in MLN and IMN were not considered. The glomerular deposits of a marker may be dependent not only on disease type, but also on the other markers.
We previously attempted to replicate the experimental findings and to develop more advanced data analysis procedures [12]. Specifically we used heatmap visualization with hierarchical clustering to reveal differential patterns of IgG subclass deposits between these diseases, and then developed predictive models using decision trees to estimate how much they improved diagnostic accuracy compared to naïve analysis. While these research goals were achieved, the interactions between IgG subclasses were still not clearly revealed.
According to another study performed by Huang et al., IgG1 was predominant in early IMN unlike late IMN where IgG4 was known to be predominant [14]. In data analysis, they introduced the concept of predominance and codominance to select the most representative markers. Although this study provided new insights on the role of IgG subclasses in IMN and more advanced data analysis procedures beyond individual marker analysis were developed, again the detailed interactions between IgG subclasses were not revealed.
Information theory provides a set of metrics useful in exploring the relationships between multiple random variables. These metrics can be directly applied to IgG subclasses to describe their behavior in MLN and IMN. Entropy (Shannon entropy) is a measure of the uncertainty of a random variable having a specific probability distribution [15], and mutual information is a measure of the mutual dependence of two random variables [16]. By measuring entropy and mutual information we can estimate how much information a variable contains and how much of it is shared with another variable. These properties of entropy and mutual information are very useful for global evaluation of the relationships between multiple biomarkers and constructing a more complete disease model.
Another approach to evaluating the relationships between random variables is to construct a Bayesian network. A Bayesian network is a graphical model that represents probabilistic relationships between random variables using a directed acyclic graph (DAG) [17]. The structure of the network can be obtained directly from data, but it is very sensitive to noise, thus requiring large amounts of data. Therefore it is desirable to identify the probable structure of a network before learning the structure from data, using prior knowledge or information obtained from other sources.
In this study, we investigated the relationships including interactions of IgG subclasses and/or complements using information theory approaches, and attempted to build integrated disease models for MLN and IMN using Bayesian networks. In addition to IgG subclasses, complements were analyzed together because complements are important immune mediators activated by immunoglobulins and easy to identify by IF microscopy. When developing the structure of the Bayesian network, selection criteria were applied using the information on IgG subclasses and complements in MLN and IMN obtained from information theory-based exploration of the data.

Case selection and data collection
In our department IF staining for IgG subclasses, C3, C1q, and C4 has been routinely conducted in cases of MLN and IMN. The present data were collected from the database of pathology reports from 2004 to 2015. The cases of MLN were limited to those where the histologic findings were compatible with class V lupus nephritis according to the International Society of Nephrology/Renal Pathology Society 2003 classification of LN [18] and antinuclear antibodies were detected at the time of onset or during the progress of the disease. The cases of IMN were limited to those where the histologic findings fitted the diagnostic criteria for IMN and no secondary etiologies were found at the onset or during the progression of the disease. Among 64 cases of MLN and 156 cases of IMN found by database query, we selected 148 cases, 53 of MLN and 95 of IMN based on the diagnostic criteria, sample quality, and availability of data. In the departmental protocol for renal biopsy, IF intensities for these markers are semiquantitatively evaluated in five renal compartments (glomerular capillary wall, mesangium, tubule, interstitium, and blood vessel) on a scale of 0 to 3. Only the data for the glomerular capillary wall and mesangium were used in our analysis because these data were considered most informative. To obtain more robust estimates of metrics of information theory and structures of Bayesian network, IF intensities were dichotomized into positive if raw IF intensities were 1, 2, or 3 and negative if 0. We, therefore, used either raw or dichotomized IF intensities depending on the type of analysis (PAC: raw data, analyses using information theory and Bayesian network: dichotomized data). To evaluate inter-observer variability, two pathologists (MP & WN) independently undertook IF scoring for 14 feasible cases and chance-corrected agreement (Cohen's kappa) were 0.70 and 0.77 for raw and dichotomized IF intensities, respectively. This study was carried out in accordance with the code of ethics of the World Medical Association (Declaration of Helsinki), and was approved by the Institutional Review Board (IRB) of Hanyang University Hospital.

Analysis of global trends of IgG subclasses and complements in MLN and IMN
Before conducting the main analyses, we used principle component analysis (PCA) to identify global data trends because our data were of high dimensionality, and the pattern of high dimensional data is usually explored by PCA with visualization [19]. Specifically, data for 14 variables (7 IF markers x 2 compartments) were mapped into a space with new orthogonal axes. This space was constructed by linear transformation of each data point so as to explain the variability of the data in the most probable way. Using PCA and visualization, we attempted to see whether MLN and IMN were clearly distinct and whether there were multiple clusters indicating multiple disease subtypes.

Exploring the relationships between each pair of the IgG subclasses and complements in MLN and IMN using information theory
We explored the relationships between any biomarkers that could be represented as random variables in MLN and IMN using Shannon entropy and mutual information instead of Pearson's correlation because every data point of our raw data was ordinal. The biomarkers included IF intensities of IgG1, IgG2, IgG3 IgG4, C3, C1q, and C4 and the disease type (whether a sample corresponded to MLN or IMN). By using entropy and mutual information, the dependence and independence of multiple random variables can be easily assessed.
As the first step in the analysis using information theory we constructed information diagrams for various queries concerning IgG subclasses, complements, and disease type (Fig 1). An information diagram is an integrated and intuitive representation of the relationships between multiple random variables in a Venn diagram-like structure which is suitable for visualizing entropy, mutual information, and other information. An information diagram can give the answers to various queries such as how much of a variable is uncertain, whether two variables are independent, and how much one can know the value of a given variable using the values of the other variables.
As a second step, we compared the differences of entropies and mutual information between MLN and IMN. By identifying significant differences of entropies between MLN and IMN, we can estimate which markers are differently expressed in MLN and IMN. Likewise if significant differences of mutual information between two markers are not readily shown, it may imply the conservation of the interaction between the two independent from the disease type. We used permutation tests to examine statistical significance, first obtaining the null distribution by permuting the disease state labels (MLN or IMN) and then computing the statistics.

Disease model construction based on a Bayesian network
To provide an integrated view of the relationships of IgG subclasses and complements in MLN and IMN, we constructed a disease model based on a Bayesian network, which is a graph model consisting of IgG subclasses, complements, and the disease type. The structure of a Bayesian network can be directly derived from the data, and this process is called structural learning. The results of entropy and mutual information analysis are used as prior knowledge during the structural learning. During this process important relationships are selected and less important ones are filtered. The Tabu learning algorithm was selected for the structural learning because it is fast and robust, and widely used [20]. For actual implementation, we used the R bnlearn package because the Tabu algorithm was already implemented in this package [21].

Distinct IgG subclass and complement profiles in MLN and IMN
We investigated the global trends of IgG subclasses and complement profiles in MLN and IMN using principal component analysis (PCA) (Fig 2). MLN and IMN were differentially distributed in the PCA-transformed space although there was a small area of mixed MLN and IMN cases. This result implies that IgG subclasses and complement profiles are distinctly different in MLN and IMN (Fig 2(a)). We looked for possible subclusters, which suggest the presence of subtypes of a disease, but failed to identify any distinct subclusters in MLN or IMN.
We also conducted PCAs of IgG subclass profiles only, and complement profiles only (Fig 2  (b) & 2(c), respectively). PCA of the IgG subclass profiles revealed a slightly less distinct distribution of MLN and IMN cases than the total profile. PCA of the complement profiles yielded rather haphazardly arranged data points in terms of disease state. These findings suggest that IgG subclass profiles and complement profiles provide complementary information about disease type.
Although differential distribution of MLN and IMN were identified by PCA, it was difficult to identify specific roles of a marker or a marker set only using PCA. Subsequently these data, therefore, were analyzed using information theory and Bayesian network.

Analysis using information theoretical measures has suggested that the interactions between any pair of IgG subclasses and/or complements may not be different in MLN and IMN
To more accurately define the features of IgG subclasses and complements in MLN and IMN, we computed various metrics of information theory that can be used to quantify unique and/ or shared portions of information concerning multiple random variables. These quantities can yield insights into the relationships between IgG subclasses, complements, and disease type.
We constructed information diagrams for 1) IgG subclass, complement, and disease type, 2) IgG subclasses, and 3) complements (Fig 1). The Information diagram for IgG subclass, complement, and disease type showed that about 90% of disease type information was shared with either IgG subclasses or complements, and IgG subclasses and complements share > 45% of their own information (Fig 1(a)). The IgG subclasses shared more information with disease type than the complements did. These findings indicate that in most cases disease type could be inferred from the IgG subclasses and complements, and that the former are more informative about disease type than the latter. These findings could also be related to the results of PCAs. The information diagram for the IgG subclasses showed that IgG4 appeared to be relatively independent of the other three IgG subclasses, which is consistent with the known uniqueness of IgG4 in many diseases (Fig 1(b)) [22][23][24][25][26]. The information diagram for complements showed that C1q and C3 have more information than C4, and that C4 is relatively independent of C1q and C3.
To identify the differences of behavior of IgG subclasses, complements, and their interactions in MLN and IMN, we investigated how entropy and mutual information changed depending on the disease type. All the entropies of the IgG subclasses or complements except IgG4 decreased when one changed from LMN to IMN (Fig 3(a)) while mutual information only changed minimally (Fig 3(b)). We investigated whether these changes were statistically significant by permutation testing. Interestingly all the mutual information between any pair of IgG subclasses or complements did not differ significantly between MLN and IMN (S1 Table) whereas all the entropies of the IgG subclasses and complements were significantly different except IgG1 and C3 (Table 1). To confirm that the similar values of mutual information in MLN and IMN were due to similar probability distributions, we compared the joint probability distributions of each pair of the markers in MLN with those in IMN by visual inspection. All the joint distributions appeared similar. Differential expression of most of IgG subclasses and complements in MLN vs. IMN is well known, but the interactions between these markers may not differ in the two diseases. It should be noted that the entropy differences between MLN and IMN appeared not attributable to the mutual information. These findings suggest that the disease type influences the glomerular deposition of most of the markers directly and/or indirectly via the other markers, but that their interactions are similar in the two disease conditions, suggesting that IgG subclasses, complements, and each interactions between them would be elements of a subsystem of an immune systems in MLN and IMN. These findings prompted us to construct a single disease model applicable to both MLN and IMN using IgG subclasses and complements, rather than separate disease models for MLN and IMN.
The mutual information between IgG1, IgG2, IgG3, and IgG4 enabled us to construct a Markov chain, IgG3!IgG2!IgG1!IgG4 (Fig 4). According to data-processing inequality, if there is a Markov chain, X!Y!Z, then I(X;Y)!I(X;Z) [27]. If one assumes that X, Y, Z form a single chain Markov chain in all possible ways, the reverse is also true. Based on the assumption that IgG1, IgG2, IgG3, IgG4 form a single chain Markov chain, the only structure satisfying the conditions for the mutual information was IgG3!IgG2!IgG1!IgG4. This assumption was justifiable in the initial stage of model construction, since selecting a simple structure is the better option, and it can be changed to a more complex one if severe bias is found.

Integrated disease model of MLN and IMN representing the interactions among IgG subclasses and complements
To concisely represent glomerular deposits of IgG subclasses and complements in MLN and IMN, we developed a disease model based on a Bayesian network (Fig 5). The results of the analyses using information theory guided the structure of the network. First the robustness of the interactions between any pair of markers when switching from MLN to IMN suggested an integrated model rather than separate models for MLN and IMN. Second the inferred Markov chain (IgG3!IgG2!IgG1!IgG4) was used as a seed in constructing the network.
The network consisted of 8 nodes representing IgG subclasses, complements, and disease type (MLN or IMN), and 11 directed edges representing cause and effect relationships between the nodes, with five edges between the markers and six nodes between disease type and markers (Fig 5). A part of the network contained the Markov chain (IgG3!IgG2!IgG1!IgG4) used as a seed. All the markers were directly influenced by the disease type except C3 whose entropy was not significantly different in MLN and IMN. The glomerular deposits of IgG1, IgG2, IgG4, and C1q were determined by two factors, the disease type and the value of the preceding marker. C4 did not have any relation with the other markers and was only influenced by the disease type. Interestingly there were no edges from complements to IgG subclasses, and IgG1 directly influenced C3. These findings partly justify the model because complements are directly activated by immunoglobulins and we could not find any reports that complements directly activate immunoglobulins. The network also illustrated the strengths of the cause and effect relationships in terms of conditional probabilities. We could easily confirm from the heatmaps that IgG2, IgG3, and C4 are rarely expressed in IMN and IgG4 in MLN (Fig 5). Our model thus describes all the important relationships between the markers effectively in one diagram.

Discussion
The core components of both MLN and IMN are subepithelial immune deposits, but the chemical composition of the deposits tends to be different [9][10][11]. However because the composition of the immune deposits in MLN and IMN was diverse and complex, previous researches focusing on the behavior of individual markers was insufficient to model the global nature of the deposits. As we believed that the interactions between the markers would have a considerable effect on the IgG subclasses and complement profiles in MLN and IMN, we focused on quantifying them. We proved that the interactions were similar in MLN and IMN. These findings are significant by itself and also provided an important basis for an integrated disease model. The integrated disease model was consistent with the previously known behavior of IgG subclasses and complements in MLN and IMN, and provided a concise description of the IgG subclasses and complement profiles in MLN and IMN.
Information theory made key contributions to quantifying the interactions between the markers in this research. Although Pearson's correlation is the metric of choice in evaluating the interactions between two random variables in most nephrology research, it is insensitive to noncontinuous variables, only captures positive or negative relationships, and is sensitive to noise in the data [28]. Because the variables representing glomerular immune deposits were ordinal, mutual information was a far better choice than Pearson's correlation. Mutual information is Analysis of IgG subclasses and complements in membranous lupus nephritis easily integrated with the other metrics of information theory and all the metrics of information theory can be represented as an information diagram providing clues about relationships, including the dependencies and independencies among multiple variables. It was particularly useful when testing whether a similar kind of interaction between markers exists in MLN and IMN, because the interactions can be assessed from the joint distributions, and mutual information is a quantity totally dependent on the joint distributions. The results of analyses using the metrics of information theory were significant by themselves and also guided the direction of subsequent analyses, particularly of the structure of the Bayesian network. These kinds of analytic schemes are applicable in other research if multiple random variables are thought to interact with each other, and most of the random variables are categorical or ordinal.
Using a Bayesian network, we were able to concisely represent the complex behaviors of the IgG subclasses and complements in MLN and IMN so providing an integrated knowledge resource. The resulting model displayed all the important information regarding IgG subclasses, complements, MLN, and IMN in one diagram, and was consistent with the previous findings about the markers [9][10][11][12][13][14]. The existence of an edge starting from IgG and ending in the complements, and the absence of edges starting from complement and ending with IgG may also validate the model because immunoglobulins activate complements by physical contact, and not vice versa [29,30]. It was also easily visualized from the heatmaps that IgG2, IgG3, and C4 are rarely expressed in IMN and IgG4 in MLN. These findings may suggest that the disease model can be served as an alternative representation of previously known findings and new hypotheses can be generated from this model.
One thing to be cautious about when interpreting the Bayesian network is that the edges of the network do not indicate the existence of physical interaction between connected nodes but rather a flow of information. For example, it would not be reasonable to assume that IgG3 physically activates IgG2; instead the molecules responsible for activating IgG2 would be other immune mediators such as cytokines. This network, therefore, could be extended by incorporating experimental data on such immune mediators without greatly changing the structure of the network. It is also worth noting that the current network is the most informative one available until such additional data have been acquired.
The disease model represented as a Bayesian network can be also used in clinical decision making where differentiation between MLN and IMN is required based on IgG subclass and complement profile. When the diagnosis is ambiguous and IF intensities of IgG subclasses and complements are given, we can obtain conditional probabilities of each of MLN and IMN using the inference rule of the network, and the one with higher conditional probability would be chosen as the most probable diagnosis. The diagnostic accuracy of the network can be compared with a similar predictive model we constructed using decision tree [12]. The inference of the diagnosis is also feasible even when IgG subclass or complement profile has some missing values. Likewise the disease model is useful in many different ways such as representing the complex relationships and inferring the diagnosis, not confined to our description.
Although the interactions between the markers in MLN and IMN were quantified using mutual information, the detailed mechanisms of the interactions cannot be specified by mutual information alone. For example, one cannot conclude whether IgG3 activates or inactivates IgG2 using only mutual information. Further studies are needed to specify such mechanisms. However the model would still be useful for designing experiments to specify these mechanisms because less important interactions have presumably already been filtered out. Without such guidance, the experimental cost would be too high. It would be also useful in exploring the behaviors of immune mediators other than IgG subclasses or complements. Let us suppose that substance A and B are highly secreted dominant immune mediators in MLN and IMN, respectively. If the experimental results were that A activated both IgG2 and IgG3 the behaviors of substance B would be quite limited because the interactions between any pair of IgG subclasses and complements should be similar in MLN and IMN. These things would be helpful in designing the experiments about the substance B.
The kinetics of immune deposits in the glomeruli in MLN and IMN are also important for understanding the roles of IgG subclasses and complements in these diseases. In particular, IgG subclass switching is a crucial event determining the composition of the immune deposits [14]. The identity of immune mediators and transcription factors involved in class switching would partly account for the particular profiles of the immune deposits in MLN and IMN [31][32][33][34]. Furthermore temporal associations between these factors need to be considered to formulate a complete model. In connection with class switching, it is interesting that the Markov chain, IgG3!IgG2!IgG1!IgG4 generated using mutual information has a similar sequence as that of the IgG subclass genes in the genome, IgG3!IgG1!IgG2!IgG4 [35]. This suggests that the immune deposits evaluated at the time of diagnosis might be footprints of multiple rounds of subclass switching, an idea that can be tested by simulation studies. Additional simulation studies could generate the most likely time-dependent subclass switching patterns fitting the IgG subclass and complement profiles.
One of limitations of our study is that the raw data of our study was obtained from the pathology report without reviewing the whole cases. However these kinds of study using electrical health records for medical research are increasing in number [36]. If adequate quality control procedures have been regularly conducted, much of biases would not be found compared to traditional well-controlled studies. In future studies, we could explore these issues by investigating the consistency of the disease model after adding random noises.
In conclusion, we have proved using information theory the similarity of the interactions between any pair of markers of the IgG subclasses and complements when switching from MLN to IMN, and have developed a disease model of MLN and IMN dealing with the glomerular deposits of IgG subclasses and complements. It should be also noted that our disease model effectively visualized many of known or latent findings of the IgG subclasses and complements in MLN and IMN in one diagram. It would be interesting to investigate whether our model is valid only for MLN and IMN or also for other glomerular diseases. Although further studies are necessary to augment this model by incorporating immune mediators, class switching events, and temporal factors, it provides a foundation for constructing more advanced ones.
Supporting information S1