Multiple-to-multiple path analysis model

Yujie Du; Junli Du; Xi Liu; Zhifa Yuan

doi:10.1371/journal.pone.0247722

Abstract

One-to-multiple path analysis model describes the regulation mechanism of multiple independent variables to one dependent variable by dividing the correlation coefficient and the determination coefficient. How to analyse more complex regulation mechanisms of multiple independent variables to multiple dependent variables? Similarly, according to multiple-to-multiple linear regression analysis, multiple-to-multiple path analysis model was proposed in this paper and it demonstrated more complex regulation mechanisms among multiple independent variables and multiple dependent variables by dividing the generalized determination coefficient. Differently, three other types of paths were generated in multiple-to-multiple path analysis model in that the correlation among multiple dependent variables was considered. Then, the decision coefficient of each independent variable was constructed for dependent variables system, and its hypothesis testing statistics were given. Finally, the research example of the wheat breeding rules in arid area demonstrated that the multiple-to-multiple path analysis considering more correlation information can get better results.

Citation: Du Y, Du J, Liu X, Yuan Z (2021) Multiple-to-multiple path analysis model. PLoS ONE 16(3): e0247722. https://doi.org/10.1371/journal.pone.0247722

Editor: Mohammadreza Hadizadeh, Central State University & Ohio University, UNITED STATES

Received: November 25, 2020; Accepted: February 11, 2021; Published: March 4, 2021

Copyright: © 2021 Du et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: The data underlying the results presented in the study are available from: Zhang Z, Wang D. Wheat drought-resistant ecological breeding. Xi’an: Shaanxi People’s Education Press; 1992. China.

Funding: This work was financially supported by Chinese Universities Scientific Fund (Grant Nos. 2452015082 and Z1090219004). The funders had no role in decision to publish, or preparation of the manuscript.

Competing interests: The authors have declared that no competing interests exist.

1 Introduction

The regression analysis, as one of the most widely used statistical methodologies, focuses on studying the relations between dependent variables and independent variables. However, the regression analysis worries less about the correlation mechanisms that may exist among the independent variables [1]. In 1918–1921, the issue was addressed by the biological geneticist Sewall Wright through developing the path analysis method [2, 3]. Sewall Wright’s path analysis mainly emphasizes decomposing the correlation and total determination in terms of model parameters, and drawing the path diagram. The path diagram is a pictorial representation of a system of simultaneous equations, which presents the picture of the relationships that are assumed and is more clearly than the equations [4]. The concrete decomposition result is to distinguish the three types of effects: direct, indirect and total effects, which can lead to a more comprehensive understanding of the relation between variables. Usually, the indirect effects of a variable are mediated by at least one intervening variable [4]. In fact, the decomposed indirect effects quantify the regulation of variables with correlation. The quantitative expression of regulatory mechanism can make the analysis more thorough and clear. Therefore, the path analysis was later applied in multiple science research fields, such as behavioural science, social science, economics, biology, agriculture, medical science and so on [5–18]. This method seems to be more and more widely used at present.

In terms of methodology research, the path analysis was generalized to the structural equation models (SEMs) through combining the principle of factor analysis and was used to analyse the relations between multivariate blocks of data [19, 20]. The decision coefficient was constructed in the specified path analysis model with no latent variables, which included one dependent variable (as result) and multiple independent variables (as causes), based on the decomposition of total determination coefficient [21]. Here, the specific path analysis model was called one-to-multiple path analysis model with the nature of standard multiple linear regression. The decision coefficient of each independent variable equals to the sum of its direct determination and the correlation indirect determination with the other independent variables. The decision coefficient can express the magnitude and direction of each independent variable influencing the variation of dependent variable. Still further, the importance of each independent variable for dependent variable can be ranked according to the decision coefficient result, which shows that the decision coefficient has the significance of making decisions. Subsequently, the statistical test of the decision coefficient was proposed [22]. The decision coefficient improves the one-to-multiple path analysis model to a certain extent. Later, the one-to-multiple path analysis model was applied in the lint yield of upland cotton research and the KEGG gene pathway regulation mechanisms research [23–25].

However, the causal system including multiple independent variables (as “causes”) and multiple dependent variables (as “results”) are often encountered in practice research. For instance, the different pathways contain the same genes in the KEGG pathway, which demonstrated that the same genes can lead to the different gene functions. Here, multiple identical genes and multiple different gene functions constitute a multiple-to-multiple system. Analysis of the regulatory relationship between genes and gene functions is helpful to the modification and change of gene structure. Similar to this, in breeding field, multiple biological shapes to multiple yield indicators also constitute a multiple-to -multiple system. Determining the importance of multiple biological shapes to multiple yield indicators is helpful to improve the yield and quality of crops. It is assumed that such a causal system does not contain latent variables. Then, the one-to-multiple path analysis model can be used to analyse the importance of each independent variable to one dependent variable and the regulations among multiple independent variables. But, it is frustrating that the results of multiple single one-to-multiple path analysis are often contradictory, so that decision makers feel confused when making decisions. Therefore, it is urgent to find a more suitable model to provide more clear decision-making suggestions for decision-makers in such a more complex system.

In this paper, we attempt to propose the multiple-to-multiple path analysis model according to the multiple-to-multiple linear regression analysis, including multiple independent variables and multiple dependent variables and no latent variables. This model considers the correlation among multiple dependent variables caused by multiple common independent variables on the basis of one-to-multiple path analysis model. The other three types of paths generated besides the two types of paths in one-to-multiple path analysis model. The decomposition of the generalized determination coefficient showed the regulation mechanisms among the multiple independent variables and multiple dependent variables along these five types of paths. And the decision coefficient of each independent variable was used to judge its importance for all dependent variables system. Finally, the effectiveness of the model was verified by an example of the wheat breeding rules in arid area.

2 Method

2.1 Equations and models

The multiple-to-multiple linear regression model is the basis of the multiple-to-multiple path analysis, so it was introduced firstly. Define the following assumptions: the dependent variable of linear regression is Y = (Y₁,Y₂,⋯,Y_p)^T and the independent variable is X = (X₁,X₂,⋯,X_m)^T. Suppose the joint distribution of and is: (1) ∑_xy≠0, when both have been normalized, the joint distribution above becomes (2)

Among them, ρ_x, ρ_xy and ρ_y are the correlation arrays of X,X and Y,Y respectively. ρ is the correlation matrix of [X^T,Y^T]^T. Under the above assumption, the normalized multiple-to-multiple linear regression model is: (3)

In (3), Y_i = [Y_i1,Y_i2,⋯,Y_ip]^T, x_i = [x_i1,x_i2,⋯,x_im]^T, β* is the regression parameter of the model, i = 1,2,⋯,n. Let n be the number of observations. We assumed that ε~N_p(0,∑_e) is the regression residual and has nothing to do with the value of X.

2.2 Regression hypothesis testing

Path analysis can only be carried out when the standardized regression equation is significant. Therefore, we need to perform the following four types of hypothesis tests for regression analysis before path analysis.

2.2.1 Hypothesis testing of generalized complex correlation coefficient r_xy.

In multiple-to-multiple standardized linear regression equations, the joint distribution of X and Y is showed as formula (2), then the generalized determination coefficient is defined as [26]: (4)

In Eq (4), v_xy is the likelihood ratio statistics for testing independence of X and Y. And , , R in |R| is the correlation matrix of X and Y. is the sample linear correlation matrix of X and Y, U is the regression square sum matrix. and are non-zero characteristic roots of B. is the generalized complex correlation coefficient of X and Y. The invalid assumption of r_xy is H₀:∑_xy = 0.When p>2 and m>2, we can use Bartlett’s approximate chi-square test: (5)

2.2.2 Hypothesis testing of regression equation .

The invalid hypothesis is and the corresponding F test statistic is: (6) is the determination coefficient of X to Y_α

2.2.3 Hypothesis testing of components in .

The invalid hypothesis is and the t test statistic is (7)

In (7), , is the α-th element on the main diagonal of . c_jj is the j-th element on the main diagonal of .

2.2.4 Hypothesis testing of [27].

The invalid hypothesis is . The F test statistic is: (8)

In (8), . After the above four hypothesis tests, if the standardized multiple linear regression equation is significant, it is meaningful to perform path analysis.

2.3 Path analysis of

The first step of multiple-to-multiple path analysis is to conduct one-to-multiple path analysis for each dependent variable and all independent variables. According to the established multiple-to-multiple linear regression equation, the path analysis model is performed. The correlation coefficient of each dependent variable Y_α(α = 1,⋯,p) and all independent variables X = (X₁,X₂,⋯,X_m)^T and their determination coefficient were divided following completely the previous one-to-multiple path analysis model on the basis of standardized linear regression equation [25]. Still further, the decision coefficient was constructed using the existing method [21]. According to the theoretical study of multiple linear regression analysis, the system of regular equations R_xxb* = R_xy about the least squares estimation of β* can be rewritten as: So (9) In Eq (9), , . The specific path diagram of the one-to-multiple path analysis model is shown in Fig 1.

Download:

Fig 1. One-to-multiple path analysis diagram.

https://doi.org/10.1371/journal.pone.0247722.g001

2.3.1 The division and path of .

(10)

Obviously, the correlation efficient was divided into m terms. There are two types for this m term: is formed by the path y_α←x_j, so is called the direct effect of x_j on y_α; and is formed by x_j↔x_k→y_α, which is the effect of x_j on y_α through the correlation with x_k and called the indirect effect. Its magnitude can be obtained by multiplying the path coefficients by the correlation coefficient r_jk, including m-1 items. Finally, is the total effect of x_j on y_α, which is the sum of the direct effect and all the indirect effects.

2.3.2 The division and path of .

(11)

Among (11): is the total coefficient of determination of X for Y_α. and its corresponding path is y_α←x→y_α. It is called the direct determination coefficient of x_j to y_α.The corresponding path of is y_α←x_j↔x_k→y_α. It is called the correlation determination coefficient of x_j through the correlation with x_k(k≠j) to y_α.

2.3.3 The decision coefficient R_α(j) and hypothesis test [22].

The comprehensively determine ability of x_j to y_α can be represented by the decision coefficient based on the division of . Its specific expression and hypothesis test are: (12)

The definition indicates that R_α(j) equals to the sum of the direct determination coefficient and the correlation determination coefficient . In fact, the decision coefficient is the sum of all determination coefficients related to x_j. The decision coefficient was used to determine the main decision variables and restrictive variables affecting Y_α.

2.4 Multiple-to-multiple path analysis central theorem

The second step is to conduct multiple-to-multiple path analysis. And the innovation is that the correlation between Y caused by the common cause X is considered and three other types of paths are generated. For convenience of observation, let p = 3, m = 3 as an example to make a multiple-to-multiple path analysis diagram as Fig 2. But, the theoretical analysis is based on m independent variables and p dependent variables.

Download:

Fig 2. Multiple-to-multiple path analysis diagram.

https://doi.org/10.1371/journal.pone.0247722.g002

The multiple-to-multiple path analysis model considered the correlation among different dependent variables compared to the one-to-multiple path analysis model. Accordingly, the central theorem of multiple-to-multiple path analysis is proposed. Based on model (3), for two different Y_α and Y_t, their models are: (13)

In (13), ε_α and ε_t are independent of each other and have nothing to do with the value of X. Since Y_α, Y_t and X have been standardized, the correlation coefficients of Y_α and Y_t, and the corresponding path theoretically is: (14) Considering the sample case, Eq (14) is: (15)

Among them, j = 1,2,⋯,m; k = 1,2,⋯,m; and α = 1,2,⋯,p; t = 1,2,⋯,p. Eq (14) and Eq (15) are called the central theorem of multiple-to-multiple path analysis.

The central theorem demonstrated that , equal to the sum of m² items composite path coefficient. Wherein, the direct path y_α←x_j→y_t has m items. Due to the correlation among independent variables x_j↔x_k(k≠j), the two types of indirect paths were formed as y_α←x_j↔x_k→y_t, y_α←x_k↔x_j→y_t. And x_j↔x_k(k≠j) has items. So the total composite path number is items. In addition, the central theorem also showed that three other types of paths generated when the correlation between different dependent variables y_α and y_t was considered, which was caused by the common X. Therefore, there are five types of paths in multiple-to-multiple path analysis, plus the two types of paths in one-to-multiple path analysis.

In fact, the correlation coefficient in the multiple-to-multiple path analysis central theorem is theoretically the regression square sum matrix U in multiple-to-multiple standardized linear regression. Under the least squares estimation, U can be expressed as follows [16]: (16)

In (16), is the correlation coefficient between y_α and y_t caused by the common cause X. Here, U is the determination coefficient matrix of X to Y. is the coefficient of determination of X to Y_α. is the complex correlation coefficient of X to Y_α. And in statistics, is the correlation coefficient of Y_α and Y_t, and has nothing to do with X in the calculation. is the determining part of Y_α and Y_t to due to the common cause X.

2.5 The division of R²≈tr(B) and its corresponding path

The generalized determination coefficient has been defined using formula (4) before, which was used to reflect the comprehensive determination of all independent variables to all dependent variables [26]. Because the non-zero eigenvalue of B is small and , the result of is small enough to make R²≈tr(B). In fact, tr(B) is the overestimation of R² here. According to R²≈tr(B), the generalized determination coefficient R² was divided as follows: (17)

Among (17), θ_αt is the element in matrix . is the direct determination coefficient of x_j on y_α, and the effect path is y_α←x_j→y_α, j = 1,2,⋯,m; α = 1,2,⋯,p. is the indirect determination coefficient of x_j and x_k on y_α, the effect path is y_α←x_j↔x_k→y_α, jk has items. is the direct determination coefficient of x_j on y_α and y_t, which is caused by the correlation of y_α and y_t because of the common cause x_j. The effect path is y_α←x_j→y_t, j = 1,2,⋯,m, αt has items. is the indirect determination coefficient of x_j and x_k on y_α and y_t. The effect path is y_α←x_j↔x_k→y_t. When α<t, y_α and y_t have items; when j≠k, jk has items. is the indirect determination coefficient of x_j and x_k on y_t and y_α. The effect path is y_α←x_k↔x_j→y_t, when α<t, y_α and y_t have items; when j≠k, kj has items. Therefore, the total number of items divided is:

Formula (17) demonstrates that the generalized determination coefficient R² was divided successfully along the five types of paths stated in multiple-to-multiple path analysis central theorem. The specific path vector structure is: (18)

2.6 The generalized decision coefficient R_y(j)

2.6.1 The definition of R_y(j).

In order to describe the comprehensive decision-making ability of x_j to Y, the generalized decision coefficient R_y(j) was defined as follows: (19)

Obviously, the generalized decision coefficient is the sum of the products of , R_jk(α), R_j(αt) and R_jk(αt)+R_kj(αt) related to x_j in the division and the corresponding elements in on the basis of R²≈tr(B). In (19), R_y(j) is divided into two parts: R_y(j)I and R_y(j)II. R_y(j)I is the determination part of x_j and x_j↔x_k to Y_α. R_y(j)Π is the determination part of x_jand x_j↔x_k to Y_α and Y_t(α≠t) due to the common X. In a word, the generalized decision coefficient includes not only the direct determination of x_j to Y_α, Y_α and Y_t(α≠t), but also the indirect determination of x_j↔x_k(k≠j) to Y_α and Y_α and Y_t(α≠t). Specially, the indirect determination considers the correlation among the independent variables and the correlation among the dependent variables at the same time. Therefore, the decision coefficient R_y(j) can be used to express the comprehensive decision ability of x_j to Y.

2.6.2 The hypothesis testing of R_y(j).

The invalid hypothesis is H₀:E(R_y(j)) = 0 and the corresponding t test statistic is: (20) In (20), .

3 Application

3.1 Datasets

In order to demonstrate the effectiveness of the multiple-to-multiple path analysis, the wheat data in arid areas to explore breeding rules was selected to discuss. In detail, the wheat data included thirty-five varieties. These data were obtained in a completely randomized block test, and each sample was set with three repetitions [28]. In multiple-to-multiple path analysis, three indexes closely related to wheat yield was selected as dependent variables: panicles per plant (y₁), grain number per panicle (y₂) and 1000-grain weight (y₃), and three other indexes were selected as independent variables: bio-mass per plant (x₁), single stem grass weight (x₂) and economic coefficient (x₃). Here, economic coefficient refers to the ratio of economic yield to biological yield of wheat.

3.2 Calculation and results

Firstly, the phenotypic correlation matrix of the sample was calculated and expressed as Eq (21). The number of observations for each variable is n = 105.

(21)

Then, we establish a multiple-to-multiple standardized multiple linear regression equation and calculate the corresponding parameters, the results were written as follow: (22) Right after, four types of hypothesis testing based on the established standardized multiple linear regression model were conducted as follow:

The hypothesis testing of generalized complex correlation coefficient r_xy.
Likelihood ratio statistics of X and Y is v_xy = 0.2987, so χ² = 121.4357**>χ²(3×3), and R² = 1−v_xy = 0.7013, . The results showed that the linear regression of Y to X was extremely significant.
The hypothesis testing of regression equation .
The values of F test statistics are F₁ = 37.9188**, F₂ = 25.394**, F₃ = 14.408**, respectively. They were all greater than F_0.01(3,101) = 4.007, which showed that each standardized regression equation was extremely significant.
The hypothesis testing of components in
The results of hypothesis testing of components in were listed in Table 1.

Download:

Table 1. t test statistics value of

.

https://doi.org/10.1371/journal.pone.0247722.t001

Among them, except x₁ was not significant to y₂ and y₃, the others were extremely significant.

4. The hypothesis testing of

The results are F₁ = 8.670**, F₂ = 25.394**, F₃ = 10.384**, and the test results were all extremely significant.

Except x₁ is not significant to y₂ and y₃, the above test results showed that the established multiple-to-multiple standardized linear regression equations were extremely significant. The path analysis and decision analysis can be performed subsequently.

Secondly, one-to-multiple path analysis of was conducted according to the theory before (Method, Part 2.3). The detailed division results of the correlation coefficient and the determination coefficient were listed in Table 2 and Table 3. The decision analysis was also conducted and the results were also listed in Table 3.

Download:

Table 2. The division results of the correlation coefficient about

.

https://doi.org/10.1371/journal.pone.0247722.t002

Download:

Table 3. The division results of determination coefficient

.

https://doi.org/10.1371/journal.pone.0247722.t003

The t test statistics values of decision coefficient hypothesis testing were listed in Table 4.

Download:

Table 4. t test statistics value of

.

https://doi.org/10.1371/journal.pone.0247722.t004

In one-to-multiple path analysis, the results of correlation coefficient division showed that the total effect of biomass per plant (x₁), single stem grass weight (x₂) and economic coefficient (x₃) are all positive and the largest to panicles per plant (y₁), grain number per pancicle (y₂), 1000-grain weight (y₃), respectively. Differently, the direct effect of x₁ to y₁ is the positive and the largest, while the indirect effect is negative and the smallest. The direct effect of x₂ to y₂, x₃ to y₃ are not the largest, but the total effect becomes the largest through the correlation regulation by the indirect effect. The results of the determination coefficients division and the decision coefficients showed that for y₁, x₁ is a very significant restrictive factor; for y₂, x₂ is a very significant positive factor and x₁ is a significant positive factor; for y₃, x₃ is a significant positive factor. These results meant that single stem grass weight (x₂) and economics coefficient (x₃) need to be increased in order to increase grain number per pancicle (y₂) and 1000-grainweight (y₃), but panicles per plant (y₁) will decrease according due to the negative correlation x₂, x₃ and y₁. Meanwhile, biomass per plant (x₁) should be decreased in order to increase the panicles per plant (y₁), but grain number per pancicle (y₂) will decrease here. The contradictory decision-making results of different independent variables (x_i) to different dependent variables (y_i) often lead to the confusion of breeders.

Therefore, after the one-to-multiple path analysis, the multiple-to-multiple path analysis was practiced by taking into account the correlation between the dependent variables. According to formula (17–19), the generalized determination coefficient R² was divided and the results were listed in Table 5.

Download:

Table 5. The division results about three other types paths of the generalized determination coefficients.

https://doi.org/10.1371/journal.pone.0247722.t005

The specific calculation of path vector structure is as follows: (23)

From the previous calculation, we can get tr(B) = 0.8671. The above division of the generalized coefficient of determination is reasonable according to R²≈tr(B). The decision analysis of the model was carried out continually. The decision coefficient of each independent variable to Y = (y₁,y₂,y₃)^T was calculated as follows: (24)

Similar available: R_y(2) = −0.1157, R_y(3) = 0.0906*. According to the decision coefficient, the t test about R_y(j) is further conducted, and the result is t₁ = −4.3943**, t₂ = 0.9293, t₃ = 2.0785**. In addition, it should be noted that the determination coefficients of x_j and x_j↔x_k to y_α have been calculated by one-to-multiple path analysis model (Table 3). The comparison of the results of Table 3 and those of Table 5 demonstrated that great changes have taken place in the regulation of x_j to Y when the correlation among dependent variables was considered. Firstly, the direct and indirect regulations of x_j, x_j↔x_k to Y also were greatly affected by the correlation among Y because of common X. As shown in Table 3, the direct determination of x₂ to y₁, y₂ were both positive, respectively ( y₂←x₂→y₂). But in Table 5, the direct determination of x₂ to y₁ and y₂ became negative (R₁₂₍₂₎ = -0.6636 y₁←x₂→y₂). This change was due to the consideration of the negative and large correlation of y₁ and y₂ . Similarly, the direct determination of x₂ to y₂ and y₃ was still changed (R₂₃₍₂₎ = 0.3741 y₂←x₂→y₃), compared to the previous determination coefficient . Different from the above, this change was small and both were positive. This phenomenon showed that the small correlation of y₂ and y₃ had little influence on the direct determination of x₂ to y₂ and y₃. The direct determination of x₃ to y₂ and y₃ was exactly like the direct determination of x₂ to y₂ and y₃. The indirect determination due to the correlation of x_j↔x_k also changed a lot because of consideration of the correlation among Y. For example, the indirect determination of x₁↔x₂ to y₁and y₃ was 0.5768(y₁←x₁↔x₂→y₃) and 0.3253(y₁←x₂↔x₁→y₃). It’s strange that the original indirect determination of x₁↔x₂to y₁, x₁↔x₂ to y₃ were -1.023 (y₁←x₁↔x₂→y₁) and 0.1833(y₃←x₁↔x₂→y₃), respectively. It is obvious that the strong negative correlation of y₁ and y₃ led to the change of indirect regulation. These big changes were enough to show the importance of considering the correlation among Y. There were similar changes in the direct determination of x₁↔x₂ to y₁and y₂(y₁←x₁↔x₂→y₂, y₁←x₂↔x₁→y₂) and x₂↔x₃ to y₁ and y₃(y₁←x₂↔x₃→y₃ y₁←x₃↔x₂→y₃). Secondly, the decision coefficients results showed that x₁ is the very significant restrictive decision factor of Y = (y₁,y₂,y₃)^T (R_y(1) = −0.3856**). But x₁ is the significant positive decision factor to y₂(R_y2(1) = 0.0420*) and is not significant to y₃(R_y3(1) = -0.0584). This phenomena seemed to be caused by the very significant negative decision making effect of x₁ to y₁, and strong negative correlation between y₁ and y₂ , y₁ and y₃ . For x₂, there is no point in making a decision. (R_y(2) = −0.1157). And x₃ became a significant positive decision factor to . However, x₃ is significant only to y₃ , and is not significant to y₁, y₂ in one-to-multiple path analysis. Obviously, the correlation among Y due to common X makes a big difference in the decision making. The results showed that the economic coefficient (x₃) should be increased, the biomass per plant (x₁) should be appropriately reduced and the single stem grass weight (x₂) should remain unchanged in the process of wheat breeding. These results were in accordance with the existing documents results [28]. In short, the consideration of the correlation among Y caused a big change of the direct determination, the indirect determination and the decision analysis results of x_j to Y. And the greater the correlation among Y is, the greater the impact on regulation.

4 Discussion

In this article, the multiple-to-multiple path analysis model was proposed based on multivariate linear regression analysis, which can be regarded as a generalization of one-to-multiple path analysis model based on univariate linear regression analysis. The innovation of this model is the multiple-to-multiple path analysis central theorem. The correlation among Y caused by common X was considered in the system analysis including multiple independent variables and multiple dependent variables. As Fig 2 shown, the other three types of paths (y_α←x_j→y_t, y_α←x_j↔x_k→y_t, y_α←x_k↔x_j→y_t) generated in multiple-to-multiple path analysis model besides the two types of paths (y_α←x_j→y_α, y_α←x_j↔x_k→y_α) in one-to-multiple path analysis. Along these five types of paths, the generalized determination coefficient R² was divided into the direct determination and the indirect determination according to R²≈tr(B). This division can clearly show the complex regulatory mechanisms among variables. Still further, the generalized decision coefficient R_y(j) was constructed by synthesizing all the items related to x_j, which was used to express the comprehensive decision-making ability of x_j to Y = (Y₁,Y₂,⋯,Y_p)^T. In fact, the direct and indirect determinations all were products of corresponding path coefficients. The quantitative expression of the regulation among variables is helpful for decision makers to make more reasonable and optimized decision suggestions for target variables. The analysis results of the wheat data in arid areas strongly confirm this. It is worth mentioning that the path analysis of any closed system can be made according to the multiple-to-multiple path analysis central theorem. However, the application of multiple-to-multiple path analysis model still has some limitations. Firstly, the model is only applicable to the causal relationship analysis among multiple dependent variables and independent variables with correlation. Secondly, the difference between the generalized determination R² and tr(B) is relatively large when the correlation among variables is very strong in multiple-to-multiple linear regression analysis, that is, the value of the correlation coefficient in correlation matrix is almost 1. Here, the division of the generalized determination coefficient R² based on R²≈tr(B) is very different from the actual result. Therefore, other division methods need to be further considered.

5 Conclusion

In the multiple-to-multiple path analysis model, the correlation among dependent variables caused by common independent variable is considered, besides the correlation among independent variables. Taking into account more correlation information analysis makes the results more practical and instructive.

References

1. Naes T, Romano R, Tomic O, et al. Sequential and orthogonalized PLS (SO-PLS) regression for path analysis: Order of blocks and relations between effects. J Chemom. 2020; e3243.
- View Article
- Google Scholar
2. Wright S. On the nature of size factors. Genetics. 1918; 3(4): 367–374. pmid:17245910
- View Article
- PubMed/NCBI
- Google Scholar
3. Wright S. Correlation and causation. J Agric Res. 1921; 20: 557–585.
- View Article
- Google Scholar
4. Bollen KA. Structural Equations with Latent Variables. NY: Wiley. 1989.
5. Duncan OD. Path analysis: sociological examples. Am J Sociol. 1966; 72(1): 1–16.
- View Article
- Google Scholar
6. Finney JM. Indirect effects in path analysis. Socio Meth Res. 1972; 1(2): 175–186.
- View Article
- Google Scholar
7. Greene VL. An algorithm for total and indirect causal effects. Polit Anal. 1977; 369–381.
- View Article
- Google Scholar
8. Berg PVD, Arentze T, Timmermans H. A path analysis of social networks, telecommunication and social activity-travel patterns. Transp Res Part C. Emerg Technol. 2013; 26: 256–268.
- View Article
- Google Scholar
9. Kang DH. An path analysis of the elderly’s deprivation experience on the thinking of suicide. J Soc Sci. 2019; 58: 197–245.
- View Article
- Google Scholar
10. Hwang HJ, Chun HY, Ok KH. The path analysis of parental divorce on children’s emotional and behavioural problems: Through child-rearing behaviours and children’s self-esteem. J Korean Home Econ Assoc. 2010; 48(7): 99–110.
- View Article
- Google Scholar
11. Diao ZJ, Chen B. Correlation and path coefficient analysis between thermal extraction yield and coal properties. Energy Sources Part A. Recovery Util Environ Eff. 2016; 38(22): 3412–3416.
- View Article
- Google Scholar
12. Cankaya S, Abaci SH. Path analysis for determination of relationships between some body measurements and live weight of German fawn x hair crossbred kids. Kafkas Univ Vet Fak Derg. 2012; 18 (5): 769–773.
- View Article
- Google Scholar
13. Norris D, Brown D, Moela AK, et al. Path coefficient and path analysis of body weight and biometric traits in indigenous goats. Indian J Anim Res. 2015; 49 (5): 573–578.
- View Article
- Google Scholar
14. Marjanović-Jeromela A, Marinković R, Mijić A, et al. Correlation and path analysis of quantitative traits in winter rapeseed (brassica napus l.). Agric Conspec Sci. 2008; 73: 13–18.
- View Article
- Google Scholar
15. Barbosa RP, Alcntara-Neto F, Gravina LM, et al. Early selection of sugarcane using path analysis. Genet Mol Res. 2017; 16(1): gmr16019038. pmid:28198498
- View Article
- PubMed/NCBI
- Google Scholar
16. Grace JB, Pugesek BH. On the use of path analysis and related procedures for the investigation of ecological problems. Am Nat. 1998; 152 (1):151–159. pmid:18811408
- View Article
- PubMed/NCBI
- Google Scholar
17. Kunanitthaworn N, Wongpakaran T, Wongpakaran N, et al. Factors associated with motivation in medical education: a path analysis. BMC Med Educ. 2018; 18: 140. pmid:29914462
- View Article
- PubMed/NCBI
- Google Scholar
18. Costello RM. Premorbid social competence construct generalizability across ethnic groups: Path analyses with two premorbid social competence components. J Consult Clin Psychol. 1978; 46(5): 1164–1165. pmid:701557
- View Article
- PubMed/NCBI
- Google Scholar
19. Jöreskog KG. Structural analysis of covariance and correlation matrices. Psychometrika. 1978; 43(4): 443–477.
- View Article
- Google Scholar
20. Graff J, Schmidt P. A general model for decomposition of effects. North–Holl Publ Co. 1982; 131–148. Netherlands.
21. Yuan ZF, Zhou JY, Guo MC, et al. Decision coefficients-decision indicators in path analysis. J Northwest A&F Univ (Nat Sci Ed). 2001; 29(5): 131–133. China.
- View Article
- Google Scholar
22. Xie XL, Yuan ZF. Statistical test of decision coefficient and its application in breeding. J Northwest A&F Univ (Nat Sci Ed). 2013; 41(3): 111–114.China.
- View Article
- Google Scholar
23. Mei Y, Guo W, Fan S, et al. Analysis of decision-making coefficients of the lint yield of upland cotton (Gossypium hirsutum L.). Euphytica. 2014; 196(1): 95–104.
- View Article
- Google Scholar
24. Du JL, Li ML, Yuan ZF, et al. A decision analysis model for KEGG pathway analysis. BMC Bioinform. 2016; 17(1): 407. pmid:27716040
- View Article
- PubMed/NCBI
- Google Scholar
25. Du JL, Yuan ZF, Ma ZW, et al. KEGG-PATH: Kyoto encyclopedia of genes and genomes-based pathway analysis using a path analysis model. Mol Biosyst. 2014; 10(9): 2441–2447. pmid:24994036
- View Article
- PubMed/NCBI
- Google Scholar
26. Xie XL, Du JL, Xie XZ, et al. Generalized complex correlation coefficient and its application in wheat breeding. J Triticeae Crop. 2017; 37(1): 87–93. China.
- View Article
- Google Scholar
27. Duleba AJ, Olive DL. Regression analysis and multivariate analysis. Semin Reprod Endocrinol. 1996; 14(2): 139–153. pmid:8796937
- View Article
- PubMed/NCBI
- Google Scholar
28. Zhang Z, Wang D. Wheat drought-resistant ecological breeding. Xi’an: Shaanxi People’s Education Press; 1992. China.

[ref1] 1. Naes T, Romano R, Tomic O, et al. Sequential and orthogonalized PLS (SO-PLS) regression for path analysis: Order of blocks and relations between effects. J Chemom. 2020; e3243.
View Article
Google Scholar

[2] View Article

[3] Google Scholar

[ref2] 2. Wright S. On the nature of size factors. Genetics. 1918; 3(4): 367–374. pmid:17245910
View Article
PubMed/NCBI
Google Scholar

[5] View Article

[6] PubMed/NCBI

[7] Google Scholar

[ref3] 3. Wright S. Correlation and causation. J Agric Res. 1921; 20: 557–585.
View Article
Google Scholar

[9] View Article

[10] Google Scholar

[ref4] 4. Bollen KA. Structural Equations with Latent Variables. NY: Wiley. 1989.

[ref5] 5. Duncan OD. Path analysis: sociological examples. Am J Sociol. 1966; 72(1): 1–16.
View Article
Google Scholar

[13] View Article

[14] Google Scholar

[ref6] 6. Finney JM. Indirect effects in path analysis. Socio Meth Res. 1972; 1(2): 175–186.
View Article
Google Scholar

[16] View Article

[17] Google Scholar

[ref7] 7. Greene VL. An algorithm for total and indirect causal effects. Polit Anal. 1977; 369–381.
View Article
Google Scholar

[19] View Article

[20] Google Scholar

[ref8] 8. Berg PVD, Arentze T, Timmermans H. A path analysis of social networks, telecommunication and social activity-travel patterns. Transp Res Part C. Emerg Technol. 2013; 26: 256–268.
View Article
Google Scholar

[22] View Article

[23] Google Scholar

[ref9] 9. Kang DH. An path analysis of the elderly’s deprivation experience on the thinking of suicide. J Soc Sci. 2019; 58: 197–245.
View Article
Google Scholar

[25] View Article

[26] Google Scholar

[ref10] 10. Hwang HJ, Chun HY, Ok KH. The path analysis of parental divorce on children’s emotional and behavioural problems: Through child-rearing behaviours and children’s self-esteem. J Korean Home Econ Assoc. 2010; 48(7): 99–110.
View Article
Google Scholar

[28] View Article

[29] Google Scholar

[ref11] 11. Diao ZJ, Chen B. Correlation and path coefficient analysis between thermal extraction yield and coal properties. Energy Sources Part A. Recovery Util Environ Eff. 2016; 38(22): 3412–3416.
View Article
Google Scholar

[31] View Article

[32] Google Scholar

[ref12] 12. Cankaya S, Abaci SH. Path analysis for determination of relationships between some body measurements and live weight of German fawn x hair crossbred kids. Kafkas Univ Vet Fak Derg. 2012; 18 (5): 769–773.
View Article
Google Scholar

[34] View Article

[35] Google Scholar

[ref13] 13. Norris D, Brown D, Moela AK, et al. Path coefficient and path analysis of body weight and biometric traits in indigenous goats. Indian J Anim Res. 2015; 49 (5): 573–578.
View Article
Google Scholar

[37] View Article

[38] Google Scholar

[ref14] 14. Marjanović-Jeromela A, Marinković R, Mijić A, et al. Correlation and path analysis of quantitative traits in winter rapeseed (brassica napus l.). Agric Conspec Sci. 2008; 73: 13–18.
View Article
Google Scholar

[40] View Article

[41] Google Scholar

[ref15] 15. Barbosa RP, Alcntara-Neto F, Gravina LM, et al. Early selection of sugarcane using path analysis. Genet Mol Res. 2017; 16(1): gmr16019038. pmid:28198498
View Article
PubMed/NCBI
Google Scholar

[43] View Article

[44] PubMed/NCBI

[45] Google Scholar

[ref16] 16. Grace JB, Pugesek BH. On the use of path analysis and related procedures for the investigation of ecological problems. Am Nat. 1998; 152 (1):151–159. pmid:18811408
View Article
PubMed/NCBI
Google Scholar

[47] View Article

[48] PubMed/NCBI

[49] Google Scholar

[ref17] 17. Kunanitthaworn N, Wongpakaran T, Wongpakaran N, et al. Factors associated with motivation in medical education: a path analysis. BMC Med Educ. 2018; 18: 140. pmid:29914462
View Article
PubMed/NCBI
Google Scholar

[51] View Article

[52] PubMed/NCBI

[53] Google Scholar

[ref18] 18. Costello RM. Premorbid social competence construct generalizability across ethnic groups: Path analyses with two premorbid social competence components. J Consult Clin Psychol. 1978; 46(5): 1164–1165. pmid:701557
View Article
PubMed/NCBI
Google Scholar

[55] View Article

[56] PubMed/NCBI

[57] Google Scholar

[ref19] 19. Jöreskog KG. Structural analysis of covariance and correlation matrices. Psychometrika. 1978; 43(4): 443–477.
View Article
Google Scholar

[59] View Article

[60] Google Scholar

[ref20] 20. Graff J, Schmidt P. A general model for decomposition of effects. North–Holl Publ Co. 1982; 131–148. Netherlands.

[ref21] 21. Yuan ZF, Zhou JY, Guo MC, et al. Decision coefficients-decision indicators in path analysis. J Northwest A&F Univ (Nat Sci Ed). 2001; 29(5): 131–133. China.
View Article
Google Scholar

[63] View Article

[64] Google Scholar

[ref22] 22. Xie XL, Yuan ZF. Statistical test of decision coefficient and its application in breeding. J Northwest A&F Univ (Nat Sci Ed). 2013; 41(3): 111–114.China.
View Article
Google Scholar

[66] View Article

[67] Google Scholar

[ref23] 23. Mei Y, Guo W, Fan S, et al. Analysis of decision-making coefficients of the lint yield of upland cotton (Gossypium hirsutum L.). Euphytica. 2014; 196(1): 95–104.
View Article
Google Scholar

[69] View Article

[70] Google Scholar

[ref24] 24. Du JL, Li ML, Yuan ZF, et al. A decision analysis model for KEGG pathway analysis. BMC Bioinform. 2016; 17(1): 407. pmid:27716040
View Article
PubMed/NCBI
Google Scholar

[72] View Article

[73] PubMed/NCBI

[74] Google Scholar

[ref25] 25. Du JL, Yuan ZF, Ma ZW, et al. KEGG-PATH: Kyoto encyclopedia of genes and genomes-based pathway analysis using a path analysis model. Mol Biosyst. 2014; 10(9): 2441–2447. pmid:24994036
View Article
PubMed/NCBI
Google Scholar

[76] View Article

[77] PubMed/NCBI

[78] Google Scholar

[ref26] 26. Xie XL, Du JL, Xie XZ, et al. Generalized complex correlation coefficient and its application in wheat breeding. J Triticeae Crop. 2017; 37(1): 87–93. China.
View Article
Google Scholar

[80] View Article

[81] Google Scholar

[ref27] 27. Duleba AJ, Olive DL. Regression analysis and multivariate analysis. Semin Reprod Endocrinol. 1996; 14(2): 139–153. pmid:8796937
View Article
PubMed/NCBI
Google Scholar

[83] View Article

[84] PubMed/NCBI

[85] Google Scholar

[ref28] 28. Zhang Z, Wang D. Wheat drought-resistant ecological breeding. Xi’an: Shaanxi People’s Education Press; 1992. China.

Figures

Abstract

1 Introduction

2 Method

2.1 Equations and models

2.2 Regression hypothesis testing

2.2.1 Hypothesis testing of generalized complex correlation coefficient rxy.

2.2.2 Hypothesis testing of regression equation .

2.2.3 Hypothesis testing of components in .

2.2.4 Hypothesis testing of [27].

2.3 Path analysis of

2.3.1 The division and path of .

2.3.3 The decision coefficient Rα(j) and hypothesis test [22].

2.4 Multiple-to-multiple path analysis central theorem

2.5 The division of R2≈tr(B) and its corresponding path

2.6 The generalized decision coefficient Ry(j)

2.6.1 The definition of Ry(j).

2.6.2 The hypothesis testing of Ry(j).

3 Application

3.1 Datasets

3.2 Calculation and results

4 Discussion

5 Conclusion

References

2.2.1 Hypothesis testing of generalized complex correlation coefficient r_xy.

2.3.3 The decision coefficient R_α(j) and hypothesis test [22].

2.5 The division of R²≈tr(B) and its corresponding path

2.6 The generalized decision coefficient R_y(j)

2.6.1 The definition of R_y(j).

2.6.2 The hypothesis testing of R_y(j).