A parametric framework for multidimensional linear measurement error regression

The ordinary linear regression method is limited to bivariate data because it is based on the Cartesian representation y = f(x). Using the chain rule, we transform the method to the parametric representation (x(t), y(t)) and obtain a linear regression framework in which the weighted average is used as a parameter for a multivariate linear relation for a set of linearly related variable vectors (LRVVs). We confirm the proposed approach by a Monte Carlo simulation, where the minimum coefficient of variation for error (CVE) provides the optimal weights when forming a weighted average of LRVVs. Then, we describe a parametric linear regression (PLR) algorithm in which the Moore-Penrose pseudoinverse is used to estimate measurement error regression (MER) parameters individually for the given variable vectors. We demonstrate that MER parameters from the PLR and nonlinear ODRPACK methods are quite similar for a wide range of reliability ratios, but ODRPACK is formulated only for bivariate data. We identify scale invariant quantities for the PLR and weighted orthogonal regression (WOR) methods and their correspondences with the partitioned residual effects between the variable vectors. Thus, the specification of an error model for the data is essential for MER and we discuss the use of Monte Carlo methods for estimating the distributions and confidence intervals for MER slope and correlation coefficient. We distinguish between elementary covariance for the y = f(x) representation and covariance vector for the (x(t), y(t)) representation. We also discuss the multivariate generalization of the Pearson correlation as the contraction between Cartesian polyad alignment tensors for the LRVVs and weighted average. Finally, we demonstrate the use of multidimensional PLR in estimating the MER parameters for replicate RNA-Seq data and quadratic regression for estimating the parameters of the conical dispersion of read count data about the MER line.

The Introduction already states that the objective of this work is the investigation of linear "measurement error regression (MER) methods for multivariate data" (line 26). In this work, I show that fitting a multidimensional line to LRVV data is an important pedagogical problem for establishing the principles of multivariate data analysis. This includes the formulation of slope, covariance and correlation in a more general way for the parametric representation.

1
• The OLR method requires that data for the x variable be error free, a condition that is rarely obtained in experimental data. Therefore, the OLR algorithm is applicable to a limited class of regression analysis problems.
Then, a statistics student might conclude that the textbook treatment of linear regression is incomplete. The parametric linear regression (PLR) framework is more general because it takes into account the experimental error in all variables, and includes OLR as a special case. • Line 342. Remark on Pearson correlation tensors and two-way, three-way, ..., m-way weighted averages, etc.
• Line 423. Section 1.7 in the original manuscript has been moved to the 'Data analysis and results' section.
• The Discussion has been revised.

Response to reviewers
My responses to the reviewers comments are highlighted in blue.

Reviewer #1
In this paper, the author proposed "The chain rule, measurement error regression and RNA-Seq analysis".
2) Justify the novelty of the proposed approach?
In this work, I discuss the fact that statistical measures of linear dependence, including covariance, correlation coefficient and regression slope are subject to the chain rule. This is a novel idea with broad implications for multivariate statistics and data science. The paper already includes references to many important MER papers that serve as a starting point for assessing the 'novelty' of this work. I find that the statistical quantities and expressions that have not been previously reported in the scientific literature include Eq 6,7,13,16,17,22,23,25,27,32,33,35,37,and 43.

3) Proofread the article once again?
Done.
4) The experimental validation of the proposed approach is quite confusing justify it.
• 'Experimental validation' is not required for the OLR and Moore-Penrose pseudo-inverse algorithms because they are constructed from algebraic least-squares principles (Noble 1977;Keener 1988;Boyd 2018).
• PLR is algebraically constructed by extending the standard least-squares regression framework with the chain rule to transform to the parametric representation. Therefore, the validation is determined a priori by 4 whether the PLR framework is consistent with underlying least-squares theory for overdetermined systems of equations. Therefore, the PLR algorithm does not require an 'experimental validation'.
• However, our discussion of PLR includes graphs that serve as visual illustrations of statistical concepts to make the paper more accessible. The

Reviewer #2
The manuscript reading looks more like a technical report (heavy on the methodological description).
See my response to question #0.1-4 above. This paper describes novel PLR methodology which has broad applications in data analytics. The mathematical emphasis is intentional because the goal is to identify principles for multivariate data analysis.
To attract strong interest, it is highly recommended to revise the manuscript to focus on the major objects and findings in a concise manner.
The major 'objects and findings' are already summarized in the Introduction, see my response to question #0.1-1 above. See lines 26, 51, 75, 164, and 202 for statements about the objectives of this work.
I have also some comments which are given below.
1. Title: Consider modifying it to be more specific. The wording of this current title ("The chain rule, measurement error regression and RNA-Seq analysis") is somewhat imprecise.
The paper has a new title.
2. In page no 3 and 4 author mentioned seven novel contributions of his work but they are not clearly established in the whole paper.
3. Recommendation to revise the "data analysis and results" part in details.
The "data analysis and results" section has been revised.
4. The discussion part (page no. 23 and 24) does not clearly state the point of view of this paper. The revision is needed for this part.
The Discussion has been revised.
5. I recommend that the author should revise the Fig. 1, Fig. 5  This 'response to reviewers' document will be published as-is if the paper is accepted for publication.