A flexible method for aggregation of prior statistical findings

Rapid growth in scientific output requires methods for quantitative synthesis of prior research, yet current meta-analysis methods limit aggregation to studies with similar designs. Here we describe and validate Generalized Model Aggregation (GMA), which allows researchers to combine prior estimated models of a phenomenon into a quantitative meta-model, while imposing few restrictions on the structure of prior models or on the meta-model. In an empirical validation, building on 27 published equations from 16 studies, GMA provides a predictive equation for Basal Metabolic Rate that outperforms existing models, identifies novel nonlinearities, and estimates biases in various measurement methods. Additional numerical examples demonstrate the ability of GMA to obtain unbiased estimates from potentially mis-specified prior studies. Thus, in various domains, GMA can leverage previous findings to compare alternative theories, advance new models, and assess the reliability of prior studies, extending meta-analysis toolbox to many new problems.


General overview
The codes in this document are developed using MATLAB (a matrix-based and proprietary language, available at http://www.mathworks.com). The codes are written in 14 '.m' files and included in a zipped file (supplementary S3 File), attached to this electronic companion.

Organization of the codes
The 14 .'m' files are divided into two general groups: user-specified and GMA-specific files. The user-specified files should be filled out based on information of your aggregation project; however, the GMA-specific files remain the same. Table 2 and Table 3 present user-specified and GMAspecific files, respectively, and provide general information about their action, inputs, and outputs.

Example embedded in the current codes
The current codes present Scenario 1 in the paper, where the "true" data generating process is (the coefficient of the three explanatory variables and the standard deviation of the error term are assumed to be one). Regression results from three "prior" studies, all assumed to be linear regressions, are considered (see three User_model files). Each imitated prior study uses a constant and two of the three explanatory variables. Hence, the constant, two coefficient of the explanatory variables, and standard deviation of the error term provide four empirical signatures for each model (therefore, there are totally 3*4=12 signatures in this example).

How to modify the codes for your research project
To modify the codes, you should complete the User-specified files. The major work here is to complete the files for two categories in Table 2: 'Replication of prior models' and 'Generating simulated data for explanatory and response variables'. See the paper for more information. Lower bound for Beta used by the optimization solver, if needed be (e.g., for a parameter that is always positive such as standard deviation) Opt_method

II. Notations in MATLAB codes
Optimization method, e.g., global search (GS) or multi-start (MS) search. See MATLAB help for more information. Opt_MS_inst The number of instances of start points, used 'only' if multi-start (MS) optimization search is selected. Opt_n The number of iterations to estimate W and then run the optimization with the new W Opt_Tolrnc Tolerance (threshold), a stopping criterion for the optimization solver Opt_ub Upper bound for Beta, if needed be, used by the optimization solver Q Covariance matrix of the estimated Beta r The number of iterations used only for bootstrapping-if no bootstrap is used, r should be 1. See the paper for more information on bootstrapping.

III. MATLAB functions
The files in Table 2 should be completed by the user based on the aggregation research project. The files in Table 3, however, are GMA-specific and remain the same for any project.  GMA_ObjFn.m Calls for simulated signature based on the current Beta and W, and then provides the value of objective function to pass into GMA_Optimization.m.

Beta, W ObjFnVal
GMA_W_star.m Calculates the covariance matrix of the simulated signatures (SS) and the weighting matrix (W)-the inverse of SS is W.

Estimation of the variations of the estimated Beta
GMA_Var_EstimatedVar.m Estimates the covariance matrix of Beta (Q).
Calls on GMA_Var_Delta.m to estimate numerical derivative (i.e., changes in simulated signatures with respect to changes in Beta), and then estimates Q.

GMA_Var_Delta.m
Estimates the sensitivity of the simulated signatures to the perturbations in Beta.

B, ConfLvl, Std_Beta ConfDown, ConfUp
For more infromation about the aggregation example presented in the codes, see Section 'Example embedded in the current codes'.
Codes are published in Microsoft Word format via MATLAB® R2015b publisher. Error term in the meta model rng(s+r*1000) % to control the random generation process (i.e., fix the noise seed % based on s (number of simulations) and r (number of replications for bootstrapping).