Statistical Inference for Valued-Edge Networks: The Generalized Exponential Random Graph Model

Across the sciences, the statistical analysis of networks is central to the production of knowledge on relational phenomena. Because of their ability to model the structural generation of networks based on both endogenous and exogenous factors, exponential random graph models are a ubiquitous means of analysis. However, they are limited by an inability to model networks with valued edges. We address this problem by introducing a class of generalized exponential random graph models capable of modeling networks whose edges have continuous values (bounded or unbounded), thus greatly expanding the scope of networks applied researchers can subject to statistical analysis.


Introduction
The need to analyze networks statistically transcends disciplines that have occasion to study the relationships between units. Applications in the medical sciences [1][2][3], physics [4][5][6][7][8], computer science [9,10], mathematics [11][12][13], the social sciences [14][15][16], and other fields examine networks that vary in size and density, over time, and have edges with values that vary from binary ties, to counts, to bounded continuous and unbounded continuous edges. An important method for statistical inference on networks is the exponential random graph model (ERGM) [17][18][19], which estimates the probability of an observed network conditional on a vector of network statistics that capture the generative structures in the network. Yet the ERGM has a major limitation: it is only defined for networks with binary ties [20,21], thus excluding a wide range of networks with valued edges (e.g., genetic networks [22] and correlation networks [23]). We develop a class of generalized ERGMs (GERGMs) for inference on networks with continuous edge values, thus lifting the restriction of this methodology to a, possibly small, subset of networks. The form of our generalized model is similar to the ERGM in that it can be flexibly specified to cover a broad range of generative features, and our model can be estimated efficiently with a Gibbs sampler. The strengths and limitations of the ERGM are apparent from its functional form. Let Y be the n-vertex network (adjacency matrix) of interest with m edges (m~n(n{1) if Y is directed and n(n{1)=2 if it is undirected). Y ij is the edge from i to j. An ERGM of the network Y is specified as: where h is a parameter vector, h(Y ) is a vector of statistics computed on the network, and the object of inference is the probability of the observed network among all possible permutations of the network given the network statistics. The h(Y ) term is what gives the ERGM much of its power: this vector can contain statistics to capture the interdependence structure of connectivity in the network -statistics can be included to capture reciprocity, transitivity, cyclicality, and a wide variety of other endogenous structures -as well as the effects of exogenous covariates [24]. The challenges for modeling networks with valued edges are apparent from the specification in equation 1. The flexibility of the ERG distribution comes from the lack of constraints in specifying h; the only constraint is that h is finite when evaluated on any binary network. This assures that the denominator is a convergent sum, and therefore represents a proper normalizing constant for the distribution of networks. However, this convergence is not assured whenever h is finite if the support of Y is infinite, as it is with any network with continuous-valued edges. The model we derive retains the flexibility of h within a framework that assures a proper probability distribution for Y when Y has continuous edges.

Methods
The major strength of the ERGM is that the vector of statistics on the network, h, can be specified to represent many forms of dependence among the elements of Y , including transitivity (i.e., clustering), popularity, and reciprocity. Because these same dependence features characterize valued networks [20,21] and can be of theoretical import [15], we seek a generalization of the ERGM that maintains the flexibility of the set of network statistics, h, while moving away from the limitations inherent in the denominator of the ERGM. We see the analytic challenge of defining an ERGM-like model for valued networks as a three-part problem: deriving a distributional family that is (1) guaranteed to have a convergent normalizing constant, (2) incorporates dependence functions into the distribution as flexibly as does the ERGM, and (3) is easily adapted to accommodate a variety of edge types (e.g., bounded, unbounded, strictly non-negative). In this section, we introduce a method of constructing joint continuous distributions on networks that permit the representation of dependence features among the elements of Y through a set of statistics on the network, h(Y ). This generalized exponential random graph model (GERGM) can be used when edges are continuous and unbounded, bounded from above, bounded from below, or bounded above and below; thus greatly increasing the scope of networks it can analyze compared to the ERGM.

The Generalized ERGM (GERGM)
There are two specification steps in our approach to generalized ERGMs (GERGMs): first, we specify a tractable joint distribution that captures the dependencies of interest on a restricted network, X , and then we transform X onto the support of Y . In so doing, we produce a probability model for Y . To illustrate these steps, begin with consideration of the restricted valued network X , which has the same vertices as Y , but edge values that are continuous and bounded between zero and one (X [½0,1 m ).
Our first specification step involves defining a set of network statistics, h, to capture endogenous effects and exogenous covariates, and defining a probability distribution for the restricted valued network X . We define a probability distribution for X by adapting the ERGM formula presented in equation 1 to address a ½0,1 bounded network and assure a convergent sum in the denominator: In equation 2, h[R p remains the parameter vector and h: ½0,1 m ?R p , is formulated to represent joint features of Y in the distribution of X . The statistics h are guaranteed to be finite on ½0,1 m and each h i ( : ) is a statistic that captures the generative structure of the network by summing over subgraph products such that for every i, L 2 h(X ) . This is a flexible specification because many dependence relationships can be captured by summing products over subgraphs of the network, particularly when the edges are in the unit interval [21]. For instance, networks generated by a highly reciprocal process are likely to exhibit high values of P ivj X ij X ji , and those in which connections gravitate toward high-degree vertices exhibit high values of P i P j,k=i X ji X ki (i.e., ''two-stars,'' [25]).
An important property of the distribution we have specified for the restricted valued network, X , is that when there are no dependencies in the network, f X is an appropriate model for independent uniform random variables. That is to say, if we have correctly specified the set of network statistics and h~0, then X has no dependencies. Since f X is the joint distribution of the quantiles of Y , and a joint uniform distribution is the joint distribution of the quantiles of independent random variables [26], h~0 implies independence among the edges in Y . This is convenient because it implies that there need not be any dependencies in the network to use the GERGM.
In our second specification step, we transform the restricted valued network X onto the support of the network of interest Y . We do so by applying parameterized, one-to-one, monotone increasing transformations, which we denote G {1 ( : ), to the m edges of the restricted network. Specifically, we specify Y as where l ij parameterizes the transformation to capture marginal features of Y ij . Equation 3 shows that we can define each edge, ij, in the network of interest (Y ) as a parameterized transformation of the same ij edge in the restricted network X . An interesting case of transforming X is when the edges of Y are bounded from below at a and above at b. In this case, the transformation Y ijã zX ij (b{a) is a natural choice. This illustrates that the GERGM can be used to model networks of correlation coefficients, which have been of great interest recently [27][28][29].
Given this transformation of the restricted network, we derive a specification for the GERGM that allows us to keep the basic structure and strength of the ERGM: the h vector is now specified on a transformation of the network rather than the network in its observed form, but it maintains all the flexibility that makes the ERGM powerful. Because dG {1 (X ij ,l i )=dX ij w0, the properties of multivariate transformations [30] imply that the distribution of Y is f Y (Y ,h,L)~f X (G(Y ,L),h)jJj, where the Jacobian matrix, J, is the matrix of first partial derivatives. Since J is a diagonal matrix, we may write the GERGM as where the model parameters h and the transformation parameters L must both be estimated. An elegant feature of this formulation is that it may be specified to reduce to well known regression models for independent data when the network is free of dependencies. Specifically, we may specify g as a probability density function (i.e., G is a CDF, and G {1 an inverse CDF) parameterized to match the support of Y and capture features of Y such as location, scale, and dependence on covariates. When g is specified as such, the distribution for Y contains many common models for independent and identically distributed variables as special cases when h~0. For instance, if g is a Gaussian PDF with constant variance and the mean dependent on a vector of covariates, the model reduces to that assumed in linear regression. This is a useful feature of the model because researchers may doubt the role of network dependencies in their data, but be uncomfortable applying a model that assumes no dependencies and is incapable of modeling them (e.g., regression). In such a case, the researcher may apply a GERGM and, if there are no dependencies, the parameters h that capture network dependencies will be zero and the parameters returned for exogenous covariates will be identical to those a regression would have produced.
A further feature of the GERGM for researchers unsure of whether to include some subset of their effects, be they endogenous dependencies or exogenous covariates, is that the GERGM allows hypothesis tests for block restrictions. As such, a researcher may apply tests, such as the likelihood ratio or Wald tests, to test the assumption that the edges of Y are independent conditional upon L.
The specification of dependencies in a quantile network is standard across different edge-types, because the support of the joint quantiles is always a unit hypercube. However, the specification of g will vary substantially based upon the marginal characteristics of Y . A few general features to consider when selecting g are (1) the support of Y , (2) the notable characteristics of the moments of Y , and (3) the dependence of Y upon covariate information. It is advisable to select g such that the support of g is equal to the possible values that could be observed for Y . For instance, if the edge values are strictly positive (e.g., monetary exchange), a Weibull distribution would be a feasible choice. Once a class of g's with appropriate support is identified, it is then important to consider other relevant marginal features of Y -such as skewness, kurtosis, or multimodality -and be sure to choose a g that is flexible enough to represent those marginal features. Lastly, it might be the case that marginal characteristics of Y vary based on some covariate information. It is important to parameterize g such that these dependencies can be accurately represented. One beneficial feature of our two-stage derivation of the GERGM is that the extensive literature on fitting flexible parametric models to independent observations can inform choices for g (e.g., [31]).
It is also important to note that inferences about network dependencies will depend upon the specification of g. The network dependencies are estimated on the joint quantiles with respect to g. Thus, changing g alters the joint quantiles of Y with respect to g and effectively changes the network within which the dependencies are estimated. In this sense, we do not expect that inferences with respect to h will be robust to substantially different choices of g. It is therefore important to consider and compare feasible alternatives for g. Typically, evaluating the robustness of a particular model to alternative specifications of g will not be especially difficult because nested alternatives can be compared using Wald tests on the parameter restrictions. Simulation based model-fit metrics, such as those computed in our application below, could also be used to compare alternative formulations of g. An important topic for future research would address model comparison and selection within the GERGM framework.
Interpretation of the GERGM coefficients is relatively straight forward and we give an extensive example when we present our application. We note here however that, when g is a PDF, X is the random variable drawn from the joint distribution of the quantiles of Y . Therefore, the vectors h and h characterize the dependencies among the quantiles of Y . In this way, our method closely resembles the process of constructing joint distributions with copula functions [26]. To illustrate the process of specifying a GERGM, it is useful to consider a generic small-scale model. A simple example of deriving a joint distribution through the combination of h and g is illustrated in Figure 1, which presents the distributions of X and Y for a directed network with two vertices exhibiting a high degree of reciprocity.

Alternative Formulations
Our approach to the generalized ERGM is not the only means by which the ERGM can be extended to model valued-edge networks, though we believe it is a particularly flexible one. Krivitsky [32] has proposed an alternative framework for such an extension, which takes a substantially different approach to the problem than we do. As noted above, one of the major challenges to deriving an ERGM for a network with infinite support is that of assuring that the sum or integral over the probability mass or density function is convergent. We assure this by defining the exponential family graphical model on the restricted quantile network. This permits free reign in the specification of dependence functions h. The only requirement is that the functions be finitevalued. The approach to assuring a convergent sum/integral, and thus a proper probability distribution, taken by Krivitsky [32] is more flexible than ours, yet imposes more constraints on the definition of h. The extension of the ERGM proposed by Krivitsky [32] is given by where g maps h to canonical parameters and R is a 'reference measure' that assures ð For a given reference measure, h must be carefully specified so as to be dominated by R.
It is not apparent that either approach is globally preferable. Our approach permits substantially greater flexibility in specifying h, since there is no need to check for convergence given a particular specification of h. However, we restrict the specification of dependence to occur within the joint quantile network. Indeed, we view the necessity that the dependencies be estimated in the joint quantile network as the primary limitation of our formulation of the GERGM. The class of models proposed by Krivitsky [32], in contrast, permits dependence to be represented in the natural support of Y . However, our framework offers a more direct relationship between the GERGM and common independence models than that proposed by Krivitsky [32]. For instance, in the Poisson ERGM proposed by Krivitsky [32], independence among the edges in the network does not assure that the edges are marginally Poisson distributed. In our formulation of the GERGM, when the edges are independent, the model is guaranteed to reduce to the marginal model used to specify g. Ultimately however, which model is more appropriate will depend on the particular application.

Estimation
Estimation of the parameters in the model is a non-trivial task. The greatest challenge in estimating h and L in equation 4 is that the integral in the denominator is typically intractable. Because of the polynomial structure of h, and the fact that the variables of integration are bounded, we know that the integral is both positive and finite, meaning f Y is a proper joint distribution. However, inference requires the approximation of the denominator. We develop a Markov chain Monte Carlo maximum likelihood estimation (MCMC-MLE) [33] method for estimating the parameters.
In order to approximate the denominator in equation 4, we sample from f X using a Gibbs Sampler. To do so, we require the conditional distribution of X ij jX {ij . To simplify the notation, let We may then draw from the conditional distribution in equation 6 using the inverse CDF method. If u is a uniform (0,1) random variable, then When h 0 Lh(X ) LX ij~0 the conditional density given in equation 6 is undefined. However, in this case, each point in the unit interval is equally likely and the conditional distribution of X ij is uniform (0,1). In order to estimate h and L, we maximize ln f Y ½ : Our algorithm iteratively proceeds by maximum likelihood estimation of Ljh and MCMC-MLE of hjL until convergence. We derive an approximation to the asymptotic variancecovariance matrix by the inverse of the negative Hessian matrix at the last iteration. Consider first the maximum likelihood estimation of Ljh. Because C(h) does not depend on L, maximum likelihood estimation of Ljh reduces to a function easy to maximize using a hill-climbing algorithm.
The estimation of hjL is more involved. LetX X~G(Y ,L L) be the estimate of the restricted (quantile) network given the current estimate of the transformation parameters. The second term in equation 8 does not depend on h, so to estimate hjL we find which requires an approximation of C(h). We approximate C(h) using MCMC-MLE; an iterative method itself. Let h ½i{1 be the previous estimate of h, andX X be a sample of n networks drawn from f X (X ,h ½i{1 ). Then, an approximation to C(h) is given by This requires a starting value for h. In simulation experiments, we have found the pseudolikelihood estimate (arg max h P ij ln f c X (X ij jh) Â Ã ) to be effective in providing starting values for h (i.e., h ½0 ). Pseudocode for the algorithm is given in Figure 2.

Challenges in Estimation and Specification
The joint distribution f X in equation 2 is a linear exponential family multivariate distribution in that ln f X is proportional to a linear combination of the parameters h and sufficient statistics h [34]. Focusing specifically on ERGMs, there is a burgeoning literature on obstacles to specification and approximate maximum likelihood estimation with multivariate discrete exponential family distributions [35][36][37]. There are two related problems that have motivated this literature: (1) the existence and uniqueness of of the MCMC-MLE, and (2) the degeneracy of the ERG distribution. To estimate the model by MCMC-MLE, we maximize the approximate likelihood function with respect to h, such that a sample of networksX X is used to approximate the likelihood function. The sample is drawn from a distribution parameterized with the same network statistics h and a previous estimate or starting value for the parameter h 0 . The performance of this optimization method depends heavily upon the sampleX X, and thus upon h 0 . Specifically, a value of h that maximizes the approximated likelihood exists and is unique if and only if the values of the network statistics computed on the observed network (i.e., X X ) are within the p-dimensional convex hull of the network statistics computed on the sample of networks. In application, this requires thatX X be drawn from a distribution that generates networks similar to X X . Heuristically, we would expect that setting h ½0 close to the true maximizer of the likelihood function would be sufficient. However, this is not the case, which brings us to the second challenge.
The problem of degeneracy in discrete exponential families adds substantial complication to the specification, estimation and simulation of ERG distributions. Discrete ERG distributions that are degenerate tend, in Markov Chain simulation, toward either the completely full graph in which all edges are at their maximum value or the completely empty graph in which all edges are at their minimum value [36]. This means that either extremely dense or extremely sparse networks have high probability in a degenerate ERG distribution. This creates two complications in application.
First, degenerate ERGMs are poor models for most empirically observed networks, meaning that it is generally unacceptable to arrive at a degenerate ERGM in training a model for an observed network [36]. Second, degeneracy of the approximating distribution in the iterations of MCMC-MLE can cause the convex hull of the statistics computed on the sample of approximating networks to be far from the statistics computed on the observed network, causing the algorithm to break down [36]. Adding to the challenges posed by degeneracy, for a given model and network size, there may be only a very small and nonlinear region in the parameter space that leads to non-degenerate ERG distributions [37], which complicates the selection of starting values and the iterative search of the parameter space.
There are two complimentary approaches to combating the problem of degeneracy in ERGMs: using specifications that are less prone to degeneracy and checking a given estimated model for degeneracy. First, the degree to which a particular ERGM is prone to degeneracy depends substantially on the specification of the model [37]. Classic ERGM specifications used counts of subgraphs that measure local dependence structures as network statistics (h). For example, to measure transitivity (i.e., whether a friend of a friend is a friend), classically specified ERGMs used counts of the number of triangles in the network. Classically specified ERGMs are known as Markov Graphs [38]. To minimize degeneracy problems, Snijders, Pattison, Robins and Handcock [39] proposed a set of specifications of the ERGM that are substantially less prone to degeneracy than Markov Graphs. This is a useful approach to the problem because use of these specifications reduces the probability that model selection/ specification will be complicated by degeneracy.
Second, one can directly check whether a given ERGM is degenerate. This is accomplished in a straightforward manner by simulating a large number of networks using MCMC and checking whether (a) the simulated network statistics are similar to the observed values and (b) whether the Markov Chain is tending toward the full or empty graph [40]. This is a powerful approach to diagnosing degeneracy because it can be applied to any ERGM specification. Indeed, regardless of the specification used, it is important to diagnose whether an estimated model is degenerate because even degeneracy-resistant specifications do not guarantee non-degeneracy.
Because the GERGM is based on a continuous exponential family and is applicable to a wide array of edge types, it is not clear that the statistics proposed by Snijders, Pattison, Robins and Handcock [39] can be easily adapted to the GERGM framework. Thus, though outside of the scope of the current research, future work should focus on developing specifications of the GERGM that are resistant to degeneracy.
Fortunately, however, it is straightforward to apply the same MCMC methods used in estimating the model to determine whether a particular GERGM is degenerate. We take a twopronged approach to checking for degeneracy. First, we check whether the average edge value in the simulated networks is closer to zero or one than to the mean of the network used to estimate the model. This can be accomplished through the use of trace plots (a line-plot of connecting mean edge values over many iterations of the chain) and/or running mean plots (a plot to examine the stability of the mean edge value over a large number of iterations of the chain); though trace plots may be better suited to this purpose than running mean plots because they show every mean value. Second, once we are satisfied that the means in the simulations are far from degenerate values, we use standard MCMC diagnostic tools to test for non-convergence of the Markov chain. The Geweke and Gelman-Rubin diagnostics lend themselves particularly well to this purpose. As with all convergence diagnostics, the Geweke and Gelman-Rubin tests are tests of non-convergence that assume the convergence of the chain as the null hypothesis; accordingly satisfying these diagnostics does not assure convergence, but provides the best indication of convergence possible given that analytical proofs of convergence are not possible. The Geweke diagnostic [41] is a time-series diagnostic based on a comparison of two non-overlapping windows of the Markov chain, one earlier in the series and one later. The Geweke diagnostic is specified as where w 1 and w 2 are non-overlapping subsets of the Markov chain of length n 1 and n 2 respectively, the g g() function is typically the mean, and s 1 (0) and s 2 (0) are the symmetric spectral density functions [42]. Because the Geweke diagnostic follows a standard normal distribution, one typically takes values greater in absolute value than 2 to indicate non-convergence.
The Gelman-Rubin diagnostic [43] examines the convergence of multiple Markov chains begun from several overdispersed starting points by estimating the factor by which the distribution of parameter w, at any point in the Markov chain, is expected to shrink under continued simulation. For mw2 Markov chains of length n, the within and between chain variances are respectively where w w (j) indicates the mean for the jth chain, and w w w w indicates the grand mean [42]. The total variance may then be calculated as d v(w) v(w)~(1{1=n)W z(1=n)B and the shrink factor is computed where values departing significantly from 1 indicate nonconvergence [42,44].
If we can satisfy ourselves that the running mean of network edge values is non-degenerate and that the Markov chains have converged, we will have satisfied the strongest possible criteria for claiming non-degeneracy of the GERGM model.

Results
We illustrate important features of the GERGM and demonstrate its efficacy by applying it to a real network: the network of domestic migration in the United States. Our aim in this application is primarily pedagogical, and so we devote more attention to the choices made as part of the modeling process and alternative ways to interpret our results than is typical of applications whose primary purpose is substantive discovery.
Interstate migration flows in the U.S., the flow of citizens from one state to another, do much to shape the demographic, political, and economic makeup of the country. Migration flows have implications for local financial markets [45] and are an important determinant of stress on public infrastructure [46]. What is more, consumer-voters are thought to relocate to states that better match their preferences [47] and, perhaps as an effect, migration can shape the political climates of the states [48]. Migration flows naturally form a directed and valued network because each state (vertex) sends a certain number of its citizens to every other state (outbound edges), and receives a certain number of citizens from every other state (inbound edges). Despite some recent interest in modeling migration as a network phenomenon [49][50][51], there is little work in this area and the literatures in policy/political science and demography have not been well integrated. Our aim is to demonstrate the GERGM on interstate migration flows while incorporating factors from both literatures.
In contrast to previous studies, we focus on the change in the directional interstate migration flow from one year to the next. Migration flows are fairly persistent over time, and the ability to predict this year's flow based on the previous year's may mask an important type of predictive deficiency in a statistical model. Substantial change in the migration in and out of a state are of interest because they can cause disruptions to local economies and exert unexpected stresses on infrastructure. Specifically, we model the change in interstate migration flows from 2006 to 2007, in the 50 states, Washington D.C., and Puerto Rico. The edge from state i to state j is the difference between the number of people who migrated from i to j in 2007 and the number who migrated from i to j in 2006. These data allow us to consider the GERGM in the context of a valued network requiring transformation away from the restricted valued network onto a continuous unbounded support with exogenous covariates and endogenous parameters, thus making full use of the GERGM's flexibility.
To gain intuition about the network under consideration, we present the largest increasing and decreasing edges and vertices in Figure 3.
There are three broad choices we face in specifying the model for the network of migration change: the selection of the distribution family for g, the covariates that condition the location of g, and the statistics that comprise h.
With respect to the distribution of g, one distinct feature of the data that we need to accommodate is the thickness of the tails. The empirical kurtosis of the edges is 637, compared to the normal distribution's kurtosis of 3. As such, we use the location-scale Cauchy distribution [52]. The PDF of the Cauchy is where m[R is the location parameter (i.e., the median), and sw0 is the scale parameter. The location parameter for the edge from i to j depends on a vector of covariates z ij via regression parameters b, such that m ij~b 0 z ij . Under the restriction that there are no dependencies in the network (i.e. h~0), our model of change in migration flows reduces to the Cauchy regression model (CRM) [52]. Thus, we denote the model without network effects by CRM. We draw directly from the literature on interstate migration in selecting the covariates. Specifically, we include the covariates that [49] finds to be statistically significant determinants of migration flows. These include the population, unemployment rate, percapita income, and average January temperature of both the sending and receiving states. Since we are modeling change in and not the level of migration, each covariate is included as the change in the respective covariate value from 2005 to 2006. For instance Unemployment Sender (ij) is the difference between state i's unemployment rate in 2006 and state i's unemployment rate in 2005.
We complete our specification by considering which endogenous dependence terms to include in the model. We include five terms to capture the endogenous generative structure of the network. The first endogenous effect we include is transitive triads, which will account for any unmodeled clustering in the network (e.g., migration in clusters of agricultural or coastal states). The transitive triads term is defined as where the six additive terms capture every possible combination of directed edges between three vertices: i, j, and k. The second dependence term is reciprocity, which will account for any tendency towards dyadic exchange of migration flows (i.e., states trading migrants at similar levels). The reciprocity term is specified as which captures the tendency of i?j and j?i edges to co-occur. The third term we include is cyclic triads, which will model the tendency towards generalized reciprocity in the network -the degree to which total flows to and from a state are correlated [53]. This term is specified as  and captures reciprocal effects that flow through a third state. The last two terms are closely related: in-two-stars and out-two-stars. These terms account for any unmodeled features of states that motivate flows to and from states respectively. The terms are specified as and capture the tendency for other states, j and k, to send migrants to state i, and for state i to send migrants to j and k respectively. The substantive interpretations of these statistics are illustrated in Figure 4. The plots present relevant quantities, computed on networks simulated using the network statistics discussed above, plotted against values of the parameter for the respective statistic. Quantities are derived as the average over 1,000 simulated networks. The g in this artificial example is a standard normal PDF, but any appropriate PDF could be used. All of the network statistics specified on X result in properties of Y that reflect the Figure 6. MCMC-based Degeneracy Diagnostics. Plots depict diagnostics for the GERGM results reported in Figure 5. Diagnostics are computed on three Markov Chains of 500,000 networks each, constructed via 500,000 iterations of a Gibbs sampler in which a complete network is drawn in each iteration. Each chain is started at a network with highly dispersed start values drawn from a U-shaped distribution on the unit interval, followed by a burn-in of 10,000 iterations. Panels (a.1)-(a.3) give the trace plots of the chains by iteration. The dark gray lines track the mean edge value and the light gray lines track the 95% confidence interval around the mean. Panel (b) gives the histogram of the Gelman-Rubin diagnostic of whether the three chains converged to the same stationary distribution, over all 2,550 directed edges in the migration network. Panels (c.1)-(c.3) give normal quantile plots, which compare the distribution of the Geweke time serial convergence diagnostic over the edges within each chain to the null standard normal distribution (i.e., the distribution implied by the null hypothesis of a chain in convergence). Note: the R package coda [57] was used to compute the Geweke and Gelman-Rubin diagnostics. doi:10.1371/journal.pone.0030136.g006 respective dependency. As the reciprocity parameter increases, the correlation between the values of Y in a dyad increase. As the in two-star parameter increases, the variance in in-degree increases. Also, when the transitivity parameter is positive, the expected value of the third edge in a transitive triad increases with the values of the other two edges in the triangle. It is important to note that these are not the only conceivable measures of their respective network dependence properties. For example, see [54] and [55] for alternative measures of transitivity in valued networks. We utilize these measures because they are consistent with the product specification used in the ERGM framework, but other network statistics can be easily incorporated into the GERGM. Figure 5 shows the estimates from our GERGM as well as estimates from the CRM. As we consider the results, it is important to assess whether the estimated GERGM is degenerate. Our GERGM shows no indication of degeneracy. We simulate networks from the GERGM via three independent Markov chains of 500,000 iterations, using a Gibbs sampler that draws a conditional edge for each directed pair of vertices in each iteration, using the conditional distribution in equation 6. Our approach includes much more simulation within each iteration, as compared to the standard Metropolis-Hastings approach to simulating from ERGM, in which one edge is re-drawn in each iteration [35]. We see, in Figure 6, that (a) the mean edge value is far from zero or one, and varies around the mean of the observed network, and (b) there is no evidence of non-convergence given by the Geweke and Gelman-Rubin convergence diagnostics. Under the null hypothesis of convergence (i.e., no difference in the means at the beginning of the chain and the end of the chain), the Geweke diagnostic has a standard normal distribution [41]. The normal quantile plots in panels (c.1)-(c.3) of figure 6 show that the Geweke statistics computed on our Markov chains are distributed very close to a standard normal, which is consistent with the null hypothesis of convergence. Also, none of the Gelman-Rubin diagnostic statistics, depicted in panel (b), are at or above 1.1 -the level typically taken to indicate non-convergence across multiple chains [56].
A Wald test suggests the restriction of the dependence terms to zero, a restriction the regression model must make because it cannot accommodate dependence terms, is inappropriate and that the GERGM provides a better fit to the data (Wald statistic~119.19 on 5 degrees of freedom, statistically significant at the 0.001 level). The statistically significant effects for the dependence parameters indicate that (a) there are clustering effects in the network, (b) migration to states repels further migration, and (c) increases in migration flows from a state are not offset by increases in flows to that state. We also find a decrease in the number of people leaving warm states, a decrease in migration to states that experienced a substantial increase in population in the previous year, and evidence of an increase in migration away from states experiencing increases in unemployment.
The superior performance of the GERGM relative to the Cauchy regression is further depicted in Figure 7, which gives the predicted and observed network-level reciprocity and cycling measures from the GERGM and CRM. This figure shows that the regression does not adequately fit the dependencies (e.g. the lack of reciprocity) in the migration network. For example, it is theoretically expected that a network of change in migration would exhibit anti-reciprocity and anti-cycling. If a locale is experiencing a spike in migration to other places, that is likely indicative of some undesirable feature of said locale. This antireciprocal feature of the migration network cannot be integrated into the conventional regression modeling framework. Figure 7 serves as an additional test of the appropriateness of the independent regression model. If the CRM were the appropriate specification, the joint quantiles would be jointly uniform and these dependence statistics computed on the latent network would be predicted by the CRM. The GERGM accurately captures these features of the latent quantile network -with the observed value falling in the inter-quartile range of the values simulated from the GERGM.
This application shows the inability of the regression framework to model the sort of dependencies that we observe in real networks and the utility of having an inferential network model capable of accommodating networks with valued edges. In this case, we used our GERGM to produce insights into the migratory dynamics of the United States that could not have been produced otherwise.

Discussion
The GERGM greatly expands the scope of networks that can be modeled within the ERGM framework. This is an important contribution for several reasons.
First, many networks have valued edges. We have examined one such network above, interstate migration in the U.S., but many others exist. For instance, the ij th edge in the cosponsorship network in the U.S. Congress measures the number of bills Sponsored by j that are cosponsored by i [15] in the two year period of the respective Congress. In previous research, [15] this network has been dichotomized to model with the ERGM. In a substantively much different application, [29] apply the ERGM to model a network created by dichotomizing pairwise correlations among the activity levels of 90 regions in the human brain. The direct analysis of a network of pairwise correlations could be conducted with the GERGM, without losing any information about the magnitude of the correlation, by using the simple transformation (i.e., G {1 ) Y ij~2 X ij {1.
Second, our method allows a researcher, who is not necessarily substantively interested in the interdependencies in the network, to test the restriction that the dependence parameters are equal to zero, meaning that interdependencies in the network do not matter. Such tests may be conducted using simple and well known methods such as the likelihood ratio test and Wald test.
Third, many common models for independent data (i.e. regression models typically estimated by least squares and/or maximum likelihood) are nested within the GERGM. Thus, if the endogenous structure of the network does not exert an effect, the researcher is returned a model with results identical to those they would have obtained using a regression. This is convenient not only because those independence models are familiar to political scientists, but because researchers may be dubious about the role of endogenous effects, but not want to risk model misspecification by ignoring them.
Lastly, and probably most importantly, the GERGM expands the set of substantive theories that researchers are able to evaluate empirically. For example, in our application, we gained insight into migration processes that would not have been possible absent the GERGM technology. This not only offers the opportunity to make progress on existing debates in the literature, but presents new theoretical horizons for scholars using relational data.