Constructing a Watts-Strogatz network from a small-world network with symmetric degree distribution

Though the small-world phenomenon is widespread in many real networks, it is still challenging to replicate a large network at the full scale for further study on its structure and dynamics when sufficient data are not readily available. We propose a method to construct a Watts-Strogatz network using a sample from a small-world network with symmetric degree distribution. Our method yields an estimated degree distribution which fits closely with that of a Watts-Strogatz network and leads into accurate estimates of network metrics such as clustering coefficient and degree of separation. We observe that the accuracy of our method increases as network size increases.


Introduction
Since the term "small world" was coined first by the Milgram's pioneering experiment [1], Watts and Strogatz [2] have proposed the most compelling analytical framework demonstrating the small-world phenomenon prevalent in a range of social, information, technological, and biological networks. A small world consists of many local clusters, but all members are connected with short distance via a few more connected members. These conditions for a small world to emerge are minimal and many real networks have shown small-world properties [3][4][5]. However, it is not straight-forward to quantify "small-world-ness." A quantitative model measures the equivalence between a network and a unique Watts-Strogatz (WS) model [6]. For a real network with high equivalence, the corresponding WS model can be generated to explore the structure and dynamics.
Even though such equivalence is confirmed, it is not viable to study a large real network when sufficient data on a population are not available. For example, modern social networks are very large in sizes and it takes significant resources and time to collect the data of a network and find key parameters. Web-based experiments accommodating a large number of participants are more difficult to control in some respects than are those conducted in physical laboratories [7]. When such real experiments are impractical, an artificially structured network can be studied instead [8]. However, a large real network with strong small-world properties cannot be replicated into the corresponding WS model, unless its parameters are estimated from a sample.
A WS model [2] is characterized by n = number of nodes, K = number of neighbors a node has to its right side in the regular lattice before rewiring, and p = "rewiring" probability with which the right end of an arc incident to a node is rewired uniformly randomly to another node. The size of population under study is represented by n which is known a priori or can be estimated [9,10] in many cases. The sample mean of node degrees is an estimator for K since the total number of arcs remain invariant after rewiring. Among the three parameters, it is the most challenging to estimate p. Motivated by this immediate need, we formulate a method to estimate p leading into an estimated degree distribution which fits closely with that of the corresponding WS model. These three parameters (n, K, p) indeed suffice to characterize a WS network. We observe that, from many generated WS networks under the same values of (n, K, p), variations in network metrics such as clustering coefficient (CC) and degree of separation (DS) (defined as characteristic path length in [2]) are very small.
A direct question from this motivation, then, is how many arcs are incident to node i 2 N after rewiring, where N is the set of nodes in a network. We start from deriving the degree distribution of a network represented by the probability P(δ i = m) that node i has a degree of m, where P is a probability mass function and δ i = the degree of node i (or the number of arcs incidents to node i) after rewiring. A previous derivation of P(δ i = m) in [11] is based on the assumption that δ i ! K after rewiring, which might not be the case for some WS networks we have generated. Nonetheless, this assumption allows a simpler formulation as a result. We thus propose a new formulation of P(δ i = m) which is closer to the exact value in a WS network.

Results and discussion
In the regular lattice of a WS network before rewiring, a node i 2 N has degree 2K with K arcs incident to its right neighbors and K to its left ones. Let N i be the set of nodes connected to node i before or during rewiring (whereas δ i is the degree of node i after rewiring). Then, before rewiring, |N i | = 2K. Note that N i does not include node i. Node i loses one degree after a sequence of events below takes place to node j 2 N i with the assumption of |N i | = 2K.
1. Arc {i, j} is chosen for rewiring with probability p.
2. One end of arc {i, j} (attached to node i) is chosen with probability 1/2.
3. The chosen end is rewired, with probability (n − 1 − 2K)/n, to a node which is neither node j nor one of the nodes in N i .
Consequently, the probability that the degree of a node decreases by 1 is a 1 2 p ðnÀ 1À 2KÞ n . We admit that |N i | = 2K might not be the case during rewiring and our formulation of P(δ i = m) is an approximation. The small world property (high local clustering and short paths) emerges for a small rewiring probability p ranging from 0.001 to 0.1 in Fig 2 in [2]. For a small p, e.g., p = 0.01, about 1% of the arcs are rewired. Accordingly, the degree of most nodes would be N i = 2K during rewiring and this assumption is not significantly limiting. As shown in the examples we have generated, our approximation still results in small errors.
On the other hand, node i gains one degree after the steps below if arc {j, k}, among the (n − 2)K arcs not incident initially to node i, is detached and an end of arc {j, k} is rewired to node i chosen randomly. If i = j or i = k, arc {j, k} is attached back and not rewired. 4. Arc {j, k} is detached for rewiring with probability p.

Node i is chosen with probability 1/n and an end of arc {j, k} is rewired to node i.
Thus, the probability that the degree of a node increases by 1 is β p/n. We rewrite the number of degrees after rewiring as δ i = 2K − X i + Y i , where X i and Y i are binomial random variables representing the number of degrees lost and the number of degrees gained at node i, respectively. Then we have and In this case, node i can lose all 2K degrees as long as it can gain from the other n − 1 − 2K nodes and can lose none since 2K < m. For n − 1 − 2K < m n − 1, 0 x n − 1 − m. Since m is larger than in the previous case, node i loses no more than n − 1 − m degrees and can lose none. In the conditional probability in Eq We assume a large n (e.g., n ) K) consistent with a large network which we are mainly interested in. From Eqs (1), (2) and (4), we have The probability mass function of binomial distribution with probability β in Eq (5) can be approximated by that of a Poisson distribution with rate λ = (n − 2)Kβ = (n − 2)Kp/n for a large n and a small β, which are the cases in small-world networks. From the fact that a ¼ 1 2 p ðnÀ 1À 2KÞ n ! p 2 and l ¼ ðnÀ 2Þ n Kp ! Kp for a large n, Eq (5) can be written as Then the mean estimated from Eq (6) ism ¼ P n m¼0 mPðmÞ and the standard deviation is s ¼ ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ffi P n m¼0 ðm ÀmÞ 2 PðmÞ q . From now on, we use interchangeably use P(m) and P(δ i = m).

Fig 1 includes
an actual WS network generated with parameters of (n = 1,000, K = 8, p = 0.04). As shown in Fig 2, the degree distribution of the generated WS network is symmetric and closely estimated by Eq (6). Due to the symmetry of degrees in WS networks, our framework is intended for networks with symmetric degree distribution.
The example WS network in Fig 1 is not the only one whose node degrees are close to those estimated by Eq (6). We now demonstrate their statistical fit via 8 tuples of parameters which were set to be n = 5,000, 10,000, K = 50, 75 and p = 0.01, 0.05. For each tuple of (n, K, p), 100 WS networks were generated and their node degrees were recorded for two tests performed. First, a chi-square test was performed for each of 800 WS networks between the actual node degrees and estimated values (nP(δ i = m)) given by Eq (6). None of the 800 tests were significant at the given significance level of α = 5% and these results corroborate our observation in  . Thus, given estimates of n, K and p are accurate, the resulting estimates of CC and DS would also be accurate. It is promising to use our method to estimate K and p from a sample and then evaluate network metrics such as CC and DS of the corresponding WS network.
Given that n is known or estimated, we propose an algorithm below to find estimatesK and p for their population values of K and p, respectively. Let S = {(i, δ i ); i = 1, . . ., s} be a set of s individuals sampled from a WS network, where individual i has a degree of δ i . Since the total number of arcs remains the same after rewiring, an estimate for the sample mean isK ¼ ð P s i¼1 d i Þ=ð2sÞ and the sample standard deviation is estimated to bê q . Then we perform a search forp until the standard deviation (sðpÞ ¼ ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ffi P n m¼0 ðm ÀmÞ 2 PðmÞ q , wherem ¼ P n m¼0 mPðmÞ) calculated from Eq (6) gets close enough toŝ. Our algorithm is based on the key observation that, as the rewiring probability increases in the WS procedure, the variations of degrees also increase in the resulting network. Thus, givensðpÞ from Eq (6), we findp in a "reverse" manner.
whilesðpÞ = 2 ½ŝ À ;ŝ þ do IfsðpÞ >ŝ, let p u ¼p. Else, let p l ¼p. Letp ¼ ðp l þ p u Þ=2 and use Eq (6) to calculatem andsðpÞ. From each network, s = 100 nodes were randomly sampled along with their degrees to calculatep from Algorithm 1. The closer the labels are to the diagonal, the more accurate the estimated values match the actual ones. In Fig 4, we normalized all estimated values between 0 and 1 to plot them together. Closer matches between the estimated and actual values are observed for larger networks with n = 20,000. The variations around the diagonal seem to be consistent with those in Fig 3, but exacerbated slightly due to the use ofp as an estimate of p.
In Fig 5, we measure the accuracy ofp calculated by Algorithm 1. For each combination of n = 10,000, 20,000, K = 80 and s/n (percentage of nodes sampled) = 1%, 3%, 5%, we generated 30 WS networks with a randomly chosen p 2 [0.05, 0.2] and calculatedp from Algorithm 1. Then, for each combination, we calculated the sample mean of 30 ratios (p=p). The ratio of 1 represents an exact match betweenp and p. As sample sizes increase from 1% to 5% of nodes sampled, mean ratios ofp=p approach to 1. Also, as in Fig 4, higher accuracy is observed for larger networks with n = 20,000. For each percentage of nodes sampled (s/n = 1%, 3%, 5%), the confidence interval for n = 20,000 is narrower than that for n = 10,000 while both of the confidence intervals overlap withp=p ¼ 1. Table 1 summarizes the sample means and 95% confidence intervals (in a format of sample mean ± margin of error) in Fig 5. A margin of error is calculated as t 0:025 ðs r = ffiffiffiffiffi 30 p Þ, where σ r is the sample standard deviation of 30 ratios. Again, For each percentage of nodes sampled (s/ n = 1%, 3%, 5%), the margins of error for n = 20,000 are smaller resulting in narrower confidence intervals. Thus, our method adds more accuracy for larger networks (e.g., large-scale social networks).

Conclusion
We have presented a method to construct a Watts-Strogatz network using a sample from a small-world network with symmetric degree distribution. Our method yields an estimated degree distribution which fits closely with that of a WS network and allows to characterize the population with accurate estimates of network metrics such as clustering coefficient and degree of separation. This is particularly useful when sufficient information on the population is not available due to limited resources and time. As observed, our method is more accurate for larger networks.
An obvious limitation of our method is the symmetry of degree distribution and we admit that many real networks have skewed degree distributions. Applications of our method are also limited to networks revealing strong small-world properties which can be well represented by a WS model, since our method is formulated based on the WS rewiring procedure. For a real network either with non-symmetric degree distribution or with weak small-world properties, we still hope that our method serves as a building block for potential revisions or extensions.
Replicating a large network from a sample allows further experiments on a generated network for more insights on its structure and dynamics. This would be feasible if some fundamental properties (e.g., small-world properties) of the network are identified and formulated in an analytical model (e.g., the WS model). Then, key parameters of the model can be estimated and a full-scale network can be constructed.