Structural Properties and Complexity of a New Network Class: Collatz Step Graphs

In this paper, we introduce a biologically inspired model to generate complex networks. In contrast to many other construction procedures for growing networks introduced so far, our method generates networks from one-dimensional symbol sequences that are related to the so called Collatz problem from number theory. The major purpose of the present paper is, first, to derive a symbol sequence from the Collatz problem, we call the step sequence, and investigate its structural properties. Second, we introduce a construction procedure for growing networks that is based on these step sequences. Third, we investigate the structural properties of this new network class including their finite scaling and asymptotic behavior of their complexity, average shortest path lengths and clustering coefficients. Interestingly, in contrast to many other network models including the small-world network from Watts & Strogatz, we find that CS graphs become ‘smaller’ with an increasing size.


Introduction
The analysis of networks is a prospering field that spans many disciplines ranging from biology, mathematics and statistics to the social sciences [1][2][3][4][5][6]. Starting with the study of random networks [7], the interest of the community shifted in recent years to so called complex networks [8,9]. In contrast to random networks, which show a Poisson distribution in the node degrees, many complex networks exhibit a power law distribution [8,10]. The attention in complex networks can be at least partly attributed to the fact that they appear to be omnipresent in nature. This makes such networks not only interesting from a theoretical but also from a practical point of view [11][12][13].
The major purpose of the present paper is three-fold. First, we derive a symbol sequence from the Collatz problem, we call a step sequence. The Collatz problem [14,15] is from number theory and it refers to the mathematical statement that starting from any natural number, the iterative application of a certain mathematical function leads always to the number 1, possibly, via intermediate natural numbers. Hence, the Collatz problem leads to the generation of one dimensional symbol sequences of natural numbers that all end at 1. Second, we introduce a new construction procedure for growing networks that is based on the step sequences from the Collatz problem. For this reason we are calling the resulting networks CS (Collatz step) graphs. This will simulteneously lead to the definition of a new class of complex networks. Third, we investigate the structural complexity and the scaling behavior of step sequences and CS graphs, including an estimate for the asymptotic complexity of CS graphs.
In contrast to many other generation procedures for growing networks introduced so far [2, [16][17][18], our method constructs networks from one-dimensional symbol sequences that are related to the Collatz problem [14,15]. We would like to emphasize that we are not the first to map one-dimensional objects to networks. For instance, for time series data this has been done in [19][20][21][22][23][24] and in [25,26] prime number related networks have been constructed. An even older example for such a construction principle can be found in [15] constructing a so called Collatz graph. In this paper, we tie on the work in [15], however, constructing networks from a different type of sequences that can be derived from the Collatz problem. Interestingly, we will show that Collatz graphs and CS graphs are entirely different and we will argue that this is related to the differences in the underlying sequences, respectively the difference in their complexity.
In this context it is interesting to note that the conjecture made by the Collatz problem is to date mathematically unproven. However, it has been numerically verified up to natural numbers as high as 5:7646|10 18 [27]. For this reason it can be assumed that this intricate problem is capable of generating symbol sequences which are truly complex.
The motivation for our network model is based on the working mechanism of a biological cell. In a cell, a linear symbol sequence, the DNA, is transcribed into mRNAs and then translated into proteins which form protein interaction, signaling or other types of gene networks [28][29][30]. That means our construction procedure, which may appear unconventional at first if compared to well known mathematical mechanisms to generate growing networks [17,18], is in fact employed by nature. In addition to this, we want to note that protein interaction networks and signaling networks can be considered as complex, not only because they exhibit a power law distribution in their degrees, but because these networks form an integral part of the functioning of living cells.
Further, studies have shown that also the DNA sequence itself can be considered as a complex symbol sequence [31,32]. This suggests that the complexity of the DNA sequence carries over to the complexity of gene networks. In this context it appear plausible to assume that not any arbitraray DNA sequence leads to complex (gene) networks but the DNA sequence needs to be complex itself. From an abstract point of view, we base our network model on these observations by exploiting the mechanism of the biological counterpart. Specifically, we use symbol sequences that are related to the Collatz problem [14,15] as starting point for our model. This paper is organized as follows. In the next section, we introduce all mathematical definitions and preliminaries we need for our analysis. Further, we define step sequences, CS graphs and our procedure to generate growing networks. In the results section, we present our analysis of step sequences and CS graphs, studying their complexity and scaling behavior. Further, we provide estimates for the asymptotic complexity, average shortest path lengths and clustering coefficients of CS graphs. This paper finish with a discussion and conclusions.

Methods
In this section we introduce the basic definitions and notations we need to introduce our network model. This includes a brief description of the Collatz problem as far as it is necessary for our analysis.

Basic Definitions: Collatz Problem, Sequence and Graph
A Collatz sequence is defined for every natural number n[N according to the iterative application of the following mapping.

T(n)~1
i fn~1 3nz1 if n odd and nw1 n 2 if n even : For example, for n~3 we obtain the sequence C 3~3 ?10?5?16?8?4?2?1. This sequence is called the Collatz sequence for n~3. Further examples for the first 20 natural numbers can be found in Fig. 1. That mean the iterative application of Eqn. 1 maps the natural number n~3 after t~7 steps to 1, i.e., T(3) 7~( T 0 T 0 T 0 T 0 T 0 T 0 T)(3)~1. Further application of Eqn. 1 cannot lead to other results because '1' is a fixed point of the above mapping. If one considers the natural numbers as states of the transformation, the above sequence can be visualized in the state space by consecutive mappings between adjacent states. A visualization of the state space for the first 20 Collatz sequences is shown in Fig. 1. Due to the fact that the state space is discrete, its representation corresponds to a network. This network has been termed the Collatz graph [15].
We would like to note that the number of elements in the state space corresponds to the number of natural numbers that are traversed from the initial natural number n to 1. However, we would like to emphasize that these elements do not necessarily correspond to the consecutive natural numbers 1,2,3, . . . n. If a state space of a set of sequences, e.g., fC 1 , . . . ,C 20 g, is considered than the number of elements in this state space is the union of the elements of the individual sequences, i.e., DC 1 | . . . |C 20 D. It is interesting to note that there are two types of states, which can be naturally distinguished from each other. The first type of states consists of the natural numbers, n, that were used as initial value to generate a Collatz sequence, C n , whereas the second type of states are states with values wn. In order to visualize this, in Fig. 1, we show states of the first type in gray color and states of the second type in orange.
Based on the above mapping in Eqn. 1, Lothar Collatz conjectured in 1937 that every natural number, n[N, will be mapped to 1 by its iterative application. This is also know as the 3nz1 conjecture or the Syracuse problem [33,34]. To date, this conjecture remains mathematically unproven, however, numerically it has been verified up to 5:7646|10 18 [27]. In this paper we will not be concerned with the proof of this conjecture, but with the the disposition of the step sequence, that can be derived from the Collatz problem, as discussed in the next section.
Step Sequence and CS Graph In addition to the state space generated by the application of, T(n), there are other, different symbol sequences one can obtain that are also based on T(n). In the following, we will derive such a symbol sequence.
In order to define our new symbol sequence properly, we need to introduce two functions, l and H. The first function, l : N?N, defines a mapping from a natural number n to the number of iteration steps t it takes T(n) to map to 1. For this reason, we call l the step function.
Based on T and l, we define a second mapping H : N?N n{1 by H(n; T)~(l(2), . . . ,l(n{1),l(n)), for nw1: That means H is a vector valued function whose components are indexed by H i (n)~l(iz1) for i[f1, . . . ,n{1g. The function H(n; T) allows to generate symbol sequences of length n{1 whose elements assume values in N. For example, H(5; T) generates the sequence (1,7,2,5). Further examples can be found in Fig. 2. We call a symbol sequence generated by H(n; T) a step sequence because the value of each component i of this sequence corresponds to the number of iteration steps the mapping T needs to be applied to map iz1 to 1. In the following, we write H(n; T) briefly as H(n).
For a step sequence H(n) there is a natural representation in form of a network. More precisely, let the different elements of a step sequence correspond to the vertices V of a network and the edges E are defined by consecutive subsequences of length 2 of the step sequence.

Definition: CS Graph
We define a Collatz step graph, briefly called CS graph and denoted by G CS (n), for a step sequence H(n) in the following way. and k~H iz1 (n) The networks defined in this way are directed and weighted, and the edge weights assume values in N|0, whereas w mk corresponds to the number of times the state k follows state m on the step sequence H(n). On a mathematical note, we want to remark that due to the fact that a Collatz step graph is constructed based on a step sequence H(n), the resulting network is indexed by n. That means for each nw1 one obtains a CS graph G CS (n).
In Fig. 2 we visualize G CS (n~20) for the step sequence, H(n~20)~ (1,7,2,5,8,16,3,19,6,14,9,9,17,17,4,12,20,20,7), ð7Þ shown on the bottom in the Fig. 2. Due to the fact that the vertices in this network correspond to the number of steps to map a certain natural number to 1, rather than to the natural numbers n themselves nor to the intermediate numbers as for the Collatz sequence, we emphasize this distinction in the meaning of these elements in a CS graph by a different node color, compared the Collatz graph in Fig. 1. In Fig. 9 we show more complex CS graphs for n~10 3 (left) and n~10 4 (right). In this case, the networks consist of 141=287 nodes and 356=3201 edges. Descriptively, the definition of a CS graph can be visualized by traversing a step sequence from the first element to the last element, corresponding to the vertices in the network, and by connecting consecutive elements with an edge. If an element of the step sequence appeared already at an earlier step, no new vertex is included, but only an edge to this vertex.

Growing Network Model
Using the definitions from the previous section one can alternatively formulate a growing network model for CS graphs, called GMCS. That means this model grows a CS graph, G CS (n), as a function of natural numbers. Algorithm 1 describes this model formally. We formulate the GMCS in terms of the step sequence H, however, we would like to note that an equivalent formulation can Interestingly, in contrast to many other models for growing networks, e.g., random networks or scale-free networks [2,[16][17][18], the construction principle of CS graphs is different. The difference to these models is that an one-dimensional symbol sequence, given by H(n), is used to determine the growth of the network. In the results section, we investigate different structural aspects of the step sequence to the resulting CS graphs. Another difference between our model and, e.g., [2,[16][17][18] is that for a fixed n one obtains always the exact same graph G CS (n). This is due to the fact that the underlying step function is deterministically formed. However, we will demonstrate that the generated networks exhibit nevertheless an astonishing complexity.
Availability: An R implementation of GMCS is available from The Comprehensive R Archive Network (CRAN; http://cran.r-project. org/).

Results
In the following, we study, first, characteristics of the step sequence and its scaling behavior. Then we investigate structural features of CS graphs, their scaling behavior and the complexity of these networks.

Properties and Scaling of the Step Sequence
In order to quantify the behavior of the step sequence H we, first, calculate the autocorrelation function, R, for a step sequence of length n~10 7 . Fig. 3 A shows a visualization of R in a double logarithmic plot. There, one can see that up to lag~10000 there is still a relatively high correlation (higher than expected for a random sequence) between the shifted sequence indicating the presence of a long term memory. Usually, the presence of a long term memory in a symbol sequence is assumed as an indicator for Structural Properties of Collatz Step Graphs PLOS ONE | www.plosone.org the complexity of the sequence [35,36]. We quantify this observation by performing a linear regression of the logarithm of the autocorrelation function and the lag, with c 1~{ 0:327, c 0~{ 0:327: The scale-free nature of the autocorrelation function, R, provides quantitative evidence for the complex nature of the step sequence H.
Another indicator for the complexity of H can be revealed with the help of two histograms obtained for even and odd elements of the step sequence. More precisely, we define for an even natural number n the following two sequences, The histograms for H e and H o are shown in Fig. 3 B. One can see that both histograms can be clearly distinguished from each other indicating subtle differences between the odd and even elements of H. A possible explanation for this effect can be found in the asymmetric construction of the Collatz sequence that is based on T(n), given in Eqn. 1, because this mapping treats even and odd numbers of n differently. However, this behavior is not trivial, because the asymmetry in even and odd sequence elements is not present in the autocorrelation function, R, if the sequences H e and H o are used for its calculation (results not shown).
The last property of H we study is the scaling behavior of the mean number of steps, t t(n; T), it takes the mapping T(n) to converge, which corresponds to the time to reach its fixed-point. More precisely, we define, t t(n; T)~1 n{1 and perform a linear regression for t t on the logarithm of the length n of the step sequence ( t tc c log (n)). This gives a scaling factor of c~23:8 with a p-value of v10 {9 . From our result follows that the scaling of t t is well approximated by the logarithmic growth in n, which means that even for natural numbers as large as n~10 6 , the mapping T converges in average in only about t t(n; T)~150 iteration steps.

Complexity and Scaling of CS Graphs
Now we turn to the investigation of structural properties of CS graphs. We start by studying the scaling of the edge weights. The results of this scaling for three different CS graphs obtained for n~f10 4 ,10 5 ,10 6 g (green, red, blue) are shown in the double logarithmic plot in Fig. 4. More precisely, this figure shows the frequency distribution of the edge weights of the CS graphs. Here, the edge weights correspond to the number of times these edge have been visited when traversing the step sequence from its first to its last element, as defined in Eqn. 6. One can see that all three networks follow asymptotically a power law with nearly the same exponent of {0:92. Further, the change toward larger values of n effects only the range of the power law behavior, but not its exponent.
Next, we study the finite and asymptotic structural complexity of the CS graphs. There are many measures that have been suggested over the past decades to quantify the complexity of networks [37][38][39][40][41][42][43]. However, currently, there is no generally accepted gold standard available that is comparable to the Kolmogorov complexity for symbol sequences [44][45][46]. For our specific context and the construction of the CS graphs based on the symbol sequence H, a recently introduced measure termed edge reduction [47] seems to be most appropriate. The edge reduction is based on so called power graphs G P which are obtained from an underlying network G by grouping nodes and edges that are similar to each other. For example, if a vertex is star-like connected to a group of other vertices, then the group of nodes appears as one node in a power graph, see Fig. 5 A. Another example is a bipartite connection of two groups of nodes, as shown in the right figure in Fig. 5 A.
The measure itself is defined by.
Here #E(G) corresponds to the number of edges in network G and #E(G P ) is the numbers of edges in the power graph G P . Hence, e r measures the reduction in the number of edges in the power graph compared to G. It is important to note that power graphs form a lossless representation of the original network G, which reduces the network complexity by explicitly representing reoccurring network substructures. The underlying construction principle of edge reduction reminds of self-similarity, observed, e.g., in fractal structures [48,49]. For this reason we consider edge reduction as an intuitively plausible quantification of the structural complexity of networks.
In order to study the edge reduction of CS graphs numerically, we generate a random sample of 500 natural numbers fn i g 500 i~1 from the interval ½10 5 ,10 7 and construct for these numbers their corresponding CS graphs fG CS (n i )g 500 i~1 . Then we determine for each CS graph its edge reduction, fe r (n i )g 500 i~1 .
Because the values of e r are restricted to the closed interval between zero and one, the edge reduction cannot grow infinitely, but needs to converge asymptotically for large values of n. For this reason we fit a logistic function [50,51], e r (n)~a 1z exp (b{c| log (n)) , ð15Þ to these values, modeling the growth of e r (n) with respect to n. The results are shown in Fig. 5 B. The parameters we obtain from a nonlinear least square fit are shown in table 0. All parameters are highly significant as indicated by very low p-values. We use this result to predict the asymptotic edge reduction for n??. From Eqn. 15 we obtain e ? r~l im n??
That means the limiting edge reduction of a CS graph is e ? r~0 :8362+0:0052.
To contrast this result with random and highly structured (simple) networks we calculate the edge reduction measure also for 500 random networks and 500 trees. The results from this are shown in Fig. 6 A and B. We would like to note that the obtained edge reduction values for random networks and trees are much smaller respectively larger than for CS graphs. These results are intuitively plausible because random objects are difficult to compress, whereas simple object compress easily. Also, it is generally observed that complex objects fall between random and simple objects [52], as is the case for CS graphs.
The second note we would like to make relates to the time scale it take the random networks and the trees to reach a steady-state value. As one can see from Fig. 6 A and B, the random networks and the trees reach for about 5000 nodes in a network values that fluctuate around their corresponding mean values, indicated as black lines. The CS graphs show for step sequences of length up to 10 7 still a tendency to increase their e r values. However, we would like to point out that the shown x-axes relate to different variables. Whereas in Fig. 6 A and B the x-axes correspond to the number of nodes in a network, in Fig. 5 B it indexes the length of the step sequence. To relate both scales with each other to obtain a fair comparison we, first, estimate the length of the step sequence for which the edge reduction of CS graphs is close to its limiting value e ? r~0 :8362. Inverting Eqn. 15 and using e' r~a {s a~0 :831 as such an approximation, because this value is within one standard deviation of the estimated limiting value, we obtain.
Then, we perform a linear regression for the number of nodes N in CS graphs and the length of the step sequence n, in a double logarithmic scale (see Fig. 5 C). The result of this is given by with c 1~0 :154,c 0~1 :721: From this, we estimate the number of nodes in a CS graph that results from a step sequence of length n~10 11:882 , as predicted above. Using Eqn. 18 this gives N~3555 nodes. Interestingly, this number is slightly smaller than for the random networks and trees. However, the order of magnitude of the node sizes of all three network types, random networks, trees and CS graphs, is of comparable order.
The next question we address relates to the connectivity structure of CS graphs. There is the general misbelief that complex networks should show a power law behavior in their degree distributions [8,10]. In Fig. 6 C and D we show the in-and outdegree distribution of a CS graph obtained for n~10 7 . As one can see, the shown distributions are not scale-free, although, there is a very narrow region toward high degrees which might develop into a power law for an increasing size N of the network. Hence, despite the fact that many types of complex networks exhibit a scale-free distribution in their degrees this is no necessary condition to constitute a complex network. Another example for this are the well-known small-world networks [5,53] that do also not have a power law degree distribution.
Another interesting property of CS graphs is their structural connectivity pattern. In Fig. 7 we show different transformations of the adjacency matrix W of the CS graph, G CS (n~5|10 4 ). Figure  A shows the adjacency matrix, W , itself. Due to the fact that a CS graph is a weighted network we use different colors to emphasize different weights. Figure B shows a binary transformation (g b ) of W , which maps non-zero elements to one and leaves zero elements unchanged, i.e., The resulting network, W b , looks similar to, W , which can be attributed to the fact that most edge weights of W are of a comparable size. This is also supported by the exponential distribution of the in-and out-degrees, shown in Fig. 6.
In Figure 7 C we emphasize non-symmetric elements in W . That means, we apply the following transformation: Then, we transform W ' to a binary matrix W ' b~gb (W ')~g b (g s (W )), as described above. Figure 7 C shows If we consider a more general transformation than in Eqn. 20 considering the components of W as similar when they are both larger than zero, but not necessarily equal, we obtain a matrix W ''. Interestingly, also the binary matrix is not a zero matrix, as can be seen in Fig. D. This establishes that CS graphs are pseudo-symmetric in the sense that they are strictly considered not symmetric but, nevertheless, appear to be symmetric, as can be seen in Fig. 7. Overall, our results suggest that the general connectivity structure of the CS graphs is intricate and robust against different transformations.

Average Shortest Path Lengths and Clustering Coefficients
Next, we investigate the scaling of the average shortest path lengths, L, in CS graphs. Here, the average shortest path length is defined as the average over all shortest paths of all pairs of nodes in a network. In the following, we perform such an analysis for undirected and directed CS graphs, whereas the undirected CS graphs are obtained from directed CS graphs by neglecting the directionality of edges.
Our results are shown in Fig. 8 A and C. Due to the fact that the average shortest path lengths decay for the undirected and directed CS graphs, we fit a nonlinear decay function, in order to determine their finite scaling and asymptotic behavior. In Eqn. 22, L(n), corresponds to the average shortest path length in dependence on the size of the step sequence, n. Table 2 gives the fitted parameter values for Eqn. 22. We would like to emphasize It is very interesting to see that L(n) decays for larger values of n and, hence, the size of the CS graphs. This is quite different to the behavior of random, scale-free and small-world networks, because all these models increase their path length with increasing size of the network. More precisely, the scaling for the average shortest path length for these three network models is [8,18]: random network ð24Þ L* ln (N) ln ( ln (N)) scale-free network ð25Þ An explanation for this behavior of CS graphs can be given with the help of the distribution of shortest path lengths, as shown in Fig. 8 B and D. In these figures, we show four different histograms for undirected and directed network, each for a different size n of the step sequence, as indicated by the color of these histograms, which correspond to the vertical lines in the Fig. 8 A and C. From the histograms one can see that the diameter of the CS graphs, which is the maximal length of all shortest paths, decreases slightly from n~5000 to n~5 : 10 5 . Further, the absolute number of shortest paths increases strongly, as can be seen from the increasing values of the y-axes. Taken together, this results in an overall decrease in the average length of the shortest paths.
In Fig. 9 we show a visualization of two undirected CS graphs, which makes this behavior even more clear. For reasons of clarity of the presentation, we removed loop connections. The left figure shows the CS graph, G CS (n~10 3 ), with 141 nodes and 356 edges that has been generated from a step sequence of length n~10 3 , whereas the right figure shows the CS graph, G CS (n~10 4 ), with 287 nodes and 3201 edges generated with n~10 4 . It is interesting to observe that both network structures are similar to a torus, forming a kind of ring. For increasing n the torus gets more dense in the sense that there are more nodes and edges around the ring, however, the overall structure is conserved. The two figures in 9 include also two shortest paths of maximal length, corresponding to the diameter of the networks, shown in green. The diameter of G CS (n~10 3 ) is 10 and the diameter of G CS (n~10 4 ) is 8. Considering Fig. 9 together with the Figs. 8 B and D, the shown histograms can be interpreted easily. First, the maximal shortest path length in the Figs. 8 B and D corresponds to the diameter of the corresponding networks, which decreases slightly. Second, the increasing number of shortest path lengths from n~5000 to n~5 : 10 5 (see scale of the y-axes) is caused by the increasing density of the nodes on the tori. Third, averaging over all shortest path lengths leads to decreasing path lengths from small to high values of n. This is not only because of a decreasing diameter of the CS graphs but also due to the largely increasing number of short path lengths (for instance shortest paths of length 2).
The shown CS graphs in Fig. 9 are undirected. However, for directed CS graphs we obtain qualitatively similar results. This can also be seen from the Figs. 8 C and D. The only quantitative difference between undirected and directed CS graphs is that the observed (average) path lengths are larger for directed CS graphs.
Finally, we study the clustering coefficients of CS graphs and investigate if these networks exhibit a 'small-world' network characteristics. The (global) clustering coefficient C, also called transitivity [4], is defined as C~n umber of triangles |6 number of paths of length two : ð27Þ In Fig. 10, we show for 23 different CS graphs, generated from n~100 to n~5 : 10 6 , their average shortest path lengths in dependence on the clustering coefficients (in blue). For reasons of reference, we include in this figure also random (red), smallworld (green), scale-free (purple) and biological (green) networks. The random networks have been generated with an Erdös-Rény model [2]. More precisely, for each CS graph, we generate one Erdös-Rény random network with the same size (number of nodes) and the same number of edges. The small-world networks have been generated with the algorithm of Watts and Strogatz with a rewiring probability of 0:01 and three neighbors in an onedimensional model [5]. The scale-free networks have been generated with the preferential attachment algorithm of [16] and randomly selected exponents between 1:5 and 2:5. Also the size of small-world and scale-free networks corresponds to the size of the CS graphs. The three biological networks correspond to a metabolic network, a PPI network from yeast and a neural network from C. Elegans [9,[54][55][56]. One can see that with an increasing length of the step sequence, the clustering coefficients of the CS graphs increases whereas the average shortest path lengths decrease simultaneously slightly. Overall, we find that the clustering coefficients are generally larger than of random and biological networks, and comparable with that of small-world networks. This suggest, that CS graphs could be considered as small-world networks, because they have a high clustering coefficient and a small average path length. We included the biological networks in this figure to show that CS graphs are also remarkably different to natural networks that represent complex mixtures of scale-free, random and small-world networks, as biological networks. Further, despite the fact that the three biological networks consist up to thousands of nodes, and not hundreds as all other networks, this imbalance does not lead to an increasing similarity to CS graphs. In contrast, the larger the CS graphs, the more different they become.
As the limiting value of the clustering coefficient, fitting a logistic growth function as above (see Eqn. 15), we estimate.
The blue cross in Fig. 10 indicates the limiting clustering coefficient and average shortest path length (see Eqn. 23) for CS graphs.

Discussion
The present paper introduced and studied a novel network class. More precisely, we, first, defined and derived a onedimensional symbol sequence from the Collatz problem [14,15], we called the step sequence. From investigating its structural properties, we found that a step sequence exhibits a complex behavior due to the presence of a long term memory. Second, based on step sequences, we introduced a construction procedure for growing networks and called the resulting networks CS graphs. Due to the explicit connection to one-dimensional symbol sequence our construction procedure is distinct to many other well-known growing network models [2,[16][17][18]. Third, we investigated the finite scaling and the asymptotic behavior of structural properties of CS graphs. More specifically, we found  that despite the fact that CS graphs do not exhibit a scale-free distribution in their degrees, their edge weights follow a power law. Moreover, we demonstrated that the values of the edge reduction of CS graphs, which provides a measure for the structural complexity of networks, are situated between the values observed for random (random networks) and simple (trees) structures. This holds for finite values of n as well as asymptotically and, hence, provides evidence that CS graphs possess a complex network structure. This is also supported by our investigation revealing the pseudosymmetric appearance of CS graphs. It is a well-know behavior that complex objects fall between random (chaotic) and simple (ordered) structures and it has been observed for a multitude of different systems, e.g., for cellular automata or Boolean networks [57][58][59].
In addition, we found that the finite scaling of the average shortest path lengths of CS graphs can be approximated by a logistic decay function. This results in the curious behavior that growing CS graphs become 'smaller' with respect to their average shortest path length. Interestingly, despite the seeming similarity of CS graphs and small-world networks generated with the Watts & Strogatz model [5] (see Fig. 9), this characteristics makes them distinct from each other.
We would like to re-emphasize that the CS graphs investigated in this paper, are constructed from one-dimensional symbol sequences (Collatz step sequences) generated by the iterative mapping of natural numbers, governed by Eqn. 1, starting from a natural number n. For this reason, we consider it natural to present our result about the structural properties of CS graphs with respect to n. Instead, usually, properties of networks are studied in dependence on the size of the networks (number of nodes) N. First, we would like to note that there is a simple scaling between n and N, shown in Fig. 5, which allows to convert the results. Second, all statements in this paper are independent of particular values of n and, hence, of N (N follows from n, not vice verse). This includes also the asymptotic results. Third, other network models are not constructed from one-dimensional symbol sequences [2,[16][17][18], for this reason their results cannot be presented in dependence on such a 'n'. Lastly, we present the network properties of CS graphs as a function of n to provide a constant reminder to the reader about the origin of these networks, which is notably distinct.
Over the last decades, there have been many suggestions to define the complexity of one dimensional symbol sequences [44,48,[60][61][62][63][64][65][66][67]. However, none of these methods can be considered as gold standard for all types of applications. For this reason, it does not surprise that the quantification of the complexity of networks, which apparently form more intricate objects than one dimensional symbol sequences, is even less well developed [40]. For this reason, we did not attempt to compare different network complexity measures with each other in order to identify the 'best' one, but selected pragmatically the edge reduction [47] as a feasible measure to study structural characteristics of networks quantitatively. However, it would be interesting for a future study to investigate the transition of complexity from the sequence level, on which our construction procedure of CS graphs is based, to the network level. Due to the established complexity on the sequence (step sequence) and the network (CS graph) level, our construction procedure seems to conserve complexity. However, a quantification of this observation could be instructive.
Another interesting aspect of the present paper that deserves more attention in future studies is the general building principle of our growing network model. To our knowledge, this procedure has not been systematically studied yet, despite the fact that it forms such a distinct mechanism compared to many other growing network models, e.g., random or scale-free networks [2,[16][17][18]. This would also be interesting from a biological point of view because our growing network model has been inspired by the biological processes of gene regulation leading to the formation of different types of gene networks [29,30,68]. Hence, such a growing network model appears from a biological perspective very natural.