The Metabolism and Growth of Web Forums

We view web forums as virtual living organisms feeding on user's clicks and investigate how they grow at the expense of clickstreams. We find that (the number of page views in a given time period) and (the number of unique visitors in the time period) of the studied forums satisfy the law of the allometric growth, i.e., . We construct clickstream networks and explain the observed temporal dynamics of networks by the interactions between nodes. We describe the transportation of clickstreams using the function , in which is the total amount of clickstreams passing through node and is the amount of the clickstreams dissipated from to the environment. It turns out that , an indicator for the efficiency of network dissipation, not only negatively correlates with , but also sets the bounds for . In particular, when and when . Our findings have practical consequences. For example, can be used as a measure of the “stickiness” of forums, which quantifies the stable ability of forums to remain users “lock-in” on the forum. Meanwhile, the correlation between and provides a method to predict the long-term “stickiness” of forums from the clickstream data in a short time period. Finally, we discuss a random walk model that replicates both of the allometric growth and the dissipation function .

ogy allows us to explain the dynamics of networks by the behavior of threads.In particular, we describe the clickstream dissipation on threads using the function D i ∼ T γ i , in which T i is the clickstreams to node i and D i is the clickstream dissipated from i.It turns out that γ, an indicator for dissipation efficiency, is negatively correlated with θ and 1/γ sets the lower boundary for θ.Our findings have practical consequences.For example, θ can be used as a measure of the "stickiness" of forums, because it quantifies the stable ability of forums to convert U V into P V , i.e., to remain users "lock-in" the forum.Meanwhile, the correlation between γ and θ provides a convenient method to evaluate the 'stickiness" of forums.Finally, we discuss an optimized "body mass" of forums at around 10 5 that minimizes γ and maximizes θ.

Introduction
A Web forum is an online discussion site allowing its members to exchange opinions by posting and replying threads.Although forum is one of the oldest Internet services, its user-generated-content nature allows it to remain as one of the most frequently used services in the era of Web 2.0 [1,2].
The importance of Web forums has motivated many studies on user's interactions on this platform, such as detecting online opinion leaders [3], analyzing political debates [4], or identifying interest-groups [5,6].But these studies on forum usage usually focus on posting behavior and not browsing behavior.A major reason is that the browsing records are generally not publicly accessible.However, as a large proportion of forum users are "silent" users who only read threads and do not give any comment [7,8], the analysis of forum usage based on posting dynamics has strong limitations.
In this paper, we get access to the historical data of Baidu Tieba, a very large Chinese Web Forum system, and investigate the browsing activities on 30, 000 forums.The size (average daily page views) of the studied forums vary from hundreds to millions.Different from previous studies that try to understand how users use forums, we propose to study how forums "consume" users.
Specifically, we view forums as virtual living organisms that grows at the expense of user's attention.
From this perspective, we can talk about the "metabolism" of forums, which describes how the attention of users are "absorbed" and "dissipated" by forums.Inspired by the metabolic theory of ecology [9,10,11], we observe the relation between the number of page views (P V t ) and the number of unique visitors (UV t ) in the growth of forums, which are understood as the "body mass" and "energy consumption" of these virtual organisms, respectively.It turns out that the vast majority of the studied forums satisfies the allometric growth pattern P V t ∼ UV θ t .In other words, the growth exponent θ = d(Log(P V t ))/d(Log(UV t )) keeps unchanged over time.We suggest that θ can be use to measure the "stickiness" of forums, as an alternative to the average surfing length L t = P V t /UV t [12].Because both of θ and L t reflects the ability of forums to remain users "lock-in", but the former is a constant over time, whereas the latter is not.
To probe into the origins of the observed scaling relationship between P V and UV , we compare the flow of collective attention on forums with the flow of energy in food webs and construct clickstream networks.In these clickstream networks, the nodes are threads and the edges are user's switching between threads.We propose the conservation of attention (clickstream) as a principle that constrains the evolution of clickstream networks.We find that according to this principle, the two quantities of interest, P V and UV , can be defined both on the network level and on the node level.On the network level, P V is the total weights of edges and UV is the dissipated clickstreams of the entire network; on the node level, P V is sum of the clickstreams to node i (T i ) over all nodes and UV is the sum of the dissipated clickstreams from i (D i ) over all nodes.The equivalence of these two versions of definitions allows us to explain the dynamics of network by the behavior of nodes.In particular, we describe the dissipation of clickstreams on nodes (threads) using the scaling function D i ∼ T γ i [13,14].And it turns out that γ, a quantity reflecting the dissipation efficiency of networks, is negatively correlated with θ.An naive analysis shows that 1/γ sets the lower boundary for θ when γ > 1.At the end of our study, we discuss an optimized "body mass" of forums that minimizes γ and maximizes θ at around 10 5 daily P V s.
The findings of the current study not only confirm the connection between growth and topology in complex systems [9,15,16,17], but also have applied meanings.For example, the observed universal relationship between "body mass" and "energy consumption" will help webmasters to benchmark and monitor the growth of online communities.Meanwhile, the technique to predict the long-term behavior of forums by analyzing the random snapshots of clickstream networks may contribute to many areas of the Web development, such as click prediction [18] and interest group recommendation [19].In particular, θ that describes the "stickiness" of forums can be integrated as a novel feature into the recommendation of interest-groups [19].Last but not least, we suggest that the presented clickstream network analysis actually provides a very general framework for studying user's browsing behavior in various online systems.In applying this analysis to other types of online social systems, one simply replaces the nodes (which are threads in the currently studied networks) with other information resources accordingly, such as news, tags, videos, etc.
2 Related work

The flow of collective attention online
At every moment, a large number of users are "hopping" between information resources by clicking web pages sequentially, generating a large amount of attention flow both within and across websites [20].Cooley and other scholars use the term "clickstream" to describe this flow of attention and suggest that the analysis of clickstream data reveals the hidden patterns of Web usage [21].
Since Huberman et al. [22], individual clickstreams has been extensively studied, including the correlation between surfing length and duration time [23], the effect of surfing length on user's logoff probability [24], and also the distribution of surfing lengths in social networks [7].A study that is particularly relevant to the current work is [12], which proposes to use the average surfing length L to describe the website "stickiness", which reflects the ability of a site to keep visitors "lockin".The common limitation of these studies is that the focus of research is always independent, individual behavior, whereas how users interacts with each other in collective surfing remains unknown.
With the development of network science, there is a rising trend to integrate clickstream studies and network theories into clickstream network analysis.In clickstream networks, the nodes are information resources and the edges are the clickstreams, i.e., the collective navigation of users between resources [25].Different from individual clickstreams, clickstream networks include the rich interactions between users via information resources, thus provides novel interpretations to some online phenomena that have been extensively studied [26].For example, the surge and decay of news in the public domain is always view as a consequence of the information diffusion among users [27].But from the perspective of clickstream networks, it can also be understood in the "reversed way" as the transmission of user's attention between news [28].As a general framework, clickstream network has been applied to model various online activities, such as paper reading [25], photo tagging [29] and video watching [30].

The metabolic theory and the allometric growth of websites
The metabolic theory of ecology [9,10,11] suggests that the metabolic rate, the rate at which living organisms take up, transform, and expend energy and materials, is the most fundamental biological rate in ecology [10].As a result, from individual biomass production to population growth, many observed patterns can be described by scaling functions with exponents that is the multiples of 1/4.For example, the "Kleiber's law", which is the core of the metabolic theory, posits that for the majority of mammals, their energy consumption scales to the 3/4 power of their body mass [9].Since its proposal, the metabolic theory has been extended greatly.For example, Garlaschelli et al. applied it to study food webs [17,11] and Bettencourt et al. use it to explain the scaling of urban cities [31].In these studies, a frequently used term is "allometric growth".
It refers to the power-law relationships between two variables in the growth of complex systems, whose exponents can be either larger or smaller than 1.When the exponent is larger than 1, it is called "superlinear scaling", otherwise, it is called "sublinear scaling" [31].
In fact, the temporal power-law relationships are also observed widely in the virtual world, even though scholars do not use the term "allometric growth".For example, Cattuto et al. find that in the development of online tagging systems, the total number of tags scales to the length of tag vocabulary with an exponent approaches 1.4 [32].Tessone et al. find that in social programming projects, the number of library citation relationships scales to the number of libraries with an exponent between 1.25 and 2 [33].Our previous studies discover a scaling relationship between the active population and the generated activities with an exponent between 1.18 and 1.5 [34].Similar patterns are also observed in other online collective behaviors such as game playing [35] and email sending/receiving [36].However, despite the fact that these studies have investigated various temporal scaling relationships, there is still a lack of research to examine the online version of "Kleiber's law", that is, the power-law relationship between "body mass" and "energy consumption" of websites.And this is the major concern of the current study.We believe that the online version of "Kleiber's law", once confirmed, will motive many studies that apply the metabolic theory to explain the various behaviors of websites, such as info-mass production (we use this term to refer to the increase of the user-generated content, which can be viewed as a online counterpart of the biomass production), ontogenetic growth, survival and mortality, etc.

3.1
The "body mass" and "energy consumption" of forums If we understand online communities as virtual living organisms that feed on user's attention, a particularly interesting questions would be, what are the counterparts of "body mass" and "energy consumption" of these virtual entities ?[16] provides an flow network model to explain Kleiber's law by arguing that living organisms are, by their very nature, flow systems that transport waters and nutrient.According to this model, "body mass" is the total amount of flow circulating within the system and "energy consumption" is the amount of flow a system receives from the environment or dissipate to the environment (which should be equal each other).By applying this model to clickstream networks, those who are familiar with internet studies will immediately find that these are also the definitions of "PV" and "UV" of websites.Therefore, the online version of Kleiber's law, to exist, predicts that In which A is a normalized coefficient.We argue that the exponent θ in Eq.1 not only shapes the growth dynamics of forums, but also provides a measure of the "stickiness" of forums as an alternative to the average surfing length L, which is suggested in [12].Using the indicator of θ, we can easily separate "sticky" forums from "non-sticky" forums.In particular, if θ > 1, we derive that L t = P V t /UV t ∼ P V t 1−1/θ>0 .It means that the average surfing length of users increases monotonically with forum size (or "body mass").In other words, users are more and more likely to be "locked-in" in the forum during its growth.This is basically what we expect to see from a "sticky" forum.On the contrary, if θ < 1, users on average navigate less threads as the size of the forum increases, which is the property of a "non-sticky" forum.An extra bonus of using θ as the indicator is that, θ = d(Log(P V t ))/d(Log(UV t )) is a constant over time, whereas L t = P V t /UV t is not.Therefore, θ quantifies the "stickiness" of forums as a stable, long-term property.

3.2
The flow network expression of forums In the aforemention sections, we have discussed the forums as online flow systems.But in order to develop a framework that allows for quantitative analysis, we need a more explicit expression of the flow systems.This is how clickstream network comes in. Figure 1 shows the clickstream network constructed from a demo dataset, which contains the browsing histories of nine users.
Each row corresponds to a single browsing activity.The first column denotes the cookies of users and the second column denotes the numeric IDs of visited threads.There is no duplicate record in the dataset.The example clickstream network is constructed as follows.For each record in the dataset, say, [a, 0], if the next record has the same cookie a, e.g., [a, 1], we add a clickstream from 0 to 1; otherwise, we create a clickstream from 0 to the artificially added node "sink".After all records are converted into clickstreams, we add a "source" node to balance the network such that the in-flow (weighted in-degree) is equal to the out-flow (weighted out-degree) over all nodes [37].This balancing process demonstrates a very important principle, the conservation of attention (clickstreams).This principle requires that in-flow must equal out-flow both on the node level and on the network level.According to this principle, attention shouldn't appear or disappear out of nowhere.All the flow of attention circulating in networks comes from, and will eventually returns to, the "environment" (the offline world or other forums).
On each node i, we defined T i as the clickstreams to i and D i the clickstream dissipated by i.
Base on the principle of attention conservation, P V and UV can be defined either on the node level or on the network level.On the node level, they are the the sums of T i and D i over all nodes in the network, respectively.On the network level, P V is equal to the total weights of edges (before network balancing) and UV is equal to the dissipation of the network, i.e., the in-flow of "sink" or the out-flow of "source" (which are equal to each other after network balancing).The equivalence of the two versions of definitions are non-trivial, because this allows us to explain the dynamics of network by the behavior of nodes, as to be shown in the next section.
At this point, we should pay attention to a major difference between biological/ecological and virtual flow systems.In biological and ecological flow networks, There is generally a "root" node who is responsible for getting flow from the environment and supplying it to the rest of nodes, such as the roots of trees (who absorb water from the earth) or the producer in food webs (who gets energy from the Sun).As a consequence, the non-root nodes only dissipate and do not receive flow directly from the environment.flow to the environment.However, things are different in clickstream networks.There is no such a "root" node.Actually, every node may receive flow directly from the environment.This is because the Web is, at least theoretically, a "flat" world in the sense that users may enter into this world from any web page.As a result, we can calculate on each node both the in-flow from "source" (which can be expressed as I i ) and the out-flow to "sink" (D i ).To keep consistent with other flow network models [9,16], we define UV as the sum of D i rather than the sum of I i .In the last section we present a figure to briefly introduce the relationship between I i and D i .

3.3
The connection between growth and dissipation In the last section, we discuss the definitions of the dissipated clickstream D i and the passthrough clickstream T i .The dissipation law in ecology [13] predicts that in which B is a normalized coefficient and γ is an exponent that reflects the efficiency of network dissipation.In Figure 2 we present three example flow networks to explain why γ is a measure of dissipation efficiency and how it is related to θ in Eq.1.
First of all, let's consider two extreme topologies, the star-like (Figure 2a) and the chain-like (Figure 2b and 2c).In the star-like topology, all threads (nodes) receive clickstreams directly from the "environment" and dissipate all clickstreams immediately; in the chain-like topology, all threads receive clickstreams sequentially from one another and dissipate a portion of the received clickstreams.For the convenience of comparison, we fix the UV of all the three clickstream networks to be the same, i.e., 10.However, we find that the resulted P V is larger in the chain-like networks (10+3+1.5+1+0.9= 16.4 in (b) and 10+9+6+3+0.9= 28.9 in (c) ) than in the star-like network (3+2.5+1.5+1= 10 in (c)).This is because by transporting clickstreams between threads instead of dissipating clickstreams immediately, the network increases its storage capacity of clickstreams, i.e., the "body mass".To understand this interesting phenomenon, one can consider how a clown plays balls.A clown can barely hold more than two balls if he just grasp them in his hands, but he can easily maintain a circulation of many balls by throwing up and passes the balls from one hand to the other when they fall down.It is in exactly the same way that clickstream transportation increases the total amount of clickstreams "hold" by a network.
So how dissipation efficiency γ is related to transportation?We find that the smaller γ "delays" the dissipation of clickstreams and thus increases the amount of clickstreams transported in the network.This finding is demonstrated by the comparison between Figure 2b and 2c.To facilitate the comparison, we define the log-out probability of users Therefore, in networks with γ > 1 P i increases with the clickstreams to nodes and otherwise it decreases with the clickstreams.We calculate that P ib = {70%, 50%, 30%, 10%} from node A to D in Figure 2b and P ic = {10%, 30%, 70%, 50%} in Figure 2c (for the convenience of the comparison, we ignore the behavior of node E, whose clickstreams are very small compared to other nodes).As the passthough clickstreams decrease monotonously from A to D, it is easy to derive that γ b > 1 > γ c .As we have calculated that UV b = UV c and P V b < P V c , we have θ b < θ c .From this simple deduction, we conjecture that γ and θ are negatively correlated.In fact, it is reasonable to expect this negative correlation in clickstream networks of different topologies.Because the lower γ always forces large nodes to transport clickstreams to other nodes instead of dissipating them to the environment.This process increases the pass-through flow of the down-stream nodes of these large nodes, and also the downstream nodes of the downstream nodes, and so on.This process continues, remaining more and more flow within the network, until the rest flow is eventually dissipated to the environment at the boundary of the diffusion area.It is obviously that such a flow diffusion process will always increase the "stickiness" of a flow network no matter what shape it has.

Results
In data analysis, we use the log file of Baidu Tieba to construct hourly-based clickstream networks.
Among the millions of forums in the entire Baidu Tieba system, we select the top 30,000 forums, whose size (the averaged daily page views in two months) varies from hundreds to millions.For each forum, we construct 1,440 successive hourly-based clickstream networks using the historical browsing data in two months (from Feb. 27, 2013 to Apr. 27, 2013).We calculate UV t and P V t of these networks to derive θ.In analyzing the dissipation behavior of nodes, we randomly select a day (Apr.24, 2013) and construct 24 successive hourly networks.In fact, to estimate γ, we just need one hourly clickstream network.The reason to include 24 networks is to test whether the estimation of γ is robust over time.In analyzing the relationship between γ and θ we use the averaged value of γ in 24 hours.In the analysis of the scaling relationships expressed in Eq.1 and Eq.2, we always use the ordinary least square regression in log-log plots to estimate the scaling exponent [10].

4.1
The allometric growth of forums Figure 3a shows how Eq.1 shapes the growth dynamics of three different forums during the studied period.We find that this strong regularity holds for most of the forums: more then 86% of forums have a R 2 > 0.8 in the fitting of Eq.1.This finding supports the assumption that all real-world flow systems are governed by the same underlying mechanics and thus exhibit similar regularities [38].In fact, we have extended this assumption to include both of real-world and virtual flow systems.It is very inspiring to find that human attention, after being quantified as clickstreams, confirms to the physical laws observed in natural systems.
In Kleiebr's law, the "body mass" scales to "energy consumption" with an exponent 4/3 ≈ 1.33 [9].But the exponent observed in our data is generally smaller than this value.The mean value of θ is 1.06 and the standard deviation (SD) is 0.10 (Figure 3b).As shown by Figure 3b, the shape of the distribution is slightly asymmetrical; it skews towards the right hand side of the x axis beyond the point of (x=1,y=0).In particular, 82% of the forums has a θ > 1.These results suggest that most forums are "sticky", in the sense that users are more likely to be attracted and remained in the forums when the forum size grows.However, by comparing θ between virtual and real flow systems, we know that the stickiness of forums to user's attention is less than that of biological organisms to water and nutrient: the latter are able to remain the flow in the systems for a longer time before dissipating them to the environment.How can the forums learn from biological organisms ?This is an interesting topic beyond the scope of the current research, but worth being looked into in future studies

The scaling of clickstream dissipation
We find that the law of dissipation (Eq.2) is a robust pattern holds for most of the studied forums: more then 98% of forums have a R 2 > 0.8 in the fitting of Eq.2.Meanwhile, the value of γ estimated from hourly networks is a stable quantity over time (the SD of γs in 24 hours is 0.14).
The mean and SD of the γ distribution is 0.93 and 0.08, respectively.On the contrary of the θ distribution (Figure 3b), the distribution of γ skews towards the left hand side of the x axis beyond the point of (x=1,y=0): 82% of forums has a γ < 1.According to aforementioned discussions, this means that most of the studied forums have a low dissipative efficiency, i.e., the log-out probability of users decreases with the clickstreams to threads.
This findings help us understand the usage pattern of Tieba forums.In browsing threads, users are more likely to log out from non-popular threads than popular threads.There are various factors that may contribute to this phenomenon, but we conjecture that the reverse-time displaying order of threads, together with the "bumping" mechanism, is probably one of the major reasons.All the forums in Tieba system sort threads in reverse time (which is the time of the latest comment) order and display them in sequential pages."Bump" describes an action (e.g., posting) taken by a user such that a particular thread is returned to the top of the thread list.Some users may even post a message with only the term "bump" to show that they are bumping to make sure that more users will see the thread.Popular threads benefit greatly from "bumping" are always displayed on the first page.As these forums are all interest groups with specific topic, such as "the Fans of Lady Gaga" or "Everything about Star Wars", visitors generally share common interest.
Instead of selective reading (which is more common in platforms such as news aggregators), they usually browse the threads in the default displaying order from the top to the bottom and from the first page to the other pages.As a result, before users getting tired, they have read the most popular threads.That is why we observe that users are more likely to stop browsing at non-popular threads.In the heat map, the warmer color means that the distribution of the data points is more dense.In plotting the "binned" data, we calculated the average of x and y values in intervals uniformly selected from the x range.This technique is frequently used to eliminate the noise in data [39].

The negative correlation between γ and θ
We find that the exponent of the dissipation law, γ, has a negative correlation with θ as expected.To understand this finding more deeply, we conduct a simple mathematical analysis.As introduced in Section 3.2, UV is equal to the sum of D i and P V is equal to the sum of T i .Thus, we have: Joining Eq.3 and Eq.4,we have : In data analysis, we find that 0 < A < 1, 0 < B < 1 and θ > 0, thus 0 < A 1/θ B < 1.Using this condition, we derive that Eq. 6 holds only when γ > 1.In other words, if γ > 1, 1/γ is the lower boundary of θ.To those cases where γ < 1, unfortunately we can not give further analytical conclusions.We only know that when γ is smaller than, but close to 1, θ approaches 1/γ.To validate our derivations, we 1/γ in black line (which is separated into two parts, the dashed one and solid one, by the point [1,1] ) in Figure 5a .We find that most of the binned data points are located above the boundary denoted by the solid line, supporting our derivation.By analyze the binned data and the original data, we find that the estimation of γ is reliable in the range [0.8,1.1], which correspond to the values of γ in [1.0, 1.2].Beyond this scope, the estimations of the two parameters vary sharply due to a lack of data.
To summarize, the reverse-time displaying order of threads, combined with the "bumping" mechanism, seems to decrease the dissipation efficiency γ and thus increase the "stickiness" θ of forums.Is this the reason why the reverse-time displaying order is so frequently used in forums and other online communities?This is an interesting question worth further exploration.
At this stage, a natural question to ask is whether γ and θ is affected by the forum size.To answer this question, we plot these two quantities against forum size as shown in Figure 5b.We find that when the forum size approximates 10 5 , γ reaches its minimum value and θ reaches its maximum value.Figure 6: The linear relationship between γ D and γ I .We plot both of the "binned" data (blue triangles) and the original data (heat map).In the heat map, the warmer color means that the distribution of the data points is more dense.The slope of the regression line fitted from the binned data is 0.46.
5 Discussion and Conclusion

The asymmetric between imported and dissipated clickstreams
As discussed in Section 3.2, we can calculate on each node both the in-flow from "source" (I i ) and the out-flow to "sink" (D i ) in a clickstream network.The parameter γ in Eq.2 is actually γ D .By replacing D i in Eq.2 with I i , we can estimate the value of γ I .γ D describes the dissipation behavior of nodes and γ I describes the flow importing behavior of nodes.In most of flow networks in the real-world, there is only one "root-node" that takes the responsibility of importing flow.
But in clickstream networks this job is assigned distributively to many nodes.Figure 6 shows that γ D changes with γ I linearly with a coefficient that equals 0.46.Both of the two parameters are smaller than 1, this means that in the network, large nodes always derive flow from,and return flow back to small nodes instead of exchange flow with the environment.This finding confirmed our assumption on the "chain-like" topology of the studied clickstream networks.

The displaying order of threads and the stickiness of forums
As mentioned in Section 4.2, the display order is a key factor that affects the log-out probability of users as well as the stickiness of forums.To understand the effect of different displaying schemes, one can consider the following metaphor.
Webmasters and users are like two players who are playing the game of porker.The rule is that the two players draw a fixed number of cards from randomly shuffled decks and for each round they present one card to compare by rank.If the webmaster's card is lower than that of the user's, he loses and the user will quit the game.The goal of the webmaster, therefore, is to take a playing strategy such that he can play more rounds.In this metaphor, the webmaster's strategy represents the displaying mechanism of threads, the users' strategy represents his preference, and the rule that the webmaster's card should be higher in rank to continue the game corresponds to user's navigation decision: in each step of the navigation, if the visited thread provides a utility higher than the expectation of the user, he will continue the navigation; otherwise, he is very likely to log out from the forum.
Traditionally, the webmaster's strategy is fixed, whereas user's strategy varies across individuals.There are basically three strategies: 1) to present cards randomly; 2) to present cards in increasing order; and 3) to present cards in decreasing order.As the strategy of users varies, we assume that the average user takes the randomly strategy.Now it is obviously that the last strategy, that is, to present cards in decreasing order, will maximize the successive winning rounds and should be preferred by the webmaster.In other words, the preferable displaying strategy of forums is to show thread in decreasing popularity (P V ).
The strategy used by Tieba system is the combination of the reverse-time displaying order and the "bumping" mechanism.This appears to be different from the decreasing popularity strategy, but as discussed, they lead to the similar result.Now we have a more intuitive understanding on how the displaying mechanism of Tieba forums leads to user's logging out at non-popular threads (whose utility are smaller than the expected utility of users) and thus increase the "stickiness" of forums.Please note that in the above metaphor we assume that the webmasters' strategy is fixed.The introduction of other mechanisms, such as personalized recommendation [40], will dramatically change the rule of the game.Actually, personalized recommendation is like a cheating method that make the webmaster know the cards selected by the users in advance, and present cards accordingly.

Summary
Websites, by its very nature, are the consumers of collective attention and the producers of information [41].We suggest that the comparison of websites with living organisms are not just qualitative metaphors, but also provides quantitative insights into the understanding of websites usage.In this study, we find substantial evidence that websites confirm to the same constraints that also shape the evolve of natural systems.
In particular, we discover the online version of Kleibers' law, that is, the scaling relationship between UV t and P V t in the temporal evolution of forums.Further, we show that the allometric exponent θ, which is an indicator for the "stickiness" of forums in attracting users, is determined by the metabolism of clickstream networks.The lower the dissipation efficiency γ is, the larger the θ would be.Interesting, there seems to be an optimized scale of forums at around 10 5 daily P V s that minimizes γ and maximizes γ.
As suggested by Bettencourt et al. [31], the allometric growth is a very general relationship between variables in the evolution of complex systems.In particular, they show that cities are extensions of biological entities, in the sense that they satisfy the same allometric functions [31,10].
Our study extends their findings in offline social systems to online social systems.We agree with Bettencourt et al. that the scaling relationship does not need to be restricted between body size and energy consumption, but is also applicable to other variables.For example, the recently found "densification" pattern in the growth of online networks [36], together with the scaling relationships discussed in [32,33,34,35,36], are probably different expressions of the "allometric growth" of online flow networks.
Our findings is relevant to the Web development in many aspects.In particular, the present method to predict the long-term trend of clicks on an online community by analyzing user behavior within a short time period is useful in click prediction and other areas of computational advertisement [42].To evaluate the "stickiness" θ of forums, instead of monitoring the P V and UV in months, one simply needs to trace the log-out probability P i of users on a random sample of threads in a hour, and examines the correlation between P i and the clickstreams to the thread T i .The direction of this correlation reflects the sticky/non-sticky property (negative for sticky and positive for non-sticky) and the magnitude of the correlation indicates the level of sticky/nonsticky.Another application of θ is to integrate it as a novel feature in to the recommendation of interest-groups [19].

Figure 1 :
Figure 1: An example dataset and the clickstream network constructed from this dataset.In (b) the weights on edges are always 1 unless otherwise specified by the number near the edge in gray color.

Figure 2 :
Figure 2: Three example clickstream networks of different topologies.(a) A star-like network in which the dissipation probability equals 100%.(b) A chain-like network in which the dissipation probability increases with the clickstreams to nodes.(c) A chain-like network in which the dissipation probability decreases with the clickstreams to nodes.

Figure 3 :
Figure 3: (a) The scaling relationships between UV t and P V t across three forums in 1,440 hours.Each data point correspond to a pair of UV t and P V t .Data points of different forums are shown in different colors.The values of θ are 1.15 (blue points), 1.21 (green points), and 1.29 (brown points) in the three forums, respectively.(b) The distribution of θ of the 29, 993 forums (the data of the rest 7 forums are not enough to support validated fitting).The mean value is 1.06 and the standard deviation (SD) is 0.10.(c) The distribution of R 2 in fitting θ.The mean value is 0.89 and the SD is 0.10.

Figure 4 :
Figure 4: (a) The scaling relationships between T i and D i across forums in three hourly networks.These three forums are the same as the forums presented in Figure 3a.The color scheme of these data points is the same as that of Figure 3a.The value of γ are 0.96 (blue triangles), 0.90 (green squares), and 0.80 (brown circles) for the three forums, respectively.(b) The distribution of the averaged value of γ over 24 hours across the 6, 877 forums.The mean value of the distribution is 0.93 and the SD is 0.08.(c) The distribution of the SD of γ over 24 hours (purple bars) and the averaged R 2 in fitting γ (orange bars).The mean and SD of the two distributions are 0.14 and 0.09, and 0.92 and 0.05, respectively.

Figure 5 :
Figure 5: The negative correlation between γ and θ (a) and the change of γ and θ with forum size (b).In (a) We plot both of the "binned" data (blue triangles) and the original data (heat map).In the heat map, the warmer color means that the distribution of the data points is more dense.In plotting the "binned" data, we calculated the average of x and y values in intervals uniformly selected from the x range.This technique is frequently used to eliminate the noise in data[39].