Can we ‘feel’ the temperature of knowledge? Modelling scientific popularity dynamics via thermodynamics

Just like everything in nature, scientific topics flourish and perish. While existing literature well captures article’s life-cycle via citation patterns, little is known about how scientific popularity and impact evolves for a specific topic. It would be most intuitive if we could ‘feel’ topic’s activity just as we perceive the weather by temperature. Here, we conceive knowledge temperature to quantify topic overall popularity and impact through citation network dynamics. Knowledge temperature includes 2 parts. One part depicts lasting impact by assessing knowledge accumulation with an analogy between topic evolution and isobaric expansion. The other part gauges temporal changes in knowledge structure, an embodiment of short-term popularity, through the rate of entropy change with internal energy, 2 thermodynamic variables approximated via node degree and edge number. Our analysis of representative topics with size ranging from 1000 to over 30000 articles reveals that the key to flourishing is topics’ ability in accumulating useful information for future knowledge generation. Topics particularly experience temperature surges when their knowledge structure is altered by influential articles. The spike is especially obvious when there appears a single non-trivial novel research focus or merging in topic structure. Overall, knowledge temperature manifests topics’ distinct evolutionary cycles.

major breakthroughs in existing fields. Their immense contribution and inspiration to subsequent researches has made them each a leader in their field of research. To this end, we refer to these papers as pioneering works and define a scientific topic led by each to be a citation network that consists of the pioneering work, child papers, which are all the articles that directly cite the pioneering work, and all the citations among them. We visualize our scientific topics with a graph that we call galaxy map. Galaxy map not only highlights the most influential child papers along with the pioneering work, but also does a preliminary clustering within the topic ( Fig. 1(a,c,e)). We find that while some pioneering works still have an overwhelming impact in the scientific topics they founded, quite a few have several child papers who have established an authority comparable or even greater than themselves. Furthermore, in some of our examples, these prominent child papers seem to have transformed the original topic into multiple new topics ( Fig. 1(e)). Much as galaxy map gives a nice overview of scientific topic's current status, the temporal evolution of scientific topic needs to be further depicted. With this regard, we go beyond the galaxy map representation and dig deeper into the topic citation network for a more intuitive perception of topic's flourishing dynamics.
Since we interpret scientific topics through their citation pattern, topic evolution is reflected by the development of topic citation network. Complicated academic citation networks are springing up all across the science community as a result of the explosive research activity growth, both in and across disciplines, and the prevalence of larger teams 18,19 . The representation and characterization of complex network has attracted a huge amount of efforts, among which an appeal to statistical thermodynamics stands out as a principled school of thought 20 . Some studies at the beginning of this century reveal the intimate connections between thermodynamic quantities and complex network dynamics 21 . Recently, more literature has succeeded in characterizing natural networks 22 , neuron networks 23 and biological networks 24 through thermodynamic approaches. In particular, thermodynamic temperature is able to capture critical events in evolving networks 25 . These prior works inspired us in that heat corresponds with popularity and moreover, temperature quantifies partly our body feelings of weather. It would be most direct and intuitive if we could 'feel' topic vigor in the same way as we perceive the weather. Motivated by this thought, we try to depict the flourishing and perishing of scientific topics by measuring their knowledge temperature, a quantity designed to portrait topic impact and popularity evolution by leveraging the rich structural information hidden in citation networks.
Knowledge temperature depends on 3 factors: the evolution of topic size, the evolution of topic knowledge quantity and the advancement of knowledge structure. As knowledge is a sublimation of information and duplicated information is no longer valuable to knowledge generation, measuring knowledge quantity boils down to evaluating the volume of non-overlapped, or useful information. The latter, however, can be estimated by examining paper similarity, which essentially involves determining citation significance. As for knowledge structure, it is also closely related to the question whether a citation is important for an article. Therefore, in order to address the key issue in knowledge temperature conception: citation importance judgement, we extracted skeleton tree for each topic ( Fig. 1(b,d,f)). Skeleton tree provides a more lucid topic representation than galaxy map and accentuates the most essential idea inheritance within the topic by preserving the most valuable citation for every child paper. In particular, we are able to answer 2 fundamental questions by tracing down a path in skeleton tree: from what thought an idea is greatly inspired and what new idea it has directly inspired. From another perspective, skeleton tree demonstrates certain clustering effect in its leaves as it puts intimately related articles together. We employed graph embedding techniques to extract topic skeleton tree. We first measured the importance of every citation in the topic based on structural information and then simplified topic citation network in 2 steps: firstly, remove the loops in the citation network and secondly, leave out relatively unimportant citations while ensuring the global connectivity ( Fig. 2(a)). Because the extraction process involves a thorough investigation into citation network structure, topic skeleton tree serves as an indispensable tool for our knowledge temperature design and for the heat distribution visualization within the topic.
We evaluated topic knowledge temperature from 2 aspects: topic growth and recent structural change in topic knowledge. Our core idea is to make an analogy between topic citation network G t and ideal gas. At timestamp t, we define topic knowledge temperature T t as: where T t growth measures knowledge increment and T t structure estimates the magnitude of changes in knowledge structure between 2 consecutive timestamps.
We initialized T t growth by combining 2 ideal gas's internal energy expressions and updated T t growth via ideal gas state equation, PV = nRT , under the assumption that G t 's expansion is an isobaric process. With pressure P being invariant and R being constant, the variation of T t growth is governed by the dynamics of topic mass n t and topic volume V t . From a macroscopic view of information and knowledge, n t measures the total amount of overlapped information whereas V t represents the total amount of information. A simple qualitative analysis shows that T t growth increases when topics succeed in accumulating distinct, or useful information, the knowledge source for the future. Intuitively, promising topics are able to attract a steady or even growing inflow of new information. On the contrary, staggering topics consume more useful information than they receive and their potential eventually drops. A rising T t growth indicates an increasingly solid and rich knowledge base and thus reflects a topic's growing impact. Furthermore, an accelerating increase in T t growth suggests a topic's greater capability in useful information collection and thus its faster gain in fame.
Inspired by the temperature design in prior work 25,26 , we computed T t structure between every two adjacent timestamps by making an analogy between G t 's evolution and an isochoric process. The analogy is legitimate as long as the node number is fixed, which unfortunately does not hold for G t . In order to solve this issue, we designed a graph shrinking algorithm that transforms the newcomers from timestamp t − 1 and t into virtual citations among nodes in G t−1 (Fig. 2(b)). We defined T t structure as the average structural change brought by a node in G t : where S t−1 , S t are the von Neumann entropy 27 of G t−1 and G t , the weighted reduced graph of G t and U t−1 , U t their internal energy. We approximated von Neumann entropy by node degree and set internal energy to be the number of edges for simplicity. Different from T t growth which focuses more on continual knowledge increment, T t structure is designed to capture recent critical events and hence assesses topic's short-term popularity.
Among all the topics, we identified 16 representative topics to conduct our knowledge temperature experiment. These articles were published between 1959 and 2014 and their research interests fall in domains including machine learning, wireless network, graph theory, biology and physics. These topics have sizes ranging from over 1000 articles and approximately 5000 citations to more than 31000 articles and nearly 200 thousand citations. We find that the temporal evolution of T t well depicts topic flourishing, with T t growth quantifying knowledge accumulation and T t structure reflecting knowledge structure shift. T t growth varies smoothly and determines the overall trend of T t (Fig. 3(a)). A big rise in T t growth correspond most often with a significant increase in topic size. Typically, during such periods, some child papers started to gain popularity and collect a non-trivial number of citations within the topic. They helped the pioneering work maintain the topic visibility 9,28 . Their attractiveness to new ideas, added to that of the pioneering work, helped contribute to the enrichment of topic knowledge pool ( Fig. 3(b)). A direct and visible consequence of this phenomenon is a fortification of existing knowledge structure, sometimes accompanied by a mild extension (Fig. 3(c-e)). Nonetheless, an ever-growing topic scale is not a guarantee for thriving periods. For instance, T t growth of topic led by 'Critical Power for Asymptotic Connectivity in Wireless Networks' has been on the decrease since 2011 despite a continuous size growth. This corresponds to the fact that almost all of the influential child papers within the topic were published no later than 2005. The lack of new, promising ideas and remarkable extensions to existing researches afterwards makes the topic lose community's attention and results in the topic's demise. As for topic led by 'A unified architecture for natural language processing: deep neural networks with multitask learning', its decline in T t growth since 2015 is somewhat atypical. The decrease is owing to the emergence of popular child papers published between 2013 and 2014 that largely excel their parent. Child papers 'Efficient Estimation of Word Representations in Vector Space', 'Distributed Representations of Words and Phrases and their Compositionality' and 'Glove: Global Vectors for Word Representation' have each attracted around 600 citations within the topic, while their total citations have all surpassed 8000, much greater than their antecedent whose citation count still remains below 3000. They have had such big achievements that they have become the authorities in the domain. Consequently, they have won over the attention of subsequent studies, which in turn affects the knowledge accumulation of the topic created by their parent paper. We observe that articles published after 2016 in the topic have not had a comparable development. This confirms partly the shadowing effect caused by the prominent child papers mentioned above. T t structure , unlike T t growth , can vary greatly over time. It usually accounts for important fluctuations of T t (Fig.4(a,b)). A high T t structure usually marks one of the following 2 events: the formation of sub-topics and the fusion of sub-topics. The first event is a consequence of the arrival of rising stars in the topic. These articles, later proven influential to the topic evolution, either introduce multiple research directions or contribute to the flourishing of a single novel research focus. The second event takes place when there is subsequent literature uniting prior works' research. More specifically, the sub-topic merge occurs when there appears some unusual citations where an old article cites a young one and that the young article is crucial to topic development ( Fig. 4 (b,d,f)). Both the emergence of a single non-trivial research focus and the sub-topic merge can cause an obvious spike in T t structure . For instance, topic led by 'Neural Networks for Pattern Recognition' had a sudden T t structure increment when child paper 'A Tutorial on Support Vector Machines for Pattern Recognition' established a third sub-topic direction. In topic led by 'On random graphs, I', prominent child paper 'On the evolution of random graphs' fuses prior works' ideas and changed topic landscape. However, the heat bought by such critical events are ephemeral. In the long run, their impact on topic's life-cycle is eventually reflected by the knowledge accumulation process, which is quantified by T t growth . We note that influential child papers play an important role in both T t 's components and thus is crucial to topic's thriving. However, the duration between their publication and their visible contribution varies a lot 29 .
Besides knowledge temperature, we can also feel topic vigor by examining its skeleton tree. In fact, the evolution of knowledge temperature is consistent with the development of skeleton tree. Its skeleton tree thrives when a topic gains popularity and fame. In times when T t growth rises, skeleton tree grows increasingly sturdy as newly published papers enrich existing research branches (Fig. 3(c-e)). During periods when T t structure soars, topics usually form new research focus thanks to some prominent child papers. The trend is visualized by the emergence of new non-trivial clusters or branches. Sometimes, lately developed research directions prove to be a big success and start to defy topic authorities by attracting most new articles' attention. In such cases, skeleton tree also manifests a gravity shift, with new branches and clusters developing much faster than the previously dominating ones ( Fig. 4(a,c,e)). Finally, if the rise of T t structure is due to sub-topic merge, separated parts of skeleton tree are connected together by a young article which later proved crucial to topic development ( Fig. 4(b,d,f)). When a topic loses it appeal, its skeleton tree stagnates, just like its knowledge temperature ( Fig. 3(f,g)).
We observe a rich variation in T t 's dynamics as each topic exhibits a unique development pattern. We identify 4 distinct topic life-cycles: rising topic, rise-then-fall topic, awakened topic and rise-and-fall-cycle topic. Rising topics demonstrate overall a steady and lasting T t increase. They welcome rather intermittently their child papers that enjoy popularity within the topic. This ensures to some extent a stable knowledge increment. Rise-then-fall topics reach their peak at some point and then go downhill owing to the lack of new development of existing ideas, the absence of new study focus or the shadowing of their outstanding child papers. In addition, their expansion pace slows down during the cooling down phase. Awakened topics can have a mild development for a duration as long as 20 years before experiencing an influence surge. Their sudden flourishing is largely due to scientific communities' recent frenzy in certain domains, such as artificial intelligence. Rise-and-fall-cycle topics manifest a more complicated T t pattern. However, their rises and falls also match the global background, such as the introduction of the Internet, the booming of artificial intelligence and the prevalence of online social networks (Detailed discussion is in Supplementary Information section S3.1-S3.4).
How is heat distributed within a topic? To answer this question, we interpreted T t as the average temperature of G t and computed knowledge temperature for every article based on T t . Node knowledge temperature gauges a work's relative popularity and impact within the topic at a certain moment. At each timestamp t, we assumed the hottest and coldest works and then employed the heat equation to propagate the heat across G t . For a node u, its temperature change dT u di is (we omit the superscript t of node temperature in the equation): where A t vu is the thermal conductivity between node v and node u. We set the pioneering article to be the hottest node (knowledge temperature = 1) and all the underdeveloped papers to be the coldest nodes (knowledge temperature = 0). We modelled heat propagation via idea inheritance and youngster's contribution to knowledge renaissance respectively by forward and backward iterations of the heat equation. The number of iteration i depends on the average hops between 2 randomly selected nodes. Finally we performed a scaling by T t . Node u's knowledge temperature at timestamp t, T t u is therefore: where T t u,std is u's temperature and T t std the average temperature derived from the heat equation.
We visualized node knowledge temperature by skeleton tree. If we let alone the coldest papers, we observe a ubiquitous phenomenon: the closer an article is to the pioneering work, the hotter it tends to be. Node knowledge temperature decreases along paths in skeleton tree (Fig. 4(c-f)). Although pioneering work is the only known hottest node, we identify other heat sources, the majority of which are the centers of non-trivial clusters. Most heat sources happen to be among the most-cited child papers within a topic. They possess primarily intrinsic value. Their own research content contributes a lot to topic's survival and flourishing. Another type of heat source are articles situated between clusters. Such papers may not have made astonishing discoveries nor have attracted many followers, but it is their studies that have inspired some influential subsequent work. Their value lies essentially in the enlightenment.
In an effort to better understand general heat distribution within topics, our preliminary observation prompted us to study the relation between node knowledge temperature and article age, as papers located in skeleton tree cores are parents or ancestors to papers on the periphery. We find that regardless of research themes, older papers indeed tend to have higher knowledge temperatures (Fig. 5). Older papers take advantage of a longer time span and tend to better diffuse their ideas thanks to their numerous followers, a tendency in line with our intuition. Since we assume pioneering works possess the "hottest" knowledge, the gradual temperature decline well illustrates that idea inheritance and innovation are taking place simultaneously in every scientific topic. However, we observe a drop in average node knowledge temperature among the oldest papers in half of the topics. 2 phenomena can explain the anomaly. Some topics contain a tiny fraction of atypical citations where younger articles are cited by older papers or papers published at approximately the same time. When the younger articles happen to be pioneering works, the oldest papers are no longer the topic founders. They usually have inspired few or even no child papers in the topics. Consequently, they are among the coldest nodes. In rare cases, these papers inspired a certain quantity of works. But they remain "cold" owing to their relatively different research focus with that of the pioneering works even though they are connected to the latter. Their citations are more like peer bonds rather than a symbol of inspiration and idea inheritance. Such is the case for the pioneering work 'Particle swarm optimization' and its peer and popular child paper 'A new optimizer using particle swarm theory'.
Even if we let alone the cold old articles, the heat distribution is not that simple and monotonous. We observe in most topics that parent papers are not always hotter than its descendants. According to our design, node knowledge temperature is affected by 2 factors: the heat-level of its own research content and the promotion gained from its descendants. Therefore, a colder parent or ancestor is either due to its less prevalent ideas or a poor general performance of its children. This phenomenon implies that an important status within the topic does not necessarily bring much fame.
We further compared node knowledge temperature with in-topic citation count, a traditional article-level impact metrics, to get a better understanding of their similarities and differences (Fig. S49). We find a weak positive correlation between the two quantities among the best-cited papers in topics. In particular, we highlighted the most-cited child papers together with pioneering works on current skeleton trees. Most of them have a knowledge temperature above average as they are represented as yellow, orange or red nodes (current skeleton trees in Supplementary Information Fig. S3, S9, S15, S35 for example). However, there are exception. For instance, in topic led by 'Particle swarm optimization', popular child paper 'A new optimizer using particle swarm theory' (NOPST) is among the coldest despite the fact that it is the most influential child paper in terms of citation count (Fig. S35). NOPST was published in the same year as the pioneering work and it only cited the pioneering work. Its low temperature is due to its relatively different research focus with that of the pioneering work and an overall low heat level of its children. The latter is somehow also a consequence of the former, as the pioneering work has most prevalent idea. The focus difference is also reflected by their separation in the skeleton tree.
We also tracked the knowledge temperature evolution of relatively popular child papers within a topic and we find a similar phenomenon already observed at topic-level. While an article's own knowledge largely determines its heat level, child papers sometimes play a perceptible role in boosting or maintaining its popularity and impact. For example, in the topic led by paper 'Bose-Einstein condensation in a gas of sodium atoms', article 'Bose-Einstein condensation of exciton polaritons' has kept being hotter since its publication despite a global cooling since 2013 thanks to an above-average active development (Supplementary Information S3.2.7). Our finding is consistent with the research which demonstrates that papers need new citations to keep their visibility 28 . Besides, in some topics, especially the one led by 'Collective dynamics of 'small-world' networks', we frequently find that popular child papers were published in renowned journals such as Nature and Science (Supplementary Information section 3.4). Our observation accords with research which suggests a positive association between journal prestige and article high impact 30 .
Nonetheless, we find that several scientific topics are intimately connected. Some pioneering works occupy a primordial position in other topics' skeleton trees. Furthermore, these closely related topics manifest similar knowledge temperature dynamics. However, such similarity does not correspond very well with idea inheritance and development in some cases. For instance, paper 'The capacity of wireless networks' (CMN) is the most successful child paper of the pioneering work 'Critical Power for Asymptotic Connectivity in Wireless Networks'. It plays a crucial role in topic's prosperity (Fig. S12) by jointly inspiring one third of the topic members, most of which were published during the flourishing period. Besides, CMN surpassed and took over its predecessor to be the new authority in their domain in just a few years. Yet, according to their topic knowledge temperatures, it is the topic led by CMN that went downhill first. To this end, we wanted to design a mechanism that allows us to better capture the interactions among closely-connected topics. Following our skeleton tree notion, we were inspired by the nutrition transfer among real trees in a forest 31 . We hence treated scientific topics as trees and conceived a forest helping mechanism where thriving topics transfuse a small fraction of vigor to their dying siblings. The amount of shared energy depends on both the ages and the size of the topic group. When we compare topic knowledge temperatures before and after forest helping, we find that our helping mechanism regulates mildly the temperatures as if it took into account the "background popularity", average popularity of a bigger research topic to which the group belongs. Overall, forest helping slightly reduces the fluctuation of T t (Fig. S50).
In summary, we report a thermodynamic approach to depict the rise and fall of scientific topics. We design knowledge temperature, an intuitive and quantitative metrics to evaluate topic overall popularity and impact dynamics by fully leveraging the scale and structure dynamics of citation network through skeleton tree. A continuous streaming of useful information is the key to topics' prosperity in the long run, to which the arrival of eminent child papers contributes a lot. In the short term, critical events such as the merge and emergence of new sub-topic also boost topic's vigor. In addition, we also examine the heat diffusion within topics and discover that older articles generally have bigger chances to diffuse its ideas and thus enjoy a higher popularity within the topic. However, exceptions exists widely, suggesting that the positive correlation between heat-level and article's age and impact remains weak. Finally, we design a forest helping mechanism to better depict the idea inheritance and development among intimately-associated topics. Although knowledge temperature cannot directly be used as a scientific impact metrics, our study suggests a new possibility to quantify research impact in a most intuitive way.
Data are available at https://github.com/drlisette/knowledge-temperature. Other related, relevant data are available from the corresponding author upon reasonable request.    Q.L. processed the topic data and optimized the skeleton tree algorithm.
X.W. gave invaluable comments for paper writing.

Competing interests
The author(s) declare no competing interests. In galaxy map: Node size and title size are proportional to total citation count. Only the most-cited papers are labelled with titles. Node colour of pioneering work is red. Node colour of the other articles are determined by their positions under the ForceAltas layout algorithm. Nodes in the same cluster take a same colour (yellow, green, blue or pink). In topic skeleton tree: Node size (except pioneering work) is proportional to structure entropy. Pioneering work node is twice the maximum size of the child paper nodes. Node colour is the same as in galaxy map. Only pioneering work is labeled by its title.  The red node labelled "P" represents the pioneering work. Green nodes are child papers. A directed edge from A to B represents "B cites A". (a) Skeleton tree extraction. From left to middle: loop cutting. Child papers c3 and c4 cites each other. We remove one of the two citations to get a tree structure. From middle to left: tree pruning. We remove redundant citations for every child paper so that it only keeps the most meaningful citation. (b) Graph shrinking for T t structure computation. Graph shrinking process transforms the newly arrived articles into virtual citations among existing papers. For example, child paper c3 arrives between timestamp t − 1 and t and cites all papers in the topic. Its citations suggest that c1 and c2, disconnected in G t−1 , have certain connections in their research content. We remove c3 and add one or two virtual citations between c1 and c2 according to the general rule where the younger virtually cites the older. If c1 and c2 were published in the same year, they virtually cite each other in G t 's shrinked counterpart, G t . Figure 3: Knowledge temperature (especially T t and T t growth ) and skeleton tree evolution of topic led by 'A unified architecture for natural language processing: deep neural networks with multitask learning'. Nodes in skeleton tree are coloured according to their knowledge temperature, with red being the hottest, yellow being the average level and blue the coldest within the topic. Node size (except pioneering work) is proportional to (re-scaled) structure entropy ? . Pioneering work node is twice the maximum size of the child paper nodes. (a) Knowledge temperature evolution.  : Knowledge temperature (especially T t and T t structure ) and skeleton tree evolution of topics led by 'The capacity of wireless networks' (CWN) and 'On random graph, I' (RG). Nodes in skeleton tree are coloured according to their knowledge temperature, with red being the hottest and blue the coldest within the topic. Node size (except pioneering work) is proportional to (re-scaled) structure entropy. Pioneering work node is twice the maximum size of the child paper nodes.

S1 Data Description
We collected topic citation relations from academic databases including DBLP, arXiv, Elsevier and Springer. Each topic is led by an article that have had a profound influence in certain domains. We refer to these papers as pioneering papers or leading papers. A scientific topic includes a pioneering paper, all the articles that directly cites it and all the citations among them. We chose 16 topics among our dataset to conduct the knowledge temperature experiment. Pioneering paper information is listed in Table S1 and topic size is listed in Table S2. Topics are ordered by publishing year.
Among 16 topics, we identify 3 topic groups, each containing 2 or 3 topics:

S2 Model
Our core idea is to treat citation network G t = (V t , E t ) as a thermodynamic system, more specifically, ideal gas. G t is a directed graph whose nodes consist of a pioneering paper and all the articles that directly cites it and whose edges are the citations among them. Its adjacency matrix A t is defined as: As knowledge temperature relies on some quantities defined in skeleton tree extraction and knowledge entropy computation, we would like to organise our model description in the following order: we present first the construction of skeleton tree, then we define knowledge entropy. Next, we unfold our topic knowledge temperature design and at last we elaborate on node knowledge temperature.

S2.1 Topic Skeleton Tree
Skeleton tree illustrates the knowledge structure of a topic. Its evolution reveals a topic's development pattern. The extraction of skeleton tree is essentially a process to reduce a graph to a tree. We note G t 's skeleton tree Tree t = (V t T , E t T ). For notation simplicity, we omit superscript t for variables that appear in the rest of this subsection. There are altogether 3 steps in Tree t 's construction: 1. We perform node embedding and compute distance matrix EmbedDist that shows the node pair-wise distance in embedding space.

2.
We derive matrix Di f f Idx based on EmbedDist to measure the difference between every node pair. Vector ReductionIdx, a node score which serves to judge the citation importance, is computed afterwards. We rely on ReductionIdx to prune G t in the following step.
3. We reduce G t to Tree t by removing less important references while ensuring the overall connectivity. The significance of a citation is determined by the similarity of 2 papers, which is assessed through their reduction indices. The process involves loop cutting and tree pruning. In Tree t , every node except the root, which is exactly the pioneering node, has at most one citation.
We start by slightly modifying adjacency matrix A by adding a self-loop to the pioneering work. This is for the convenience of spectral decomposition. Then, we compute out-degree matrix D and normalized Laplacian matrix diagonal matrix, with diagonal entries equal to the out-degree, or practically speaking the in-topic citation count of each node. We next perform a full spectral decomposition of L. The eigenvectors are our node embeddings and EmbedDist is a distance matrix with entry EmbedDist u,v = eigenvector u − eigenvector v 2 .
Now we proceed to compute difference matrix Di f f Idx. For node pair (u, v), we define their difference index Di f f Idx u,v as: v parent s are the predecessors of v and d u,v parent is the shortest weighted path between u and v parent : MaxDist is the biggest distance between two connected nodes, ) and avgStep is the average hop number of all shortest paths between any two reachable nodes. Di f f Idx gauges the difference between u and v by involving works that inspire v. If u and v parent is reachable from each other, it suggests that there is some degree of similarity in their ideas or research topics and thus we represent their distance by shortest path's weight. Else, we model their correlation by a long imaginary path of avgStep hops and step length of MaxDist. Therefore, the greater Di f f Idx u,v is, the more different u and v are.
For a node u, its reduction index ReductionIdx u is defined as the sum of its difference indices: Vector ReductionIdx helps to determine the importance of citations. A citation between two articles with similar reduction indices is considered more valuable than one between two papers with different reduction indices.
We are now ready to extract topic skeleton tree. The first step is to find and cut loops in G t . We cut a loop by removing the least important edge (its extremities have the most different reduction indices). Nonetheless, we try to ensure that the edge we cut is not the last citation left for some node so as to preserve overall connectivity as much as possible. After loop cutting, we obtain a tree. The second step is to remove redundant citations in the tree. Recall that we only keep one citation for every node except the root in Tree t . Fig. 2(a) illustrates the whole process with a toy example.

S2.2 Structure Entropy
We adopt structure entropy 32 to determine the node size in the skeleton tree visualisation. Structure entropy measures the uncertainty of the tree structure if node u is absent. Consequently, it makes sense to evaluate the importance of a paper to knowledge passing within the topic by structure entropy. For a node u other than the root, its structure entropy S t u is defined as: u is the cut size of the sub-tree Tree t u whose root is u. It is the sum of the degree of nodes in Tree t u in Tree t . E t T is the edge set of skeleton tree. V t T,u is the number of nodes Tree t u contains (the sum of out-degrees of Tree t u ) and V t T,u parent the number of nodes Tree t u parent has.
The term before log measures the importance of Tree t u to the whole skeleton tree and the log part describes the uncertainty of Tree t u with respect to its parent sub-tree.
Structure entropy of the entire topic, S t , is defined as the sum of node structure entropy:

S2.3 Topic Knowledge Temperature
Topic knowledge temperature T t is defined as: where T t growth measures knowledge increment and T t structure estimates the degree of latest structural changes in topic's knowledge framework.

S2.3.1 T t growth
We initialise T t growth by combining the 2 expressions of ideal gas's internal energy U: where S is entropy, n is substance amount (number of moles), V is volume, R is ideal gas constant, c is heat capacity and k adjusting coefficient.
As a result, T 0 growth writes: where S 0 is the initial structure entropy of the topic , n 0 initial topic mass, V 0 initial topic volume, k coefficient to be determined and R and c two constants.
Next, we model G t 's evolution as an isobaric process of ideal gas. Hence, according to the ideal gas state equation PV = nRT , by fixing pressure P, T t growth is updated by the following expression: We set topic volume V t to be the node number: V t = |V t | and topic mass n t as n t = |V t | −Use f ulIn f o t . Topic structure entropy S t is derived in the previous subsection, S t = S t .
Use f ulIn f o t is based on Di f f Idx in skeleton tree extraction: Nevertheless, we would like to finish this part with a qualitative analysis of T t growth 's dynamics from a macroscopic view of information and knowledge. Knowledge originates from information, but information and knowledge have different characteristics. Information is only valuable for one time. Duplicate information does not create any additional value, thus cannot be used to create knowledge. Knowledge is like an understanding and a refinement of information. It is always valuable. Normally speaking we cannot have too much knowledge.
Bearing the interplay of knowledge and information in mind, we are now ready to interpret the symbolic meaning of volume V t and mass n t . V t represents the total amount of information possessed by a topic at timestamp t. Use f ulIn f o signifies the amount of useful information and thus n t symbolises the total amount of overlapped, or used information. We assume that each paper carries one unit of information. Yet we derive useful information edge by edge. This is because in a skeleton tree, all articles except the pioneering paper only have one citation, and if article u and its 'parent' ('child') article have drastically different Di f f Idxs, they are likely to have distinct research contents. In this case, therefore, even if one of them has completely overlapped content with some other article(s) , we can still roughly determine one unit of new information.
From the update rule of T t growth , we distinguish 3 cases (suppose G t always expands, thus V t always increases): 1. T t growth will not change if V t and n t have identical increase rate during the last period.
2. T t growth will decrease if n t increases faster than and V t over the last period.
3. T t growth will increase if V t increases faster than and n t over the last period.
T t growth goes up when the quantity of total information grows faster than the amount of duplicate information. Note that V t − n t = Use f ulIn f o t , T t growth rises when there is an accelerated increase in useful information. The more abundant useful information is, the bigger possibility for a topic to create new knowledge in the future and the greater potential a topic is. Otherwise, the topic "consumes" information faster than its information capital accumulation. If the tendency continues, it will have less information reserve for knowledge generation in the future. Its growth potential declines and eventually it 'dies'. Therefore, T t growth reflects both how smoothly the knowledge accumulation goes and how promising the topic is at timestamp t. As knowledge enrichment eventually brings about scientific impact, T t growth illustrates the long-term cumulative impact of a topic.

S2.3.2 T t structure
For a thermodynamic system with freedom to vary its volume, temperature and pressure, the variation in internal energy dU is given by dU = T dS − PdV + mdn, where T is the temperature, P the pressure, dV the volume change, m the particle mass and dn the change in the number of particles 26 . The temperature T for an evolving network with fixed node number can be derived as T = dU dS 25 . It has been proved that with appropriate thermodynamic representations and some approximations, this relation is able to detect the critical events in a dynamic network 25,26 .
Inspired by the above literature, we define T t structure as: where S t−1 , S t are the von Neumann entropy of G t−1 and G t and U t−1 , U t the internal energy. G t is a weighted reduced graph of G t . It has all the nodes and edges of G t−1 . Besides, G t contains virtual citations deduced from the new nodes coming between timestamp t − 1 and timestamp t. Intuitively, T t structure can be interpreted as the average structural change brought by an article in G t .
The transformation from G t to G t boils down to 2 tasks: remove new nodes and add virtual citations when possible. The edge weight of a real citation is 1. For every new node x, we distinguish 2 cases: • If x has only 1 parent node p x , then remove x. If x has child node(s) c x , connect it (them) to x's unique parent node and set the edge weight A p x c x = 1 2 A xc x . Intuitively, since x only cites 1 paper, its arrival cannot give us extra information about whether any of the node pair in G t−1 that don't have a citation between them shares some of their research content.
• If x has multiple parent nodes, find all its "youngest" ancestor nodes in G t−1 . If a parent node p x is in G t−1 , then p x is already a "youngest" ancestor node. Else, iteratively find p x 's predecessors until they are in G t−1 . Note x's youngest ancestor nodes in G t−1 (a 1 , a 2 , ..., a m ). Next, for each ancestor pair (a i ,a j ) between which there is no edge in G t−1 , add a directed virtual link according to their publishing year y i , y j (note A the real-time adjacency matrix, m the total number of x's youngest ancestor nodes): -If y i < y j , add a directed weighted edge from a i to a j of weight 2·∑ px A pxx m(m−1) . The new edge means "a j virtually cites a i ".
-If y i > y j , add a directed weighted edge from a j to a i of weight 2·∑ px A pxx m(m−1) . The new edge means "a j virtually cites a i ".
-If y i = y j , add a bidirectional weighted edge between a i and a j of weight ∑ px A pxx m(m−1) . The new edge means "a j , a i virtually cites each other". In case of a duplicate virtual link, we discard it. In order words, we always keep the first virtual link added between a node pair. Remove x after adding all possible virtual links. Intuitively, since x cites several papers, we can guess that these papers are somehow loosely connected to one another even if there is no direct citations among them. That is why we add virtual citations of weight less than 1.
We set U t−1 , U t to be the sum of edge weight. As an authentic citation has a weight of 1, U t−1 reduces to the number of edges U t−1 = |V t−1 |. Therefore, if we note A t and E t the adjacency matrix and the edge set of G t respectively, We approximate S t and S t by node degree. The von Neumann entropy for a directed graph is the sum of the von Neumann entropy of its strongly connected (SC) components 27 : Now assume the strong connectivity and we extend the entropy computation for unweighted directed graph 25,27 to that for a weighted directed graph G = (V, E). First define some notations: Adjacency matrix A: In-degree and out-degree of node u: d in Transition matrix P: We noteλ s normalized Laplacian eigenvalue and φ unique left eigenvector of transition matrix P.
The von Neumann entropy of G is the Shannon entropy associated with the normalized Laplacian eigenvalues.By adopting the quadratic approximation to the Shannon entropy (i.e. −x ln x ≈ x(1 − x)), we have 27 Now we expand the equation tr(L 2 ) = |V | + 1 2 (tr(P 2 ) + tr(PΦ −1 P T Φ)) 27 for G: Combine the simplifications together and we have an approximation of G's entropy: Finally, we obtain S t and S t :

S2.4 Node Knowledge Temperature
We employ the heat equation to compute node knowledge temperature. For a node u, its temperature change dT t u dt is: where A t iu is defined as: Di f f Idx is defined previously in subsection topic skeleton tree. A iu is the thermal conductivity between node i and node u.
Before the heat diffusion, we need to fix the temperature of certain nodes and to precise the number of iteration of the heat equation. We assume that the pioneering work is the hottest and all the inactive papers are the coldest. An article u is considered inactive if either of the following criteria is met: 1. u does not have any citation until timestamp t 2. If u joins in the topic before timestamp t − 1 and u does not have any new citations between timestamp t − 1 and timestamp t.
We first diffuse heat backward by transposing the adjacency matrix A for 1 iteration, then forward for avgStep iterations.
avgStep, defined during skeleton tree extraction, can be interpreted as the average hops between 2 random nodes in G t . Backward propagation models the popularity gain in idea thanks to the newcomers and forward propagation models the heat diffusion due to the inheritance of topic knowledge.
We obtain node knowledge temperature ranging from 0 to 1 after applying the heat equation. The last step is to scale node knowledge temperature by topic knowledge temperature. Note T t u,std and T t u node u's temperatures before and after the scaling and T t std the average node knowledge temperature before the scaling, we have

S2.5 Forest Helping
Forest helping is designed for a group of similar topics. Through this mechanism, thriving topics "transfuse" a small part of their energy to other stagnant sister topics. The helping does not change the total energy of topic group: where K is the number of topics in a group and T t j, f orest the average temperature of topic j after the helping.
If all topics in the group are hotter than last period, no helping takes place. Else, all of the topics with a rising knowledge temperature help the rest.
We model the probability that "a thriving topic is willing to help others" follows a beta distribution B(1, ∑ K j=1 a j ), a j being topic age. Beta distribution varies from 0 to 1, which corresponds with option "not help" and option "help with all I have". We assume a prosperous topic will give an amount of energy equal to the expectation of the distribution. Hence, at time t, the energy that a topic gives away is proportional to its own knowledge temperature and is inversely proportional to the ages of the entire group: The energy received by each topic in need of help is proportional to its node number. Therefore, they have an identical increase in their knowledge temperatures: As topics mature, their initially close connection in thoughts will wear off by time. Consequently, the amount of energy transmitted through forest helping will decrease.

S3 Experiments
We first present our results and analysis for individual topic, next discuss the forest helping results for topic group. Note that most of the data for 2020 only cover the first 2 months, therefore the latest temperature is not definite. The data in the tables are rounded to 3 decimal places. We set two constants in T t growth 's calculation as R = 8, c = 1. For topics with more than 5000 articles, the coefficient k = 10 in T 0 growth 's computation. Else, k = 100.
In this section, we refer to "popular child papers" as the child papers with high in-topic citations unless explicitly specified. Child papers with titles in topic's current skeleton tree are the ones with the highest in-topic citations, whereas the highlighted child papers in galaxy maps are the ones that has won the most total citation counts.
Based on the evolution of knowledge temperature, we classify topics into 4 categories: rising topic, rise-and-fall topic, awakened topic and rise-fall-cycle topic. Among 16 topics, 9 follow a rise-then-fall pattern, with their knowledge temperature reaching record high shortly after birth. 3 topics have been almost always on the rise until today. 2 topics have waited a long time before being recognised and having a surge in knowledge temperature. We refer to them as awakened topics. The rest exhibit a periodic knowledge temperature variation characterised by multiple up-down cycles.

S3.1.1 Regulatory T Cells: Mechanisms of Differentiation and Function
The topic has been thriving ever since its birth in 2012 (Fig. S1). It has a very stable annual growth of T t and T t growth , which corresponds with its seemingly uniform publishing rhythm: an annual publication count always over 10% of the total size between 2013 and 2019. In addition, popular child papers came at a steady speed during 2012 and 2015. They have helped maintain a stable knowledge accumulation.
T t structure remains tiny, suggesting that this topic has a gradual knowledge structure progression and has not experienced a sudden short-term impact gain. Indeed, although we observe constant visible development in skeleton tree, we don't see any disruptive changes in the overall structure (Fig. S2). Under the leadership of several popular child papers, the topic have been succeeded in developing some sub-directions, as is reflected by the fact that multiple non-trivial branches have been gradually  Figure S1. Regulatory T cells: topic statistics and knowledge temperature evolution growing out of the central cluster led by the pioneering work. Yet so far the pioneering paper remains the absolute topic center. Moreover, tiny twigs are forming around the center at a seemingly uniform speed, which may be a good sign for more novel research focus. The vigor of skeleton tree shows again the topic's slowly yet firmly rising popularity and impact. Now we closely examine its latest skeleton tree (Fig. S3). Almost all the hottest articles surround the pioneering paper and node knowledge temperature decreases globally as the articles are located farther away from the pioneering paper. Note that the blue nodes that surround the pioneering work are articles with little development within the topic. If we let alone these coldest papers, the heat distribution fits the general rules "the older the hotter" (Fig. 5(a)) and "the more influential the hotter" ( Fig.  S49(a))). Nonetheless, there are exceptions. Age and citations are not guarantee for heat-level. For example, popular child paper 'Transcription factor Foxp3 and its protein partners form a complex regulatory network' is colder than some of its child papers in the research branch it leads. The intrinsic difference of their research ideas, which is partly reflected by the average heat-level of their citations, causes the temperature difference. Besides, we also identify some young and hot articles. For example, 2 papers published in 2017, 'TNFR2: A Novel Target for Cancer Immunotherapy' (TNFR2) and 'Crosstalk between Regulatory T Cells and Tumor-Associated Dendritic Cells Negates Anti-tumor Immunity in Pancreatic Cancer' and 1 paper published in Nature Immunology in 2018, 'c-Maf controls immune responses by regulating disease-specific gene networks and repressing IL-2 in CD4 + T cells' all have a knowledge temperature above average. All of them have already inspired several works. Their popularity not only manifests the boosting effect of new articles on original work, but also shows the lasting activity of this topic. Overall, these atypical examples suggest that the positive correlation between node knowledge temperature and age or pure impact in terms of citation statistics is weak.
In particular, we find the knowledge temperature evolution of paper 'Basic principles of tumor-associated regulatory  Table S3. Regular T Cells: Clustering effect example. First line is the parent paper and the rest children.
mildly climbed up since 2016, which is in accordance with topic knowledge temperature dynamics. The arrival of its promising child, TNFR2. TNFR2 has helped keep BPTRT's heat-level with its own development. This example well illustrates child article's role in maintaining parent paper's popularity and impact.
We observe in addition certain clustering effect in the skeleton tree. For example, almost all direct children of paper 'Pregnancy imprints regulatory memory that sustains anergy to fetal antigen' have similar research themes as itself (Table S3). This confirms the effectiveness of our skeleton tree extraction algorithm.

S3.1.2 Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling
As is shown by the basic statistics and T t , the topic is keeping popularity and steadily gaining impact (Fig.S4). Its popular child papers came at a steady speed during 2015 and 2017. Apart from enriching topic knowledge pool with their own ideas, they also attracted new researches' attention and thus have helped maintain a stable knowledge accumulation. The topic has been accelerating its expansion since 2017. It witnessed the biggest annual publication count in 2019. Yet as most child papers published no earlier than 2018 have had little development, the publication surge did not result in a significant uprise in T t .
T t structure remains tiny compared to T t growth , suggesting that the topic has a gradual knowledge structure progression and has not experienced a sudden short-term impact gain. Indeed, although its skeleton tree has constant visible development (Fig. S5), so far no child paper is able to defy the absolute authority of the pioneering paper, the center of the biggest cluster. Several popular child papers have each led a research sub-field in the topic, as is depicted by the small bundles extending from the central cluster. In particular, popular child paper 'LSTM: A Search Space Odyssey' in 2017 has inspired 2 schools of thoughts. The maturation of these newly emerged research directions accounts for a higher T t structure in the first years of the topic. Overall, we observe a universal non-trivial growth in the skeleton tree. The vigor of skeleton tree shows again the slowly yet firmly increasing popularity and impact of this topic. Now we closely examine its latest skeleton tree (Fig. S6). The decrease in node knowledge temperature from root, the pioneering work, to leaves is obvious, which accords with the general rule "the older the hotter" (Fig. 5(b)). Note that the blue nodes that surround the pioneering work and popular child papers are articles with little development within the topic. In particular, the heat distribution is rather concentrated in old papers. This phenomenon is in line with our above observation that young child papers have little authority in the topic. The limited heat diffusion is also why most popular child papers have a node knowledge temperature no greater than average. This topic is quite young. It needs more time to fully explore the potential of new ideas and to trigger a thorough heat diffusion in its range.
In particular, we find the knowledge temperature evolution of the second most-cited paper 'An Empirical Exploration of Recurrent Network Architectures', published in 2015 in journal International Conference on Machine Learning very interesting (Fig. S6). This article became much hotter from 2015 to 2016 thanks to its numerous child papers. However, its temperature reduces by half from 182.578 to 89.19 the next year upon the arrival of the third most-cited paper 'LSTM: A Search Space Odyssey', the leader of the right major branch in the skeleton tree ( Fig. S5 (b,c)). Since then, its temperature has been slightly decreasing to around 80 in 2020. The sudden drop is a vivid illustration of the rivalry within the topic. Figure S3. Regular T Cells: Galaxy map, current skeleton tree and its regional zoom. Papers with more than 55 in-topic citations are labelled by title in the skeleton tree. Except the pioneering work, corresponding nodes' size is amplified by 3 times.
year   Table S4. GRU: Clustering effect example. First line is the parent paper and the rest children.
We observe in addition certain clustering effect in the skeleton tree. For example, almost all direct children of paper 'Machine Health Monitoring Using Local Feature-Based Gated Recurrent Unit Networks' study the industrial applications of gated recurrent unit network (Table S4). This illustrates the effectiveness of our skeleton tree extraction algorithm.

S3.1.3 Neural networks for pattern recognition
The topic gained popularity and impact steadily in its first 10 years, as is shown by its increasing size and T t (Fig. S7). During this period, influential child papers within the topic, namely 'Pattern Recognition and Neural Networks' (PRNN) published in 1996 and 'A Tutorial on Support Vector Machines for Pattern Recognition' (SVMPR) published in 1998, shaped the skeleton tree altogether with the pioneering work. Their enrichment to topic knowledge structure accounts for a slightly higher T t structure back then, which is manifested by the formation of 2 clusters in the skeleton tree (Fig. S8). Yet the pioneering work is still the absolute authority in the topic. In particular, the cluster in the top is led by PRNN and the top-left small cluster surrounds SVMPR (Fig. S9). Meanwhile, their arrival pushed up the T t growth as they also enlarged knowledge base together with common descendants with the pioneering work. Afterwards, despite a constant increase in total size, topic's T t increment has slowed down. The popular child papers coming after 2000, namely 'Boosting the differences: much as their antecedent (Fig. S8). As a result, the topic has been accumulating its knowledge and popularity much slower than before. Nonetheless, globally speaking, this is a rising topic.
year  Now we closely examine the interior of this topic. 20 years of development allows a full exploration of the mainstream ideas and a thorough heat diffusion within the topic (Fig. S8). Today, the most popular child papers all have a node knowledge temperature above average (Fig. S9) and they serve as heat sources together with the pioneering work. As the articles are located farther away from them, node knowledge temperature decreases globally. Node knowledge temperature also drops evenly with article age (Fig. 5(c)). The drastic heat-level drop in biggest ages is due to the fact that the topic contains several articles published earlier than the pioneering work and these articles have few followers. Besides, the blue nodes that surround the pioneering work and the most popular child papers are papers with few or no in-topic citations. However, even if we let alone these oldest articles and the aforementioned papers with little subsequent development, the general rule "the older the hotter" is not robust. DM is coloured orange while DGC and FSC are coloured orange-red. This is due to the intrinsic difference of their content, which is reflected by their distinct citations. This example also suggests that the general rule "the more influential the hotter" is weak (Fig. S49 (c)).
We observe in addition certain clustering effect in the skeleton tree. For example, all child papers of 'Selection of input parameters to model direct solar irradiance by using artificial neural networks' study the topic's application in energy radiation (Table S5). This confirms the effectiveness of our skeleton tree extraction algorithm.  Table S5. Pattern recognition: Clustering effect example. First line is the parent paper and the rest children. Figure S9. Pattern recognition: Galaxy map, current skeleton tree and its regional zoom. Papers with more than 230 in-topic citations are labelled by title in the skeleton tree. Except the pioneering work, corresponding nodes' size is amplified by 5 times.

S3.2.1 Critical Power for Asymptotic Connectivity in Wireless Networks
As is shown by the basic statistics and T t growth , the topic reached its peak around 2011 (Fig. S10). The decline in scale growth and T t growth is obvious afterwards. The majority of popular child papers were published no later than 2004. They pushed up T t growth with their new ideas and contributed to the flourishing before 2010. In particular, popular child papers 'The capacity of wireless networks' published in 2000 and 'The number of neighbors needed for connectivity of wireless networks' published in 2004 each leads a non-trivial research sub-direction, demonstrated as clusters in the skeleton tree (Fig. S12). Their substantial extension to the topic knowledge structure is additionally illustrated by a high T t structure in the early days. However, the glory did not last for long. After 2010, the continuous lack of young influential child papers gradually resulted in a decreasing topic visibility and thus a shrinking inflow of useful information, its knowledge source. The trend is also reflected in the stagnation of skeleton tree. While we are still able to detect some development on the periphery of all 3 clusters from 2007 to 2011, the skeleton tree seems to take a definitive form after 2011. The snapshots look almost identical (Fig. S11). Consequently, both T t growth and T t structure have plunged. After 10 years of golden age, the topic is now perishing.
year Now we closely examine the heat distribution within the topic (Fig. S12). We observe a quick heat diffusion during the flourishing period ( Fig. S11(b,c)). Now heat diffusion is complete as popular child papers all have a knowledge temperature above average and the child papers published during the golden period are relatively hot in general (Fig. 5(d)). An obvious exception lies in the oldest child papers. Their low average temperature is because they were published at the same time or earlier than the pioneering work and they have few or no followers. Besides the pioneering work, popular child paper 'The capacity of wireless networks' is also a heat source within the topic. As articles are located farther away from them, they gradually cool down. The blue nodes that surround the pioneering work and the popular child paper 'The capacity of wireless networks' in central clusters are papers with few or no in-topic followers. However, the general rules "the older the hotter" and "the more influential the hotter" (Fig. S49(d)) are not robust. For instance, paper 'New perspective on sampling-based motion planning via random geometric graphs' (SBMP) published in 2018 is hotter than its parent, 'CONNECTIVITY OF SOFT RANDOM GEOMETRIC GRAPHS' (CSRG), an article published in 2016. SBMP has an average knowledge temperature while CSRG has a temperature below average. This can be mainly attributed to their different research focus, which is reflected by their distinct citations and citations' average heat-level. Another reason may be that even though CSRG has had a much better development, the dozen articles it has inspired have gained little popularity and impact, thus they do not help boost CSRG's status.
We find article 'Power Control in Ad-Hoc Networks: Theory, Architecture, Algorithm and Implementation of the COMPOW Protocol' particularly interesting. It is not a cluster center, nor does it have many articles around, yet it has a big structure New perspective on sampling-based motion planning via random geometric graphs 2018 Table S6. Critical Power: Clustering effect example. First line is the parent paper and the rest children.
entropy and a highest knowledge temperature. We think this is due to its strategic position, right between 2 clusters respectively led by 'The capacity of wireless networks' and 'The number of neighbors needed for connectivity of wireless networks'. The article itself may not have a big impact, but it has inspired a handful of influential literature. Its value lies in enlightenment.
We observe in addition certain clustering effect in the skeleton tree (Table S6). For example, almost all child papers of 'CONNECTIVITY OF SOFT RANDOM GEOMETRIC GRAPHS' have similar research themes as itself. This confirms the effectiveness of our skeleton tree extraction algorithm.

S3.2.2 The capacity of wireless networks
As is shown by T t , the topic reached its peak at some time around 2007 (Fig. S13). The batch of popular child papers arriving between 2001 and 2004, namely 'Capacity of Ad hoc wireless networks', 'Mobility increases the capacity of ad-hoc wireless networks', 'A network information theory for wireless communication: scaling laws and optimal operation' and 'Impact of interference on multi-hop wireless network performance', largely enriched the topic knowledge base by inspiring several research sub-fields, as is reflected by the significant structure advancement in skeleton tree from 2003 to 2007 (Fig. S14). As a result, we observe a soar both in T t growth and T t structure . Popular child papers continued to come until 2007. But the younger ones did not cause a stir as much. Only 1 of them has made visible contribution to knowledge structure evolution: 'Closing the Gap in the Capacity of Wireless Networks Via Percolation Theory' published in 2007 opened up a new research focus and led to the end division of a major branch in the skeleton tree by 2011. The decreasing exposure gained by its child papers and a decelerating evolution in knowledge pattern caused T t structure to drop after 2007. But the residual attractiveness continued to draw a abundant quantity of "new blood" and ensured the rise in T t growth for a while longer. After 2011, despite a continuous size expansion and a steady knowledge accumulation, the topic has been gradually phased out due to an overall mediocre development of child papers published after 2009. The wear-off of the community's focus is illustrated by an immediate drop in T t structure in 2015, which also accounts for the down trend of T t . Correspondingly, we observe fewer remarkable changes in skeleton tree during this period. While the cooling-down is mainly due to attention loss before 2015, recent temperature drop is caused by knowledge supply shortage. The focus loss has eventually resulted in diminishing publications and affected its long-term knowledge accumulation. To sum up, after around 10 years of glory, the topic is now going downhill. Now we probe into the topic and closely examine the heat distribution in its latest skeleton tree (Fig. S15). After 20 years of development, the heat diffusion is nearly completed as popular child papers all have a knowledge temperature above average and the child papers published in the first 10 years are relatively hot in general (Fig. 5(e)). The popular child papers and the pioneering work are the multiple heat sources within the topic. If we let alone the blue nodes surrounding the pioneering work and popular child papers, which are papers with few or without any in-topic citations, it is clear that node knowledge temperature decreases globally as the articles are located farther away from them. However, there are exceptions to general rules "the more influential the hotter" (Fig. S49(e)) and "the older the hotter". For example, paper 'Mobility increases the capacity of ad-hoc wireless networks' (MAWN) published in 2001, which is at the junction between the central cluster and a principal branch, is slightly colder than 2 of its children: 'Design challenges for energy-constrained ad hoc wireless networks' (DCAWN) published in 2002 and 'Unreliable sensor grids: coverage, connectivity and diameter' (USG) published in 2003. MAWN is coloured orange while DCAWN and USG are coloured orange-red and red. The main reason of this uncommon phenomenon is their different research focus, which is reflected by their distinct citations and citations' average heat-level. Another reason may be that even though MAWN has inspired much more child papers, few of its numerous followers have so far achieved remarkable development, hence their limited boosting effect.
We observe in addition certain clustering effect in the skeleton tree (Table S7)  S12. Critical Power: Galaxy map, current skeleton tree and its regional zoom. Papers with more than 100 in-topic citations are labelled by title in the skeleton tree. Except the pioneering work, corresponding nodes' size is amplified by 3 times.
year  . Capacity Wireless Network: Galaxy map, current skeleton tree and its regional zoom. Papers with more than 500 in-topic citations are labelled by title in the skeleton tree. Except the pioneering work, corresponding nodes' size is amplified by 3 times.

S3.2.3 Efficient Estimation of Word Representations in Vector Space
The popularity and impact gain in the first years is mainly due to a fast accumulation of useful information. By the end of 2013, 2 influential child papers, 'Linguistic Regularities in Continuous Space Word Representations' (LRCSWR) and 'Distributed Representations of Words and Phrases and their Compositionality' (DRWPC) had formed the fundamentals of topic knowledge structure. LRCSWR is the red node in the middle of the then skeleton tree and its child, DRWPC, is represented by the yellow-green node above itself ( Fig. S17(a)). During the next 2 years, the topic expanded quickly thanks to the substantial development of all 3 papers. DRWPC emerged as the second topic center following the pioneering work ( Fig. S17(b)). In addition, DRWPC helped extending topic knowledge structure by inspiring a new research direction. This research branch later proved to be a novel research focus. Starting from 2016, owing to a multidimensional development the topic has been maintaining a knowledge reserve quantity corresponding to its size, which is reflected by its steady T t growth (Fig. S16). More importantly, the research branch that emerged by the end of 2015 has developed into 2 new non-trivial research directions due to the popularity rise in 2 child papers published in 2014: 'Glove: Global Vectors for Word Representation' (Glove) and 'Distributed Representations of Sentences and Documents' (DRSD). They brought new knowledge, attracted the attention of the latest research attention, and catalysed an accelerated topic knowledge structure evolution, which is captured by a rising T t structure . This year, there has not been any significant new trend so far. Therefore, the topic cools down a bit due to a T t structure drop. Unless the topic succeeds in "breeding" some new focus or having some breakthrough to existing sub-topics in the near future, it starts to go downhill after 6 years of thriving.
year  Figure S16. Efficient word representation: topic statistics and knowledge temperature evolution Now we probe into the topic and closely examine the heat distribution in its latest skeleton tree (Fig. S18). The topic's fast development accompanies a continuous heat diffusion. The older popular child papers has become the hottest since 2015 and the younger ones, namely DRSD and Glove, has recently evolved into topic's new heat sources. It is clear that node knowledge temperature decreases globally as the articles are located farther away from them. This phenomenon fits the general rule "the older the hotter" (Fig. 5(f)) and "the more influential the hotter" (Fig. S49(f)). Note that the blue nodes that surround the pioneering work and popular child papers in central parts are papers with few or without any in-topic citations.
We observe in addition certain clustering effect in the skeleton tree (Table S8) Table S8. Efficient word representation: Clustering effect example. First line is the parent paper and the rest children.

S3.2.4 Coverage problems in wireless ad-hoc sensor networks
This topic reached its peak around 2010 thanks to a surge in  (Fig. S20(b)). Nonetheless, along with the multidimensional flourishing, the knowledge structure started its gravity redistribution due to the maturation of the research sub-directions. This silent transformation is captured by the high T t structure around 2010. The aforementioned popular child papers as well as their inspirations for future works also make great contributions to the knowledge accumulation. They helped push up T t growth until 2010. Afterwards, the topic experienced first an absence of promising child papers and then a decline in useful information supply due to its decelerated expansion. Consequently, T t growth has stagnated. The skeleton tree has unsurprisingly lost its vigor during this period (Fig. S20 (c,d)). To sum up, this topic, after a rapid development in its early days, demonstrates now a decreasing activity and a diminishing popularity and impact.
The topic's skeleton tree is a bit special in that it is comprised of 2 parts. The separation is due to the isolation of LAWAN from the pioneering work. LAWAN cites both the pioneering work and 'Dynamic fine-grained localization in Ad-Hoc networks of sensors' (DLANS). Because of a closer relation between LAWAN and DLANS, its connection to the pioneering work is cut off in skeleton tree extraction. A similar reason caused the separation of DLANS and the pioneering work. LAWAN, along with several intimately related papers, is thus completely separated from the pioneering work. They form a mini bundle beside the central cluster in 2004 skeleton tree. Shortly after, the arrival of popular child paper, CPWS, largely developed this tiny bundle and turned it into the big aggregation under the central cluster (Fig. S20). Now we closely examine the heat distribution within the topic (Fig. S21). After 19 years of development, the heat diffusion is nearly completed as most popular child papers have a knowledge temperature above average and the child papers published during the flourishing period are relatively hot in general (Fig. 5(g)). Half of the most popular child papers serve as heat sources and node knowledge temperature decreases globally as the articles are located farther away from them. This corresponds with the general rule "the older the hotter". Yet as several papers published at the same time as the pioneering work either have had few development or have not been cited by any recent works, they are the coldest and thus bring down the average knowledge of the oldest articles. In addition, the blue nodes that surround the pioneering work and popular child papers are papers with few or without any in-topic followers. However, we still find exceptions even if we let alone the oldest papers.  yet the MMEPA is a green node. This is mainly due to their relatively different research focus as most of their in-topic citations do not overlap with one another. Another reason may be that even though MMEPA has inspired much more child papers, few of them have achieved remarkable development, hence their limited boosting effect. In addition, this counter example also suggests that the general rule "the more influential the hotter" is very weak in this topic ( Fig. S49(g)).

S3.2.5 A neural probabilistic language model
Unlike many topics that welcome the majority of their popular child papers shortly after their birth, this topic waited for a long time. Most of its prominent child papers came during 2010 and 2014. Their arrival opened up new research sub-fields (Fig.  S24) and infused much vigor and new knowledge to the topic, which strongly boosted T t growth during 2011 and 2015 (Fig. S22). Although the topic continued to grow fast after 2015, few child papers stood out and none has created new research focus so far. As a result, the knowledge accumulation process is affected by the overall quality slump and the topic started to cool down owing to the lack of new outstanding ideas. In terms of knowledge structure evolution, the topic manifests a smooth and steady progress (Fig. S23). Since the arrival of popular child papers is quite evenly spanned over 2010 and 2014, their contribution to the thriving is more reflected as knowledge and impact accumulation than a short-term popularity gain. To conclude, after a recent boom thanks to its popular child papers, the topic is now going downhill.
The skeleton tree is a bit special because it is made up of 2 parts. This is due to the separation of paper 'Connectionist language modeling for large vocabulary continuous speech recognition' (CLM) from the pioneering work, the only citation CLM has within the topic. In fact, CLM was published a bit earlier than the pioneering work, therefore its relation with the pioneering work may not be tight. This results in the edge cutting during skeleton tree extraction. CLM later inspired 'Efficient training of large neural networks for language modeling', whose work turned out to have a greater influence on the aforementioned popular child papers than that of the pioneering work. That is why skeleton tree finally takes a separated form. Now we closely examine the current heat distribution with its latest skeleton tree (Fig. S24). The pioneering work remains the only heat source in the topic and almost all of the most popular child papers have a knowledge temperature below average. Although they have indeed vitalized the topic, more importantly they themselves have proposed novel ideas that made them overshadow the pioneering work and become the new authorities in the domain (Fig. S24 galaxy map). The relatively loose connection to the core topic idea has resulted in their low knowledge temperature. Their "coolness" is also the reason that the cluster they are in is much colder than the one led by the pioneering work. Overall, we observe the general rule "the older the hotter" (Fig. 5(h)). The blue nodes that surround the pioneering work and popular child papers are papers with year  Detecting geo-relation phrases from web texts for triplet extraction of geographic knowledge: a contextenhanced method 2019 Table S9. Neural language model: Clustering effect example. First line is the parent paper and the rest children.
few or without any in-topic citations. Node knowledge temperature decrease is clear as we walk down the paths in skeleton tree. However, there are exceptions. Hit paper 'A unified architecture for natural language processing: deep neural networks with multitask learning' (UANLP) published in 2008 is colder than, for instance, its well-developed child 'Large Scale Distributed Deep Networks' published in 2012 and 'Parsing Natural Scenes and Natural Language with Recursive Neural Networks' published in 2011. These 2 child papers are represented as orange nodes yet the UANLP is a yellow node. Their temperature difference lies mainly in their research focus reflected by their citation patterns. Although these 2 child papers both have a few followers in the latest skeleton tree, they are still less popular than their parent in terms of idea diffusion. This counter example also illustrates that the general rules "the more influential the hotter" is very weak in the topic (Fig. S49(h)).
We observe in addition certain clustering effect in the skeleton tree (Table S9). For example, all child papers of 'Road2Vec: Measuring Traffic Interactions in Urban Road System from Massive Travel Routes' have a research interest related to geographic relation. This confirms the effectiveness of our skeleton tree extraction algorithm. In addition, this small bundle is very younger, hence their research interest may be among the latest trends. . Neural language model: Galaxy map, current skeleton tree and its regional zoom. Papers with more than 600 in-topic citations are labelled by title in the skeleton tree. Except the pioneering work, corresponding nodes' size is amplified by 3 times.

S3.2.6 A unified architecture for natural language processing: deep neural networks with multitask learning
As is shown by T t growth and T t , the topic continuously gained fame between 2009 and 2015 (Fig. S25). Almost all of its most influential child papers were published during this period. After that, despite a steady size growth, the topic has gradually cooled down. This is because the majority of prominent child papers, namely 'Efficient Estimation of Word Representations in Vector Space' (EEWRVS), 'Distributed Representations of Words and Phrases and their Compositionality' (DRWPC) and 'Word Representations: A Simple and General Method for Semi-Supervised Learning' (WRSSL), were published no later than 2013. They brought large amounts of new knowledge and, more importantly, attracted much immediate attention after their publication. By the end of 2015, these child papers, having collected a fair share of in-topic citations, had already become crucial members of the topic. Together with the pioneering work, they shaped topic knowledge (Fig. S26(d)). Child papers published no earlier than 2016 enriched the ideas proposed by the aforementioned popular child papers (Fig. S26(e,f)). Very few have had a significant subsequent development even though the topic has succeeded in attracting a stable stream of recent attention. Therefore, the enrichment of knowledge base has slowed down and thus the knowledge temperature has slightly dropped. To sum up, the topic demonstrates a rise-then-fall dynamics.
The skeleton tree of this topic manifests a gradual structural advancement in line with a constantly small T t structure (Fig. S26). Its popular child papers have unanimously dedicated themselves to one single research sub-direction, which is portrayed by the steadily-growing big branch (Fig. S27).
year  Now we closely examine the internal heat distribution and its latest skeleton tree (Fig. S27). The pioneering work is the only heat source. Interestingly, half of the most popular child papers have a knowledge temperature below average. In fact, they all cited another popular child paper, WRSSL. In terms of idea inheritance, they are less close to the pioneering work than WRSSL. A bigger portion of original idea has caused their relatively low knowledge temperature. We see a clear node knowledge temperature decline from the root to leaves. This corresponds with the general rule "the older the hotter" (Fig. 5(i)). As the topic contains 2 articles published earlier than the pioneering work and they have few in-topic citations, the average node knowledge temperature for the oldest papers is not maximal. In addition, the blue nodes that surround the pioneering work and the most popular child papers are papers with few or without any in-topic citations. However, even if we set aside the oldest (c) Skeleton tree until 2013  Table S10. A unified architecture for NLP: Clustering effect example. First line is the parent paper and the rest children.
papers and the aforementioned coldest papers, the general rule is violated. Hit paper 'Learning Deep Architectures for AI' (LDAAI) published in 2009 is colder than, for instance, its child papers '3D Convolutional Neural Networks for Human Action Recognition' published in 2013 and 'Learning structured embeddings of knowledge bases' published in 2011. These 2 child papers are represented as orange nodes yet LDAAI is coloured yellow. This is mainly due to their relatively different research focus as their in-topic citations do not overlap with one another. Similarly, popular child paper EEWRVS is slightly colder than its descendant, DRWPC. These counter examples also illustrate that the general rule "the more influential the hotter" is very weak in this topic ( Fig. S49(i)).
We observe in addition certain clustering effect in the skeleton tree (Table S10). For example, all child papers of 'Throughput-Optimized OpenCL-based FPGA Accelerator for Large-Scale Convolutional Neural Networks' have a research interest towards accelerator. This confirms the effectiveness of our skeleton tree extraction algorithm.

S3.2.7 Bose-Einstein condensation in a gas of sodium atoms
Founded in 1995, this topic thrived for some 20 years before starting to stagnate since 2013 (Fig. S28). The relay among these popular child papers maintained the topic's flourishing for 20 years. In addition, the topic was most prolific between 2010 and 2012, with annual publication number all exceeding 5% of current topic size. The increasing inflow of knowledge, together with the exposure brought by the aforementioned popular child papers, contributed to a slightly bigger climb in T t and T t growth between 2011 and 2013. After that, the topic has not so far welcomed any superstars that have incited remarkable development. Yet it still has a rather stable knowledge accumulation judging from basic statistics. Hence overall T t growth ceased to go up and so is T t .
T t structure is higher in early days, which corresponds with a multi-dimensional growth in skeleton tree thanks to influential child papers published around 2000 (Fig. S29). After 2013, skeleton tree has fixed its structure. We observe few visible changes in skeleton tree, namely some development in the research direction jointly led by popular child papers BECEP and BECPOM and a new small research branch deriving from the school of thought led by child papers 'Second-Order Corrections to Mean Field Evolution of Weakly Interacting Bosons. I.' published in 2010 and its rather successful descendant 'Derivation of the Cubic NLS and Gross-Pitaevskii Hierarchy from Manybody Dynamics in d = 3 Based on Spacetime Norms' published in 2014. Now we closely examine its internal heat distribution together with its latest skeleton tree (Fig. S30). After more than 20 years of development, the heat has fully propagated to recent research directions led by popular child papers. Popular child papers are among the hottest articles and the child papers published during the flourishing period are relatively hot in general (Fig. 5(j)). The knowledge temperature decrease from cores to ends is clear. This corresponds with the general rule "the older the hotter". The blue nodes that surround the pioneering work and popular child papers in main clusters are papers with few or without any in-topic citations. However, there are exceptions. Paper 'A gapless theory of Bose-Einstein condensation in dilute gases at finite temperature' published in 2000 is colder than its child paper 'Theory of the weakly interacting Bose gas' (TWIBS) published in 2004. TWIBS is also slightly colder than its direct child paper in current skeleton tree 'Weakly-Interacting Bosons in a Trap within Approximate Second Quantization Approach' (WIBTASQ) published in 2007. This is mainly due to their relatively different research focus as most of their in-topic citations do not overlap with one another. As WIBTASQ is the least developed among the three in terms of citations, this counter examples also illustrates that the general rule "the more influential the hotter" is weak (Fig. S49(j)).
year   Figure S30. Bose-Einstein condensation: Galaxy map, current skeleton tree and its regional zoom. Papers with more than 150 in-topic citations are labelled by title in the skeleton tree. Except the pioneering work, corresponding nodes' size is amplified by 3 times.
title year Comparative analysis of electric field influence on the quantum wells with different boundary conditions: II. Thermodynamic properties 2015 Theory of the Robin quantum wall in a linear potential. II. Thermodynamic properties 2016 Comparative analysis of electric field influence on the quantum wells with different boundary conditions.: I. Energy spectrum, quantum information entropy and polarization 2015 Thermodynamic Properties of the 1D Robin Quantum Well 2018 Table S11. Bose-Einstein condensation: Clustering effect example. First line is the parent paper and the rest children.
We find the knowledge temperature evolution of child paper BECEP particularly interesting. Despite topic's stagnation starting from around 2013 and 2014, its knowledge temperature has been constantly on the rise since its publication, from 60.4 in 2006 to 83.5 in 2020. Its rising temperature demonstrates its above-average recent development compared to the entire topic.
We observe in addition certain clustering effect in the skeleton tree (Table S11). For example, all child papers of 'Comparative analysis of electric field influence on the quantum wells with different boundary conditions: II. Thermodynamic properties' have a research interest towards thermodynamics. This confirms the effectiveness of our skeleton tree extraction algorithm.

S3.3.1 Long short-term memory
After a boom right after its birth, the topic hibernated for as long as 10 years before having an explosive growth. As is shown by the basic statistics, the topic's expansion in the first 15 years is much slower than recently. Apart from publication quantity difference, we also observe an obvious discrepancy in article's contribution to topic's flourishing. Few child papers turned out to be popular among topic members. Child paper 'Learning to Forget: Continual Prediction with LSTM' (LFCP) published in 2000 is the only superstar the topic had for a long time. It successfully extended the pioneering work's idea and founded a new research focus, represented by the branch pointing to the bottom-left in skeleton tree ( Fig. S32(b,c,d)). Although the research branch seemed small by 2001, it already meant something compared to the then topic size. The evolution in knowledge structure led to a high T t structure . The remaining popular child papers, namely 2 published in 2003, 'Kalman filters improve LSTM network performance in problems unsolvable by traditional recurrent nets' and 'Learning precise timing with lstm recurrent networks', arriving later unanimously focused on LFCP's idea. Together they contributed to the maturation of this new sub-field and maintained partly the heat-level of the entire topic. The situation changed after 2010. The artificial intelligence frenzy pulled the topic under the spotlight. Thanks to the favorable background, the topic welcomed numerous popular child papers during 2013 and 2016, for instance, 'Sequence to Sequence Learning with Neural Networks' (S2SNN) ,'Neural Machine Translation by Jointly Learning to Align and Translate' (NMTAT) and 'Deep Residual Learning for Image Recognition' (DRLIR). While inheriting the essence of LFCP, they brought alone considerable amount of new knowledge, introduced new sub-topics and produced the renaissance of this old topic (Fig. S33, S32(d,e,f)). Consequently, we see a slightly higher T t structure around 2015 owing to the knowledge structure enrichment and a soar in T t starting from 2017. The long interval between the birth and the peak of impact and popularity makes us define this research field as an awakened topic.
There is a tiny cluster isolated from the majority of the skeleton tree (Fig. S33 in the top-middle of current skeleton tree). This is because the topic contains several child papers published at the same time or evenly a bit earlier than the pioneering work. Comparatively speaking, their work is not very intimately related to that of the pioneering article. Therefore, altogether with some of their closest descendants, they were disconnected from the pioneering work during the skeleton tree construction. Now we examine the heat distribution within the topic (Fig. S33). The pioneering work remains the only heat source so far. Although this topic has a long history, its flourishing took place a few years ago. It needs more time to have a thorough heat diffusion within the topic. That is why most popular child papers have a node knowledge temperature around or a bit above average. At present, most of the hottest articles are located around the pioneering work the central cluster. The knowledge temperature decline from the core to ends is obvious. This corresponds with the general rule "the older the hotter" (Fig. 5(k)). Note that the blue nodes surrounding the pioneering work and popular child papers in non-trivial clusters are papers with few or without any in-topic citations. The low average temperature for the oldest papers is due to their loose connection to the topic majority as they were published no later than the pioneering work and have had few child papers within the topic. However, even if we let alone these papers, age is not guarantee of a bigger impact and popularity. For instance, 2 popular children papers of LFCP are slightly hotter than itself.   Networks' published in 2011 is also slightly colder than its child, 'Understanding the exploding gradient problem', which was published in 2012. Their temperature difference is mainly owing to their research focus, as is reflected by their distinct citation patterns. These counter examples also illustrate that the general rule "the more influential the hotter" is weak (Fig. S49(k)).
We find the knowledge temperature evolution of LFCP particularly interesting. Its knowledge temperature dropped from 6.53 to 5.08 from 2001 to 2005. The decrease rate is greater than that of topic knowledge temperature. This is because its followers had little development, thus overall the bundle led by Learning to forget had a slower development than the entire topic. Its temperature has been on the rise since 2007. In particular, the increase has greatly accelerated from 2015. We attribute its  (Fig. S33). Their instantaneous popularity has brought learning to forget back to scientists' attention. Recall that these papers also contributed a lot to the knowledge temperature leap of the entire topic starting from 2017.
We observe in addition certain clustering effect in the skeleton tree. For example, almost all child papers of 'Developing a Long Short-Term Memory (LSTM) based model for predicting water table depth in agricultural areas' deal with earth science and agriculture and 'Visual Reasoning with a General Conditioning Layer' leads a handful of articles specialising in visual reasoning (Table S12). We also identify some bundles dealing with energy forecast and financial trading. All these observations confirm the effectiveness of our skeleton tree extraction algorithm. Moreover, these aforementioned bundles were born no earlier than 2018, thus they are also good illustrations of some latest research hotspots in the topic.

S3.3.2 Particle swarm optimization
The topic gained popularity and expanded its impact steadily from its birth until around 2004 largely under the joint efforts of the pioneering work and several well-developed child papers published before 2000, namely 'A modified particle swarm Figure S33. Long short-term memory: Galaxy map, current skeleton tree and its regional zoom. Papers with more than 1000 in-topic citations are labelled by title in the skeleton tree. Except the pioneering work, corresponding nodes' size is amplified by 3 times.
optimizer','Empirical study of particle swarm optimization', and 'Parameter Selection in Particle Swarm Optimization'. It is also these prominent child papers within the topic that lay the foundation of the skeleton tree (Fig. S35). Another 2 influential younger child papers, 'Comparing inertia weights and constriction factors in particle swarm optimization' published in 2000 and 'The particle swarm -explosion, stability, and convergence in a multidimensional complex space' published in 2002, opened up a smaller sub-topic, which is visualized as the smaller major arm that extend from the central cluster. Their arrival ensured topic's thriving in its first 10 years, which is reflected by a rising T t growth and a relatively high T t structure during that period. In comparison, nothing remarkable happened in the following 5 years. Papers published during this period simply extended the established sub-topics. As a result, T t and its components stagnated (Fig. S34). Next, the machine learning wave revitalized the topic. Starting from somewhere between 2010 and 2013, novel research focuses have been derived from the older sub-topics and some of them already had certain development (Fig. S36 (e,f)). This phenomenon is illustrated by the increasingly rich end structure of skeleton tree. In addition, annual publication number reached record high for the year 2014. This trend resulted in T t 's surge shortly after. As the tendency is cooling down now, so is the topic. Overall, this is a topic waken up by the AI booming.
There is a small cold cluster detached from the topic majority (Fig.S36 in the top-right of (f)). This cluster is led by popular child paper 'A new optimizer using particle swarm theory' published in the same year as the pioneering work. Thus the two papers probably have different focus even though they bear resemblance in their ideas. Their divergences cause their separation in the skeleton tree and their distinct knowledge temperatures. The separated skeleton tree also accords with topic's galaxy map representation where it seems to be split into 2 parties (Fig. S35).
year  Now we closely examine the internal heat distribution together with its latest skeleton tree (Fig. S35). After 25 years of development, the heat has already fulled diffused to the entire topic, as most popular child papers that founded recent research focuses have a knowledge temperature above average. They are the topic's heat sources. It is clear that node knowledge temperature decreases globally as the articles are located farther away from multiple research centers. This fits the general rule "the older the hotter" (Fig. 5(l)). Note that the colder average knowledge temperatures among the oldest articles is caused by Figure S35. Particle swarm optim: Galaxy map and current skeleton tree. Papers with more than 1700 in-topic citations are labelled by title in the skeleton tree. Except the pioneering work, corresponding nodes' size is amplified by 5 times.  Table S13. Particle swarm optim: Clustering effect example. First line is the parent paper and the rest children.
the "cold" popular child paper mentioned in the previous paragraph and the relatively independent research branch it leads. This child paper is also responsible for the drastic average temperature plunge in most-cited papers (Fig. S49(l)). Besides, the blue nodes that surround the pioneering work and popular child papers in non-trivial clusters are papers with few or without any in-topic citations. However, the general rule is violated even if we do not consider this "cold" research branch. For example, 'Path planning for mobile robot using the particle swarm optimization with mutation operator' is slightly colder than its child paper 'Classic and Heuristic Approaches in Robot Motion Planning A Chronological Review'. The former is coloured yellow-orange and the latter orange. Their temperature difference is mainly due to their different research focus, which is reflected by their distinct citations. Similarly, paper 'Using neighbourhoods with the guaranteed convergence PSO' is also colder than its child paper 'A guaranteed convergence dynamic double particle swarm optimizer'. The former is coloured orange and the latter orange-red. These counter examples illustrate that the general rule "the older the hotter" is not robust.
We observe in addition certain clustering effect in the skeleton tree. For example, almost all child papers of 'A self-generating fuzzy system with ant and particle swarm cooperative optimization' deal with fuzzy rule (Table S13). This confirms the effectiveness of our skeleton tree extraction algorithm.

S3.4.1 On random graphs, I
As is shown by T t and T t growth , the impact and popularity evolution of this topic is a bit complicated (Fig. S37). The publication of popular child paper 'On the evolution of random graphs' (OERG) in 1984 brought the first boom in the 1980s. This article combined its ancestors' ideas and successfully fused the previously separated parts in skeleton tree due to an atypical citation from an older article 'On the existence of a factor of degree one of a connected random graph' (Fig. S38(b,c)). This merge is the first significant evolution in knowledge structure and thus led to a spike in T t structure . Afterwards, the topic went relatively silent in the 1990s before a group of popular child papers came during 2001 and 2003. Among these articles, 'Random graphs with arbitrary degree distributions and their applications' published in 2001 non-trivially furthered the study of OERG and introduced a new research focus into the topic, as is illustrated by the emergence of a third cluster in the skeleton tree ( Fig.  S38(f,g)). Its followers and popular child papers, 'Evolution of networks' published in 2002 and 'The Structure and Function of Complex Networks' published in 2003 extended its idea and created several new research sub-fields. That is why we observe some splits derived from the young cluster ( Fig. S38(g)). They successfully attracted a lot of attention in a short time and the topic has witnessed an accelerated expansion since around 2000. Together with their contribution to the topic knowledge pattern, this topic experienced another booming around 2010. Later, the topic kept its activity thanks to several young promising papers including 'Measurement and analysis of online social networks' published in 2007, 'Community detection in graphs' published in 2010 and 'Catastrophic cascade of failures in interdependent networks' published in 2010. Although they opened up several new research orientations, there have not been a substantial subsequent development and the branches leading by them remain small in comparison to the principal clusters ( Fig. S38(h,f)). Consequently, they have mostly helped maintain the topic's visibility and its stable impact. Now we closely examine the internal heat distribution together with its latest skeleton tree (Fig. S39). The topic has a long development history. In each period, new research focuses emerged (Fig. S38 every line shows a period). Today, we see 3 major research focuses and their founders are all the heat sources. As the articles are located farther away from the pioneering paper or the sub-topic centers, their node knowledge temperature decreases globally. The blue nodes that surround the pioneering work and popular child papers in main clusters are papers with few or without any in-topic citations. Generally speaking, older papers are hotter than the younger (Fig. 5(m)). In comparison with other scientific topics, knowledge temperature fluctuates   more among the "middle-aged" papers. This phenomenon is in line with the up and downs the topic experienced during their publication period. Besides, we also observe a general rule "the more influential the hotter" in the topic (Fig. S49(m)) as the most-cited child papers are among the hottest articles. However, this rule is only robust for the most eminent child papers.
We observe in addition certain clustering effect in the skeleton tree (Table S14). For example, all child papers of 'False Beliefs in Unreliable Knowledge Networks' probe into knowledge network. This confirms the effectiveness of our skeleton tree extraction algorithm. Moreover, the small group was born in 2017, suggesting that their research focus, knowledge network, may be one of the latest hotspots within the topic.

S3.4.2 Collective dynamics of 'small-world' networks
As is shown by T t , although the topic is heating up thanks to a robust knowledge accumulation, it has experienced multiple up and downs during the past 20 years due to short-term popularity fluctuations (Fig. S40). This topic has welcome 2 waves of popular child papers, the first coming between its birth  S41(c), S42). Their substantial contribution to the knowledge quantity and diversity led to a fast rise in both T t growth and T t structure . As a result, the topic reached the first peak around 2007. For the following years, the short-term exposure increase brought by these eminent child papers gradually wore off and few child papers emerged as rising stars. The topic development during this period was primarily a fortification of its existing knowledge architecture. That is why the topic slightly cooled down during 2007 and 2010 despite a robust topic expansion and an on-going useful information accumulation. It was also during this down period when the younger popular child papers were published. Some of them, including 'Complex brain networks: graph theoretical analysis of structural and functional systems' published in 2009 and 'Complex network measures of brain connectivity: Uses and interpretations' published in 2010, introduced new research sub-fields closely related to the idea of the pioneering work. They both formed a non-trivial branch extending directly out of the central cluster ( Fig. S41(e,f)). Others continued to enrich the existing research fields created by former eminent child papers. For example, 'Emergence of Scaling in Random Networks' demonstrated an exceptional capability to attract substantially more subsequent works even after 10 years of its publication thanks to the explosive growth of social networks. The new knowledge extension and the lasting refinement of the entire knowledge framework are portrayed by a flourishing topic skeleton tree with multidimensional development and a steadily rising T t until 2016, a year when the topic hit the second peak. While the first golden age is essentially owing to a rapid internal growth, the second streak is largely propelled by favorable social trends, especially the prevalence of online social network and the popularization of brain or neuroscience. Recently, the short-term focus benefit has been dying out and no remarkable progress have been matured enough to cause a stir. Thus the topic is now seeing a small slip. Now we closely examine the internal heat distribution together with its latest skeleton tree (Fig. S42). All popular child papers have a knowledge temperature above average. This shows that the heat diffusion within the topic is completed after over 20 years of development. Most research focuses derived from the original ideas of the pioneering work have had some substantial development. The ensemble makes up the majority of heat sources within the topic. Besides, we also spot few atypical heat sources. They are articles that connect non-trivial research directions in the skeleton tree. For example, paper 'Combatting maelstroms in networks of communicating agents' published in 1999 connects the entire left research branch and the central cluster led by the pioneering work. It does not have any direct followers on skeleton tree, but it is the hottest node and its big structure entropy suggests that it is important to the entire knowledge framework. Its value lies exclusively in the enlightenment. As the articles are located farther away from these heat sources, their node knowledge temperature decreases. This accords with the general rule "the older the hotter" (Fig. 5(n)). Note that the average temperature for the oldest papers is not the highest. This is due to the presence of 3 "cold" articles published in the same year as the pioneering   Table S15. small-world: Clustering effect example. First line is the parent paper and the rest children.
work. They either hardly inspired any subsequent works or failed to attract the attention of recent researches. Besides, the blue nodes that surround the pioneering work and the most popular child papers in principal clusters in the current skeleton tree are papers with little or no in-topic development. However, the general rule is violated even if we let alone the oldest articles. For example, paper ESRN is slightly colder than its child papers, 'The large-scale organization of metabolic networks.' published in 2000 in Nature and 'Classes of small-world networks' published in 2000. Both are coloured red while ESRN is coloured orange-red. The temperature difference is mainly due to their different research focus, as is reflected by their distinct citations. The counter example also illustrates that the general rule "the more influential the hotter" is weak (Fig. S49(n)). Last but not the least, we find that most articles published in top journals such as Science and Nature have high knowledge temperatures and numerous citations. This accords with the prior study which points out the boosting effect of renowned journals on articles 30 .
We observe in addition certain clustering effect in the skeleton tree (Table S15). This confirms the effectiveness of our skeleton tree extraction algorithm. Moreover, these newly-formed small groups are very young, suggesting that their research focus may be among the latest hotspots within the topic.

S3.4.3 Latent dirichlet allocation
As is shown by T t , the impact and popularity evolution of the topic fluctuates. After reaching the first peak around 2010, this field cooled down for a while before it became trendy again around 2019 (Fig. S43). In the long run, the topic has an increasing impact. The rise-and-fall pattern is largely due to the short-term popularity fluctuations, as is demonstrated by the variation of T t structure . In its first 10 years, the topic developed 3 principal research sub-fields, as is illustrated by the skeleton tree ( Fig.  S44 (a,b,c)). The advancement is largely owing to the the arrival of several influential child papers within the topic around 2005 and 2006: 'A Bayesian hierarchical model for learning natural scene categories', 'Hierarchical Dirichlet Processes' and 'Dynamic topic models' (Fig. S45). They increased the exposure of this topic, facilitated a rapid knowledge accumulation and enriched greatly the knowledge structure. Consequently, the topic had its first golden period. Afterwards, the sweeping trend of machine learning helped the topic gain more attention and fame. A new wave of popular papers joining between 2009 and 2012 gradually manifested their attractiveness, namely 'Labeled LDA: A supervised topic model for credit attribution in multi-labeled corpora','Reading Tea Leaves: How Humans Interpret Topic Models' and 'Probabilistic topic models'. They extended the former research focuses and provided inspiration for novel, promising ideas. This is captured by the increasingly complex major branches in skeleton tree (Fig. S44 (e,f)). In particular, this wave brought a large amount of attention immediately to the topic and created a second glory. Now we closely examine the internal heat distribution together with its latest skeleton tree (Fig. S45). After over 20 years of development, the original and recent research ideas have all had a rich development. The heat is therefore diffused to every Figure S42. small-world: Galaxy map, current skeleton tree and its regional zoom. Papers with more than 2000 in-topic citations are labelled by title in the skeleton tree. Except the pioneering work, corresponding nodes' size is amplified by 6 times.  Figure S43. LDA: topic statistics and knowledge temperature evolution corner of the skeleton tree with the help of popular child papers. Apart from multiple heat sources in the core of research branches, we also identify some hottest articles between principal clusters. For example, paper 'Variational extensions to EM and multinomial PCA' published in 2002 connects the entire right branch and the central cluster. It does not have many direct followers within the topic, but it is the hottest node and it has a big structure entropy due to its knowledge bridging value. As the articles are located farther away from these "hit" papers, their node knowledge temperature decreases. This accords with the general rule "the older the hotter" (Fig. 5(o)). The blue nodes that surround the pioneering work and popular child papers in central parts are papers with few or without any in-topic followers. However, there are exceptions. Paper 'You Are What You Tweet: Analyzing Twitter for Public Health' (YWTPH) published in 1998 is colder than its child papers, 'Using Twitter for breast cancer prevention: an analysis of breast cancer awareness month' published in 2013 and 'Global Disease Monitoring and Forecasting with Wikipedia' published in 2014. The latter two are coloured in orange-red while YWTPH is coloured in yellow-green. Their temperature difference lies primarily in their different research focus reflected by their distinct in-topic citations. This counter example also suggests that another general rule "the more influential the hotter" is not robust (Fig. S49(o)).
We observe in addition certain clustering effect in the skeleton tree (Table S16). This confirms the effectiveness of our skeleton tree extraction algorithm. Moreover, these mini-groups are very young, suggesting that their research focus may be among the latest hotspots within the topic.

S3.4.4 A FUNDAMENTAL RELATION BETWEEN SUPERMASSIVE BLACK HOLES AND THEIR HOST GALAXIES
The knowledge temperature evolution of this topic is quite unique. Not only T t manifests multiple local peaks every 6 years, but more importantly it is T t structure that dominates the ups and downs of T t (Fig. S46). As for T t growth , its increase in the early days is due to the continual arrival of popular child papers within the topic until 2006. They brought a steady inflow of new knowledge that enriched the topic content. Almost all the popular papers published after 2008 have not so far achieved a comparable development.
The skeleton tree of this topic is also very special in that there are much fewer child papers surrounding the pioneering work,  Table S16. Clustering effect example. First line is the parent paper and the rest children. Figure S45. LDA: Galaxy map, current skeleton tree and its regional zoom. Papers with more than 700 in-topic citations are labelled by title in the skeleton tree. Except the pioneering work, corresponding nodes' size is amplified by 5 times.   Figure S48. BLACK HOLES: Galaxy map, current skeleton tree and its regional zoom. Papers with more than 340 in-topic citations are labelled by title in the skeleton tree. Except the pioneering work, corresponding nodes' size is amplified by 5 times.  Figure S49. Relation between article in-topic citation and knowledge temperature. Grey dotted horizontal line marks the topic knowledge temperature in 2020. Articles with no citation and the pioneering work are excluded.

S3.5 Topic Group
A topic group is an ensemble of several closely-related topics. During a certain period, topics in a group can manifest distinct popularity and impact changes. Some may prosper while others stagnate or go downhill. When this is the case, our forest helping mechanism allows thriving topics to donate a small fraction of their vigor to their dying siblings. The heat exchange among topic group members somehow takes "background popularity and impact" into consideration. After forest helping, the knowledge temperatures of closely related topics have a more similar evolution and correspond better to idea inheritance and development.

S3.5.1 wireless network group
The skeleton tree of topic led by 'Critical Power for Asymptotic Connectivity in Wireless Networks' (CPACWN) reveals an indisputably intimate relation between the itself and the topic led by 'The capacity of wireless networks' (CWN) (Fig.  S12). Being the most prominent child paper of CPACWN, CWN substantially extended CPACWN's ideas and founded a new research focus. Its crucial role in topic's prosperity is also reflected by its high popularity and influence within the topic: it jointly inspired one third of the topic members, most of which were published during the flourishing period. Their similar knowledge temperature evolution also confirms their closeness. During forest helping, CPACWN's topic donated some of its heat to CWN's topic in early days. This behavior models the promotion effect brought by CPACWN's increasing impact and popularity. However, this did not help CWN's topic much because it had already a much bigger size. After the adjustment, their knowledge temperature evolution is more similar than before. Both topics were hottest in 2007 and 2008 (Fig. S50). This corresponds better with their individual development and inherent connection. In fact, CMN achieved such a huge success that it took over its predecessor to be the new authority in their domain in just a few years. The dominating size of CWN's topic clearly makes it a better representative of background popularity and impact, which usually has a big influence on similar smaller topics. Therefore, the destiny of CPACWN's topic is to some extent determined by the development of CMN's topic.
The rise-and-fall OF CWN's topic is thus an indicator of CPACWN's topic's flourishing. Figure S50. wireless network group: knowledge temperature evolution before and after forest helping

S3.5.2 RNN gated unit group
'Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling' (GRU) introduced a new research focus and made non-trivial contribution to the recent thriving of topic led by 'Long short-term memory' (LSTM) (Fig. S33). In fact, nearly half of the papers that cite GRU also cite LSTM. Over the past 3 years, LSTM's topic has had a substantial development and a fast-growing impact and popularity thanks to a large number of new publications. In comparison, GRU's topic has shown signs of stagnation shortly after its initial glory. Today, the phenomenal size of LSTM's topic qualifies LSTM's authority claim in the domain. As a result, the prosperity of LSTM's topic is a nice representative of background popularity and impact, which usually has a big influence on similar smaller topics. While GRU helped with the flourishing of LSTM's topic in its early days, it is now LSTM's topic's turn to help maintain the heat-level of GRU's topic (Fig. S51). A soaring background popularity and impact is favorable for GRU's topic future development, at least in a short term. For this topic group, the forest helping is just like the mechanism that we observe in the real nature: mother tree shares nutrients with its child trees so as to give them a better chance of survival. Figure S51. RNN gated unit group: knowledge temperature evolution before and after forest helping

S3.5.3 word embedding group
'Efficient Estimation of Word Representations in Vector Space' (EEWRVS) is the most influential child paper in both topics respectively led by 'A neural probabilistic language model' (NPLM) and 'A unified architecture for natural language processing: deep neural networks with multitask learning' (UANLP). Furthermore, EEWRVS's topic is more than twice the size of NPLM's and UANLP's. EEWRVS has outperformed its parents and has established authority in this research field. The considerable size of EEWRVS's topic makes it a nice representation of background popularity and impact, which has an influence on smaller topics within the research field. Owing to its close relationship with NPLM's topic and UANLP's topic, the booming of EEWRVS's topic more or less increases their visibility and attracts research attention. Through forest helping, the "energy" from EEWRVS's topic slows down the perishing of NPLM's topic and UANLP's topic (Fig. S52). The heat exchange models the boosting effect of the background, a bigger research field where the 3 belong to. Figure S52. word embedding group: knowledge temperature evolution before and after forest helping