Emergent invariance and scaling properties in the collective return dynamics of a stock market

A key metric to determine the performance of a stock in a market is its return over different investment horizons (τ). Several works have observed heavy-tailed behavior in the distributions of returns in different markets, which are observable indicators of underlying complex dynamics. Such prior works study return distributions that are marginalized across the individual stocks in the market, and do not track statistics about the joint distributions of returns conditioned on different stocks, which would be useful for optimizing inter-stock asset allocation strategies. As a step towards this goal, we study emergent phenomena in the distributions of returns as captured by their pairwise correlations. In particular, we consider the pairwise (between stocks i, j) partial correlations of returns with respect to the market mode, ci,j(τ), (thus, correcting for the baseline return behavior of the market), over different time horizons (τ), and discover two novel emergent phenomena: (i) the standardized distributions of the ci,j(τ)’s are observed to be invariant of τ ranging from from 1000min (2.5 days) to 30000min (2.5 months); (ii) the scaling of the standard deviation of ci,j(τ)’s with τ admits good fits to simple model classes such as a power-law τ−λ or stretched exponential function e-τβ (λ, β > 0). Moreover, the parameters governing these fits provide a summary view of market health: for instance, in years marked by unprecedented financial crises—for example 2008 and 2020—values of λ (scaling exponent) are substantially lower. Finally, we demonstrate that the observed emergent behavior cannot be adequately supported by existing generative frameworks such as single- and multi-factor models. We introduce a promising agent-based Vicsek model that closes this gap.


Background
Stock prices demonstrate considerable volatility, a result of several confounding factors such as traders' collaborative and competitive decision-making to buy, hold or sell, differing appetites for risk, and various time horizons for expected returns on investment [1][2][3][4][5][6][7].A considerable body of literature has focused on identifying patterns in price fluctuations and on developing succinct dynamical models that display similar characteristics as a real market.Financial experts have, for instance, frequently observed seasonality patterns in individual stock prices and the fractal nature of price fluctuations [8,9].In contrast, macroscopic patterns that concern the ensemble of stocks would emerge because of correlated dynamics in investment decisions, and would reflect inter-stock asset allocation strategies used by investors.The abundant literature in swarming and "econophysics" [4,10] provide a framework for both numerical analysis of such joint price data and for formulating theoretical generative models.January  Conventionally, investors and economists compute the return [11]: the return over an investment horizon of τ is defined as the equivalent compounded interest rate if one bought a stock at time t and sold it at time t + τ , and estimated as r(t; τ ) ∶= ln(p(t + τ )) − ln(p(t)), where p(t) and p(t + τ ) are the stock prices at time t and t + τ , respectively.Return sequences are a stationary measure (a parameter constant over the interval τ ) of price change characterized by several statistical properties observed in empirical evidence across markets (often referred to as stylized facts [12]).
Prior research has explored the statistical properties of these returns as a means to characterize the dynamics of the market.Some [11,13] have identified linear relationships in the log-log scale on the distributions of the returns.Plerou et al. [14] discover power law fits [15,16] on (i) the cumulative distribution function (CDF) of the return distribution and (ii) the standard deviation of the return distribution as a function of market capitalization.Similarly, Müller et al. [17] demonstrated evidence of a scaling regime governing the mean of returns with respect to the return horizon (for τ ≤ 20 seconds).Return distributions and their properties for longer horizon τ beyond high frequency trading time scales have not been studied.Moreover, such prior works only study return distributions that are marginalized across the individual stocks in the market.For example, in [11,13], they fix a τ and then compute r i (τ ) for all stocks i in a market and then estimate the distribution of this set of r(τ )'s; thus marginalizing over all the stocks, and the Complementary CDFs (CCDFs) for our dataset is presented in Fig. 15 of the supplementary material and for large enough τ the tails can be fit with that of power-law distributions.Similarly, in [14] they compute the standard deviation, σ i (τ ), of returns for a fixed τ and for all stocks with market capitalization S i and then they show that log(σ i (τ ) scales linearly with log(S i ); again, the returns are marginalized over all the stocks with the same capitalization.None of these works track statistics about the joint distributions of returns conditioned on different stocks, which would be useful for optimizing inter-stock asset allocation strategies.
To address such issues, other methods have attempted to model the inter-stock return correlations in order to compare stocks' relative performances over time [4].These correlations are particularly useful to construct graphical models of the market, in this case, a fully connected network, where the stocks are nodes and the pair-wise correlations correspond to the inter-stock edge weights.Structures are distilled from within the network representation by adopting various graph theory algorithms [18][19][20][21].For example, one can compute the Minimum Spanning Tree (MST), which, for certain return horizons, exhibit a local aggregation of communities of stocks (nodes), such that each community is shared by stocks belonging to market sector [11,22].Recent studies have attempted to refine this method of identifying clusters by calculating the normalized partial correlations in relation to the market mode [23].With fixed τ > τ 0where the MST structures are obtained -these partial correlation scores have been observed to contain enough inter-stock information to facilitate agglomerative clustering of the Korean Stock Exchange (KOSPI) that aligned with GICS sectors, while network modeling of the partial correlations computed using daily returns have helped uncover specific stocks that are influential in driving the return patterns in a subset of high-capitalization NYSE stocks [24].

Our Contributions
Such MST-based studies (see Fig. 8) provide only a limited visual representation of the underlying return correlation distribution and its dependence on τ .We extend such MST-based analysis of the correlation statistics to the study of its density function, governing the return correlations.In contrast to existing work discussed earlier, we are interested in: (a) the distributions of market-mode adjusted partial correlations computed at both short (1 minute) and long investment horizons (50000 minutes); and (b) the distribution properties as a function of the investment horizon τ .As our first main contribution, we find that for a significant range of τ (varying from approximately 2.5 days to around 2.5 months), the standardized distributions -scaled by the standard deviation σ(τ ) and zero-shifted by mean µ(τ )-of the market-mode adjusted partial return correlations are invariant of τ .
The above distribution invariance results suggest that both the standard deviation σ(τ ) and the mean µ(τ ) of the pairwise partial correlations would be a function of return horizon τ in the regime where such invariance is observed.We find that µ(τ ) has no significant scaling behavior with respect to τ (see Fig. 16 in the Supplementary Material).As our second main contribution, we find that the standard deviations of the partial return correlations do indeed scale as a function of τ in the distribution invariance regime, and demonstrate convincing fits via either a power law or a stretched exponential function.The critical model parameters -the scaling exponent in the case of the power law (λ), and the stretching parameter (β) in the stretched exponential function -appear to be rich indicators of macroeconomic volatility patterns.The distribution invariance as well as the scaling of σ(τ ) are observed to hold for 1000min ≤ τ ≤ 30000min.Evidence spans 17 years of real S&P500 stock price data, sampled every minute.Data can be accessed for research purpose at the Wharton Research Data Services (WRDS)1 and the code repository2 is linked.
Finally, we explore if these numerically observed emergent behavior properties can be replicated by an agent-based generative model.We first reexamine the single-and multi-factor generative models, popular generative frameworks used to model consensus behavior in financial markets [25].These models for the most part fail to replicate the above-mentioned emergent trends -the invariant standardized histograms and the power-law/stretched-exponential fits with respect to τ : (a) The single-factor model fails to reproduce the vine MST structure; (b) Multi-factor models, while generating the vine, fail to produce both the distribution-invariance and the scaling phenomena.As our third main contribution, we introduce an alternate framework, a modified Vicsek model -commonly used to describe the dynamics of active matter -that proves to be a much more promising candidate for reproducing the empirical evidence.In these approaches, the stock market would be modeled as a closed environment, where individual stocks behave as agents in a vector space that influence each other.At each time-step, the position of an agent corresponds to a particular stock's instantaneous market behavior.Agents that exhibit correlated behavior over multiple time steps cluster together as swarms.

Correlations of returns and partial correlation with respect to market mode
Suppose a market has N stocks; in the S&P500, N ≈ 500.Let us denote, by p i (t), the price of stock i at time t for i = 1, 2, . . ., N and t ini ≤ t ≤ t fin .Typical macroeconomic market analyses such as Year-over-Year (YoY) gain, annualized returns and GDP growth, cap T int ∶= t fin − t ini to 1 year from January 1st to December 31 to avoid seasonality patterns and resulting artifacts in the correlations.We similarly consider each calendar year separately and the evidence of scaling is thus presented individually for each of the 17 years (2004 − 2020).
We sample each of the stock prices at a granularity of 1−minute.Let the price sequence of a stock i be p i .For T int = 1 year, there are ∼ 98000 values per sequence.We compute the effective return or interest rate r i (t; τ ) of the stock i at time t over a time horizon of τ (τ ≤ T int ), a preferred first-order metric for investing than the absolute price.An investment in the i th stock at time t (say, a sum of mp i (t) by purchasing m units) when compounded continuously at the given rate would yield the same amount as that which would be obtained by selling the stock at time (t + τ ) (i.e., mp i (t + τ )).

Quantitatively, mp
= mp i (t)e ri(t;τ ) .Thus, we get: for t ini ≤ t ≤ t fin − τ .Therefore, an investment horizon of τ yields a return sequence of length, T int − τ + 1.Note that for the longest considered τ = 30000 min, the return sequence for each stock still contains a significant number of return values (∼ 68000).
After computing the return sequences of all stocks, we can find the market-mode return sequence as: We denote the time average of r i (⋅; τ ) and r i (⋅; τ )r j (⋅; τ ) for i, j = 0, 1, 2, . . ., N by Now we are ready to define the conventional correlation.Between any pair of stocks i, j = 1, 2, . . ., N : where Note: In an optimized market, the cross-correlation between r i (t) and r j (t + ∆) for non-zero ∆ can be written by replacing the right-hand side of Eq. ( 4) by: ∫ However, these correlation should equal 0; otherwise investors would use one return series to predict another stock's return (recall Stylized Fact I [12]).
Next we introduce the concept of partial correlation between stocks i and j with respect to the market mode [26]: let ri (t; τ ) be the residuals while predicting r i (t; τ ) with respect to r 0 (t; τ ) using a linear fit.Then the correlation between these residuals associated to stocks i and j is the partial correlation and is given by, where ρ i,j (τ ) is the (conventional) correlation between stocks i and j, and ρ i,0 , ρ j,0 are correlations of returns of stocks i and j with respect to the market return.

Distributions and invariance of partial correlations
Let p τ (x) be the probability density function of c i,j (τ ) empirically estimated as: where δ D (⋅) is the Dirac delta function.We observe that as to be expected, the functional form of p τ (⋅) is τ -dependent (see Fig. 1).However, the standardized distributions -the distribution when c i,j (τ ) are mean-shifted and scaled by standard deviation, are invariant over a significant regime of τ ; i.e. τ max ≥ τ ≥ τ min .This indicates that the scaling factor -in this case, the standard deviation -scales with τ .

Scaling phenomena during distribution invariance
Let σ(τ ) be the standard deviation and m(τ ) the mean.In this work, we use the inverse of the standard deviation -rather than the standard deviation -as an interpretable measure of precision, b(τ ) = 1 σ(τ ) .We will demonstrate that during the regime where the standardized distributions are invariant (τ > τ 0 ), the dependence between τ and the precision b(τ ) is well-explained by simple models with few and interpretable parameters such as the power law, and the stretched exponential function: Aside from convincing model fits, the critical model parameters -the scaling exponent λ (in the case of the power law) and the stretching parameter β (in the case of the stretched exponential) -once trained, emerge as candidates for macroeconomic indicators of market volatility.Indeed, we suspect that any other simple model class that can convincingly fit and validate the (near-linear in log-log scale) dependence of b(τ ) with respect to τ should similarly express the market characteristics within its model parameters.

Empirical Results: Emergent phenomena in real-world data
Recall the dataset descriptors: T int = 1 year; the sampling time interval of stock prices is 1 minute; there are ∼ 251 business days in a year when the stock market is open from 9 ∶ 30 to 16 ∶ 00 ET; for every day the market is open, each stock has ∼ 390 prices.For T int = 1 year, each stock has a ∼ 98000-length price series; the price series is arranged such that the closing price at 16 ∶ 00 ET on the current market day immediately precedes the opening price at 9 ∶ 30 ET on the following market day.While volatility in after-hours trading may result in drastic price fluctuations at particular indices in the series, an increasing τ has a smoothing effect on these spikes, and we believe that the return correlation PDFs are not significantly affected by these gaps.Results presented below are replicated for a shorter T int = 3 months (see S4 Appendix).The PDF defined in Eq. ( 10) is visualized for 6 years -1 year per row: (a) Left: τ = 1 min to 1000 min and (b) Right: τ = 1000 min to 30000 min.For τ > τ 0 , the normalized PDFs are invariant to τ (see Fig. 2).

Functional form of the standardized partial correlation PDF is invariant during a finite τ regime
We provide qualitative and quantitative evidence of the invariance of the functional form of the standardized distribution for a finite regime 30000min > τ > 1000min.First, in Fig. 2, we plot the standardized histograms for 6 years (remaining years can be verified using the attached codebase), by superimposing the functions pτ (⋅) across different τ .Quantitative evidence is provided next: • Pairwise KL divergence between standardized partial correlation PDFs: For each year from 2004 to 2020, we compute the KL divergence (KLD) between pτ1 (⋅) and pτ2 (⋅), the standardized partial correlation PDFs computed with τ 1 and τ 2 respectively.We would like to show that inside the regime where functional invariance was visually observed (1000min ≤ τ ≤ 30000min), D KL (p τ1 ||p τ2 ) for any pair {τ 1 , τ 2 } is small compared to the KLD computed between a pair of standardized PDFs for which τ is outside the scaling region.The pairwise KL divergence between the standardized partial correlation PDFs across 6 evaluated years are presented in Fig. 3.The dark square block in the bottom right of each heatmap implies that the KL divergence between any pair of standardized distributions sampled from the region of τ specifying the functional invariance is low.In order to compute the KL divergence in a consistent and comparable fashion, each standardized PDF is re-sampled (N = 10000) using Gaussian smoothing, N (0, 0.2).
• Probing the onset of the function invariance using Gaussian Mixture Models and Kurtosis: We fit a Gaussian Mixture Model (GMM) (2-mode) on pτ (⋅) and probe the weights of the two components across τ .We expect to see a transition as the function invariance sets in.As shown in Fig. 4, initially, one mode is dominant, and as τ > 1000, the weights of the two modes become comparable.Such transitions are observed with 3, 4, 5-mode fits as well.
• Kurtosis of the density function with respect to τ : In Fig. 5, we demonstrate the same transition (from τ < 1000min to τ > 1000min) by plotting the kurtosis of pτ (⋅) with respect to τ .When the invariance property takes effect, the kurtosis values suggest a corresponding transition from a leptokurtic (> 3) to platykurtic (< 3) regime.

The scaling behavior and its emergent properties
Motivated by the observed function invariance in the standardized distributions of the partial return correlations, in Fig. 6 we plot b(τ ) -the precision (defined in Materials and Methods) -as a function of τ for each year, 2004 to 2020, to find evidence of a scaling phenomenon.Within the τ range where the invariance is identified (τ = 1000 min to 30000 min) -the regime highlighted by light cyan, we observe a near-linear relationship between ln τ and ln b suggesting a Stretched Exponential or Power law fit 3 .We now evaluate these fits using Model Architecture Search (MAS).

Model Architecture Search
We

Generative Models
We have observed so far that real S&P500 data demonstrates a functional invariance in the standardized distributions of the partial return correlations and an associated linear dependence of the precision with respect to τ .Economists attempt to construct generative models to explain these results in order to better characterize the consensus-forming taking place in the stock market.A starting point -as noted in the Introduction -is the correlation graph of inter-connected stocks, which reveals emergent stock communities for a return horizon τ > τ 0 corresponding to industry sectors.These are shown in Fig. 8.Many generative models -such as the single-and multi-factor models -have been proposed to explain these interactions by quantifying the pair-wise inter-stock interactions.We show that these do not replicate the observed functional invariance and/or scaling behavior and propose a suitable replacement -a modified Vicsek modelthat is more promising.

Single Factor Model
The conventional single-factor model [28] uses only the fluctuations of the market mode and individual stock prices to model the correlations of return, i.e., where r 0 (t) represents the market mode describing the overall fluctuation of the financial market.In Eq. ( 14), ξ i (t) is the part not included in the market mode.In the one-factor model, ξ i (t) is a zero mean Gaussian distributed time series with and is independent of each other and r 0 (t).In fact, we can derive the values of the correlation coefficients in the one-factor model: ρ ij = ρ i0 ρ j0 , and the residuals {c i,j (τ )} N i,j=1 = 0 in Eq. ( 9).The standardized distribution pτ (⋅) in Eq. ( 11) reduces to the delta function (µ = 0, b(τ ) → ∞) violating the structure of the empirically observed correlation distribution.

Multi-Factor Model
The MSTs of the pairwise stock correlations clearly show clustering of stocks belonging to the same sector, and one can formulate a multi-factor model [29][30][31] wherein we Fig 9 .MSTs of the multi-factor models: The sector for each stock (node) is indicated by its color.We varied η: (left) η = 0.1, (center) η = 10.0, and (right) η = 1000.0.We set the number of steps 10000 and µ γ = 1.0.Observe for small η, the sectors are separated into distinct groups in the MST -similar to when τ is large.As η increases, the groups lose identity and merge; i.e the sector information is devalued: a similar effect to when τ is small.Precision in the edge weights is set to 1.
supply additional parameters that correspond to the individual sectors.Since the computational models we are considering directly output returns (not the prices), one needs to introduce an additional parameter to simulate the effect of time scale τ : by varying this parameter, one can control whether the market mode dominates -drowning out the effect of the sectors (as observed for small τ in real data)-or is suppressed, allowing the sector correlations to emerge (as observed for large τ ).
Here, N (µ, σ 2 ) is the Gaussian perturbation function and γi,k is non-zero when stock i belongs to sector k and 0 otherwise.Additionally, the variance of market and sector returns are set to 0.05 × ∆t, and 0.10 × ∆t respectively.Note that increasing η corresponds to larger perturbations of ξi (t) in successive time steps.Thus it plays the role of 1/τ : a large η implies market-mode dominance and small η, sector-mode dominance (see Figs. 9 and 10).
We performed the numerical simulation of the multi-factor model with K = 2 sectors.We set N = 500 and the number of stocks in each sector as 250.We swept η in Eq. ( 19) across multiple values.The other parameters are set as follows: µ γ = 1.0, αi ∼ N (0.0, 1.0), βi ∼ N (0.0, 1.0), and γi,k ∼ N (µ γ , 1.0) if stock i belongs to sector k and otherwise zero.In Fig. 9, we show the MSTs of the multi-factor models for η = 0.1, 10.0, 1000.0 and µ γ = 1.0.Observe that for small η, the stocks per sector belong in separate communities in the MST.As η increases, the communities collapse.
In Fig. 10, we plot the PDF p τ (⋅) in Eq. ( 10) of the multi-factor model and standardized PDF pτ (⋅) in Eq. (11).The following dynamics are observed: (a) For small η -may correspond to large τ -two peaks originate from two sector modes and one peak originates from the market mode; (b) for moderate η, the market mode dissipates and the two sector modes dominate the distribution; and (c) for large η, the  (11).We set the number of steps 10000 and µ γ = 1.0.return correlation distribution becomes random due to large perturbations.The multi-factor model explicitly uses sector affiliation as a parameter resulting in multi-modal correlation distributions.This multi-modal structure of pτ (⋅) results in the precision of return correlations b(τ ) not scaling with τ .

Modified Vicsek model
The Vicsek model is a generative model that can display some of the salient group characteristics of swarming behavior, as observed in the motion patterns of flocks of birds and swarms of fish.Compared to the multi-factor model where group assignments are provided in advance, such assignments emerge naturally in the Viscek model: each particle in the swarm is influenced by other particles that are within a neighborhood.Based on such local-only interactions, long distance order emerges and groups of particles cluster together in their dynamical behavior, akin to sectors emerging in stock markets.
Our model uses the standard setup [32] with the following modifications: (a) Consistent with the factor model setup, the predicted variable is the return r i (t); (b) Particles (individual stocks) move in R 1 (rather than in R 2 ); a stock's offset from 0 is the return value.The proximity of one particle i to another j at time t is the absolute value of the difference of the returns |r i (t) − r j (t)| rather than the typical cosine distance metric used in R 2 ; (c) Time steps are discretized rather than continuous.The update step is: where and N i,δ is the number of elements j that satisfy |r i (t) − r j (t)| < δ.An extended derivation of the Vicsek update step is presented in S5 Appendix.
Evaluating the Vicsek model under different parameter settings of δ, η: We next discuss how parameters δ (radius of influence), and η (standard deviation of noise) -individually and collaboratively -can play roles analogous to the return horizon parameter τ in the empirical stock price data.In particular, we analyze the dependence of the distributions of correlation of returns defined in Eq. ( 20) on δ and η.The number of steps was set to 10000, and δ was fixed at 0.10.For η = 100.0, the vine structure is apparent and we used these vines to define the analogs of sectors in stocks.In particular, we performed a community finding on MST [33] corresponding to the vine structure, and identified 11 communities corresponding to the number of GICS sectors.These communities indeed constitute individual vines, as shown by colored nodes in the right-most figure.Next we tracked the associated stocks as η decreased based on the fixed δ condition.Notably, as η decreases, the sectors collapse due to the fixed neighborhood of δ, which encourages more particles (stocks) to interact with one another.This increased interaction arises as the particles experience less perturbation from Ξ, leading to homogeneous behavior and radial MSTs.
• Role of δ: The parameter δ plays a crucial role in determining the extent to which particles in the model are influenced by their neighbors.We anticipate that very large values of δ lead to substantial inter-particle influence, producing highly correlated return sequences, while very small δ values result in independent particle behavior with little correlation.Thus, we would expect small values of δ to lead to very small return correlations -akin to short return horizons τ -and as δ is increased we expect pockets of correlated returns, just as sectors emerge in the empirical data with increasing τ .Indeed, as shown in Fig. 12, for intermediate values of δ the precision follows a near-linear dependence in the log-log scale with respect to δ -a scaling phenomenon.
• Role of η: Injected noise adds randomness to the trajectories of particles and together with the radius of influence determined by δ, the noise level η facilitates the formation of distinct pockets.In the absence of this noise and with a sufficiently wide radius of influence, particles tend to merge into a unified group, exhibiting strong correlations with each other.Visual evidence illustrating this effect can be seen in the MST structure in Fig. 11(a).As illustrated in Fig. 11(b) and 11(c), increasing the noise factor results in the formation of communities.Of course if η is increased further, the vine structure will disintegrate.
Indeed, δ and η behave as duals of one another while influencing the distribution of the return correlations: For example, increasing the noise in particle trajectory (η) has a similar effect to decreasing each particle's radius of influence (δ).Given these constraints, we look to discover a scaling effect with respect to δ, η and functional invariance of pδ (⋅), pη (⋅) for intermediate δ, η values.For simulations, we set N = 500, α i = γ i = 1.0 and β i = 0.05 for i = 1, 2, . . ., N , and ∆t = 1.0.
• Functional form of the correlation PDFs: In Fig. 13, we present the standardized correlation PDF p (⋅) (⋅) for various η values (on the left) and δ values (on the right) (compare with the empirical result in Fig. 2).Notably, within a finite range of η and δ, we observe that the functional form shows invariance properties similar to those observed in the empirical data.11) of the modified Vicsek model.Note that the subscript is changed from τ to η.We set δ = 1.0 and the number of steps 10000.(b) Standardized PDF pδ (x) in Eq. (11) of the modified Vicsek model.Note that the subscript is changed from τ to δ.We set η = 10.0 and the number of steps 10000.
• Scaling behavior with respect to η and δ: In Fig. 12, we plot the relationship between the precision and each of the parameters η (left) and δ (right) keeping the other fixed (please refer to Fig. 6 for a comparison).We observe the scaling phenomenon for intermediate values of η and δ.While at the extremes, particle trajectories are either completely uncorrelated (high η, low δ) or globally correlated (low η, high δ), the range in between facilitates particles to be locally correlated (akin to sectorssee Fig. 11 (right)).

Concluding Remarks
In this paper, we first observe that the standardized distributions of the partial correlation of returns reaches an invariance for a finite range of τ .Second, within this τ regime, we demonstrate a scaling phenomenon governing the precision of the raw distributions, b(τ ), with respect to τ -the investment horizon.We additionally review existing stochastic and generative factor models to show that they fail to model these observed emergent phenomena and propose a modified Vicsek-inspired framework that is a more promising candidate.12).Error bars are computed using 4-fold cross-validation while estimating the linear fit.In the blue highlighted regions, anomalies are observed where λ deviates from the linear fit.
to 2020 on real stock price data sampled every minute of trading hours.
The compelling presence of such a scaling phenomenon warrants investigating the role of the model parameters that are crucial to the fit that explains the dependence of b(τ ) on τ .Specifically, in the case of a Power Law fit, λ appears as a macro-economic indicator of market health.A similar analysis on the Stretched Exponential fit -also a good fit of the {τ, b(τ )} data in Fig. 7 -shows a similar effect with respect to the β parameter (see Supplementary Information).
Fig. 14 plots the scaling exponent across the 17 years of evaluation.The figure shows that λ's exhibit inter-annual variations, portraying a distinct linear decline from past to present, characterized by intriguing anomalies (highlighted in cyan).We seek to make sense of this trend and interpret its significance.Setup: Recall that the standard deviation σ of the return correlations is proportional to τ λ .To quantify the change in standard deviation as we transition from a short-term (τ I ) to a long-term (τ F ) investment horizon, we introduce a novel metric defined as follows: where σ(τ F ) and σ(τ I ) represent the standard deviations corresponding to the long-term (τ F ) and short-term (τ I ) investment horizons, respectively.This measure captures the fractional increase in the standard deviation from the short-term to the long-term.A large R σ indicates that the standard deviation of the return correlations in the short-term are much smaller than in the long-term.A small R σ suggests that the short-and long-term investment horizons look statistically similar.
Using the scaling law, we get: where τ I τ F ∈ (0, 1).Referring back to Fig. 14, we observe empirically that λ ∈ (0, 1).Therefore, R λ is an increasing function in λ.Given that we have noticed a consistent decrease in the value of λ over the years, our focus now shifts to understanding the implications of a corresponding declining trend in R λ across years: • Market Maturity: We first consider the y-intercepts depicted in Fig. 6.Specifically, the values of σ(τ I ) (which is 1/b(τ I )) demonstrate a consistent and gradual increase from past to present, while σ(τ F ) remains relatively constant across the same period.Consequently, the decreasing trend in R λ suggests that σ(τ I ) gets closer to σ(τ F ) every successive year.In more general terms, the communities of stocks observed over longer investment horizon in earlier years appear in shorter time horizons in January 11, 2024 16/26 later ones, a sign that investors are becoming increasingly efficient and adept at identifying stock return patterns -a sign of market maturity.
• Global Financial Crises: We now consider the cyan-colored windows in Fig. 14 corresponding to two recent global crises -subprime mortgage crisis in 2008 and the COVID-19 pandemic in 2020.In these cases, λ dips significantly below the linear fit.
As markets stabilized post the 2008 crisis, the λ values rebound to the linear trend.Since our data stops at 2020, it remains to be seen whether a similar rebound will take effect.
In summary, the discovery of such scaling phenomena and its associated summary statistics in the partial correlations of stock price returns adds to a growing body of work in macro-economic modeling.By extending the qualitative observations of the variations in MST structure to the correlations at large in a quantifiable manner, we demonstrate one robust path to probe market health based on collective dynamics.
In Fig. 16, we demonstrate that the mean value of the partial correlations -as opposed to the standard deviation -of the returns does not scale as a function of τ .12),( 13)).Similar to the effect of λ in the case of the Power Law, in this section, we evaluate whether the stretching parameter β in the Stretched Exponential fit can act as a market indicator.
To get a better sense of the behavior of the stretched exponential, in Fig. 17, we plot the b(τ ) − τ dependence in linear-log scale across 4 years: 2004, 2008, 2012, 2020.Both an exponential fit -β = 1, linear in the linear-log scale -and the best stretched-exponential (β < 1) fits are plotted.As corroborated by the MAS experiments (see Figs. 7,21), the stretched exponential is a significantly better fit.
β appears to be a parameter that changes year-to-year.For year 2020, the exponential and stretched exponential fits are more similar to one another (β → 1) than in 2004.This motivates investigating whether there is a general trend in the β values, similar to the λ's presented in Fig. 14.In Fig. 18, we show a similar plot.
First we explain the increasing trend: As β increases (tends to 1), the corresponding Stretched Exponential model fit flattens such that the short-term (small τ ) and long-term (large τ ) investment horizons are similarly effective.This can be confirmed by the y-intercepts in Fig. 17.
Note that this interpretation is similar to when λ is small in the Power Law fit.In fact, there are also corresponding anomalies (marked in cyan) for the years when the market was volatile -2008, 2020.Observe that for these years, the β values significantly overshoot the linear trend (beyond an error margin; 4-fold cross validation).

June 30)
In this section, we demonstrate that (a) the functional form of the standardized PDFs pτ (x) is stable for some τ > τ 0 when T int = 3 months between Apr. 1 and June 30 (Q2), and (b) the scaling phenomenon is once again observed between b(τ ) and τ .Recall that T int is the integration window that is used to compute the correlation in Eq. ( 4).We accordingly decrease the maximum investment horizon from 50000 minutes to 10000 minutes since τ < T int .Below we provide 3 figures corresponding to (a) the functional invariance (Fig. 19), (b) the scaling phenomenon (Fig. 20), and (c) model architecture search (Fig. 21).Note their similarity to those presented in the main text.

S5 Appendix. Deriving the Modified Vicsek model
We first review the Vicsek model [32].Let us consider a two-dimensional system of N kinetic particles.We denote, by ⃗ x i (t) and ⃗ v i (t), the position and velocity of particle i at time t.The position of i is updated by The amplitude of the velocity of each particle is constant: and where N i,δ is the number of elements j that satisfy ∥⃗ x i (t) − ⃗ x j (t)∥ F < δ.Instead of the uniform distribution, we can use the normal distribution for Ξ i (t): ).

Fig 2 .
Fig 2. Qualitatively demonstrating the stability of the functional form for increasing τ : Standardized PDF pτ (⋅) in Eq. (11) visualized for 6 years -1 year per row: (a) Left: τ = 1 min to 1000 min and (b) Right: τ = 1000 min to 30000 min.As τ exceeds 1000 min, the shape of pτ (⋅) takes a more stable form.A similar analysis with T int = 3 months is presented in the Appendix Fig. 19.

Fig 3 .
Fig 3. Pairwise KL divergence between standardized partial correlation PDFs for 6 years of analysis: Dark blocks along the major diagonal (circumscribed in blue) indicate that when the invariance is visually observed (1000 ≤ τ ≤ 30000), the pairwise KL divergence is low.

Fig 4 .Fig 5 .
Fig 4.  Capturing the onset of function invariance by fitting 2-mode GMM onto the standardized partial correlation PDFs: Across 6 years of analysis, we plot the weights of the 2 GMM components after fitting to pτ (⋅) for different τ .Mode 1 corresponds to the mode with the lower standard deviation.As τ is increases, the second mode starts contributing significantly to the fit signaling the onset (shaded cyan).

Fig 6 .
Fig 6.Dependence of the scaling factor b(τ ) on τ : (a) from 2004 to 2011; and (b) from 2012 to 2020.From τ = 1000 min to 30000 min (the regime highlighted by light cyan), observe the near-linear relationship between ln τ and ln b.A similar visualization with T int = 3 months is presented in the Appendix Fig. 20.

Fig 7 .
Fig 7. Model Architecture Search -Training and Validation MSE: The MSE (CI = σ between folds) is reported as a measure-of-fit of every model for each year for {τ, b(τ )} samples within the identified τ regime.Observe that the Power Law and the Stretched Exponential fits consistently reports lower validation MSE.Error bars are computed across 4 folds of cross-validation.Polynomial models demonstrate clear signs of overfitting while the exponential model (β = 1) is only slightly worse than the best fits.A similar analysis with T int = 3 months is presented in the Appendix Fig. 21.

Fig 8 .
Fig 8. MSTs with (left) τ = 1 min and (right) τ = 1000 min in 2004: The starting and end dates are Jan 1st and Dec 31st, respectively.Each vertex is colored depending on its GICS sector.Edge weights are computed as: d i,j (τ ) ∶= √ 2(1 − ρ i,j (τ )).Observe that for larger values of τ , stocks belonging to the same sector cluster together.

Fig 11 .
Fig 11.Minimum Spanning Trees (MSTs) obtained from the modified Vicsek generative model with varying η values: We generated MSTs using the modified Vicsek generative model with different values of η: (a) η = 0.010, (b) η = 1.0,and(c) η = 100.0.The number of steps was set to 10000, and δ was fixed at 0.10.For η = 100.0, the vine structure is apparent and we used these vines to define the analogs of sectors in stocks.In particular, we performed a community finding on MST[33] corresponding to the vine structure, and identified 11 communities corresponding to the number of GICS sectors.These communities indeed constitute individual vines, as shown by colored nodes in the right-most figure.Next we tracked the associated stocks as η decreased based on the fixed δ condition.Notably, as η decreases, the sectors collapse due to the fixed neighborhood of δ, which encourages more particles (stocks) to interact with one another.This increased interaction arises as the particles experience less perturbation from Ξ, leading to homogeneous behavior and radial MSTs.

Fig 12 . 6 :Fig 13 .
Fig 12. Visualizating the dependency of b(⋅) w.r.t(a) η and (b) δ: Observe a scaling phenomenon of b(⋅) with respect to η and δ as the empirically observed trend in Fig. 6: Number of steps = 10000.The highlighted region (in which) denotes the near-linear fit between b(⋅) and η, δ.For different parameter settings, the near-linear fit in the log-log plot is visualized in cyan.

Fig 14 .
Fig 14.Scaling exponent with respect to year: Values of λ from 2004 to 2020 with interval T int = 1 year.The definition of λ is given in Eq.(12).Error bars are computed using 4-fold cross-validation while estimating the linear fit.In the blue highlighted regions, anomalies are observed where λ deviates from the linear fit.

Fig 15 .
Fig 15.Empirically estimated CCDFs of returns (+ve values only): The CCDF of returns computed for different return horizons τ and years are presented.In accordance to prior work, power laws are observed for a finite range of τ (marked in cyan).

Fig 16 .
Fig 16.Expected value of the partial correlations as a function of τ and across years: The mean value of c ij as a function of τ is presented.While the standard deviation shows scaling properties, we do not observe a similar relationship governing the means.

Fig 17 .
Fig 17.Plotting the dependence of b(τ ) with respect to τ as a stretched exponential: Four log-linear plots (-X-) representing the relationship between b(τ ) and τ are presented for years 2004, 2008, 2012, 2020.The stretched exponential (-, green) and exponential (grey) fits are shown.Notice that for later years, the stretched exponential fit and the exponential fit are closer to each other.

Fig 18 .
Fig 18.  Stretching parameter β with respect to year: Values of β from 2004 to 2020 with interval T int = 1 year are plotted above along with a linear trendline fit.Error bars are generated using 4-fold cross-validation (see Fig.14).Years marked with anomalies are identified in the cyan-shaded regions.

Fig 20 .
Fig 20.T int = 3 months -The power law (a) from 2004 to 2011; and (b) from 2012 to 2020 From τ = 1000 min to 10000 min (the region highlighted by light cyan), observe the near-linear relationship between ln τ and ln b similar to the case with T int = 1 year (see Fig. 6 in the main text).

Fig 21 .
Fig 21.T int = 3 months -Model Architecture Search: Training and Validation MSE The MSE (CI = σ) is reported as a measure-of-fit of every model for each year.Observe that the Power Law and the Stretched Exponential fits consistently reports lower validation MSE barring 2014.Error bars are computed across 4 folds of cross-validation.Polynomial models demonstrate clear signs of overfitting while the exponential model is only slightly worse in fit compared to the power law.These results are similar to the case with T int = 1 year (see Fig. 7).