Learning zero-cost portfolio selection with pattern matching

We replicate and extend the adversarial expert based learning approach of Györfi et al to the situation of zero-cost portfolio selection implemented with a quadratic approximation derived from the mutual fund separation theorems. The algorithm is applied to daily sampled sequential Open-High-Low-Close data and sequential intraday 5-minute bar-data from the Johannesburg Stock Exchange (JSE). Statistical tests of the algorithms are considered. The algorithms are directly compared to standard NYSE test cases from prior literature. The learning algorithm is used to select parameters for experts generated by pattern matching past dynamics using a simple nearest-neighbour search algorithm. It is shown that there is a speed advantage associated with using an analytic solution of the mutual fund separation theorems. We argue that the strategies are on the boundary of profitability when considered in the context of their application to intraday quantitative trading but demonstrate that patterns in financial time-series on the JSE could be systematically exploited in collective and that they are persistent in the data investigated. We do not suggest that the strategies can be profitably implemented but argue that these types of patterns may exists for either structural of implementation cost reasons.


Introduction
Sequential investment strategies aim to facilitate portfolio control decisions by collecting information from past behaviour and states of the market and using this information to deploy capital across a selection of assets in a manner the can generate consistent wealth maximization over the long-term [14,17,22].
The intention of the paper is not to find a profitable trading strategy for quantitative trading but to show that such strategies exists by providing a simple, transparent and easily recoverable example in the domain of unleveraged zero-cost portfolio selection for statistical arbitrage.
Here we make no specific assumptions relating to the nature of price processes for the sake of the algorithms, however, the approach is broadly based on prior mathematical analysis that use assumptions of stationarity and ergodicity of the price increments in order to allow the study of asymptotic growth rates.In particular to ensure that such growth rates have well-defined maxima when full knowledge of the distribution and its process have been achieved [14,17,18,22].
We investigate the idea that by using pattern matching algorithms (where the patterns are unspecified) combined with learning algorithms, based on some purpose, such as wealth maximisation irrespective or risk [13,14], we can: Email address: tim.gebbie@wits.ac.za (Tim Gebbie) 1. Beat a cash portfolio in the context of a self-funding strategy, a zero-cost portfolio strategy, and that 2. We can beat the best stock in the market [16].
The latter has been shown to be the case in prior literature, by investigating daily sampled stock data from the NYSE for long-only (fully invested) portfolio strategies [14,17,15,16,20].Here we consider both of these cases: zero-cost, and fully invested strategies, in the context of the South African stock market, the Johannesburg Stock Exchange (JSE), and do so for both daily sampled data and intraday data.
The approach here should not be confused with questioning the value of technical analysis where pre-specified patterns, in the form of some sort of library or set of rules, are used to try to generate systematic wealth [11].We are considering the problem of probing phenomenology aimed at understanding financial markets as a complex adaptive system [1,3].More specifically, we are considering the modelling of time-series arising from complex adaptive systems, something more closely aligned with the context of nonlinear dynamical systems thinking [4].The question of finding evidence of structure, as opposed to randomness, in financial time-series data, but beyond evidence of long-term memory or typical stylised facts [2].We argue that we are not trying to show that specific patterns exist, that such pattens are predictable, but rather that the interaction of a purposeful agent with a stock market using pattern-matching can generate wealth that would not be expected from a typical null-hypothesis of geometric Brownian motion.
We are specifically not looking for statistically preserved properties of time-series, in the sense of time-series models, but are rather looking for evidence of statistical repeating structures in time-series, but without a-priori ability to know the form that the structure will take, perhaps because of the nonlinearity of the system in question [8,10].
We are seeking indirect evidence of structure by showing that a purposeful agent can learn to make investment decisions [3], in a positivist manner, by looking for a-priori unspecified and unknown patterns in the data, that can be purposefully exploited, sequentially and systematically, to generate wealth in excess of that expected by randomness and the related normative perspectives of the functioning of financial markets.This is not in itself new, there is a rich literature on attempts at probing the predictability of this or that financial time-series.What can be considered controversial is the view that fairly naive computational learning agents can generate wealth within the system without special insights or understanding of the system itself 1 .
By extracting positive growth rates in the excess of the performance of the best stock by using unleveraged combinations of underlying stocks over long periods of time this can be taken as building the case that there are indeed patterns, or some sort of structure, that almost repeat though time in a manner that their occurrence can be treated as exploitable information in collective.This has been shown to be the case for long-only portfolio's [14,17,15,16,20].We show this for self-funding strategies; zero-cost portfolio's.
To achieve this we construct sequential investment strategies based on pattern matching and demonstrate that these strategies can generate positive growth rates in excess of the best stocks in the investment universe, and substantial positive growth rates for zero-cost strategies in excess of that expected from investment in cash or risk-free assets.
We do not address the question of whether it is risk that the investor is being compensated for, or even whether the strategies we are isolating are in fact statistical arbitrages, in the sense that the strategies long-term volatility tending to zero in conjunction with an always positive probability of positive performance at zero initial cost [12].
The appearance of patterns and organisation is a fundamental property of complex adaptive systems [4].Looking directly for pockets of predictability in complex dynamical systems [5] as an approximation to modelling complexity adaptive systems [4] is notoriously difficult given the intricacies of noise and nonlinearity [6,7].Coupling purpose, via a learning criterion, here wealth maximisation irrespec- 1 This view benefited from conversations with D Hendricks and D Wilcox tive of risk, to the selection for patterns, in order to achieve the stated purpose, is the approach promoted here.
It is in this sense that we built a framework that extracts pockets of predictability, if they exist, via pattern searching, ideally in an online manner, in order to increase our agents wealth irrespective of risk, but specifically in the situation where the form of the patterns are always unknown, changing and dynamic, but are represented in collective past histories of the system components.
In Section 2 we present the agent-based learning algorithm as an extension of prior work [13,14,17,18] and [20,22].The contributions here are: (i) the algorithm is explicitly re-written in online form in order to make near-real-time applications tractable, (ii) the algorithms are modified for application to the zero-cost portfolio selection problem using the mutual fund separation theorems [25,24], (iii) the algorithms are explicitly tested, using synthetic data, real daily data both from the NYSE and JSE, and for JSE intraday 5-minute bar-data.
Section 3 describes the approach we have adopted for the generation of experts or agents modified for use in zerocost portfolio strategies.The algorithm parameters are not tuned prior to use but are left to the online-learning algorithm to select.
In Section 3.6, we consider strategies that target predictable patterns using a simple modified version of the nearest-neighbour pattern-matching strategy developed by [22].
As in the case of the learning algorithm, the agentgeneration algorithms have been modified in principle: (i) to support offline and online algorithm use, (ii) they are explicitly framed for use with zero-cost portfolio selection problems, and (iii) portfolio optimizations have been replaced with analytic quadratic approximations in order to improve execution times.
In order to have true online pattern matching the algorithms would have to replaced with either look-up-tables built off-line or a hybrid method that combines offline building of the history of the agents performance and then an almost online method that updates that cached history of agents performance across parameters as the data arrives sequentially in real-time.
Section 4 provides an overview of the data used in the various numerical experiments.
The data is sequential and uniformly sampled and takes on the form of open-high-low-close (OHLC) data, this is described in Section 4.1.The use of open, high, low and close data combinations for the daily data testing can be carried over for intraday studies, and the use of close prices is a special case.
The synthetic data is described in Section 4.2 along with the algorithm testing strategy.Briefly, a simple Kolmogorov-Smirnov test is adopted to assess algorithm behaviour across 4 test cases: 2. SDC2: log-normal random data where all assets have the same positive mean and as such basic learning is not possible for zero-cost portfolios (portfolios that have long and short positions that sum to zero), 3. SDC3: log-normal random data with varying positives means, and 4. SDC4: where we have log-normal data with both positive and negative means with the same fixed variance.
The synthetic data is used to understand and prove the behaviour of the zero-cost portfolio strategy (which we will call active portfolio's) and the fully-invested portfolio strategy (which we will call absolute portfolio's).
The four real-world data sets are described in Section 4.3: 1.The standard daily sampled test-data set for the NYSE [14,17,15,16,20 A general overview of the implementation of the numerical experiments is addressed in Section 5.
Section 6 describes the results and analysis of the results, first the synthetic data in section 6.1 and then for the realworld data, in Sections 6.2, 6.3, 6.4, and 6.5, respectively for the four real world case studies: NYSE, extended merge NYSE, daily sampled JSE and intraday JSE.

An online-learning algorithm for portfolio selection
The application is for a set of stocks ordered in time where each agent will consider different combinations of stocks for each time-period based on features and strategy parameters.These different agents compete in an adversarial manner in competition for capital allocations [13,17,18,22].Here agents with poor performance will have incremental capital allocations reduced and agents with robust performance will have incremental increases in capital allocation.Better performing agents will over time have their relative contribution to the aggregate portfolio increased so that their decisions are preferentially selected for trade at the onset of each trading or investment period based on information available at the end of the prior trading period.
The online learning algorithm takes as inputs: a set of agents controls, and performances.These are enumerated over features (here price-relatives) and free-parameters of the temporally ordered objects (here stocks).
The key feature used will be price relatives which are defined for the m-th object as: In vector notation we will write this equivalently as x t where the m-th component is x m,t .The controls that represent the agents' are the portfolio weights by which each agent's decision will contribute to the final aggregate decision at a particular time.
Agent performance is represented by factor (agent) mimicking portfolios that are formed from the portfolio controls at each time period.The controls are estimated and implemented at the beginning of each period.The relative changes in asset performance will then modify the relative weights of the asset over the investment period and the performance of a given agent is then determined at the end of the investment period.This is determined both by the controls, and selecting for the collection of objects the agent is holding, their weights, and the performance of those objects as determined by price relatives.
Agents do not have to hold the same number of objects.Agents can hold all or small groups of objects, they can short-sell objects and hold long positions in objects2 .The collection of objects a particular agent holds will be called the agent's object cluster.
The parameters that denote agents are typically a parameter that is an index of the cluster of objects an agent has decided to use, and the algorithm specific parameters; typically a data window parameter k determining how much past data to include, and a parameter more specific to a given algorithm if it is required, such as a partition parameter , and a forecast horizon dependent parameter, τ .
Any four useful parameters can be used in the learning algorithm that was implemented in this paper.The number of agents is then a function of these four freeparameters.The learning algorithm will then carry out the weighted averaging process based on agent past performance over the agents enumerated by these four parameters.
The parameters are denoted τ , w, k and respectively.We reserved parameters k and for algorithm specific parameters -this is done in order to try to align with their usage in the prior literature [20].There are at most W values of w, K values of k, L values of and τ n values for the horizon parameter τ .
The default value of the horizon parameter is 1: τ = 1.For simplicity and computational speed the results pre-sented in this paper have used the default value 3 .The choice of these parameters will determine the number of agents in the system.The number of agents is denoted by n where the total number of agents will then be N = τ n W KL.
The n-th agent is represented by a tuple containing the controls at a given time and its performance (H nm,t , S n,t ).This tuple will usually be represented in vector notation as (H n,t , S n,t ) where the object index m is suppressed.
For discrete values of sequential time running from t = 1 until some maximal time T the agent controls H are then collection of T time-ordered (N, M )-dimensional matrices that are represented as multi-dimensional double precision matrices in the software.
The value of the n-th agents controls for the m-th object at time t is H nm,t for discrete values of time.The performance of the agents is represented as a (N, T )-dimensional matrix where the n-th agent has its performance over the t-th time interval as S n,t .
There are at most M objects.So m can take on values on the integer interval [1, M ] that would enumerate the objects.The number of objects remain static for a given agent even though they may be able to achieve zero positions in a particular agent.
From the perspective of the learning algorithm the mechanism of agent generation is not important, it is required that all N agents are correctly enumerated at each time increment.At the beginning of each time increment the controls determined at the end of the previous time increment are implemented and then held to the end of the time period at which time the agent performance is determined and the agent controls are then adjusted using the learning algorithm.
The learning algorithm updates the agent mixture control q n,t which is a measure of how much a given agent will contribute to the aggregate portfolio.The q variables control the relative mixture of agents through time as they compete based on their past performance.The mixture controls cannot in general be thought of as probabilities, which makes their use and notation different to some of the prior literature [20].

Online-learning algorithm
The learning algorithm is inspired by the universal portfolio approach developed by [17,18] and refined by [22].The learning agent can be thought of as a multimanager, using asset management language, where the multi-manager is selecting and aggregating underlying strategies from a collection of portfolios H n,t and then aggregating using some selection method to a single portfolio b t that is implemented at each investment or trading period t.
The basic learning algorithm was incrementally implemented online, but offline it can be easily parallelized across agents.The learning algorithm has five key steps: 1. Update the portfolio wealth: The portfolio controls b m,t for the m-th asset are used to update the portfolio returns for the t-th period Here the price relatives for the t-th period and mth asset, x m,t , are combined with the portfolio controls for the period just ending to compute the realised portfolio returns for this period, period t.The portfolio controls were computed at the end of the prior period and implemented at the beginning of the current period.The relative amounts of each object in the portfolio will have changed by the relative price changes assuming no cash-flows into or out of the portfolio during this investment period.

Update agent wealth:
The agent controls H nm,t were determined at the end of time-period t − 1 for time period t by some agent generating algorithm for N agents and M objects about which the agents make expert capital allocation decisions.At the end of the t-th time period the performance of each agent, S n,t , can be computed from the change in the price relatives x m,t for the each of the M objects in the investment universe considered using the prices at the start, p m,t−1 , and the end of the t-th time increment, p m,t , using the agent controls.
S n,t = S n,t−1 ∆S n,t .
3. Update agent mixtures: We considered three different agent mixture update rules: 1.) the universally consistent choice, and 2.) an exponential gradient choice [23] and 3.) an exponentially weighted moving average.We generically refer to these online updates as rule g.In practice one would select one of the three update rules once for the duration of the offline training, if one seeks to initialise the algorithm prior to deployment, or for use online during the system implementation in real-time.For the numerical experiments presented here we adopted the universal consistent approach inspired by [18,22] as this demonstrates the principle.We can define the mixture of controls as the accumulated agent wealth is used as the update feature for the next unrealised increment with some normalisation, as such, the agent mixture control for the n-th agent for the next time increment, t + 1, is proportional to the measure of wealth: the alternative choices can include the Exponential Gradient (EG) 4 approach of [23] or an Exponential Weighted Moving Average (EWMA) 5 based learning strategy.We adopt the simplest update rule for the mixture of controls, it should be noted that there can be practical advantages to using more adaptive methods such as EG and EWMA learning where the learning rates can be used as additional parameters to be learnt using a thick modelling framework [15].
4. Re-normalise agent mixtures: If the agent mixture is to be considered a positive probability then we require that n q n = 1 and that all q n ≥ 0. This is the case of fully-invested agents where no shorting is allowed.We will call these types of agents absolute agents: For agents that we will consider active the leverage is set to unity for zero-cost portfolios: (1.) n q n = 0 and (2.) ν = n |q n | = 1.Here the mixture controls allow for shorting of one agent against another and the portfolio becomes self-funding.The mixture controls can no-longer be thought of as positive probabilities.
The leverage is normalised in order to ensure consistency between the learning algorithms and agent generating algorithms.

Update portfolio controls:
The portfolio controls b m,t are updated at the end of time period t for time period t + 1 using the agent mixture controls q n,t+1 from the updated learning algorithm and the agent controls H nm,t+1 from the agent generating algorithms using information from time period t and averaged over all n agents.
4 Exponential Gradient (EG) based learning: q n,t+1 = qn,te ηS n,t n q n,t S n,t 5 Exponential Weighted Moving Average (EWMA) based learning: The strategy is to implement the portfolio controls, wait until the end of the increment, measure the features, update the agents and then re-apply the learning algorithm to compute the agent mixtures and portfolio controls for the next time increment.Step 3: The agent mixture is updated for rule g q n,t+1 = g(q n,t , S n,t ) Step 4: The agent mixtures are re-normalised renormalise controls b n,t+1 = 1 ν b n,t+1 renormalise mixtures q n,t+1 = 1 ν q n,t+1 end if end for return (b t+1 ,S n,t ,S t ,q n,t+1 )

Agent generating algorithms
The purpose of the agent generating algorithms are to sequentially generate the agent controls H nm,t for the n-th agent for the m-th object for implementation at the start of the t-th time period.These will be denoted in vector notation as H n,t .
We initially considered three different agent-generating algorithms over which the thick modelling was carried out in order to learn the various algorithms' free-parameters: 1.) a pattern-matching algorithm [20], 2.) a contrarian mean-variance portfolio algorithm we called anti-BCRP (as it trades against the Best Constant Rebalanced Portfolio for a given k-tuple of data) 6 , and 3.) the ANTICOR algorithm [16].The various free-parameters of these algorithms, such as the window sizes k and partitions were then used to enumerate the agents that would compete for capital allocations in the learning algorithm.We adopted the pattern-matching approach [20] for the numerical experiments in this paper as we found a performance advantage in looking for more general patterns rather than merely targeting mean-reversion effects, and more importantly, the pattern-matching algorithms are more generic as they do not require any a-priori choices for the structures that are learnt for.This was considered to be more faithful to the intent of the paper -where we seek to show that unspecified patterns can be learnt for in a manner that can both beat the best single stock in a universe of stocks and can beat a cash portfolio in a self-funding strategy.

Comments on Notation
The feature realisations at time t for the m-th object, x m,t , are also denoted in vector notation as x t .The agent controls and the feature time-series are the key inputs in the online-learning algorithm to determine the agent mixtures q n,t through time.The online learning algorithm is path-dependent and as such both a function of the history of agent controls as well as the feature time-series history.
Following prior work we denote random feature variables as X and their realisations as x [20,22] where for some vector valued stationary and ergodic process {X t } +∞ −∞ with realisations denoted as x 1 , x 2 , . . ., x t and their corresponding random variables as X 1 , X 2 , . . ., X t .However, we will refine the notation further in order to more effectively enumerate the agents for our specific implementation.
The strategies are based on constructing a k-tuple of the selected feature for m-objects.We will denote the agent-tuple by x k w,t and the k-tuple as x t−k t .The ktuple is a slice of data of length k from the current time t, of width m enumerating all the objects.We will modify the k-tuple notation to {x t−k t } s(n), to denote a k-tuple taken from an -partition of the data for a given cluster of objects w = s(n).Here s is the cluster index of the n-th agent.We are suppressing the m index and using vector notation to write the k-tuple as x.The agent-tuple will be unique to the n-th agent where n is the unique agent index enumerating a particular combination of k, and w.
A k-tuple is used to determine agent controls H n,t .The initial features used are historical prices sequences which are assumed to be realisation x from some random process X.The pattern-matching algorithm will then refine the k-tuple to groups of nearest-neighbours that are expected to reflect historical selected outcomes that better reflect reversion by directly using k past realisations of performance of each object, for a given partition, by finding the mean-variance wealth minimizing portfolio (in order to be contrarian), either fullyinvested or zero-cost, and using the resulting portfolio weights for the agents with the specific window and partition parameters: H n,t+1 = H n,t+1 (γ, −µ(xn,t), Σ(xn,t)) comparing with Eqn. ( 25) and (26).
future outcomes than merely the last price change or price change sequence.This is done by comparing the current realisation x t−k t with the past.In this way, given a set of parameters enumerating the nth agent we will select the required tuple from the existing data realisations depending on the algorithm parameters using some selection function f where the m-th component of the k-tuple is x m n,t .

The Log-optimal strategy
The log-optimal strategy under the assumptions of stationarity and ergodicity has been shown to be the best possible choice of strategy over the long term [14].This type of analysis has been extended to the semi-log-optimal case [20] where weakened conditions have been derived.
The surprising result is that even with this weaker formulation the loss of optimality is such that log-optimality has, for all practical purposes, equivalent performance to portfolios selected using semi-log-optimality [20].This provides an argument for the use of competing sequences of mean-variance portfolios in the framework of agent-based competition for capital.
With an initial investment wealth of S 0 using a sequence of portfolio controls B = {b i } t−1 i=1 from time i = 1 until the current time t the portfolio wealth for a fully-invested portfolio is [20] This gives an average portfolio growth rate W t (B) = Here one is aiming to maximize the overall wealth through the incremental selection of the sequence of fully-invested portfolio controls B.

Universally consistent strategies
The fundamental result of universal log-optimality is that no investment strategy can have a faster average rate of growth than that arising from the log-optimal portfolio [13,14,17,18].However, full knowledge of the distribution of the process is required.Strategies achieving an equivalent growth rate without knowing the distribution are called universally consistent [14,20] strategies.
In principle one could via simulation enumerate all the possible controls and find via brute-force the set of controls that solve the log-optimal portfolio selection problem.This is ambitious given current technology constraints and that the opportunity set of stocks is typically large and the data representing the features even larger -particularly for intraday quantitative trading problems.
In the idealized situation we would define some simplex Λ where there is a prior distribution µ on the simplex, such that some expert or agent b is a given realisation from this distribution of portfolios.We would then directly evaluate the µ-weighted fully-invested universal portfolio at time t [18,19] where Λ dµ(b) = 1 and the portfolio value S t at time t is as Here the portfolio is fully-invested such that b1 T = 1 for unit vector 1.
Although we seek strategies that are universally consistent with respect to the class of stationary and ergodic processes.A pragmatic approach is required given both the unrealistic distributional assumptions, and the curse of dimensionality we face in enumerating control space 7 .
The strategy is to reduce the problem by finding a more informed subset of controls that can be used to approximate the required sequence of portfolio controls that are used to represent a universally consistent strategy.In addition to reducing the set of applicable controls one also aims to streamline the evaluation of these controls and their adaption through time, this can be achieved by reducing the log-optimality criterion to semi-log-optimality.

Semi-log optimality
We choose to focus on the first two moments of the price relative distributions: the mean and covariance.This will allow enhanced performance speed of the algorithms (see Figure 13) but with some loss in long-term optimality [30,20] and as such a deviation from the universally consistent strategies.
First, we have reduced the opportunity space in the simplex of all possible portfolios in order to make the problem of finding a portfolio that is optimal over the entire feature space computationally tractable, this is achieve by using agent-generating algorithms and learning over the free-parameters for those agents generating algorithms.
Second, we replaced the optimization with a quadratic approximation that will give us analytic solutions to replace optimizations that we would otherwise have to solve numerically.In addition to a performance advantage, using the quadratic approximation this will also provide a straight-forward method for considering both fullyinvested and zero-cost portfolio's in a single framework.
Streamlining the algorithms for performance was approached in two steps, first, to separate the problem into that of an online-learning algorithm and the agent generating algorithms, then, second, to reduce the log-optimality criterion to semi-log-optimality.
The semi-log-optimal portfolio selection takes on the form b where h(z) = (z − 1) − 1 2 (z − 1) 2 from the second order Taylor expansion of log(z) at z = 1.
A related approach was taken in [19] where they derived an analytic approximation for an efficient universal portfolio.Our simplified mean-variance approach was motivated by their development of an analytic algorithm, the difference here is that we want an algorithm that is online, analytic, explicitly includes zero-cost portfolios, and allows for the restriction of the solution space using some agent generating algorithm directly at each step rather than via side-information.

Active fund separation problem
The determination of the optimal portfolio is sequentially implemented using the exact solution to the quadratic approximation to log-optimality by solving the active fund selection problem.The active fund selection problem is a special case of the mutual fund selection problem [25,24].This will give an analytic approximation that can both cater for long-only fully-invested agents (absolute agents) as well as leverage one 8 zero-cost portfolio's (active agents).
We therefore consider the semi-log-optimal portfolio optimization problem [25,26,24] for return expectation vector µ and asset return covariance matrix Σ with a portfolio control vector ω in terms of the risk aversion parameter γ.The conjugate transpose of a vector is denote as (•) T over a single investment period to define the control problem as: max Here we have changed notation to denote the portfolio controls as ω in order to avoid confusion with the portfolio strategy controls b that are the result of the online-learning algorithm which aims to approximate the semi-log-optimal portfolio selection strategy for aggregate portfolio controls b t for time increment t.
Here the portfolio controls ω are used to generate the agents that populate the agent control set H n,t .It is the agent control set that is then used to generate the semilog-optimal portfolio choice at each time t: b t .
Eqn. ( 16) can be rewritten as the mutual-fund Lagrangian and solved using elementary Kuhn-Tucker methods.Two equations are found in terms of the optimal solution for the portfolio control, ω * , the first gives the quadratic optimal risk-return pay-off, and the second, the fully-invested portfolio investment constraint The Lagrange multiplier is determined by substituting Eqn.(18) into Eqn.(19) to find: This is then used to eliminate the Lagrange multiplier from Eqn. (18) to find a formulation of the mutual fund separation theorem: The first term on the right is the lowest risk portfolio and the second term is the zero-cost portfolio that encapsulates the relative views of the assets.We will typically work with the separation theorem in the form given in Eqn.(21).The second term will give us an efficient method of generating zero-cost portfolio's.
It is then convenient to re-write the Mutual Fund Separation theorem to an Active Fund Separation theorem explicitly from Eqn. ( 21) by defining the lowest risk portfolio as the benchmark portfolio: where The formulae for ω B and ω A will be directly used in the agent generating algorithms based on views encoded in the mean, µ, and the covariances, Σ, as a function of the various agent generating parameters.The resulting controls H n,t will then be determined from the m-th component of either ω A for the active agents or ω B +ω A for the absolute agents for the n-th agent for time-increment t.
For situations where we want agents constructed from zero-cost portfolios we will use the tactical solution from Eqn. (24) to generate the agents for a given k-tuple.In situations where we need fully invested agents we will use the combination of the benchmark fund and the active (or tactical) fund.
Suppressing indexes over the m objects the agent controls for the n-that agent for the two possible cases: (1.) the absolute agents, and (2.) the active agents is then Here the m-th component of H n,t is H nm,t and the portfolio weights are dependent on the agent-tuples x n,t for a given agent For the active agent we enforce the leverage unity constraint at the beginning of each time increment, this can be considered equivalent to setting the risk-aversion γ, at the beginning of each time increment, such that the leverage is always unity.This is an important feature of the algorithm as we do not enforce uniform risk-aversion through time.We rather choose to ensure that capital be fully utilized given the available information.The following sections describe how the agent-tuples are constructed for the various agent generating algorithms.

Agent generating algorithms from patterns
In order to efficiently reduce the space of portfolio controls to efficiently generate a reasonable approximation to universally consistent strategies using Eqn.( 13) we reduce the set of applicable controls using agent-generating algorithms.The agent-generating algorithm we use in our numerical experiments will be a pattern-matching algorithm [20].One can make various decisions about how to break data up into manageable pieces for the various algorithms, the most basic decisions relate to how to break up the data in time, we call this partitioning, the other choice relates to how we break the data up in terms of the objects themselves (often called the features), this we call clustering.Partitioning is typically a more intricate task because this has implications for the algorithm and system structure.
The pattern-matching algorithm is based on two steps subsequent to the choice of clusters s(n): (1.) partitioning and (2.) pattern-matching.Clusters can be chosen by a variety of methods, we would like to promote two methods: (i) correlation matrix based methods [27], and (ii) clusters based on economic classifications of stocks 9 .The prior method, correlation based methods, have outputs that can be directly used as inputs into the algorithms discussed here, specifically via s(n), the cluster membership parameters.It is however, the method based on fixed economic sector classifications [29], that will be explicitly used in this paper for the intraday experiments in Section 6.5, this is both for speed and simplicity10 .
In the daily numerical experiments we have ignored the impact of clustering and used the clusters s(n) of the n-th stock as being trivial, i.e. we consider a single stock cluster that includes all m objects.The inclusion of clustering indexing can be important to the practical implementation of these techniques as it is often useful to restrict trading signal decisions to similar stocks.There is a wealth advantage to this, as we have shown when we considered the impact of clustering for the numerical experiments using intraday data (see Table 15).
The pattern matching algorithm is split into two key components: First, the partitioning algorithm, which selects collection of time-order features from the full set of feature data.Second, the pattern-matching algorithm, where given a measured pattern derived from the feature data, is used by the algorithm to find similar patterns in a given partition of the feature data.

Partitioning
Subsets of time-ordered data are selected from the original time-order data for a given collection of objects.The collection of objects can in turn be a sub-collection of the original set of objects.Partitioning takes place in the time domain while clustering is in the object dimension.The purpose of partitioning is to prepare data subsets for pattern-matching [22].Four distinct approaches to data partitioning are enumerated here, however only the trivial partition is used in the experiments.
A partition is a collection {p t } represented by a logical vector of the length of a given time-series where true is represented as one and false as zero to index membership in a given partition.When a partition is determined from features that determine the state of the system at a given time we will use that partition to represent the system in that state for the sake of pattern-matching.
For the numerical experiments presented here we will use variations of the trivial partition: Here all the temporally ordered data is kept in a single partition as represented by a vector of ones of length of the time-series.
There are wealth advantages associated with more sophisticated partitions.We considered four different partitioning approaches: the trivial partition, the over-lapping partition 11 : were data membership in partitions is repeated in order to bias the data towards a given time, for example, the last time-increments is repeated across all partitions for time-series of length T, the exclusive partition where the partitions are mutually exclusive subsets of the full partition, and the side-information partition [18].The most heuristically useful partition is that of the side-information partition where partitions can be preselected in the partitioning algorithm based on rules conditioned on side-information [18], partitioning can be both 11 Example of length T overlapping partition of features: {pt} T = {(0, . . ., 0, 0, 1), (0, . . ., 0, 1, 1), . . ., (1, . . ., 1, 1, 1)} .useful as a nuanced exploitation of information, for example by splitting feature data over different regimes, and thus to generate distinct agents for different regimes, and as an effective approach to parallelization of algorithms.
Here we would partition the time-series based on sideinformation arising from additional features drawn from the system being observed as in [18].For example, we could use a Markov-switching algorithm with states, assign each time in the time-series a state index and the define the partition membership based on states, or we could choose a feature as side-information and -tile the data into groups and then based on whether a given time has a side-information feature in a particular group it would be assigned to a given partition.
Partitioning serves as a convenient mechanism for breaking up the feature data into distinct states.This can be useful when choosing to search for patterns when the system is in a distinct state as it will enable the algorithm to search for patterns only in historic data residing from times in the past when the system was in a similar state.By combining a partitioning algorithm with a state-detection algorithm one can both improve computational times as well as algorithm performance in terms of wealth generation [28], this is not explored further here.

Pattern-matching
The pattern-matching algorithm will take a k-tuple and search a given partition of the feature data for similar patterns by finding the smallest distance measure between the k-tuple and data in a given partition.This best matching set of data in the partition will then be used to determine a pattern-matching time j .The matching time will then be used to select a future outcome some time period τ ahead of the matched pattern.This future outcome is used to construct a tuple of data, the agent-tuple, iteratively using the look-ahead rule: j n = j + τ .A number of such pattern-matches will be accumulated to construct the agent-tuple x n,t and from this a mean and covariance are computed.
This mean and covariance will then serve as the input into Eqn.(22) to determine that agent controls H n,t+1 , the n-th agents controls to be held for time-period t + 1.
The pattern-matching algorithm is split into two separate algorithms.The first algorithm, which we will call the pattern algorithm, generates patterns to be matched and partitions of data into which the pattern will be matched.The second algorithm will then take the pattern and the data partitions and generate matching times.The matching times will then be used to generate an agent-tuple x n,t .
The pattern algorithm generates a k-tuples {x t t−k } s(n) [22] for matching, and a data partition {x t } (p ,s(n)) using a predefined temporal partition {p } of the data and the cross-sectional cluster for the n-th agent s(n).This is iteratively done for each agent as enumerated by the parameters that define a given agent: the cluster membership w = s(n) of the n-th agent, the partition variable , the k-tuple variable k and the look-ahead horizon variable τ .
For each set of variables that define the n-th agent the pattern algorithm will then call the matching algorithm.

partitions {p }
for n-agents do The matching algorithm will find matches for the ktuples, x t t−k in the partitions.If there is a single partition of data, the matching algorithm will find the ˆ closest matches.We consider two rules for calculating ˆ and will refer to these as rule P .This rule is introduced in order to easily compare our algorithms with prior literature, more specifically [20,22].The difference is related to how the partitions are defined and implemented.
We consider the trivial rule: ˆ = and the rule required to recover the Nearest-Neighbour (NN) algorithm performance described in [22].The Györfi et al Nearest Neighbour rule is where ˆ is determined by a variable p ∈ (0, 1).The choice of p used in the experiments is the same as in [22].
Where t represents the number of time periods in the history, and the floor is taken to find the smallest partition at the given time.This modification serves primarily to allow us to recover prior results in the literature using the framework we implemented in the software for the numerical experiments.
If there are partitions of data the algorithm will find the best match in each partition.The matching algorithm will find best matches and from those best matches extract matching times j associated with the time of each k-tuple match.From the look-ahead rule the matching algorithm will then construct the agent-tuple x n,t .The matching algorithm will then compute the agent-control for this given agent-tuple h n,t .
The distance between tuples is the 2-norm.Although we could use the distance between two matrices as the general distance in the algorithm, we have chosen to differentiate selecting the most recent vectors of object features and the test-tuple as the vector distance between these two vectors only for the case of k = 1, while for k > 1 we measure the distance of each object from the same object at a difference time independently from other objects.This will rather allow us to search for the best fits of objects independently rather than in collective.This is an important refinement, in the original version of the algorithm we followed [22] and used the 2-norm in full generality independent of the window size k we found better performance by independently selecting for patterns using column-wise computed distances.

OHLC data
The data we will consider will be sequential data, but not necessarily continuously sequential.For this reason we will study OHLC (Open-High-Low-Close) bar-data where the closing price of a given bar is not necessarily the opening price of the subsequent bar of the data.We will first study daily sampled data and then intraday data.The algorithms will be initially tested using synthetic data (see Section 4.2), and then the real world test data used in prior research [17,22] (See Section 4.3) which are sequences of daily sampled closing prices.
The data and algorithms can be easily extended to accommodate additional features as side-information [18]; such as volumes, spreads, and various financial indicators and asset specific and state attributes.The sideinformation can be trivially used to re-partition data into additional sets of agents and then used as inputs into the learning algorithm.The wealth performance enhancement relating to the side-information extension is not demonstrated in the numerical experiments presented here.
OHLC bar data is typically represented by a candle-stick graph as in Figure (1).The time-series data is such that the closing price of time-increment t is not necessarily at time t + 1 the start Algorithm 3 MATCHING Algorithm (MTA) Require: 1. look-ahead-rule τ for t-state do for p ∈ {p } do for j-states ∈ p do find a test-tuple s distance measure of dim(objects) { j } p ← m,j = j ∀m else column-wise 2-norms for matrix j,k of time increment t + 1.The closing price can in fact be at some time t + δ for some arbitrary data-specific timeincrement δ.A low-frequency example is that of a typical trading day on the JSE, the market opens in the morning with some opening price, o t , at 9h00, the market may then close at some closing time 17h00, after a closing auction period, the official closing price c t , is then printed soon after the market close (perhaps after some randomisation period).The market is then closed for some time-period over-night until the market opens again on the subsequent day.There is a period, δ, when the market is close and as such information is not continuously being priced into the traded assets.Information that accumulates over-night will then be priced into the market prices through the process of the opening auction and subsequent trading in the various assets.
Our approach to OHLC data is applicable to a variety of synchronously sampled or re-sampled data sets, including intraday data: 1. close-to-close: Here the prices p m,t for the m-th assets are the time-series of close prices.The price relatives x m,t are then the computed from the close price timeseries c m,t The algorithm is trying to exploit information relating to price changes from the close of trading of one time increment to the close of trading of a subsequent time increment.

open-to-close:
Here the prices p m,t for the m-th assets are the ordered time-series pairs of open and close prices on the same data the price relatives are then computed as Here one is trying to exploit price relative changes within a trade increment, for example, across a single day from the market opening to the market close ignoring the over-night price changes.

close-to-open:
Here the prices p m,t for the m-th asset are the price changes from the close of the trade period at t − 1 to the next trade period at time t Here one is looking to exploit the change in prices between trade periods where the information cannot yet be fully reflected in trading until the trading commences in the next trade period.

Feature δ t
Figure 2: Feature time-series investment period for the t-th time increment showing that the end of the t-th increment does not always have to coincide with the start of the next, here the t+1-th, investment period.The opening price is denote as o m,t and the close price for the period as c m,t for the m-th asset.
This is looking for inefficiencies in the prices changes from market opening to market opening.
The important missing component of information is that related to volume (and additional features such as spread, order-imbalance and order-book resilience for intraday data).For example, the opening price is a less reliable price when it has been determined off significantly lower volumes of trading, as compared with a typical closing price.In the case where the closing auction of a given market has more volume than the typical opening auction the relative uncertainties in the prices can be substantial.The typical time increment for a given feature is given in figure (2).We promote the use of a state-detection algorithm and side-information partitioning in order to address these types of concerns.In the context of this work such issues do not change our conclusions.It is expected that the learning algorithm will still attempt to maximise the long-term wealth given a specific agent generating algorithm for a given feature set.For both daily data and intraday data the feature set that is of most interest to us in this study will be those associated with the "close-to-close" and "close-to-open" price relative features.

Synthetic Data
The algorithm was tested on four synthetic data cases (SDC) for both active and absolute portfolios.The synthetic data was generated for 10 stocks over 1000 time periods.The price relatives x m,t for each stock at each time period was randomly generated from a lognormal distribution (lognrnd function in MATLAB generated using the Mersenne Twister psuedorandom number generator [31] and initialised using a specific seed value), each synthetic data case defines a mean, µ, and variance, v, used to generate the dataset.The mean, μ, and standard deviation, σ, of the associated normal distribution is given by : Table 1 summarises the four synthetic data cases, each case was generated 30 times and initialised with seed values 1, 2, . . ., 30 respectively.
1. Synthetic Data Case 1 (SDC 1): was generated from a lognormal distribution with a mean price relative, µ = 1, and a variance, v = 0.0002, to simulate a stock market where there is no significant increase or decrease in the value of a stock over time.
The expected outcome is that neither the active portfolio nor the absolute portfolio will be able to learn which stocks it should hold a long position or short position.
2. Synthetic Data Case 2 (SDC 2): was generated from a lognormal distribution with a mean price relative, µ = 1.001, and a variance, v = 0.0002, to simulate a stock market where the value of the stocks are increasing over time.
The expected outcome is that the absolute portfolio will learn which stocks to hold a long position on, however the active portfolio will not be able to learn which stocks to hold a short position on as no stocks decrease in value over time.
3. Synthetic Data Case 3 (SDC 3): was generated from a lognormal distribution with a random mean price relative, µ ≥ 1, assigned to each stock and a variance, v = 0.0002.The random means is calculated as follows: where δ is a random number generated from a standard normal distribution (using the randn function in MATLAB with the Mersenne Twister psuedorandom number generator [31] and initialised using a specific seed value).This simulates a stock market where some stocks are increasing in value and some stocks are decreasing in value over time.
The expected outcome is that both the active portfolio and the absolute portfolio will learn to hold a long position on the stocks increasing in value over time and hold a short position on the stocks decreasing in value over time, however it is expected that the absolute portfolio will beat the active portfolio due to the growth rate of the stocks increasing in value over time.1: The means and variances that were chosen when generating the synthetic data sets.The random means for SDC 3 was calculated using Eqn.(37) and the means for SDC 4 was generated as described in section 4.

Summary of
4. Synthetic Data Case 4 (SDC 4): was generated from a lognormal distribution with mixed means assigned to the price relatives, µ = 0.999 was assigned to 3 stocks and µ = 1.001 was assigned to the remaining stocks, and a variance, v = 0.0002.This dataset simulates a stock market where the value of some stocks are increasing and the value of some stocks are decreasing.
The expected outcome is that both the active portfolio and the absolute portfolio will learn to hold a long position on the stocks increasing in value over time and hold a short position on the stocks decreasing in value over time.

Real Data
The algorithm is tested on four sets of real data, summarised in Table 2, two data sets from the New York Stock Exchange (NYSE) obtained at [32] and two data sets from the Johannesburg Stock Exchange (JSE) obtained at [33].

JSE OHLC Data:
This was obtained from Thomson Reuters Tick History (TRTH) [33] and contains daily data for 42 stocks listed on the Johannesburg Stock Exchange (JSE) from 1995-2015 (using RIC chain 0#.JTOPI), however not all of the 42 stocks
were listed in 1995 and the data for these stocks begins at a later time and missing data were handled by assigning a price relative of 1 for that day.

JSE Intraday Data:
The transaction data was obtained from Thomson Reuters Tick History (TRTH) [33] and consisted of top-of-book and transaction updates for 40 stocks listed on the Johannesburg Stock Exchange (JSE) during 2013 in RIC chain 0#.JTOPI.
The transaction data was converted into 5-minute bar data using the trade price and volume weighted averaging.The 5-minute bar-data starts at 9h30 and ends at 16h30 for normal trading days and starts at 9h30 and ends at 11h30 for early close days.A normal trading data on the JSE starts with an opening auction between 8h30 and 9h00, continuous trading takes place between 9h00 and 16h50, and the day ends with a closing auction between 16h50 and 17h00.

Implementation
The wealth achieved by the portfolio and the wealth achieved by the agents is determined using Algorithm 1 (OLA).The agent controls H n,t , introduced in section 2, used in Algorithm 1 is determined by using Algorithm 2 (PMA), which calls up Algorithm 3 (MTA) 16 to determine the agent controls for each agent.Algorithm 3 updates an agents wealth as described in Eqn.(25).In the experiments 50 agents were used with K = (1, 2, . . ., 5) and L = (1, 2, . . ., 10), similar to choice of agents ('experts') used by Györfi et al in [20,22].
All Results and data processing was done in MATLAB.The algorithm was implemented for both the absolute and active case using a MATLAB class that we named pattern, a MATLAB class was used instead of function because this allows the algorithm to easily be extended to a more online approach.The pattern class was extended to include our recovered version of the Györfi et al Nearest Neighbour [22] algorithm so that the running time comparisons in section 6 will be accurate.The Cover et al [17] Universal Portfolios algorithm was recovered by creating a MATLAB function that implement the algorithm.

Synthetic Data
The algorithm was tested on four synthetic data cases (SDC) to illustrate how the algorithm performs in different types of markets.Table 3 displays the best and average wealth achieved by the active and absolute portfolios for 30 runs of each synthetic data case initialised with seed values 1, 2, . . ., 30 respectively.
On all of the datasets the algorithm, when using absolute portfolio, eventually learns the stocks that are increasing in value over time as observed for SDC 2, 3 and 4. Similarly the algorithm, when using active portfolio, eventually learns to hold a long position on the stocks that are increasing in value over time and hold a short position on the stocks that are decreasing in value over time; as observed for SDC 3 and 4. Figures 3, 4, 5 and 6 shows the wealth achieved by the active and absolute portfolios, as well as the wealth achieved by each synthetic stock when randomly generated using an initial seed value of 7.
Tables 4 and 5 displays average p values from the twosample Kolmogorov-Smirnov tests when comparing the following combinations of the total wealth gained from the portfolio (S 1 ), the wealth gained from the best agent of the portfolio (S 2 ) and the wealth gained from the best stock (S 3 ):  4: Comparisons of the average p values of the wealth gained from the active portfolio.The first p value in each column is average p value, of the 30 data sets for each case, using two-sample Kolmogorov-Smirnov tests for the alternative hypotheses (Hyp.).The second p value is obtained from the two-sample Kolmogorov-Smirnov tests for the alternative hypothesis that the cumulative distribution function (CDF) of the p values for the 30 data sets for each case is larger than the CDF of the average p value at the 5% significance level.
2. S 2 > S 3 : The alternative hypothesis that the CDF of the wealth gained from the best agent of the portfolio, S 2 , is larger than the CDF of the wealth gained from the best stock, S 3 , at the 5% significance level.
3. S 3 > S 1 : The alternative hypothesis that the CDF of the wealth gained from the best stock, S 3 , is larger than the CDF of the total wealth gained from the portfolio, S 1 , at the 5% significance level.
The two-sample Kolmogorov-Smirnov test was chosen because it is a non-parametric test and makes no assumption about the distribution of the datasets.

NYSE Data
The algorithm was run on the NYSE data set for both absolute and active portfolios on the same pairs of stocks used by Cover in [17] and by Györfi et al in [20,22].Table 8 shows the wealth achieved by the active and absolute portfolios and is compared to reference results from the literature when using the nearest neighbour strategy (G N N ) by Györfi et al [20,22] and the universal portfolio strategy (UP) by Cover [17].
G * N N denotes our best recovery of the results of the nearest neighbour strategy [20,22].The results achieved by from the universal portfolio strategy was identically recovered [17].The last row of table 8 shows the results of the strategies when running on all 36 NYSE stocks.
The algorithm compares well to the two stocks combinations used by Cover in [17] and by Györfi et al in [20,22].A surprising result is how the wealth achieved by the portfolio when run over all 36 stocks compares to results by p > p SDC 1 0.893 0.407 0.000 0.000 0.647 0.013 SDC 2 0.642 0.013 0.000 0.000 0.567 0.001 SDC 3 0.593 0.002 0.000 0.000 0.795 0.100 SDC 4 0.846 0.274 0.000 0.000 0.644 0.006 Table 5: Comparisons of the average p values of the wealth gained from the absolute portfolio.The first p value in each column is the average p value, of the 30 data sets for each case, using two-sample Kolmogorov-Smirnov tests for the alternative hypotheses (Hyp.).The second p value is obtained from the two-sample Kolmogorov-Smirnov tests for the alternative hypothesis that the cumulative distribution function (CDF) of the p values for the 30 data sets for each case is larger than the CDF of the average p value at the 5% significance level.Györfi et al in [20,22], this may be due to a loss of accuracy in the quadratic approximation step of the algorithm as the number of stocks increase.

NYSE Merged Data
The algorithm was run for both absolute and active portfolios on the NYSE Merged dataset on two stock combinations and on all of the 23 stocks in the dataset.The two stock combinations chosen were stocks Commercial Metals and Kin Ark Corp. and stocks IBM and Coca-Cola.

Daily sampled JSE data
The algorithm was run for both absolute and active portfolios on various sets of two stock combinations, namely stocks AngloGold Ashanti Ltd and Anglo American PLC, stocks Standard Bank Group Ltd and FirstRand Ltd, and stocks Tiger Brands Ltd and Woolworths Holdings Ltd.The algorithm was also run for both absolute and active portfolios on a combination of 10 stocks, 20 stocks and 30 stocks.In each case the date for which the data of a stock starts may be different, the time period for the algorithm therefore starts with the stock that has a

Intraday JSE data
The algorithm was run for both absolute and active portfolios on various sets of two stock combinations, namely stocks AngloGold Ashanti Ltd and Anglo American PLC, stocks Standard Bank Group Ltd and FirstRand Ltd, stocks Tiger Brands Ltd and Woolworths Holdings Ltd, and stocks MTN Group Ltd and Vodacom Group Ltd.The algorithm was also run for both absolute and active portfolios on the same combination of 10 stocks used on the JSE OHLC Dataset.

The Impact of Market Frictions
An important criticism of any strategy simulation relates to the need to account for the impact of market frictions, this includes: transaction costs, the cost of the capital for trading, the cost of market access, the cost of regulatory capital for taking risky trading positions, and market impact.These are all required to be included in any estimate of performance slippage for any realistic assessment of the viability of trading activity.

Daily strategy trading frictions
The argument that the zero-cost low frequency (daily traded) strategies are viable, even when unleveraged, is  Here we find that there is no particular combination of OHLC data for which there is a systematic preference, e.g.close-to-close, the case of considering the close price change from one day end to another is not systematically more profitable than other combinations of data times.These tests do consider the reality of trading prior to a time point, for example market close, one cannot a-priori know what the close price will be, this has to be approximated.This excludes price-impact effects.This demonstrates the speed advantage of using the analytic quadratic approximation as compared to numerically solving the log-optimal constrained optimization at each time-step for each agent combination.As expected the fully invested analytic solution is fastest, the zero-cost portfolio next, because of the additional leverage constraint, and the slowest the algorithm that required the numerical solution of the optimization.
based on the 7 years of history 17 .Consider Table 10 for the active case for the Top 10 JSE stocks (see appendix Appendix A).Here we would argue for 15bps of daily profit before costs (from Table 10 using the accumulated daily wealth of 9.53).We consider the strategy that trades close of the one day to the close of the next day (close-toclose).This is considered in order to take into account liquidity effects.The closing auction is the most liquid time to trade on the JSE.It is unlikely that one would be able to achieve low slippage trading near the daily market opening.We consider the combination of cost of capital (the borrowing costs required to source trading capital and cost of regulatory capital) and a small penalty for slippage due to the differences between the realized closing prices and the estimated closing prices that the algorithm would require in order to estimate the portfolio controls18 as 10bps per day.In practise it should also be noted that such a trading strategy can be converted to one that trades in equity swaps, so-called contracts-for-difference (CFDs), this would convert the uncertainty about slippage into an up-front fee and allow for excellent implementation of the required model positions with a known cost and no meaningful liquidity concerns.If the daily strategy was implemented with these types of delta-one instruments our estimates of slippage can then be considered conservative.
We argue that we can realistically earn a modest 5bps of unleveraged self-funded trading profit per day, or an annual return of 12% of unleveraged profit-and-loss.

Intraday strategy trading frictions
We assume: 1.) a daily slippage of 50bps for the (selffinancing) zero-cost statistical arbitrage strategy, 2.) borrowing costs on the capital required for trading over the year to be 10%, and 3.) that the strategy we denote as the active strategy generated a 4.63 wealth gained over a year of trading (see Table 15).Putting these together we argue for an upper limit on the profit, even when unleveraged, to be a return of 20% for 250 days of trading 19 .
The turn-over is important in realistic assessments of profitability for intraday trading.We try to account for this in our indicative costing of the slippage by assuming that we have 100% turn-over of inventory at each trading period with a consistent cost of 0.55bps (0.0055%), per trading period20 with an additional 4bps to give an indicative slippage of 50bps for the intraday trading per day.

Conclusion
In prior work it has been shown that in South African financial markets persistence and long-memory are generic [2].This paper adds to our knowledge of the South African market by showing that in addition to evidence supporting long-memory processes, price processes have patterns that are exploitable in a straight-forward manner.
We provide a simple portfolio value based learning algorithm, a multi-manager in the language of asset management, that selects an over-all portfolio with weights b by considering a selection of N different strategies H n with their underlying portfolio weights being constructed for underlying strategies that are enumerated over a variety of combinations of time-series patterns, time-scales, clusters and partitions.This is considered in the context of universally consistent strategies [17,22] but with an extension to directly consider self-financing zero-cost quantitative trading strategies -what we call the active portfolio.
When applying the algorithms to real daily test data, it compares well to results from our implementation of algorithms from the literature [17,22] and actual results from the literature from the New York Stock Exchange (NYSE) dataset (see Tables 8 and 9).
The active version of the algorithm, when applied to intraday data from the Johannesburg Stock Exchange (JSE), is shown to have performed well in comparison to the best stock, and compares favourably with methods from prior work [17,22] (see 14).We show that on the Johannesburg Stock Exchange data the algorithms can learn trends and patterns and enhance out-of-sample wealth accumulation for both daily and intraday applications (see Tables 10 and 14).This is demonstrated on both low frequency data, daily sampled data, and higher frequency data, intraday uniformly sampled transaction data.
We have shown that there is an advantage to include agents that are clustered on stock economic sector classifications (see Table 15); this increases the number of agents (or experts) considered by the learning algorithm through including sector membership into the resource, financial and industrial stocks sectors, which in turn boosts the out-of-sample performance.This suggests that combining more sophisticated clustering algorithms [27] with machine learning can be advantageous in the domain of quantitative trading.
The pay-off between computational performance and wealth accumulation can be seen by considering the increased duration of the simulation as one increases from 10 stocks, to 20 stocks through to 30 stocks in Figure 13.The commensurate loss in performance can be seen in Table 14.For example, the 20 stock simulation generated a wealth of 1.74 for the absolute portfolio and 5.68 for the Gyorfi et al nearest-neighbour strategy with the absolute portfolio being almost 5 × 10 4 seconds faster (or 18% faster).For intraday statistical arbitrage problems for quantitative trading with many (50>) assets, computational delays can lead to lags between information arrival and order-execution that can negatively impact a strategies profit-and-loss performance.
We have shown that in the daily dataset for the Johannesburg Stock Exchange, when considering open, high, low and close price data, there is an advantage when considering strategies that relate to the patterns arising across closing price to closing price data (see Table 10).It is difficult to profitable trade the market opening price to the market closing price as intraday dynamics seems to become important and one tends to incur significant market frictions associated with poor market liquidity near the market opening.This provides evidence that one can in principle beat the best stock (or the money market account in the case of the self-financing strategy) as pattern persistence is sufficiently robust in the markets considered.
Towards addressing the key criticism related to correctly estimating the impact of market frictions, in Section 6.6 we give our estimates for the impact of market frictions on both the intraday strategy, where an estimated annual return of upto 20% for the unleveraged self-financing strategy, and the daily strategy, trading the closing price of the market from one day to the next, at an annual return of upto 12% (see Section 6.6).Using this we argue that the self-financing zero-cost portfolio strategy can be considered tractable both intraday and across days, that after a reasonable estimates of costs, one is still able to learn how to exploit patterns the recur in the financial times-series data considered in this study.It should also be noted that for optimised intraday trading the event-time paradigm should be implemented rather than the calendar-time approach that was used for simplicity in the experiments in this paper.This is fairly straight-forward to implement using equal volume buckets and online down-sampling transaction data to a time-series of volume-weighted average prices for equal volume buckets [35].
The aim here is to show that there are repeated patterns that can be exploited on both daily samples and intraday time-scales.We do not claim that being able to exploit such patterns is necessarily profitable as a commercial enterprise, what we are claiming is that structure does exist in the financial market time-series that is indicative of existence of repeated structures that emerge and change through time, but after reasonable costs can be consid-ered a riskless profit, or at least be considered a signature of the ability to generate systematic profits from patterns in financial time-series data.
We do not know whether there is a finite state representation of the system that could be used to generate the observed time-series dynamics.We have evidence for non-linear structure in the time-series data, by providing a simplistic algorithm that can exploit structure in timeseries data, when it exists, and we know that the algorithm would behave quite differently for random data.To show that this is indicative of some finite state representation would require online state-detection, either via some type of cluster methodology [28], or via some sort of state-space reconstruction algorithm following the methods of deterministic chaos [5,6,7].This paper makes no statement about the existence of a finite and sufficiently stable finite state representation.
The other criticisms could relate to both barriers to entry to reasonably cost effect market access, as well as the scalability of these types of strategies due to stock liquidity.In terms of the prior, many proprietary trading structures within hedge-funds and banks would have very low transaction costs due to bulk trading activitieshence we consider our daily transaction costs of 50bps as onerous but realistic.In terms of the liquidity concerns, we have limited ourselves, in the Johannesburg Stock Exchange data set, to collections of the 10 and 20 most liquid stocks.These stocks can be traded in meaningful volumes.
We could speculate that it is the buying and selling patterns of large institutional mutual funds, or capital flows of large institutional participants in capital markets, that create key feed-backs which generate persistence in patterns of price dynamics.Realistically there are a variety of potential candidate sources of top-down and bottomup feedbacks within a system as complex and adaptive as the financial market systems; these could provide various mechanisms that can balance disorder with order in a meta-stable configuration of states over various timescales [3].We argue that fairly naive computational learning agents can generate wealth within the system without special insights or understanding of the system itself.

Acknowledgements
TG would like to thank AIMS South Africa for their support and hospitality at Muizenberg.The authors would like to thank Diane Wilcox for conceptual contributions, Turgay Celik for discussions and ideas relating to algorithm testing, Dieter Hendricks and Raphael Nkomo for various conversations relating the quantitative trading and machine learning for trading.This work was in part supported by NRF grant number 89250.The conclusions herein are due to the authors and the NRF accepts no liability in this regard.

Figure 1 :
Figure 1: The feature time-series data is best thought of as OHLC (Open-High-Low-Close) bar data.The filled box in the candle chart denotes the situation where the close price is lower than the open price, conversely the unfilled box has the close price higher than the open price.

Figure 3 :Figure 4 :Figure 5 :Figure 6 :
Figure 3: (a) The wealth achieved by the active and absolute portfolios on SDC 1 that consists of a time period of 1000 and 10 stocks.(b) The wealth of each randomly generated stock.

Figure 7 :
Figure 7: Comparison of the wealth gained from different methods when investing in (a) iroqu and kinar (b) 36 stocks from the NYSE dataset.

Figure 8 :
Figure 8: Comparison of the wealth gained from different methods when investing in (a) comme and kinar (b) 23 stocks from the NYSE Merged dataset.
later starting time.The JSE OHLC dataset was processed into four datasets containing close-to-close, open-to-close, close-to-open and open-to-open price relatives, the algorithm was run on each of these datasets.

Figure 9 :Figure 10 :
Figure 9: Comparison of the wealth gained from different methods when investing in (a) ANGJ and AGLJ (b) 10 stocks (c) 20 stocks (d) 30 stocks from the JSE OHLC close-close dataset.This does not account for price-impacts and frictions, nor for the need to approximate an expected close price just prior to market close as one solves for the portfolio controls, there will always be a difference between the controls solved for just prior to market close and those required once the market has closed and the official closing prices printed.

Figure 13 :
Figure13: Running time of the portfolios in seconds of the different strategies.This demonstrates the speed advantage of using the analytic quadratic approximation as compared to numerically solving the log-optimal constrained optimization at each time-step for each agent combination.As expected the fully invested analytic solution is fastest, the zero-cost portfolio next, because of the additional leverage constraint, and the slowest the algorithm that required the numerical solution of the optimization.
[32]22]ontains close-to-close price relatives for 36 stocks listed on the New York Stock Exchange from 1962-1984.This is the same data set used by Györfi et al in[20,22]and Cover in[17].2. NYerged Data: This is described in[32]13and the dataset contains close-to-close price relatives data for 23 stocks listed on the New York Stock Exchange from 1962-2006.The data of the 23 stocks during 1962-1984 is identical to the data described above in point 1.
14.The data lists the open, high, low and close prices for all of the 42 stocks.This raw data was processed into four datasets containing close-to-close, open-to-close, close-to-open and opento-open price relatives respectively.Splits, mergers15

Table 3 :
from Investing in Synthetic Data Wealth achieved by the active and absolute portfolios for 30 runs of each synthetic data case.
Average p Values of Wealth Gained (S) from theAbsolute Portfolio

Table 6 :
Comparison of the average p values from twosample Kolmogorov-Smirnov tests for the alternative hypothesis that the cumulative distribution function (CDF) of wealth gained from the active portfolio on SDC i is larger than the CDF of wealth gained from the active portfolio on SDC j at the 5% significance level, where i represents the rows and j represents the columns of the table.The p values is the average of 30 comparisons, each comparison using a seed value of 1, 2, . . ., 30 respectively.

Table 7 :
Comparison of the average p values from twosample Kolmogorov-Smirnov tests for the alternative hypothesis that the cumulative distribution function (CDF) of wealth gained from the absolute portfolio on SDC i is larger than the CDF of wealth gained from the absolute portfolio on SDC j at the 5% significance level, where i represents the rows and j represents the columns of the table.The p values is the average of 30 comparisons, each comparison using a seed value of 1, 2, . . ., 30 respectively.

Table 8 :
Comparison of the total wealth achieved from the active (Act.)and absolute (Abs.)portfolios to the wealth achieved from the Györfi et al nearest neighbour (G N N ), attempted recovery of the Györfi et al nearest neighbour (G * N N ), the universal portfolio (UP) and a buy-and-hold of the best stock strategies.

Table 9 :
Comparison of the total wealth achieved from the active (Act.)and absolute (Abs.)portfolios to the wealth achieved from the attempted recovery of the Györfi et al nearest neighbour strategy (G * N N ), the attempted recovery of the universal portfolio strategy (U P * ) and a buy-andhold strategy of the best stock (Best).

Table 10 :
The total wealth achieved by the active (Act.)and absolute (Abs.)portfolios compared to the wealth achieved from the attempted recovery of the Györfi et al nearest neighbour strategy (G * N N ), the attempted recovery of the universal portfolio strategy (U P * ) and a buy-andhold strategy of the best stock (Best) on the close-to-close dataset.

Table 11 :
The total wealth achieved by the active (Act.)and absolute (Abs.)portfolios compared to the wealth achieved from the attempted recovery of the Györfi et al nearest neighbour strategy (G * N N ), the attempted recovery of the universal portfolio strategy (U P * ) and a buy-andhold strategy of the best stock (Best) on the close-to-open dataset.

Table 12 :
The total wealth achieved by the active (Act.)and absolute (Abs.)portfolios compared to the wealth achieved from the attempted recovery of the Györfi et al nearest neighbour strategy (G * N N ), the attempted recovery of the universal portfolio strategy (U P * ) and a buy-andhold strategy of the best stock (Best) on the open-to-close dataset.

Table 13 :
The total wealth achieved by the active (Act.)and absolute (Abs.)portfolios compared to the wealth achieved from the attempted recovery of the Györfi et al nearest neighbour strategy (G * N N ), the attempted recovery of the universal portfolio strategy (U P * ) and a buy-andhold strategy of the best stock (Best) on the open-to-open dataset.

Table 14 :
The total wealth achieved by the active (Act.)and absolute (Abs.)portfolios compared to the wealth achieved from the attempted recovery of the Györfi et al Figure 12: Comparison of the wealth gained from different methods when investing in 20 stocks from the JSE Intraday dataset, the plot includes the results of using clusters on the stocks.It is important to note that the clustered portfolios have 150 agents and the portfolios without clusters have 50 agents.