Segmentation of time series in up- and down-trends using the epsilon-tau procedure with application to USD/JPY foreign exchange market data

We propose the epsilon-tau procedure to determine up- and down-trends in a time series, working as a tool for its segmentation. The method denomination reflects the use of a tolerance level ε for the series values and a patience level τ in the time axis to delimit the trends. We first illustrate the procedure in discrete random walks, deriving the exact probability distributions of trend lengths and trend amplitudes, and then apply it to segment and analyze the trends of U.S. dollar (USD)/Japanese yen (JPY) market time series from 2015 to 2018. Besides studying the statistics of trend lengths and amplitudes, we investigate the internal structure of the trends by grouping trends with similar shapes and selecting clusters of shapes that rarely occur in the randomized data. Particularly, we identify a set of down-trends presenting similar sharp appreciation of the yen that are associated with exceptional events such as the Brexit Referendum in 2016.

S1 Appendix. Trend length and trend amplitude marginal probability distributions from the epsilon-tau procedure for random walks.
The epsilon-tau procedure presented in the main text -considering time constant patience level τ and tolerance level for the up-trend case ε = max {m+1≤t ≤t} x t − x m , where m is the reference point -imposes restrictions on the sequence of values x t of a time series that can form an up-trend (analogous for down-trend).
The tolerance level ε restricts the values x t of the up-trend [m + 1, m + ] to be always above the reference value x m : x t > x m , ∀t ∈ [m + 1, m + ].
The patience level τ requires that for all points t before the end of the trend m + there is at least one point t in the window [t + 1, min(t + τ, m + )] with at least the same value of t (otherwise the end of the trend would be t because the time between consecutive maximum values would have reached the patience level): ∃t ∈ [t + 1, min(t + τ, m + )] : x t ≥ x t , ∀t ∈ [m + 1, m + − 1]. (2) For points beyond the end of the trend m + , we must have either one of the following set of restrictions arising from the stop conditions: (a) value of time series reaches tolerance level ε: where 1 ≤ µ ≤ τ .
(b) time between consecutive maxima reaches patience level τ : Observe that from the above restrictions we have that the first increment ξ m+1 = x m+1 − x m of an up-trend is always positive and the first increment ξ m+ +1 = x m+ +1 − x m+ beyond the end of an up-trend is always negative.
Using the constraints for up-trends, we derive the trend length and trend amplitude marginal probability distributions for the random walk: where the independent and identically distributed increments ξ t can take value +1 with probability p, −1 with probability q, or 0 with probability r = 1 − p − q. In the derivation, we take the reference point m = 0 to simplify the notation.
Trend length marginal probability distribution For patience level τ = 1, the restrictions on the increments of the random walk translate as: And thus the probability of an up-trend with length for τ = 1 is: For patience level τ = 2, we have two cases according to the trend amplitude a: (i) Trend amplitude a = 1: The probability of an up-trend with length and amplitude a = 1 is: (ii) Trend amplitude a ≥ 2: where ν, 0 ≤ ν ≤ − 2, is the number of zero increments between the first positive increment ξ 1 and the next positive increment ξ ν+2 .
We obtain the probability of an up-trend with length , amplitude a ≥ 2 and number of initial zero increments ν by considering a Markov process in the increments ξ t with transition matrix T (ξ) : Therefore, the probability of an up-trend with length for patience level τ = 2 is: It would be possible to derive the probability distributions for patience level τ ≥ 3 using higher order Markov chains, but the computation becomes involved by such method and we do not develop it here.

Trend amplitude marginal probability distribution
In order to derive the probability of an up-trend with amplitude a for arbitrary patience level τ , we use the combinatorial approach schematized in Fig 1-a: we represent an up-trend of amplitude a by a positive increments intercalated by boxes b h , 1 ≤ h ≤ a, to be filled with sequences of increments with zero sum, i.e., not contributing to the trend amplitude, while respecting the restrictions due to the patience and tolerance levels, and a final box b a+1 to be filled with a sequence of increments indicating the stop of the epsilon-tau procedure (either due to the tolerance level or to the patience level) . Fig 1-b shows the representation of a set of minimal sequences of increments with zero sum having length j (≥ 1) and maximum depth k (≥ 0), where the value of the initial position is only repeated in the end of the sequence -any sequence of increments respecting those limits can occupy the shaded gray area. An indefinite number of such minimal sequences can be inserted in each box b h , 1 ≤ h ≤ a, provided that j ≤ τ (so that the patience level is not reached) and the depth k of the sequence does not reach the reference value x 0 of the up-trend (tolerance level is not reached) . Fig 1-c represents a set of sequences indicating the stop of the epsilon-tau procedure when the tolerance level ε is reached; the initial position of the sequence cannot be revisited and its length j must be at most τ and its depth k must be equal to the trend amplitude a, reaching the reference value x 0 of the up-trend only in the end of the sequence . Fig 1-d represents a sequence indicating the stop of the epsilon-tau procedure when the patience level τ is reached; the initial position of the sequence cannot be revisited either and its length j must be equal to τ and its maximum depth k must be a − 1 (not reaching the tolerance level). We compute the probabilities of each mentioned set of sequences by utilizing a Markov process in the positions y t (and not in the increments ξ t , as done for the trend length). Note the different notation y t for the position in the sequence of each set being studied and not the position x t in the whole up-trend. The transition matrix T k of order k in this case reads as: P (y t = 1 | y t−1 = 1) P (y t = 1 | y t−1 = 2) P (y t = 1 | y t−1 = 3) · · · P (y t = 1 | y t−1 = k) P (y t = 2 | y t−1 = 1) P (y t = 2 | y t−1 = 2) P (y t = 2 | y t−1 = 3) · · · P (y t = 2 | y t−1 = k) P (y t = 3 | y t−1 = 1) P (y t = 3 | y t−1 = 2) P (y t = 3 | y t−1 = 3) · · · P (y t = 3 | y t−1 = k) . . . . . . . . . . . . . . .
(i) Set z jk of minimal sequences of length j and maximum depth k with zero sum : • Case j ≥ 2, k = 0: there is no minimal sequence in this set z jk because for length j ≥ 2 it is necessary at least one negative and one positive increments -and thus a depth k ≥ 1to have a sequence with zero sum. Then: • Case j = 1, k ≥ 0: the only possible sequence in this set z jk is the one formed by a single zero increment with initial position y 0 = 0 and final position y 1 = 0. The probability is: • Case j ≥ 2, k ≥ 1: represented in Fig 1-b, sequences in this set z jk have negative increment from initial position y 0 = 0, positive increment to final position y j = 0 and all intermediate positions in between y t = 1 (otherwise the sequence would not be minimal) and y t = k (the maximum depth). The probability of of this set is given by (using results on powers of tridiagonal toeplitz matrices -reference [28] of the main text): where λ u k+1 = r + 2 √ pq cos uπ k+1 .
(ii) Set s (ε) jk of sequences of length j and depth k indicating the stop of the procedure due to the tolerance level ε: • Case j ≥ 1, k = 0: there is no sequence in this set s (ε) jk because sequences indicating the stop of the epsilon-tau procedure starts with a negative increment and, thus, k ≥ 1. Then: • Case j = 1, k ≥ 2: a sequence with depth k ≥ 2 must have length j ≥ 2. Then: • Case j ≥ 2, k = 1: because sequences in s (ε) jk start and ends with negative increment, there is no sequence in this set. The probability is: • Case j = 1, k = 1: the first increment of a sequence indicating the stop of the epsilon-tau procedure must be negative, which already satisfies the conditions of this set s (ε) jk . Thus: • Case j ≥ 2, k ≥ 2: represented in Fig 1-c, sequences in this set s (ε) jk start with a negative increment from initial position y 0 = 0 and end with a negative increment to final position y j = k; all intermediate positions must be in between y t = 1 (because the initial position cannot be revisited) and y t = k − 1 (because depth k is only reached in the final position). Then, the probability of this set is: (iii) Set s (τ ) jk of sequences of length j and maximum depth k indicating the stop of the procedure due to the patience level τ : • Case j ≥ 1, k = 0: there is no sequence in this set s (τ ) jk since sequences indicating the stop of the epsilon-tau procedure starts with a negative increment (k ≥ 1). Then: • Case j ≥ 1, k ≥ 1: represented in Fig 1- jk start with a negative increment from initial position y 0 = 0 and all other positions must be in between y t = 1 (because the initial position cannot be revisited) and y t = k (the maximum depth). Then, the probability of this set is: We can now write an expression for the probability of an up-trend with amplitude a for arbitrary patience level τ . In each box b h , 1 ≤ h ≤ a, we can insert any number of sequences from the set union τ j=1 z j(h−1) and in box b a+1 we place a single sequence from the set union