Modeling aggressive market order placements with Hawkes factor models

Price changes are induced by aggressive market orders in stock market. We introduce a bivariate marked Hawkes process to model aggressive market order arrivals at the microstructural level. The order arrival intensity is marked by an exogenous part and two endogenous processes reflecting the self-excitation and cross-excitation respectively. We calibrate the model for a Shenzhen Stock Exchange stock. We find that the exponential kernel with a smooth cut-off (i.e. the subtraction of two exponentials) produces much better calibration than the monotonous exponential kernel (i.e. the sum of two exponentials). The exogenous baseline intensity explains the U-shaped intraday pattern. Our empirical results show that the endogenous submission clustering is mainly caused by self-excitation rather than cross-excitation.


Introduction
Self-exciting and mutually exciting point processes are a natural extension of Poisson processes, which are first proposed by Alan G. Hawkes [1,2]. Hawkes processes have been applied to characterize clustering events in finance, particularly to high-frequency data and market microstructure [3,4], because many types of events are clustered in time such as order submissions [5], mid-quotes changes [6], transactions [7] and extreme returns occurrences [8].
As a class of branching processes, self-exciting Hawkes models can be used to compute the so-called branching ratio, which is defined as the average number of triggered events of the first generation per source [9][10][11]. Through calibrating the self-exciting Hawkes model on time series of price changes, the endogeneity and structural regime shifts are quantified in commodity markets [12]. In other words, the branching ratio can serve as an effective measure of endogeneity for the autoregressive conditional duration point processes [13]. In addition, a marked self-exciting process model can successfully characterize intraday clustering of extreme fluctuations and the instantaneous conditional VaR [14]. The Hawkes models are further extended to quadratic by allowing all feedback effects in the jump intensity that are linear and quadratic in past returns [15].
In addition to self-exciting processes, more researchers study the cross-exciting effects through multivariate Hawkes processes. The multivariate Hawkes processes are applied to a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 time trades and mid-quote changes for a New York Stock Exchange stock [16], to study complex interactions between the time of arrival of orders and their sizes [17], to fit the observations of trades-through [18], to measure the resilience of London Stock Exchange order book [5], to account for the dynamics of market prices [19][20][21], to model price change by a selfexciting mechanism and an exogenous component generated by the pre-announced arrival of macroeconomic news [22], and to model financial contagion across six international stock index [23]. The multivariate Hawkes processes are prevalent in modeling high-frequency order book and have been extended to non-linear Hawkes function [24] and new nonparametric kernel estimation procedure [25].
In this paper, we are interested in modeling aggressive market order placement, i.e. orders with the size greater than the opposite best quote. These orders consume liquidity and walk up the limit order book, causing the best-quotes to change. Aggressive market orders are very important in price formation and microstructure. For example, the submission pattern of aggressive market orders may contain information about order splitting behavior according to the liquidity available in the order book. We introduce a bivariate marked Hawkes process to model aggressive market order arrivals. It's reasonable to apply a Hawkes process to model order events because the inter-trade durations have fat tails and long memory [26][27][28]. The Autoregressive Conditional Duration (ACD) model, introduced in [29], can also characterize market order arrival via durations between events. However, the ACD model is indirectly in terms of durations, not directly giving order arrival intensity as the Hawkes model. We find that the exponential kernel with a smooth cut-off (i.e. the subtraction of two exponentials) at short times produces better calibration than the monotonous exponential kernel (i.e. the sum of two exponentials) does. Our empirical results show that the endogenous submission clustering is mainly caused by self-excitation rather than cross-excitation. In addition, the exogenous aggressive order arrivals show obvious intraday pattern.

The model
Let N 1 and N 2 denote the counting processes for aggressive market buy orders and aggressive market sell orders. These two processes are assumed to form a bivariate Hawkes process with intensities λ 1 and λ 2 , where μ i > 0 is a baseline intensity describing the arrival of exogenous events, and the kernels ϕ ii and ϕ ij represent respectively the self-exciting and cross-exciting effects. The kernel ϕ(t − s) describes the impact of a previous event at time s on the current intensity at time t. Previous studies advocate the use of exponential or power-law kernels. Here we use the difference of two exponentials as the kernel to account for the self-or cross-excitations: where v j is the share volume of the order event. The negative exponential term provides a smooth cut-off at short times. This function has several advantages. First, it satisfies ϕ ij (0) = 0, since we cannot expect market participants to react instantaneously to events. Second, it allows excitations smoothly increase to the highest and then gradually fade over time (see Fig 1), which is more reasonable to characterize the reaction of market participants. A few researches also suggest similar kernel functions [9,11]. In our empirical analysis, we will compare the kernel in Eq (2) with the sum of exponentials kernel below In this case, we also set ϕ ij (0) = 0 like other literatures. Third, compared with power-law kernels, the use of exponential kernels can reduce the computational complexity from OðN 2 Þ to OðNÞ. Stationarity condition. A multivariate point process is stationary if the joint distribution of any number of types of events on any number of given intervals is invariant under translation. According to Theorem 7 in [30], the stationarity condition of a multivariate point process is that, the matrix Q with entries q ij ¼ R þ1 0 j� ij ðuÞjdu has a spectral radius strictly less than 1. For our bivariate Hawkes model with the kernel in Eq (2), the matrix Q is In the same way, the matrix Q for the kernel in Eq (3) is We recall that the spectral radius of the matrix Q is defined as ρ(Q) = max a2ℓ(Q) |a|, where ℓ(Q) denotes the set of all eigenvalues of Q. Parameter estimation. For each type of orders, μ i is taken as a seasonal piecewise linear spline with 4 knots at 9:30am, 10:00am, 10:30am, 11:30am for the morning session, and 13:00pm, 14:00pm, 14:30pm, 15:00pm for the afternoon session. Therefore, the intensity λ i dependents on the following parameter set θ i , In other words, there are 12 parameters to be estimated for each type of orders and for each time interval. Suppose that the data is observed over the interval [0, T], then the maximum likelihood estimates for θ i can be obtained by maximizing where {t n,i } is the sequence of the times of the events of type i (see Theorem 3.1 in [16]).
Mis-specification testing. The quality of the fits is then assessed on the time-deformed series of durations {τ n,i }, defined by wherel i is the estimated intensity and {t n,i } are the empirical time stamps. If a Hawkes process describes the data correctly, the values of τ n,i must be independent and exponentially distributed with the rate equal to 1. This can be verified visually in QQ-plots and rigorously with the Kolmogorov-Smirnov test [16].

Data
We use order flow data of the stock China Vanke (000002.SZ) traded on the Shenzhen Stock Exchange from April 10th, 2003 to May 20th, 2003. The China Vanke is one of the stocks with high liquidity. We choose these 21 days data due to high activity of order events around the annual financial report announcement. In this paper we consider the order flow occurring in the continuous double auction period (9:30 AM to 11:30 AM and 1:00 PM to 3:00 PM). Note that there is a lunch effect, that is, the morning orders nearly have no impact on the submission of afternoon orders after 1.5 hours lunch break. Therefore, we will estimate our model separately for the morning and afternoon sessions. A problem arises regarding the granularity of the data. Because time stamps are rounded to the nearest 10 milliseconds, the data set contains multiple events with the same time stamp. For our sample, there are only 0.61% of events having the same time stamp as some other events. Comparing with the data in [5], which are rounded to the nearest seconds, there are 40% of events have the same time stamp. Due to ignorable probability to have multiple events in our sample, we do not handle the data and assume that each event occurring within 10 milliseconds is independent of all the others if any within the same interval.
We mark the role of order size v i in our model. We collect all the aggressive orders which have sizes greater than the opposite best quote and then calculate their median values: 3000 shares (30 lots) for aggressive market buy orders and 3200 shares (32 lots) for aggressive market sell orders. We also calculate the proportions of order penetration, which is the number of price levels on the opposite order book that the order consumes. The results are presented in Table 1. We can see that most aggressive market orders consume only the orders at the first price level. Fig 2(A) shows the empirical distributions for the inter-arrival durations of aggressive market buy (sell) orders. Generally, it seems more like a power-law distribution, not an exponential distribution. This indicates that the order events are not following a Poisson process. The autocorrelations of inter-arrival durations of order events presented in Fig 2(B) also confirms this point. Both market buy orders and market sell orders have an autocorrelation lasting more than 40 seconds. Therefore, it's reasonable to apply a Hawkes process to model the order events.

Results
In Fig 3, we present a sample of the estimated intensity path of aggressive market buy orders in the morning of April 10th, 2003. The kernel function used here is the smooth cut-off biexponential function given in Eq (2). We rescale the instantaneous intensity in every minute. In order to observe the goodness of fitting, we also chart the real intensity, which is the number of aggressive market buy orders in every minute. It shows that our model describes the intensity dynamics quite well.
The QQ plots of the time-deformed durations defined in Eq (8) on April 10th, 2003 are presented in Fig 4. We carry out the tests on both two types of market orders in either the morning or afternoon sessions. It is found that all the point collapse to the corresponding diagonals, indicating the exponential distribution of the data. Therefore, all the four fits are rather   (3) is used. We find that, except for the case of market sell orders in the morning, the time-deformed durations are obviously not consistent with the exponential distribution. As for the case of market sell orders in the morning, it shows a good fit due to the fact that the estimated second term in the kernel is too small. More specifically, we obtain that α 22 = 0.0089 and β 22 = 0.1733. If the time has passed 20 seconds since the last market sell order arrival (u = 20), the first term is e À a 22 u ¼ 0:8369 and the second term is e À b 22 u ¼ 0:0312. Therefore, the second term has little contribution to the self-excitation process, and both the sum of exponentials kernel and the subtraction of exponentials kernel provide high goodness-of-fit. However, this is not the usual case. For the usual cases like the other three plots in Fig 4, the second term in the kernel function is essential and the kernel in Eq (2) gives much better goodness-of-fit than the kernel in Eq (3). Hence, we will only consider the smooth cut-off kernel in the following analyses. Now we use the Kolmogorov-Smirnov test to analyze the goodness of fit for all sample days. This demonstrates that, with rare exceptions, almost of the samples pass the Kolmogorov-Smirnov test by a large margin. This further confirms that our bivariate Hawkes model with smooth cut-off kernels fits the market order events correctly.
Then we examine whether the estimated parameters result in a stationary bivariate Hawkes process. We fix the share volume v = 3000 (30 lots). Fig 6 presents the spectral radiuses for 42 estimated bivariate marked Hawkes processes, including 21 morning sessions and 21 afternoon sessions during the 21 sample days. It can be seen that all 42 spectral radiuses are strictly less than 1 and thus all 42 bivariate Hawkes processes are stationary.
We recall that the baseline intensity μ(t) describes the arrival of exogenous events. In the left panel of Fig 7, we first count the average market order number in every minute. The average number of orders displays the well-known U-shaped intraday pattern of order placement. Then, we plot the baseline intensity μ(t) in the right panel of Fig 7. The estimated exogenous part μ(t) perfectly exhibits an intraday pattern. In addition, it is reasonable that the exogenous intensity is lower than the total intensity. The endogenous intensity depends on self-and cross-exciting kernel functions. In Fig 8, we show the four estimated kernel function ϕ ij (u) with fixed share volume v = 3000 for aggressive market buy orders and aggressive market sell orders. We find that these kernel functions have   a similar pattern but the scales are remarkably different. The kernel functions representing the self-exciting impact have higher values than those representing the cross-exciting impact, especially for market buy orders. This indicates that the self-excitation plays a major role in the endogenous part of aggressive market order placement.

Conclusion
In this work, a bivariate marked Hawkes model is proposed to characterize aggressive market order arrivals. The order arrival intensity is marked by an exogenous part and two endogenous processes reflecting respectively the self-excitation and cross-excitation. The kernel function is crucial to characterize the endogenous self-excitation and cross-excitation. We propose and compare two types of kernel function. One is a smooth cut-off exponential function (i.e. the subtraction of two exponentials), and the other is a monotonous exponential kernel (i.e. the sum of two exponentials). We calibrate the bivariate Hawkes models with different kernel functions using order flow data of a stock traded on the Shenzhen Stock Exchange. The bivariate Hawkes model is well estimated when the kernel is a smooth cut-off exponential function and the parameters satisfy the stationary condition. The exogenous baseline intensity explains the U-shaped intraday pattern. We confirm that the order arrival intensity from the endogenous part is mainly contributed to the self-exciting process, while the cross-exciting influence is weak, especially for aggressive market buy orders. Through our model, the high-frequency traders can better understand and predict market order arrivals, and then form their own order submission strategy to make profit. Besides, quantifying the endogenous contribution to the order arrival intensity will help to predict some extreme events, such as flash crash.