AQM based on the queue length: A real-network study

Active Queue Management (AQM) is recommended by Internet Engineering Task Force to mitigate the bufferbloat phenomenon in the Internet. In this paper, we show the results of comprehensive measurements carried out in our university network, in which a device with an AQM algorithm, designed and programmed for this purpose, was running. The implemented AQM algorithm was based on the dropping function, i.e. arriving packets were dropped randomly, with the probability being a function of the queue length. Several different dropping function forms, proposed in the networking literature, were used, in addition to the classic FIFO queue (no AQM). The experiment lasted over a month, during which the state of the network was measured and recorded several thousand times. This made the results independent of the natural fluctuations of the users’ behavior and the network load. Conclusions on the general performance improvement offered by the implemented AQM, as well as the differences in the performance between particular forms of the dropping function, were reached. Some of these conclusions differ from those drawn previously from simulations. This underlines the need for carrying measurements of new AQMs in real, operating networks, with complex, natural traffic.


Introduction
In the early Internet, the buffers at network devices were meant to mitigate the losses caused by the statistical multiplexing of flows incoming from different directions. Namely, the random nature of incoming flows caused occasional bursts of traffic at nodes. These bursts had to be stored in buffers, before being forwarded. This was initially the only purpose of buffers. At this early stage, the congestion control in the Internet was poor, what led to a few spectacular congestion collapses and forced the search for a new, efficient method of congestion control. Such method, the Reno algorithm, was found and widely implemented, what eliminated congestion collapses and improved the inter-flow fairness.
Unfortunately, the new algorithm, and its descendants used today, exploit the buffers in a different way. They constantly increase the sending rates of individual end systems, until one or more buffers on the transmission path get full, causing either long queueing delays, or packet loss, or both. In this way, by noticing the long delays and/or losses, modern congestion control mechanisms are informed that the maximum possible sending rate has been achieved. So far, AQM algorithms based on the dropping function have been widely studied via mathematical modeling and computer simulation. We are, however, unaware of any published results of tests of these algorithms implemented in a real device, and carried out in a real, fully operational network.
In this paper, we show results of measurements of a fully operational network of our university campus, in which a device with AQM based on the dropping function, designed and programmed by us, was running. Nine different dropping function forms, proposed in the literature, have been tested and compared with each other, and with the classic FIFO-tail-drop queue (no AQM). The experiment lasted over a month, during which the state of the network was measured and recorded several thousand times, with random changes of the dropping function. This made the results independent of the natural fluctuations of the users behavior and the network load. Each measurement lasted as long as 2 million packets passed through the AQM device. During such period, several performance characteristics were recorded, including the network load, the average queue size and its standard deviation, the packet loss ratio and the packet burst ratio. These detailed characteristics were then aggregated using cost functions (6) and (7), defined in the next section, which enabled us to assess the overall impact of particular dropping functions, and the lack of AQM.
The device with AQM, used in the measurements, was an x86, multi-core server, running Linux operating system, with implemented dropping function mechanism. DPDK cards and software, [25], were used to provide fast packet processing. They made it possible to process and forward the arriving traffic with the full speed of the output interface, i.e. 1Gb/s. Prior to the measurements carried out in a real network, our AQM device has been tested in a networking laboratory, using artificial traffic from Spirent generator. The results of these tests were described in [26]. It is important to note, however, that tests in a local lab cannot properly expose advantages or disadvantages of an AQM scheme, for several reasons. The most important reason is that the round trip propagation times (RTPTs) are unrealistically small in a local lab (below 1ms). In the measurements described in this paper, many flows traversing the AQM device experienced much longer RTPTs, of tens or hundreds milliseconds. Such RTPTs are hard to mimic in a local lab. On the other hand, it is well-known that high RTPTs have a profound impact on the behavior of the TCP congestion control mechanism, and, as a consequence, the AQM algorithm (e.g. [27]). Therefore, results of AQM tests obtained in a local lab may differ from those obtained in a real, working network.
Finally, it is worth mentioning that queueing models with the dropping function have been studied recently using the mathematical methods of classic queueing theory, with different assumptions on the arrival stream and the service time distribution, e.g. [24,[28][29][30]. Some of these theoretical results have been used to parameterize the dropping functions, implemented in the AQM device to provide a specific performance goal, i.e. the average queue size of 50 packets at the network load of 1.
The rest of the paper is organized in the following way. In Section 2, the mathematical results used to parameterize the dropping functions in the AQM device are recalled first. Then, nine formulas for dropping functions implemented in the AQM device are given, as well as the definitions of the performance characteristics collected in the experiment. The academic network, in which the measurements were carried out, is described in Section 3. In Section 4, the design of the device with the AQM algorithm based on the dropping function is presented. Namely, the internal structure exploiting the multi-core architecture, the actual packet flow and processing, as well as the advantages of using the DPDK framework, are described. Then, in Section 5, the results of the measurements are presented and discussed. In particular, five graphs depicting the mean queue length, its standard deviation, the loss ratio, the burst ratio and the total impairment factor, for different loads and dropping functions, are presented and discussed. In addition, a table with all the performance characteristics, averaged over short load intervals, are shown and discussed. The final conclusions and recommendations are given in Section 6.

Theoretical results, dropping functions and characteristics used in the experiment
In what follows, the dropping function will be denoted by d(n), where n is the queue length upon a packet arrival, and d(n) is the probability of dropping this packet. Naturally, for every n we have 0 � d(n) � 1.
The particular shapes of the dropping function, used in the measurements, were taken from the literature (there will be references next to formulas). The specific parameterization of each of them has been carried out using a model of the queue with the dropping function taken from [24]. Namely, the following theorem was proven there. Theorem 1. If the interarrival time distribution is exponential with parameter λ and the service time distribution is exponential with parameter μ, then the probability P k that the queue size is k in the steady state is equal to: the average queue size and its standard deviation are equal to: ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi X 1 k¼0 k 2 P k À ðEXÞ respectively, the loss ratio is equal to: while the average response time of an accepted packet is equal to: where ρ = λ/μ is the load offered to the queue. Using this theorem, nine different dropping function types were parameterized in such a way, that each of them provides EX ¼ 50, i.e. the average queue size of 50 packets, for ρ = 1, (see [26]). The resulting dropping functions have the following forms (depicted also in Fig 1): • small buffer shape, see e.g. [31]: ; if n < 108; 0:5; if n � 108; 8 < : • constant shape: ( Besides the nine functions listed above, the classic FIFO algorithm (no dropping function) was used. In such case, the default buffer size was 300 packets.
Therefore, we used 10 queueing schemes in total-nine with the dropping functions and one without. The AQM device switched randomly between the ten, so that there was no correlation between the particular day or time of the day and the queueing scheme used. The switching scheme will be described in detail in Section 5.
Each measurement lasted as long as 2 million packets passed through the AQM device. During such period, several performance characteristics were recorded, in particular: the load of the AQM device, ρ, the average size of arriving packet and its standard deviation, the average queue length in packets (EX) and its standard deviation (DX), the loss ratio (L), and the burst ratio (B).
All these characteristics are rather well known and commonly used. Nevertheless, to avoid any misunderstandings, we will clarify now how we understood and measured them. The load was understood as the average arrival rate of packets bound to the output link, divided by the capacity of this link, i.e. 1Gb/s. The loss ratio was measured as the fraction of packets which were dropped during the measurement, either by the dropping function mechanism, or by the overflowed buffer. Finally, the burst ratio was understood as the ratio of the average number of packets dropped in a row, divided by the theoretical number of packets dropped in a row, that should have been observed for a purely random and independent loss, [32]. By this definition, B informs us about the statistical structure of packet losses, i.e. their tendency to cluster together. If B = 1, then the losses seem to occur randomly and independently of each other. If B < 1, then the losses tend to occur separately, as single events. If B > 1, then the losses tend to group together, in long series. The latter is known to influence negatively important networking services, e.g. real time multimedia transmissions. (It is also worth mentioning, that B is not the only parameter used to characterize the structure of losses; see e.g. [33,34] for other approaches).
Due to the fact that high values of both loss ratio and burst ratio have negative (but different) influence on the quality of transmission, it is useful to estimate their combined negative effect using one factor. Such an impairment factor, combining L and B, have been proposed in [35], and has the following form: with I e and R denoting some constants (by default, I e = 0 and R = 4.3). Therefore, in our experiment, after each measurement of L and B, the factor I was computed, to allow for an easy assessment of the total deterioration of transmission caused by losses, including their structure.
Obviously, the most important parameter, exposing the occurrence of the bufferbloat, is the average queue size. Of almost the same importance is the factor I. To assess their combined negative impact, we can use the following global impairment factor: Finally, the standard deviation of the queue size can be considered as a supplementary characteristic. It reflects the stability of the queue size, provided by the AQM mechanism and influences the delay jitter of the transmission.

Academic network used in the experiment
The scheme of the campus network of the Silesian University of Technology, used in the measurements, is shown in Fig 2. The network routing is handled by Juniper MX480 core router (on the left of Fig 2). During the experiment, the AQM device with the dropping function (in the middle) was connected into the network part servicing student dormitories (on the right). Each dormitory building has dedicated 802.1q virtual LAN. Two of the buildings are located in neighboring cities, thus to reach them, the VLANs follow an MPLS cloud with VPLS.
The network in each dormitory is an autonomous network, administered by a different entity. From our experiment point of view, each such network is a different sub-ISP network and it connects users using layer 2 devices. The access to the Internet, DHCP and security are handled by the devices in the campus network. Each end system in a dormitory has a public IP address, so there is no NAT on the path to the Internet.
During the measurements, the aggregated traffic from the Internet towards the dormitories was typically up to 1.2Gb/s, rarely exceeding this value. The return traffic was about 10 times smaller and it was unaffected by the AQM device. The TCP/UDP proportions varied from 75/ 25 to 50/50 percent. The average packet size was about 1300 bytes, with the standard deviation of about 450bytes.
The network measurements have been formally approved by the Vice Rector for Science and Development of the Silesian University of Technology, prof. Marek Pawełczyk.

AQM device used in the experiment
The AQM device, shown in the middle of Fig 2, was an x86-multi-core server running Linux operating system. This device was designed and programmed by us to test the dropping function mechanism in real networks. What is important, DPDK cards and software were used in this device, what allowed it to achieve the processing and forwarding speed exceeding by far the bit rates actually occurring in the experiment. First of all, the DPDK framework allows disabling the interrupts on the participating network interfaces. To further improve the performance, CPU cores can be assigned to separate tasks performed concurrently. Further speedup is gained by processing buffer descriptors, pointing to memory locations with packets, rather than actual packets.
The schematic internal structure of our AQM device is depicted in Fig 3. The interface on the right has the speed of 1Gb/s, which constitutes the potential bottleneck for the aggregated traffic bound to dormitories. The interface on the left runs at the speed of 10Gb/s, which enables it to receive traffic exceeding 1Gb/s.
In general, each CPU core in Fig 3 runs in an infinite loop, polling one buffer for packets, performing perhaps some tasks, and sending the packets to some other buffer.
In particular, the CPU core on the left of Fig 3 performs the actual dropping of packets, depending on the selected dropping function and the current queue size. Unfortunately, the DPDK framework does not allow querying the NIC queue size, due to the lengthy PCIe operation required to access NIC processor registers. Therefore, to be able to read the queue size, we had to introduce an intermediate queue in the memory (in the middle of Fig 3). It exploits the ring-buffers library, built into DPDK, and enables the concurrent atomic access, which does not require locking of the queue for reading or writing. Thus the current size of this queue can be easily calculated. The CPU core on the left does so after reading a packet from the left NIC RX queue. Then it queries the dropping probability from a tabularized dropping function, calculates a random number, and combines that information into a drop or nodrop decision. If the packet is dropped, then its memory is freed. Otherwise, the packet is enqueued into the intermediate queue. From this queue, the CPU core on the right reads packets and outputs them to the right NIC TX buffer. This buffer has a small size of 4 packets, to minimize its influence on the performance of the device.
The CPU core in the middle of Fig 3 simply forwards packets on the uncongested return path.
The secondary task of the AQM device is to carry out the measurements during the course of the experiment. This task is handled also by the CPU core on the left. All the characteristics are gathered in every cycle performed by this core. As it keeps the current queue size all the time (an argument of the dropping function), the average queue size and its standard deviation can be computed easily. The same core is used to drop packets, so the loss ratio and burst ratio can be obtained with equal ease.

PLOS ONE
To be sure that all these operations do not cause any performance overhead influencing the measurements, we tested the device using a hardware generator. Namely, a high load of 3Gb/s of small packets (100-bytes long) was generated. No performance issues were observed under such load. Both the bit rate and the number of packets processed per second in this stress test exceeded by far what was then met in the real experiment. In this way, we made sure that the actual measurements were not affected by any issues with the performance of the device.

Experiment results and discussions
The experiment lasted continuously over a month. On the hour, i.e. at 7:00, 8:00, 9:00, etc, the queueing algorithm was switched randomly between d 0 -d 8 and no AQM, with the period of operation of one hour. Namely, an integer random number from set {0, 1, . . ., 9} was generated first. If number 9 was generated, the AQM device operated with no AQM for the next hour. If number i 2 {0, 1, . . ., 8} was generated, the device used d i dropping function for the next hour. An hour later, a new integer was generated, the queueing algorithm switched to a different dropping function, and so on.
This operation scheme was meant to assure that every dropping function worked in every networking conditions, i.e. in the morning, in the middle of the day, at night, at the beginning of the week, at the end of the week, during the weekend, when the load was high, low, moderate, etc. The one hour operation time of a particular dropping function was chosen as being long enough to allow the algorithm to reach its steady-state operation regime and short enough to provide approximately stable networking conditions, i.e. the level of users activity and the network load.
During the operation of a particular dropping function, measurements of the performance characteristics provided by this function were carried out. In every measurement, the current dropping function, the load, the average queue size and its standard deviation, the packet loss ratio and the packet burst ratio were collected. In addition, the cost functions values (6)  In these graphs, each individual measurement is presented as a single, colored dot, where the color represents the particular dropping function, the same as in Fig 1. As we can see, for one value of the load and one dropping function, we can have several, quite different measurements of the performance characteristic. This is natural in a real networking environment. It can be caused, for instance, by different proportions of the TCP and UDP traffic, at different times. Therefore, a third-degree polynomial is fitted, using the generalized additive model, to all measurements within a particular dropping function. These polynomials are plotted as the curves of the same color, as dots obtained for a particular dropping function.
All the graphs begin with the load of 0.9. There were also measurements with the load less than 0.9, but such results are less interesting, as the queue sizes and losses are small for a low load, no matter if AQM is used, or not. Finally, there were very few measurements for ρ > 1.2, therefore the graphs end at the load of 1.2. Now, as we can see in Fig 4, all dropping functions d 0 -d 8 provide similar queue sizes within the whole range of ρ. The lack of the dropping function, on the other hand, results in a significantly (several times) longer queues, i.e. the bufferbloat. Therefore, when the bare queueing delay is taken into account, the application of the dropping function is clearly advantageous, no matter which shape of the nine is used. Note also that all the dropping functions induce the average queue size pretty close to 50 packets for ρ = 1, which was the design goal while parameterizing these functions by means of the mathematical model of [24].
In Fig 5 we can observe the stability of the queue size. All the dropping functions provide a similar, good stability, with slightly worse results for d 2 . Again, the lack of the dropping function results in a much worse deviation of the queue size.

PLOS ONE
All of the observations made so far are consistent with the previous studies conducted via simulations (see e.g. [36]).
However, Fig 6 is quite surprising and inconsistent with the results of the previous studies. As indicated by simulations, an application of the dropping function should improve the queueing delay, but always at the cost of an increased loss ratio. This is not confirmed in Fig 6, which is a good news. On the contrary, most of the used queueing schemes, including no AQM, induce similar loss ratios. What is more, the smallest loss ratios are not obtained when

PLOS ONE
no AQM is used, but for one of the dropping functions. Namely, d 4 provides the lowest loss ratios in the whole range of the offered load. On the other hand, a little worse than average results are obtained for d 2 and d 0 .
In Fig 7, the burst ratio can be studied. As we can see, three queueing algorithms induce visibly worse results. In particular, especially bad results are obtained for no AQM and for the trivial dropping function, d 0 (small buffer). They are both far worse than the results for all non-trivial dropping functions, d 1 -d 7 , which are rather close to each other. Quite bad results

PLOS ONE
are also obtained for another trivial dropping function, d 8 . Furthermore, the absolute values of B obtained in the measurements for no AQM case are very high. In the whole range of the load, they oscillate around 1.75, sometimes even exceeding 2. Therefore, it is really important to decrease such high values of B, to values much closer to 1, which can be easily achieved by an application of a proper dropping function.
Finally, in Fig 8, the total impairment factor, C, is depicted. It combines the impact of the queueing delay with the loss impairment, given in formula (7). As we can see, the no-AQM

PLOS ONE
case is far worse than anything else. Among the dropping functions, the trivial function d 0 performs visibly worse than others. This is consistent with the results of the previous studies.
We may also notice in Fig 8, that in general d 4 is better, while d 2 is worse, than any other non-trivial dropping function, d 1 -d 7 . This is inconsistent with the results of the previous studies-in simulations, the convex dropping function d 2 prevailed. It must be stressed, however, that the differences between d 4 are d 2 small, if compared with no AQM at all.

PLOS ONE
Summarizing these considerations, we can draw the following general picture. All the nontrivial dropping functions, d 1 -d 7 , provide a significant improvement of the performance, if compared with no AQM, when both the queuing delay and losses are taken into account. The trivial dropping functions, d 0 and d 8 , provide a similar improvement of the queueing delay, but their performance is inferior in terms of the structure of losses.
There are some differences in the performance among the proper dropping functions, d 1d 7 . For instance, d 4 provides the best loss ratios, d 3 -the best burst ratios, etc. However, these differences are not crucial-they are far lower than the difference between an application of a proper dropping function, and no AQM at all.
The same conclusions, that were drawn from Figs 4-8, can also be drawn from Table 1. In this table, the average measurements obtained for different load ranges are presented. Namely, the following load ranges, are used: (0.90-0.94], (0.94-0.98], . . ., (1.14-1.18]. Within each load range, we have the average value of a particular characteristic, e.g. the queue size, for every dropping function separately. Moreover, the worst result within a particular load range and particular characteristic is boxed. On the other hand, the best result within a particular load range and particular characteristic is printed in a bold font. We can see that a great majority of bad (boxed) results is associated to no-AQM scheme. Moreover, in no single case the best result is associated with no-AQM scheme. These are very strong arguments for using AQM.

Conclusions
We presented the results of an experiment carried out in our university network, in which an AQM device, designed and programmed for this experiment, was running. In particular, the class of AQM algorithms, in which the packet dropping probability is a function of the queue size, was used in the device. Nine dropping function shapes, known from the networking literature, were implemented and tested, in addition to the classic FIFO queue. Several commonly used performance characteristics, including the load, queue size, loss ratio and burst ratio were recorded during a month of measurements. In addition, two cost functions, expressing the combined impact of these characteristics, were computed. What is important, the AQM device itself did not introduce any limitations on the network throughput-its internal performance was far better than needed to fully saturate the 1Gb/s link used. This was confirmed in laboratory stress tests.
From the obtained results, we can draw at least three clear conclusions.
Firstly, an application of the AQM with any of the non-trivial dropping functions, d 1 -d 7 , proved to be highly beneficial, if compared with no AQM. Typically, the queueing delay, the stability of the queue and the burst ratio ware improved by far, while the loss ratio was kept at the similar level, as in the no-AQM case.
Secondly, it was not crucial, which particular shape of the dropping function, taken from the literature, was used. All the shapes used in the measurements, except for the trivial d 0 and d 8 , provided a very good performance, much better than the lack of AQM. Naturally, some of the non-trivial functions, d 1 -d 7 , proved to be slightly better than others (e.g. d 4 ). The differences, however, are not critical-any one among d 1 -d 7 can be recommended to replace the classic FIFO buffer.
Thirdly, the study underlined the necessity to perform experiments and measurements of this type in real, operating networks. The very complex nature of real traffic, with distributed round trip times among flows, large number of distinct protocols, different flow characteristics and durations, lead sometimes to different conclusions, than those obtained via simulation models. For instance, the lowest loss ratio in the whole range of load was observed here for one of the dropping functions, rather than for the FIFO buffer, as predicted by simulations, [36]. Also, the best results in simulations were obtained for a convex dropping function, which was speculated to be a consequence of some special properties of convex functions (e.g. [37]). This effect was not confirmed herein. In fact, the convex dropping function d 2 performed slightly worse than other non-trivial functions. It is hard to explain the detailed impact of the shape of a particular dropping function on the system performance. This is again due to the nature of real traffic, which is not only very complex at any particular moment in time, but also highly variable. The overall load may vary by far, proportions between TCP and UDP traffic may vary by far, the number of flows and data type they carry may vary by far, statistical properties of flows (rates, packets sizes, round trip times, durations) may vary by far, etc. Each of the mentioned traffic features influences the resulting performance characteristics, when combined with a particular shape of the dropping function. Therefore, the only reasonable way to study this is by exposing each of the tested dropping function to different traffic types, many times, and in a long period of time. This was done in the experiment described in the paper.

PLOS ONE
There are several possible directions of future work. Firstly, other networks can be used to carry out similar experiments. Before that, we cannot be sure, to what extend the results obtained in one network, are representative to other networking environments. Secondly, a much more detailed analysis can be performed. For instance, in our study we collected and discussed the performance characteristics for the aggregated traffic only. There was no distinction between individual flows in the measurements, thus the results are averaged across all the flows. It would be interesting to see, how different dropping functions influence the characteristics of individual flows (the per-flow loss ratio, the per-flow burst ratio etc), depending on the flow rate, its round trip time, and so on. Thirdly, AQM algorithms of different types than herein can be implemented and tested in real networks. Such study is needed not only to test the advantages and disadvantages of a particular algorithm in a real networking environment, but also to demonstrate the possibility of its implementation, efficient enough to make the algorithm useful in contemporary networks.