An experimental comparison of different hierarchical self-tuning regulatory control procedures for under-actuated mechatronic systems

This paper presents an experimental comparison of four different hierarchical self-tuning regulatory control procedures in enhancing the robustness of the under-actuated systems against bounded exogenous disturbances. The proposed hierarchical control procedure augments the ubiquitous Linear-Quadratic-Regulator (LQR) with an online reconfiguration block that acts as a superior regulator to dynamically adjust the critical weighting-factors of LQR’s quadratic-performance-index (QPI). The Algebraic-Riccati-Equation (ARE) uses these updated weighting-factors to re-compute the optimal control problem, after every sampling interval, to deliver time-varying state-feedback gains. This article experimentally compares four state-of-the-art rule-based online adaptation mechanisms that dynamically restructure the constituent blocks of the ARE. The proposed hierarchical control procedures are synthesized by self-adjusting the (i) controller’s degree-of-stability, (ii) the control-weighting-factor of QPI, (iii) the state-weighting-factors of QPI as a function of “state-error-phases”, and (iv) the state-weighting-factors of QPI as a function of “state-error-magnitudes”. Each adaptation mechanism is formulated via pre-calibrated hyperbolic scaling functions that are driven by state-error-variations. The implications of each mechanism on the controller’s behaviour are analyzed in real-time by conducting credible hardware-in-the-loop experiments on the QNET Rotary-Pendulum setup. The rotary pendulum is chosen as the benchmark platform owing to its under-actuated configuration and kinematic instability. The experimental outcomes indicate that the latter self-adaptive controller demonstrates superior adaptability and disturbances-rejection capability throughout the operating regime.


Introduction
The design principles of under-actuated self-stabilizing systems are extensively used in the fabrication of humanoid robotic systems, aeronautical systems, self-balancing transporters, control procedure by self-tuning the critical controller parameters [32]. This setup enables the system to quickly adapt to the abrupt state-variations [32][33][34]. Historically, the adaptive controllers are categorized via their direct or indirect nature. The direct approach self-adjusts the critical controller-parameters as a function of the error-variables [13]. In the indirect approach, an identification scheme is used to estimate the system's unknown model-parameters to update the control law [14].
Extensive research has been done to synthesize robust adaptive controllers for under-actuated mechatronic systems [35,36]. The Model-Reference-Adaptive-Controllers utilizes the Lyapunov theory to track a reference control model which leads to the online dynamic adjustment of the critical controller parameters [37,38]. However, identifying an accurate reference model for the tracking purpose is a difficult task [39]. The gain-scheduling mechanism employs a state-driven look-up table to select pre-configured feedback controllers; where, each controller is designed specifically for a given operating condition [40]. The calibration and stability assurance of the constituent controllers for a system with a big range of uncertainty become quite laborious [41]. The model-predictive-controllers use smaller time frames to solve the receding-horizon optimization problem and deliver time-varying controller gains [42]. However, they render wrong predictions which may lead to a fragile control effort under long drifting disturbances or model variations [43]. The State-Dependent-Riccati-Equation based controllers require accurate state-dependent-coefficient matrices to update the Riccati Equation solutions [44]. However, an accurate definition of these matrices is quite hard due to the restrictions imposed by the nonlinear dynamics of higher-order systems [45]. The Markov-Jump-Linear-System is a stochastic control technique that is renowned for its reliance against the random faults occurring in the cyber-physical system [46]. However, the cost and likelihood of acquiring accurate a priori transition probabilities for necessary computations is expensive and arguable, respectively [47].
The state-error-driven nonlinear scaling functions have also been extensively used for the development of expert adaptive systems to online adapt the controller parameters [48]. Retrofitting the linear compensators with nonlinear scaling functions to adaptively modify the critical controller gains has garnered a lot of traction in developing robust control for nonminimum phase systems [49]. The nonlinear-type feedback controllers tend to improve the system's damping against oscillations, reference-tracking accuracy, and error-convergence rate [50,51]. There are two main categories of nonlinear-type gain adaptation laws that are widely used in the adaptive control field; namely, the state-error-magnitude observers and state-error-phase observers. In state-error-magnitude observers, the online dynamic gainadjustment depends on the magnitude of the state-error variable and its higher-order derivatives [52]. In state-error-phase observers, the online dynamic gain-adjustment is driven by the magnitude of the classical state-error as well as the direction of motion of the state response (commonly referred to as the "phase" of the state-response) [53]. The phase information helps in flexibly manipulating the controller's characteristics as the response deviates from or converges to the reference [54]. The biologically-inspired artificial-immune system is a computationally intelligent adaptive mechanism that efficiently rejects the exogenous disturbances [55]. They mimic the self-regulation capability of biological immune systems to adaptively tune the controller parameters which optimizes the controller's adaptability to environmental indeterminacies [56].
The hierarchical self-tuning state-feedback regulators are yet another emerging control paradigm [57,58]. They are implemented by dynamically adjusting the constituent weighting matrices of LQR's QPI to indirectly modify the controller gains [59]. The online variations in these weighting-factors manipulates the critical parameters in the succeeding layers of the controller's structure which eventually delivers time-varying state-feedback gains.

Proposed approach
The main contribution of this article is the development and experimental comparison of four unique state-of-the-art nonlinear-type hierarchical self-tuning state-feedback regulators for under-actuated mechatronic systems in order to improve their disturbance-rejection capability against exogenous disturbances. The proposed control scheme follows a hierarchical architecture that re-computes the state-feedback gains, after every sampling-interval, based on the state-error-dependent adaptive tuning of weighting-factors associated with LQR's quadraticperformance-index (QPI). For this purpose, the generic LQR structure is retrofitted with an auxiliary online self-tuning mechanism that acts as a superior regulator to adaptively tune the constituent weighting-factors associated with the QPI. The Riccati equation uses these adjusted weights to deliver the time-varying state-feedback gains. Each self-tuning mechanism is designed such that it exploits a specific aspect of the system's state-error profile and harnesses it to effectively reposition the system's closed-loop eigenvalues in the stable region of the complex-plane. The said hierarchical control procedure is quite innovative because, apart from adjusting the state-feedback gains online, the solution of Riccati equation concurrently guarantees the asymptotic-convergence of the control law as long as the concerned weightingfactors are varied within pre-defined bounds. Hence, additional stability proofs are not required. The salient innovative contributions of this article are postulated as follows: 1. Development of a self-tuning mechanism for the LQR's "degree-of-stability". 2. Development of a self-tuning mechanism for the control-weighting-factor associated with the QPI.
3. Development of a self-tuning mechanism for the state-weighting-factors of QPI that depends on the system's state-error-phase. 4. Development of a self-tuning mechanism for the state-weighting-factors of QPI that depends on the magnitudes of the system's state-error variables.

5.
Formulation of each self-tuning mechanism by using pre-configured hyperbolic functions to re-scale the critical weighting-factors in real-time. 6. Comparative performance assessment of the proposed self-tuning controller variants by conducting credible real-time experiments, designed specifically to emulate practical disturbance scenarios in the physical environment, on the standard QNET Rotary Pendulum setup [11].
The experimental results (shown later in this article) indicate that each self-tuning-regulator variant significantly enhances the system's robustness against the exogenous disturbances and the control-input economy to a certain degree while preserving the system's asymptotic-stability throughout the operating regime. The experimental comparison of the four different structures of hierarchical self-tuning regulators, that employ innovative rule-based adaptation mechanisms to dynamically adjust the critical weighting-factors of the QPI, has not been attempted previously in the open literature. Hence, this is the main focus of this article.
The remaining paper is organized as follows: The pendulum system is mathematically modeled in Section 2. The baseline fixed-gain LQR is synthesized in Section 3. The detailed design of the four prescribed hierarchical self-tuning regulators is presented in Section 4. The experimental comparison of the proposed self-tuning regulator is presented in Section 5. The paper is concluded in Section 7.

System model
In this paper, a standard rotary inverted pendulum (RIP) system is used as the benchmark platform to experimentally analyze the implications of the proposed control procedure [60]. It requires an active control system to stabilize itself vertically. Apart from being under-actuated in nature, the said multivariable system also exhibits all the properties typically associated with mechatronic systems; such as open-loop (or kinematic) instability, complex geometry, and nonlinear dynamics [61]. The block diagram of an RIP system is illustrated in Fig 1. The system employs a DC geared servo-motor to apply the necessary control torque to rotate the pendulum's arm, which is coupled to the motor's shaft. The arm's angular displacement energizes the pendulum rod to swing-up and balance itself vertically. The angular-displacements of the arm and the rod are denoted as α and θ, respectively.
The system's nonlinear equations of motion are formulated via the Euler-Lagrange approach [62]. The system's Lagrangian (L), expressed in Eq 1, is evaluated by computing the difference between the total kinetic energy (T) and the total potential energy (V) of the system is computed in terms of the coordinates (φ and θ) and their corresponding angular-velocities ( _ φ and _ y).
The Euler-Lagrange equations of the RIP system are derived as follows [62].
where, τ represents the torque applied by the DC motor. It is expressed as follows.
The viscous damping forces and frictional forces are neglected in this research. The resulting nonlinear relationship between α, θ, and τ is expressed as follows [62].
€ a ¼ À rM 2 p l 2 p gðcos yÞy À J p M p r 2 cos y sin yð _ aÞ 2 À ðJ p þ M p l 2 p Þt ðM p r 2 ðsin 2 yÞ À J e À M p r 2 ÞJ p À M p l 2 p J e ; € y ¼ À M p l p ððM p r 2 g sin 2 y À J e g À M p r 2 gÞy þ rðJ e _ a 2 sin y À t cos yÞÞ ðM p r 2 ðsin 2 yÞ À J e À M p r 2 The aforementioned set of nonlinear equations can be linearized around the point Furthermore, the small-angular displacements of the pendulum rod are approximated via the following expressions.
The state-space model of a linear dynamical system is represented via Eq 6 [11].
where, x is the state-vector, y is the output-vector, u is the control input signal, A is the statetransition matrix, B is the input matrix, C is the output matrix, and D is the feed-forward matrix. The state-vector and the control input-vector of the RIP system are identified in Eq 7 [59].
where, V m is the control-input voltage applied to operate the DC motor. The nominal statespace model of the RIP system is presented as follows [59]. where, The model parameters of the QNET RIP are identified in Table 1 [11].

Linear quadratic regulator
The LQR is a standard state-space control strategy that is widely favored for optimal positionregulation of multivariable electro-mechanical systems [19]. The LQR yields an optimal control trajectory by minimizing an energy-like QPI, expressed in Eq 9, that captures the state-variations and the control input associated with the linear dynamical system [17].
where, Q 2 R 4×4 and R 2 R are the state and control-input weighting matrices, respectively. The QPI minimization is followed by the solution of Hamilton-Jacobi-Bellman (HJB) equation to acquire the state-feedback gains offline [17]. The weighting-matrices are selected such that Q is a positive semi-definite matrix and R is a positive definite matrix. For the RIP system considered in this research, the Q and R matrices are symbolically represented as shown in Eq 10.
where, q x and ρ represent the real-numbered coefficients of the Q and R matrices, respectively. The value of ρ is selected as unity to maintain a reasonable control-input economy. The Q matrix is tuned in this research by iteratively minimizing the performance criterion given in Eq 11 to minimize the position-regulation error as well as the control-input energy [63].
such that; e a ðtÞ ¼ a ref À aðtÞ; e y ðtÞ ¼ p À yðtÞwhere, e α (t) and e θ (t) represent the error in the angular displacement of arm and rod from their corresponding reference positions, respectively. The reference position of the pendulum's rod is set as π radians in order to stabilize it vertically. The angular position of the pendulum's arm recorded at the beginning of every experimental trial is considered as its reference, α ref . The LQR delivers the optimal set of statefeedback gains with the lowest cost of J lq . However, these optimal gains are computed by using a specific set of Q and R matrices. This arrangement may not always contribute a good position-regulation behavior with respect to J lq . Hence, to optimize the selection procedure, J c is used to tune the state-weighting factors in this research [59]. To acquire the best-fit solution, each state weighting-factor is selected from the range [0, 500] in this research. The search is initiated from a random point in the range-space. The search is conducted in the direction of descending gradient of J c and it is terminated when the minimum cost is achieved. The coefficients of Q and R matrices acquire for this research (corresponding to the minimum cost of J c ) are presented as follows.
where, P2R 4×4 , is a symmetric positive definite matrix. It is well-known that if the system is controllable and that Q = Q T � 0 and R = R T > 0, the solution of ARE yields an asymptoticallystable control behavior [17]. The state-feedback gain vector, K f , is calculated as shown in Eq 14.
where, K ¼ ½k a k y k _ a k_ y �. The optimal control law is expressed as follows.
The evaluation of the gain vector yields K ¼ ½À 6:21 130:56 À 4:22 17:83�. The linear control law is restructured by equipping it with the following state-error-integral variables.
This augmentation improves the pendulum's damping against fluctuations and its reference-tracking behavior [18]. The integral control law is expressed as follows.
The integral-gain vector K i is tuned by iteratively minimizing the cost function, J c , to minimize the position-regulation error. The K i vector that yields the minimum cost in the range [-5, 0] is selected. In this paper, the integral gains are tuned as K i ¼ ½À 2:06 À 7:47 � 10 À 6 �. The baseline control law is given by the linear combination of the optimal control law and the integral control law as shown in Eq 18.

Hierarchical self-tuning-regulator design
The ubiquitous LQR uses the system's linear state-space model to deliver fixed state-feedback gains. Thus, it lacks robustness against the state-deviations caused by the bounded disturbances, modeling uncertainties, identification errors, and other parametric variations. To solve this problem, the LQR is augmented with an online adaptation law that dynamically reconfigures the critical controller parameters. The adaptation law is realized by using stateerror-dependent nonlinear scaling functions. These synthetic "nonlinear" functions flexibly manipulate the control profile to reject the exogenous disturbances. This arrangement significantly improves the controller's adaptability and disturbance-rejection capability; although, the resulting self-tuning regulator continues to utilize the system's linear state-space model. This section presents the theoretical background and formulation of four different state-ofthe-art hierarchical adaptive state-feedback control procedures. Each self-tuning mechanism adaptively modulates the gains of the LQR. The proposed mechanisms redesign the nominal LQR, after every sampling interval, to flexibly manipulate the control-input trajectory which aids in efficiently rejecting the exogenous disturbances and parametric variations. It is to be noted that only the state-feedback gains are being updated online in the proposed adaptive control procedures; whereas, the integral gains are kept fixed at K i ¼ ½À 2:06 À 7:47 � 10 À 6 � as discussed in the previous section. Each proposed adaptive control procedure undertakes to achieve a beneficial compromise between the position-regulation behaviour and control energy expenditure while maintaining the system's stability across a broad range of operating conditions. As discussed earlier, the proposed adaptation laws self-adjust specific parameters (existing naturally) within the hierarchical structure of the LQR control system. The online reconfiguration of these targeted parameter indirectly leads to the re-computation of statefeedback gains, after every sampling interval. In this article, four unique hierarchical self-tuning control procedures are investigated. These control procedures are individually synthesized by: 1. Self-adjusting the degree-of-stability of the LQR by using state-error feedback.
2. Self-adjusting the R matrix by using state-error feedback.
3. Self-adjusting the coefficients of Q matrixby using a well-established rationale that depends on the state-error-phase feedback.
4. Self-adjusting the Q and R matrices by using well-postulated meta-rules that depend on the state-error-magnitude feedback.
The adaptation laws are formulated via pre-calibrated hyperbolic nonlinear scaling functions. These functions are continuous which allows for a smooth variation of the concerned weights as the operating conditions change. These functions are bounded which limits the variation of the concerning weights and thus, ensures an asymptotically-stable control behaviour. The symmetry of the hyperbolic functions, about the vertical axis, helps to appropriately steer the control trajectory as the polarities of the state-error variables change. Finally, these algebraic equations can be easily solved in a single-step after every sampling interval. Unlike the iterative auto-tuning or gradient-descent techniques, the real-time computation of hyperbolic scaling functions does not put an excessive recursive computational burden on the embedded processor. Hence, they are computationally economical and can be easily programmed in the control software by using modern-day digital computers.

Adjustable degree-of-stability
The baseline LQR is transformed into a self-tuning-regulator by retrofitting it with a selfadjusting degree-of-stability (DoS) [21]. The QPI is equipped with a reconfiguration block that relocates the system's closed-loop poles on the left-hand side of the vertical line, s = −β(t), on the complex s -plane; where, β(.) is a state-error dependent time-varying positive constant. The original QPI is modified by associating a time-varying exponential multiplying factor of the form e 2β(t)t with it as shown in Eq 19 [22].
The multiplication of the typical cost-function with the time-varying exponential term shifts the position of eigenvalues of the state-transition matrix A on the left side of the line s = −β(t) which ensures the asymptotic-stability of the controller's operation [22]. The revised cost-function can be simplified according to the following expression.
This simplification implies that the expressions of the state-vector, as well as the controlinput vector, can be revised as expressed below [23].
The substitution of the revised expressions of the state-vector and control-input vector yields the following expression of the cost-function.
The system's state-equation is also modified as expressed below [40].
The expression in the Eq (18) reveals that the augmentation of the exponential term, e 2β(t)t , in the quadratic cost-function ends up transforming the system's state-matrix A into A+β(t)I. Hence, this arrangement contributes in varying the coefficients of the state-matrix as a function of the state-variables. The modified expression of ARE is shown below [24].
The time-varying state-feedback gain vector is updated online as follows.
The updated gain vector, K d (t), flexibly steers the control trajectory using the following Self-Tuning-Regulator (STR) control law.
In order to constitute the adaptive control law, the value of β is dynamically adjusted via an online adaptation law. The proposed adaptation mechanism is formulated by using continuous nonlinear scaling functions that dynamically reconfigures the value of β online based on the real-time variations in the system's cumulative position-regulation error. The cumulative position-regulation error and projected error, contributed by the pendulum's arm and the rod, is evaluated by taking the linear combination of the individual state-error variables. The modified Riccati equation (expressed in Eq 24) uses the updated values of β to re-compute its solution after every sampling interval, and thus, yield a time-varying state-feedback gain vector. The structure of the STR employing the aforementioned adjustable-DoS (ADoS-STR) mechanism is illustrated in Fig 2 [64].
The online adaptation law for β is formulated by using a pre-calibrated continuous Hyperbolic-Secant-Function (HSF) that depends on the weighted sum of state-error variables [64]. The HSF is chosen because its waveform is continuous, bounded, and even-symmetric. The shape of HSF's waveform is calibrated according to the following rationale [64].
The magnitude of β is enlarged when the state-error magnitudes increase in order to place the eigenvalues farther from the imaginary-axis. This arrangement yields stronger damping against overshoots and quickly reverses the direction of response.
1. The magnitude of β is reduced when the state-error magnitudes decrease in order to place the eigenvalues closer to the imaginary-axis. This allows the response to settle naturally (and smoothly).
These characteristics yield rapid convergence with strong damping against oscillations without contributing large actuating torques under the influence of bounded exogenous disturbances. The proposed HSF is formulated as follows.
such that; zðtÞ ¼ s 1 e a ðtÞ þ s 2 e y ðtÞ þ s 3 _ e a ðtÞ þ s 4 _ e y ðtÞwhere, sech(.) represents the HSF, β min and β max represent the minimum and maximum limits of the HSF, z(t) is the weighted sum of all state-error variables in real-time, and the parameters σ 1 , σ 2 , σ 3 , and σ 4 are the preset weights linked with each state-error variable in z(t). The waveform of the weight-adjusting function is shown in Fig 3. The inclusion of the four state-error variables in the computation of z(t) informs the adaptation law regarding the effect of the disturbance on the system's behavior. This self-reasoning capability improves the controller's adaptability. To acquire the proposed self-reasoning capability, "positive" weights are selected for each state-error variable in z(t). Hence, when the state-responses diverge from the reference, the positive weights promote an increment in the magnitude of z(t) owing to the same polarities of the classical error variables and their derivatives in this phase. Conversely, when responses revert and approach the reference, the positive weights allow a decrement in the magnitude of z(t) owing to the opposite polarities of the classical error variables and their derivatives in this phase. This arrangement enhances the controller's flexibility and ensures a stiff damping control effort under large error conditions to quickly attenuate the oscillations, and a softer control effort under small error conditions. The parameters are selected by minimizing J e to improve the reference-tracking and disturbancerejection behavior. The tuned parameters are recorded in Table 2.

Adjustable control-weighting-factor
In the LQR problem, the control-weighting-factor (ρ) steers the control input trajectory. The selection of ρ makes a compromise between the system's position-regulation behavior and control energy expenditure [22]. A small value of ρ increases the controller's robustness against disturbances but also induces highly discontinuous control activity. On the contrary, a large value of ρ limits the system's control activity under disturbance conditions which inevitably degrades the position-regulation and transient-recovery behavior [65]. Hence, the fixed value of ρ renders the overall control mechanism uneconomical under rapidly changing operating conditions [66,67]. On one hand, it applies superfluous control force under small error conditions. On the other hand, it contributes to inadequate control resources under transient disturbances. A viable solution is to adaptively modulate the ρ in LQR's QPI, while keeping the coefficients of Q matrix fixed at their prescribed values, as shown below. Q ¼ diagð32:8 52:2 6:1 2:5Þ; RðtÞ ¼ rðtÞ ð28Þ The idea is to smoothly slide the factor ρ across a continuous surface so that the control profile can be flexibly manipulated to minimize the reference-tracking error and to maintain a reasonable control-input economy throughout the operating regime. This arrangement automatically relocates the eigenvalues of the closed-loop system to effectively compensate for the disturbances. With the modification, R(t) = ρ(t), incorporated in the nominal LQR procedure,  the QPI is revised as follows.
The modified Riccati Equation is expressed in Eq 30.
The gain vector is re-computed online as follows.

K c ðtÞ ¼ RðtÞ
The time-varying gain vector, K c (t), delivers the following STR control law.
The STR equipped with the adjustable-control-weighting-factor (or ACWF) is denoted as ACWF-STR in this research [68]. Its block diagram is shown in Fig 4. The ACWF-STR yields an asymptotically-stable control behavior, as long as ρ(t)>0.
The proposed STR is implemented by augmenting the baseline LQR with a reconfiguration module that self-adjusts the value of ρ as a pre-calibrated nonlinear scaling function of stateerror variables. The following meta-rules are used to formulate the proposed reconfiguration module [68].
1. Under small error conditions (or equilibrium state), the value of ρ is enlarged to allow for position-regulation with minimal control input expenditure.
2. Under large error conditions (or disturbance state), the value of ρ is proportionally reduced to deliver a tighter control effort to efficiently reject the disturbances.
3. If the control-input inflates drastically under the influence of bounded disturbances, the variation-rate of ρ is reduced to economize the control effort and limit the peak servo requirements.
With these qualities, the module dynamically restructures the control procedure to enhance the system's response speed, strengthen its damping against oscillations, and ensure optimum allocation of control resources under exogenous disturbances. The HSF is used to ensure smooth transitions in the value of ρ as the operating conditions change [64]. The linear combination of the real-time state-error variables is used as the input to the HSF which aids in diagnosing the occurrence (and impact) of the exogenous disturbances. The feature dictated by the third meta-rule prevents the RIP's DC motor from getting saturated while maintaining a reasonable response-speed and damping against oscillations [68]. This feature is incorporated in the HSF-based adaptation law by means of an auxiliary control-input-dependent function. The proposed ACWF adaptation law is formulated as follows.
rðtÞ ¼ r min þ ½ðr max À r min Þ � sechðgðu; tÞ � zðtÞÞ� ð33Þ where, ρ max and ρ min represent the upper and lower bounds of the HSF, μ c is the preset variation-rate of the HSF, z(t) is the same state-errordriven variable as shown in Eq 27 [64], and γ(u, t) is the control-input-dependent self-adjusting variance of the HSF. The function γ(u, t) is specifically designed and implanted in the adaptation-law to realize the third meta-rule. The augmentation of γ(u, t) dynamically adjusts the variance of the adaptation law to maintain the controller's robustness without contributing highly discontinuous control activity. The shape of the HSF waveform is adjusted, under large servo requirements, as shown in Fig 5. The parameter γ o is the basic variance of the function, ω is the positive constant between [0, 1] that presets the lower bound of the variance, η is the positive weight of u(t), and ψ is the positive fractional exponent of the scaled u(t) that prevents the self-adjustment at smaller control signals. The aforementioned parameters are tuned offline by iteratively minimizing J e . The selected values of these parameters are recorded in Table 3 [68].

Adjustable swfs using error-phase observers
This section presents another practical adaptive control scheme that self-tunes the LQR gains by adaptively modulating all the state weighting-factors associated with the QPI [57].
For the under-actuated systems, the degrees-of-freedom to be stabilized are greater than the rank of R which makes it quite hard to establish a correlation between ρ and the state-variables [58]. However, the coefficients of the state-weighting-matrix Q (denoted as q x ) hold a one-toone correspondence with the respective state-variables. This arrangement provides a

PLOS ONE
Comparison of different hierarchical self-tuning regulators for under-actuated systems pragmatic approach to dynamically adjust the values of q x online. Apart from obviating the necessity to tune and preset the state-weighting-factors based on a specific performance criterion, this approach increases the degree-of-freedom of the controller design [58]. Each weighting-factor is dynamically adjusted by using pre-calibrated nonlinear functions that are driven by the corresponding state-error variables of the system, as shown in Eq 34.
QðtÞ ¼ diagðq a ðtÞ q y ðtÞ q _ a ðtÞ q_ y ðtÞÞ; R ¼ 1 ð34Þ The control weighting-factor is preset to unity to maintain an economical control activity. The time-varying state weighting-matrix, Q(t), is used to modify the solution of the Matrix-Riccati-Equation, after every sampling interval, as shown below [17].
The updated P(t) re-computes the state-feedback gains online by using the following update law.
The STR equipped with adjustable State-Weighting-Factors (SWF) is shown below.

uðtÞ ¼ À K s ðtÞxðtÞ þ K i εðtÞ ð37Þ
The block diagram of SWF-STR is shown in Fig 6. The following Lyapunov function is used to verify the asymptotic stability of the SWF-STR architecture [17].
The term _ PðtÞ approaches to zero in an infinite horizon control problem [35]. Thus, the simplified expression of _ V ðtÞ reduces to Eq 40.
The expression of first-derivative is negative-definite as long as Q(t)>0, which justifies the stability of the proposed STR.
This adaptation law relies upon the "phase" of the system's state-response(s) to adaptively tune the state-weighting-factors [53]. The baseline weight-adjusting functions are implemented via pre-calibrated HSFs that depend on the variations in the magnitude of the classical state-error and phase of the state-response. These HSFs are retrofitted with an auxiliary phase-

PLOS ONE
Comparison of different hierarchical self-tuning regulators for under-actuated systems observer that accurately "deduces" and informs the adaptation mechanism regarding the movement of the state-response (away or towards the reference) based only on the instantaneous polarities of the classical state-error and the state-error-derivative variables [54]. The "phase" information is also used to automatically "mutate" the shape of each HSF waveform. This synthetic self-deduction and self-mutation capability significantly enhances the robustness of the adaptive control procedure against exogenous disturbances; thus, making it highly suitable for damping control applications. The following qualitative rules are used to constitute the online adaptation mechanism [53].
1. When the response is diverging from the reference, the values of q x are inflated to apply a stiff control effort which damps the overshoots and reverses the direction of response.
2. When the response is converging to the reference, the values of q x are reduced to apply a soft control effort which allows the response to settle (naturally) with minimum fluctuations.
These characteristics induce rapid transits in the response with strong damping against oscillations while suppressing the peak servo requirements. However, this rationale requires precise information regarding the phase (direction of motion) of the response to restructure the control procedure. Consider the time-domain error profile of an arbitrary under-damped system, shown in Fig 7, under the influence of a bounded disturbance.
The error profile is divided into four phases; A, B, C, and D. Each phase represents a distinct operating condition that is addressed individually to attain the best control effort. The polarities of error and error-derivative are the same when the response is deviating from the reference (phases A and C). The polarities of error and error-derivative are opposite when the response is converging to reference (phases B and D) [53,54]. In lieu of this state-error behavior, the phase is observed as follows [69].
where, and m x is a step(.) function that yields a "zero" if its internal product yields a negative value and a "one" if the internal product yields a positive value, and 'x' denotes the state- variable being considered. This phase-observer is embedded within the structure of a stateerror-dependent HSF to alter the waveform's shape as the state-error changes [69]. The proposed self-mutating HSF is given in Eq 42 [59].
where, a x and b x are the positive upper and lower bounds of each function such that a x �b x to ensure q x (t)�0, and γ x represents the variance of each function. The proposed HSF complies with the aforementioned meta-rules. Each weight-adjusting function is augmented with its corresponding Boolean operator, m α or m θ . The logical rules governing the self-mutation of q x (t) are defined in Table 4. The mutation scheme is illustrated in Fig 8 [59]. In phases A and C, the response deviates from the reference. Since the error and error-derivative variables have the same polarities, the Boolean-setting of m x = 1 selects the growing function of the form � q x ðtÞ. This setting delivers a tight control effort to damp the overshoot (or undershoot). In phases B and D, the response converges to reference. The error and error-derivative variables have opposite polarities which lead to the Boolean-setting of m x = 0. This setting contributes a relatively gentle control effort to allow for a quick yet smooth settlement of the response.
With the commissioning of the phase-observer, the weight-adjusting function(s) autonomously reconfigure their waveforms as illustrated in Fig 9 [53]. The proposed augmentation strengthens the controller's disturbance-rejection capability by autonomously transforming the growing behaviour of the waveform into a decaying behaviour as the state-response transits from divergence phase to convergence phase, and vice-versa. The self-mutating errorphase-based HSFs are formulated as follows [59]. q a ðtÞ ¼ m a a a À ððb a À ð1 À m a Þa a Þ � sechðg a e a ðtÞÞÞ ð43Þ q y ðtÞ ¼ m y a y À ððb y À ð1 À m y Þa y Þ � sechðg y e y ðtÞÞÞ ð44Þ q_ y ðtÞ ¼ m y a_ y À ððb_ y À ð1 À m y Þa_ y Þ � sechðg_ y e y ðtÞÞÞ ð46Þ such that; m a ¼ stepðe a ðtÞ � _ e a ðtÞÞ; m y ¼ stepðe y ðtÞ � _ e y ðtÞÞ

PLOS ONE
Comparison of different hierarchical self-tuning regulators for under-actuated systems The hyper-parameters associated with each weight-adjusting function are tuned by iteratively minimizing J e to yield strong damping control. The tuned parameters are shown in Table 5 [59].
The adapted values of q x (t) remain positive throughout the operating regime, which ensures the system's stability. The STR equipped with the self-mutating error-phase-dependent HSFs is denoted as the "EP-STR" [59].

Adjustable SWFs using error-magnitude observers
The proposed scheme dynamically updates the state-feedback gains, after every sampling interval, by adaptively modulating the state-weighting-factors as well as the control-input weighting factor associated with the QPI, concurrently, by using online state-error dependent expert self-tuning mechanism(s) [57]. This arrangement is beneficial because it indirectly alters the state-feedback gains by harnessing the full potential of the proposed hierarchical adaptive LQR scheme by dynamically adjusting all the user-specified constituent weightingfactors of the Riccati equation.
It enhances the adaptability of control procedure to realize the environmental indeterminacies and flexibly steer the control profile to compensate for the consequent parametric variations. The weighting-matrices containing the self-adjusting coefficients are represented as follows.
QðtÞ ¼ diagðq a ðe a ; tÞ q y ðe y ; tÞ q _ a ð_ e a ; tÞ q_ y ð_ e y ; tÞÞ; RðtÞ ¼ rðe a ; e y ; tÞ ð47Þ The value of R is maintained at unity to economize the control energy expenditure. The rationale and the methodology used to formulate the state-error dependent online self-tuning mechanism(s) for the state-weighting-factors is discussed in the following discussions. The restructured Riccati Equation is expressed in Eq 48.
The Riccati Equation yields a time-varying solution, P(t), after every sampling instant. The self-adjusting state-feedback gain vector is calculated by using Eq 49.
The proposed STR law is defined as follows.
This self-tuning strategy observes the real-time variations in the state-error magnitudes to dynamically adjust the weighting-factors while preserving the system's stability throughout the operating regime. The rationale used to develop the error-magnitude observer for self-tuning control of robotic systems has been experimentally verified in the available literature [11]. It relies upon the following two meta-rules to modify the critical controller parameters [52].

PLOS ONE
Comparison of different hierarchical self-tuning regulators for under-actuated systems 1. The proportional state-weighting-factors (q α and q θ ) are inflated as the magnitude of corresponding classical state-errors tend to reduce, and vice-versa.
2. The differential state-weighting-factors (q _ a and q_ y ) are inflated as the magnitude of corresponding state-error-derivatives tend to increase, and vice-versa.
Together, these characteristics dynamically reconfigures the control procedure to strengthen the system's disturbance-compensation capability [11,52]. To ensure a smooth transition of the weighting-factors, nonlinear scaling functions are required to be continuous, bounded, and even-symmetric. Hence, the weight-adaptation functions are implemented via partial-hyperbolic-functions (PHFs), whose shapes and forms are configured offline according to the above-mentioned qualitative rules [70]. It is to be noted that the hyperbolic secant functions and zero-mean Gaussian functions can also be used instead of the PHF to mathematically

PLOS ONE
Comparison of different hierarchical self-tuning regulators for under-actuated systems program the said adaptation law [64,68]. The error-magnitude driven PHFs used to scale each state and control weighting-factor are formulated below [71].
q_ y _ e y ; t ð Þ ¼ a_ y À b_ y 1 þ jg_ y _ e y ðtÞj 2 ð54Þ r e a ; e y ; t ð Þ ¼ a u À b u 1 þ jg u ðg a e a ðtÞ þ g y e y ðtÞÞj 3 ð55Þ where, a x and b x represent the prescribed upper and lower bounds of the state-weighting functions, and γ x represents the variance of the state-weighting functions. The waveforms of the weighting-adjusting functions are shown in Fig 10. A proper selection of the γ x enables the controller to apply a stiffer control effort under disturbed state and a softer control effort under equilibrium state of the system. This arrangement strengthens the system's damping against fluctuations, yields minimum-time transient recovery and renders a smoother control activity [71]. It also averts the limit-cycles contributed by static-friction during dead-zones. In this mechanism, the value of ρ is also adaptively modulated as a nonlinear function of the classical state-error variables. This arrangement prevents the actuator from getting saturated due to the rapid fluctuations and large overshoots in the control-input profile, without trading-off the system's robustness, under exogenous disturbances [52]. It contributes rapid transits with strong damping against disturbances while economizing the control-energy expenditure [71]. The prescribed bounds of each hyperbolic function are carefully selected so that q x (.)>0 and ρ(.)>0, under every operating condition, to ensure an asymptotically stable control behavior. The hyper-parameters associated with each weight-adjusting function are tuned by iteratively minimizing J c to attain the best position-regulation accuracy. The tuned parameters are presented in Table 6 [71]. The STR constructed via the error-magnitude driven PHFs is denoted as "EM-STR".

Comparative performance assessment
This section presents a detailed overview of the hardware setup, testing procedure, and comparative experimental analysis of the proposed control schemes.

Experimental setup
The proposed self-tuning control mechanisms are analyzed by conducting hardware experiments on QNET RIP hardware setup [62]. The angular displacements, θ and α, are measured in real-time by using the optical rotary encoders that are commissioned on-board the hardware setup. These encoders are installed at the pivot of the pendulum rod and with the motor's shaft, respectively. The hardware setup uses NI-ELVIS II data-acquisition board to capture the  encoder measurements and digitize them at a sampling rate of 1000 Hz [11]. The digitized measurements are then serially transmitted to the software control routine at 9600 bps. The customized control routine is digitally implemented by using the "Block Diagram" tool as well as the built-in mathematical functions available in the virtual-instrument file of the LABVIEW Software. The said software is running on a 2.0 GHz digital computer with 8.0 GB RAM. After every sampling instant, the control routine receives the updated sensor measurements, adjusts the critical controller parameters, and computes the modified control signal. The control routine uses the computer's built-in real-time clock to plan the successive updates in weighting factors after every sampling interval. The front-end of the control software acts as a user interface that records and graphically displays the real-time variations in the state and controlinput. The generated control signals are serially transmitted back to the motor driver circuit that is installed on-board the hardware setup. The driver circuit translates the incoming motor control signals into pulse-width-modulated commands that are subsequently amplified to actuate the DC motor. The DC motor and its driving circuit, commissioned on the RIP hardware setup, are durable and agile enough to handle the discontinuous control activity contributed by the proposed control schemes. The QNET Rotary Pendulum's hardware setup is shown in Fig 11.

Tests and results
The position-regulation and disturbance-compensation capability of the proposed adaptive control schemes are compared by conducting five unique "hardware-in-the-loop" experiments on the QNET pendulum setup. The time-domain state and control-input variations are recorded for comparative analysis. The graphical results pertaining to θ and α are depicted in degrees (or deg.) to simplify the visual understanding. The detailed testing procedures along with the corresponding graphical results are presented as follows: A. Reference tracking: The position-regulation behavior of the pendulum under normal conditions is analyzed by allowing the rod and the arm to track their respective reference positions. The variations in θ(t), α(t), V m (t), and K(t) are shown in Fig 12. B. Impulsive-disturbance compensation: The controller's ability to compensate the impact of bounded impulsive disturbances is examined by applying a pulse-signal in the V m (t) profile to perturb the state-response(s). The applied pulse has a time-duration and a peak-magnitude of 100.0 ms and -5.0 V, respectively. The pulse signal is injected in the control response at discrete intervals. The resulting variations in θ(t), α(t), V m (t), and and K(t) are shown in Fig 13. C.
Step-disturbance attenuation: The controller's ability to attenuate random exogenous torques is assessed by injecting a -5.0 V step-disturbance signal in the V m (t) profile at t � 5.

Analysis and discussions
The quantitative analysis of the experimental results is done with the aid of the following seven Key-Performance-Indicators (KPIs): a. The root-mean-squared value of error (RMSE x ) in the pendulum angle response(s).
b. The mean-squared value of the applied DC motor voltage (MSV m ).
c. The magnitude of the peak overshoot (OS θ ) observed in θ(t). d. The time taken by pendulum's rod (t set ) to settle within ±2% of the reference after a disturbance.
e. The disturbance-induced angular offset in the arm's position (α offset ).
f. The peak-to-peak amplitude of the disturbance-induced fluctuations in the arm's position (α pp ).
g. The magnitude of peak motor voltage (V m,p ).
The aforementioned KPIs are used as the standard performance measures in the available literature to critically analyze the position-regulation behavior, disturbance-rejection capability, and control energy requirements of the system [15,64]. The experimental results, expressed in terms of the aforementioned KPIs, are summarized in Table 7. The proposed control schemes remain stable under every disturbance condition. The results clearly indicate that the generic LQR underperforms as compared to the adaptive controller variants in every test case. The ADoS-STR exhibits a moderately better position regulation behavior as compared to generic LQR. Its control-input economy is relatively better than ACWF-STR in every testcase. The ACWF-STR manifests significant improvement in the robustness but also renders highly discontinuous control activity which contributes to chattering in the response of θ(t).
The EP-STR exhibits a time-optimal behavior as compared to ACWF-STR and ADoS-STR. Apart from contributing enhanced disturbance-rejection; it delivers better control-input efficiency than other STR variants while maintaining the system's asymptotic-stability throughout the operating regime. However, its time-domain performance is inferior to that of EM-STR, especially under the testing scenarios Test A, C, and E. The EM-STR demonstrates significant enhancement in the disturbance compensation capability and position-regulation accuracy as compared to the EP-STR. However, amid transient disturbances, the EM-STR consumes relatively large control energy and exhibits large peaks in the control voltage profile. A concise qualitative analysis of the performances of the proposed STR variants is summarized as follows: In Test-A, the RIP exhibits the largest deviations in the angular responses under the influence of LQR. The deviations in the responses of θ and α progressively reduce as the nominal LQR is retrofitted with enhanced adaptation mechanisms. The ASWF-adapted EM-STR shows optimum position-regulation accuracy with minimum reference-tracking error, minimum chattering, and reasonably low control-energy consumption as compared to other adaptive controller variants (except for EP-STR). The EP-STC shows the second-best positionregulation performance and the best control energy expenditure amongst the other STR variants. The pendulum response of ADoS-STR shows a fixed offset of 0.3 deg. from the vertical reference throughout the experimental trial. The ACWF-STR shows persistent chattering in θ (t).
In Test-B, the ILQR demonstrates the slowest transient-recovery and insufficient damping against the impulsive disturbances. It demonstrates the largest magnitude of the peak overshoot in the pendulum's response, which is followed by persistent steady-state oscillations. The ACWF-STR continues to exhibit a highly discontinuous control activity. The EP-STR exhibits minimum transient-recovery time to effectively attenuate the oscillations and shows minimum OS θ while attenuating the impulsive disturbances. The EP-STR consumes minimum average control-input energy (MSV m ). Its peak servo requirements are also much smaller than that of EM-STR. The EM-STR shows minimum steady-state fluctuations upon convergence, owing to the augmentation of the phase-based self-learning capability of the controllers.
In Test-C, the step-disturbance permanently displaces the arm from its reference position. The LQR manifests the largest post-disturbance displacement in the arm's position and large oscillations in the rod. The intermediate STR variants demonstrate moderately better transient-recovery behavior with reasonable damping against the oscillations. The EM-STR, however, effectively suppresses the influence of the applied step-disturbance by contributing minimum RMSE and offset in the nominal positions of the pendulum and the arm, respectively. It exhibits the minimum α offset and the minimum peak-to-peak amplitude of the oscillations in the pendulum's responses, θ(t). Furthermore, EM-STR contributes a slightly better control-input economy as compared to EP-STR. The ADoS-STR exhibits the most economical control-input behavior in this test-case.
In Test D, the EP-STR effectively attenuates the ripples in the response caused by the noise. Despite the noise, the EP-STR controlled system manages to regulate the pendulum at the desired reference position(s) with minimal RMSE and minimum control voltage requirements. The EM-STR exhibits the second-best time-domain behavior, in terms of the controlenergy expenditure and position-regulation accuracy.
In Test-E, the EM-STR again surpasses other STR variants compared in this article. It robustly compensates the perturbations induced by the modeling-error by delivering strongly damping against the oscillations in the state-responses, and thus, minimizing the referencetracking error as well as the control energy consumption. The EM-STR effectively attenuates the peak-to-peak amplitude of the post-disturbance oscillations in the state-responses. The control activity of the AI-STC controlled system is relatively smoother than the EP-STR variants. The EP-STR exhibits the minimum RMSE in the pendulum's angular profile, θ(t). The ADoS-STR exhibits the most economical control-input behavior in this test-case.
From a functional point of view, the state-feedback gains associated with EM-STR respond and adapt to the real-time state-variations relatively quickly. Unlike other STR variants, the abrupt yet small variations of EM-STR gains justify its enhanced adaptability, robustness, and smoother control activity under exogenous disturbances. This flexibility is attributed to the dynamic self-adjustment of all weighting-factors associated with the ARE. The enhanced adaptability of EM-STR comes at the cost of tuning a relatively large number of hyper-parameters (as compared to other adaptation mechanisms discussed here). However, the betterment in the performance is sufficient to ignore this drawback.
The experimental analysis validates the superior position-regulation accuracy and enhanced robustness of the EM-STR in almost every test-case. It manifests better adaptability under perturbed conditions as compared to other STC variants. It effectively removes the inherent shortcomings of other adaptation mechanisms, which enables it to flexibly steer the control trajectory. However, EM-STR does consume large control energy as compared to the other controllers in almost every test-case. The EP-STR shows the second-best time-domain performance after EM-STR.
The proposed hierarchical control procedure is highly scalable. Each controller variants exhibits a certain degree of resilience against the aforementioned disturbance scenarios. However, in future, the proposed control procedure can also be augmented with auxiliary neurofuzzy adaptive compensators, suggested in [72,73], to effectively handle the hardware limits imposed on under-actuated systems; such as input and actuator dead-zones, limit cycles, and parametric uncertainties associated with the system's actuated and un-actuated state-variables.
The constitution of the proposed hierarchical control procedure only requires the a priori identification of the systems linear state-space model and pre-calibrated weight-adjusting functions. Thus, apart from the self-stabilizing mechatronic platforms, the proposed control schemes can be easily extended to flexible-joint robotic manipulators and other classes on under-actuated systems as well [74].

Conclusion
This paper presents the comparative performance assessment of four state-error-driven hierarchical adaptive control strategies that enhance the disturbance-rejection capability of closedloop under-actuated mechatronic systems. Each adaptation mechanism dynamically reconfigures the constituents of the Riccati equation in an innovative manner to self-tune the statefeedback gains of LQR. The proposed architecture delivers adaptive actions in real-time without explicitly relying on the estimation of state-dependent-coefficients in the system's statespace model. This feature makes it highly scalable and computationally economical. The improvement in time-domain performance and robustness imparted by each self-tuning-regulators, discussed in this article, is analyzed under practical disturbance scenarios by conducting real-time hardware experiments on the QNET rotary pendulum system. The experimental outcomes validate the superior robustness and position-regulation accuracy of the EM-STR scheme in almost every test case. It is a resourceful scheme that utilizes the full state-errorfeedback to self-adjust the state and control-input weighting-factors of the QPI online. The EP-STR delivers the second-best time-domain performance and maintains a reasonable control-input economy. Furthermore, EP-STR excels EM-STR under the influence of the step-disturbance. Its ability to self-mutate in real-time increases the controller's degree-of-freedom which enhances the system's response speed and damping against disturbances. In the future, the performance of the proposed control scheme can be further investigated by employing expert adaptive systems that are driven by soft computing techniques. The proposed reconfiguration schemes can also be enhanced by self-regulating the variances and exponents of the hyperbolic functions. The feasibility of the proposed controller(s) can also be analyzed by extending it to other mechatronic systems.