An artificial EMG generation model based on signal-dependent noise and related application to motion classification

This paper proposes an artificial electromyogram (EMG) signal generation model based on signal-dependent noise, which has been ignored in existing methods, by introducing the stochastic construction of the EMG signals. In the proposed model, an EMG signal variance value is first generated from a probability distribution with a shape determined by a commanded muscle force and signal-dependent noise. Artificial EMG signals are then generated from the associated Gaussian distribution with a zero mean and the generated variance. This facilitates representation of artificial EMG signals with signal-dependent noise superimposed according to the muscle activation levels. The frequency characteristics of the EMG signals are also simulated via a shaping filter with parameters determined by an autoregressive model. An estimation method to determine EMG variance distribution using rectified and smoothed EMG signals, thereby allowing model parameter estimation with a small number of samples, is also incorporated in the proposed model. Moreover, the prediction of variance distribution with strong muscle contraction from EMG signals with low muscle contraction and related artificial EMG generation are also described. The results of experiments conducted, in which the reproduction capability of the proposed model was evaluated through comparison with measured EMG signals in terms of amplitude, frequency content, and EMG distribution demonstrate that the proposed model can reproduce the features of measured EMG signals. Further, utilizing the generated EMG signals as training data for a neural network resulted in the classification of upper limb motion with a higher precision than by learning from only measured EMG signals. This indicates that the proposed model is also applicable to motion classification.


Introduction
Surface electromyogram (EMG) signals obtained from the skin surface represent the action potential generated from muscle fibers constituting each motor unit, and reflect muscle H and variance s 2 t . Variance s 2 t is the value at t of a random variable σ 2 having a distribution determined by a commanded muscle force component of variance " s 2 , derived from Eq (1), and signal-dependent noise ε according to the commanded muscle force " F. The relationship between " F and " s can be expressed as An artificial EMG generation model based on signal-dependent noise where k and a are constants that can be experimentally estimated [6,7]. Then, σ 2 is represented by the sum of " s 2 and ε: Assuming that ε is a random noise with a zero mean, the mean E[σ 2 ] and variance Var[σ 2 ] of σ 2 are calculated as follows: Var ½s 2 ¼ E½ðs 2 À " Considering that σ 2 > 0, the inverse gamma distribution IG(α, β) is chosen as the distribution of σ 2 [13]: where α and β are parameters that determine the inverse gamma distribution and are referred to as the shape parameter and the scale parameter, respectively [20]. The relationships between [α, β] and the mean and variance of σ 2 are expressed as follows: " Var ½ε ¼ Var The artificial EMG signal z t can be defined as the product of w 0 t and random number series σ t , which is generated from the inverse gamma distribution determined by the mean " s 2 and the variance Var[ε]: where w 0 t is the Gaussian noise process, which has the same power spectrum as stationary EMG signals, and is generated from the following shaping filter based on an Mth-order autoregressive (AR) model: where w t is white Gaussian noise with mean = 0 and variance = 1, v is the estimated variance of error, and a j (j = 1, Á Á Á, M) is the coefficient of the AR model. Because these AR parameters are estimated by normalized EMG signals with variance = 1, w 0 t becomes Gaussian noise with mean = 0 and variance = 1.
The above procedure is the general form for generating artificial EMG containing signaldependent noise superimposed according to the commanded muscle force (expressed as the commanded muscle force-based generation in Fig 1). However, this procedure requires the estimation of the parameters in Eq (1) that change depending on the skin condition and the location of the electrodes for each subject. Therefore, this paper proposes another approach called EMG variance-based generation. The proposed approach uses rectifying and smoothing EMG signals by directly estimating the distribution of σ 2 using a small number of EMG samples recorded in advance.

EMG variance-based generation
Hayashi et al. assumed that EMG signal x follows a Gaussian distribution with a mean of zero and a variance that follows an inverse gamma distribution. Consequently, they proposed an approximate estimation method for the mean and variance of variance that utilizes the property of rectified-smoothed EMG signals [13]. Unlike [13], we propose a new method to determine " s 2 and Var[ε] by proportional modulation with an introduced gain η: where a 0 , a 1 , Á Á Á, a N−1 and b 0 , b 1 , Á Á Á, b N are the coefficients of an Nth-order low-pass filter, η is the gain for proportional variance modulation, y t is a rectified-smoothed EMG signal at t, E[y t ] is the expectation of y t , andâ is the shape parameter of variance distribution. From Eqs (3) and (4) s 2 by settingâ in advance. a is fixed and is estimated prior by maximizing the marginal likelihood of pre-measured EMG dataset x pre using the steepest descent method. Note that the proportional gain η modulates the EMG variance; thus, in the case of η = 1, the proposed model generates artificial EMG signals to reproduce the variance distribution of the measured signals.
As stated above, " s 2 and Var[ε] can be estimated and modulated using the rectifiedsmoothed signal y t , shape parameterâ, and proportional gain η based on Eqs (10) and (11).

Experiments
Ethics statement. This study was approved by the Human Research Ethics Committee of the Hyogo Institute of Assistive Technology. All subjects were told the aim of the experiments and provided written informed consent before participating in the trial. The individual in this manuscript has given written informed consent (as outlined in PLOS consent form) to publish these case details.
Subjects. Ten healthy young adults (males, age range: 22-24 years; mean age: 21.8 ± 1.0 years) and four healthy young adults (males, age range: 21-23 years; mean age: 22.6 ± 0.8 years) were recruited in Experiment 1 and Experiment 2, respectively. Both experiments were conducted in Hyogo Rehabilitation Center from August 2015 to January 2016. All subjects were right-handed and were included on the basis of the following criteria: no previous physical, neurological, or sensory disorders, no medication that might influence their muscle activity, and no history of intense exercise in the previous 24 hours. Experiment 1: Evaluation of generated artificial EMG signals. We conducted an evaluation experiment for artificial EMG signals generated using the proposed model. In the experiment, we first measured the EMG signals during constant isometric contraction of the biceps brachii of ten healthy subjects. By using the measured EMG signals, the variance distribution parameters " s 2 and Var[ε] and the parameters for the shaping filter H were estimated. Artificial EMG signals were then generated based on the estimated parameters. Further, the accuracy of the generated signals was evaluated by comparing them with the measured EMG signals.
For EMG signal recording, the subjects were seated, with the right upper arm pointing downward, the right forearm bent forward to the horizontal, and the palm turned upward (Fig 2). EMG signals were recorded using a pair of electrodes attached to the skin surface of the biceps brachii at a sampling frequency of 1000 Hz while the subjects were weighted with a load hanging vertically on the right wrist with the elbow on a desk (Fig 2). The subjects were instructed to maintain the posture for 10 seconds with the elbow at 90˚. The load weight was varied through values of 500, 1000, 1500, and 2000 g, and one trial was conducted for each load weight. The range of these load weights was selected on the basis of the following criteria: it appears in everyday activities, subjects do not feel muscle fatigue, and the differences in the muscle activation levels can be acquired clearly. The latter five-second part of the ten seconds of recorded data was used for comparison and variance distribution estimation. A multi-telemeter system (NIHON KOHDEN, WEB-5000, high-frequency cutoff: 100 Hz, low-frequency cutoff: 5.4 Hz) was used for measurement.
To evaluate the reproducibility of the proposed model with respect to the measured EMG signals, the estimation of variance distribution was conducted for each load weight. The model order of the shaping filter was determined as M = 20 based on the Bayesian information criterion (BIC) [21], and the estimated variance of error v and the model coefficients a j were determined using the Burg method [22]. The proportional gain was determined as η = 1.0, meaning that the variance modulation was not applied. A second-order Butterworth low-pass filter (cutoff frequency: 1 Hz) was used to smooth the EMG signals. Generation of artificial EMG signals using estimated parameters was also conducted, with ten trials for each load weight. Tanizaki's method [23] and Box-Muller's method [24] were used to generate inverse gamma and Gaussian random numbers, respectively.
The accuracy of the artificial EMG generated was evaluated in terms of average amplitude, frequency component, and kurtosis of EMG distribution. In general, the amplitude and the frequency component are the important features of an EMG signal. Kurtosis is the fourth central moment of distribution, and was utilized to evaluate the influence of the variation in The subjects were seated with the right upper arm pointing downward, the right forearm bent forward to the horizontal, and the palm turned upward. EMG signals were recorded from a pair of electrodes attached to the biceps brachii while the subjects were weighted with a load on the right wrist and maintained the right elbow on a desk. variance on the shape of the EMG distribution. The average amplitude was determined from the average value of the rectified and smoothed signals. Further, the absolute percentage errors in the average amplitudes between the measured and artificial EMG signals were calculated for each load weight. With respect to the frequency component, the power spectrum densities of the measured and artificial EMG signals were calculated using the M 0 th-order AR model as follows: where f is frequency, a 0 j ðj ¼ 1; Á Á Á ; M 0 Þ is the AR coefficient, and v 0 is the estimated variance error. The model order M 0 was determined as M 0 = 20 based on the BIC. a 0 j ðj ¼ 1; Á Á Á ; M 0 Þ and v 0 were determined using the Burg method. The correlation coefficients in the power spectrum densities between the measured and artificial EMG signals were then calculated. Finally, the kurtosis of the EMG distribution was calculated for each load weight. Note that kurtosis is a measure of the tailedness of a probability distribution. The sample kurtosis for a univariate random process x ¼ fx n g N n¼1 can be calculated as follows: where " x and s are the mean value and standard deviation of x, respectively. The root mean square error (RMSE) in the kurtosis between the measured and the artificial EMG signals was then calculated in all trials as follows: where T is the number of trials (T = 10) for each load weight, and K t andK t are the sample kurtosis of the measured and the artificial EMG signals at trial t, respectively. In the proposed model, the shape parameterâ and the parameters of the shaping filter need to be set in advance of artificial EMG generation using pre-measured EMG signals for each subject. Hayashi et al. [13] assumed that the shape parameter is a constant within an individual regardless of muscle activation levels. However, this assumption has not been verified experimentally. Therefore, because the muscle activation level on the pre-measured EMG signals can affect the generation accuracy of the proposed model, generation and evaluation of the artificial EMG signals were conducted by changing the source of the preset parameters. These preset parameters were set for each subject using EMG signals recorded in advance under each load weight.
For comparison, artificial EMG signals were also generated based on the Hogan and Mann's model [6,7], and evaluated. The major differences between the method based on the Hogan and Mann's model and the proposed method is that EMG variance is handled as a constant and is estimated using maximum likelihood estimation in the former method. Finally, artificial EMG generation based on the proposed model with the variance modulation by the proportional gain was conducted to evaluate its generation accuracy. The EMG signals recorded under a 1000 g load were set as the reference, and the gain η in Eq (10) was defined as follows: where f w is the muscle force of the biceps brachii at load weight w = 500, 1000, 1500, 2000 g, and is calculated from each load weight following the procedure adopted by Hayashi et al. [13], with the body weight and the length from the elbow axis to the ulnar styloid. Note that by using this proportional gain, the artificial EMG signals at each load weight were generated only from the measured signals under a 1000 g load. These generated EMG signals were evaluated by comparing the measured EMG signals at each load weight. We compared the average value of the ten subjects in each index among the proposed method, the constant variancebased method, the and proposed method with the variance modulation. In this comparison, the shape parameterâ and the parameters of the shaping filter were set for each subject using the pre-measured EMG signals under a 2000 g load, and they were set in common throughout the comparison. Experiment 2: Motion classification. To evaluate the applicability of the proposed model to motion classification, a classification experiment was conducted in which the generated EMG signals were utilized as training data for a neural network. This experiment was conducted on four healthy subjects (Subjects A-D). EMG signals were recorded using six electrodes (L = 6: Ch. 1: extensor carpi ulnaris; Ch. 2: flexor digitorum profundus; Ch. 3: extensor digitorum; Ch. 4: flexor carpi ulnaris; Ch. 5: triceps brachii; Ch. 6: biceps brachii) at a sampling frequency of 1000 Hz (Fig 3). The subjects performed six motions (C = 6): flexion, extension, supination, pronation, hand open, and hand grasp. The EMG measurement system and the parameters for the smoothing process were the same as in the evaluation experiment. Prior to the experiment, pre-measurement of EMG signals during a maximum voluntary contraction (MVC) was conducted for each motion. The channel corresponding to the agonist muscle in each motion was then determined as follows: whereŷ ðlÞ t is a pre-measured rectified-smoothed EMG signal of channel l at time t and E t ½ŷ ðlÞ t is the expectation ofŷ ðlÞ t regarding t. The parameterâ and parameters of the shaping filter were also calculated from the pre-measured EMG signals for each subject.
In EMG signal recording, the muscle activation level r ðlÞ t ðl ¼ 1; 2; Á Á Á ; LÞ in each channel was calculated as a percent of MVC (%MVC) simultaneously with the measurement: where y ðlÞ t is the rectified and smoothed EMG signal of channel l at t, and y ðlÞ max is the maximum value of pre-measuredŷ ðlÞ t during the MVC. The subjects were presented with the muscle activation level of the agonist muscle r ðmÞ t by using a bar graph in real time (Fig 4). The white vertical line in Fig 4 showed the desired muscle activation level, and the subjects were instructed to perform each motion while maintaining this line. First, the subjects performed and maintained each motion for 10 seconds with the desired muscle activation level r ðmÞ t at 40%, and recording of the training data was conducted. Next, the task of maintaining each motion for 10 seconds was conducted over 10 trials in two conditions of the desired muscle activation level at 40% and the target muscle activation level at 80%, and recording of the testing data was also conducted. Feature extraction for the training and testing data was then conducted according to the method proposed by Fukuda et al. [4], in which the measured EMG signals are rectified and smoothed, and then normalized to make the sum of all the channels equal to 1.0. In the motion classification task, a neural network called the log-linearized Gaussian mixture network (LLGMN) [4,25], which can estimate the posterior probability of each class, was used. For LLGMN learning, 100 samples of the artificial EMG signals at 80%MVC, which were generated based on the proposed model, and 100 samples of the measured EMG signals, which were randomly sampled from the training data at 40%MVC, were used for each motion.
To calculate the mean variance " s 2 ðc;lÞ of the channel l in the motion c (c = 1, 2, Á Á Á, C) at 80% MVC, Eq (10) was vectorized: where η (c,l) is the proportional gain of the channel l in the motion c, and is defined as follows: where λ (c,m c ) is the entire time mean value of the muscle activation level of the channel m c corresponding to the agonist muscle in the motion c at 40%MVC. The classification rate was calculated for 1000 samples, sampled from the 5000-th sample to the 6000-th sample of the testing data, and the average classification rate was derived from ten trials.
In the proposed method, 200 samples, including 100 samples of the measured data and 100 samples of the artificially generated data, were used for LLGMN learning, as described above. It is well known that motion classification based on machine learning tends to improve classification accuracy by only increasing the number of training samples in general. For comparison, therefore, the average classification rate was also calculated for two cases: (1) learning conducted with only 100 samples of the measured data at 40%MVC, and (2) learning conducted with 200 samples accumulated by simply increasing the number of samples of the training data at 40%MVC by random sampling.   An artificial EMG generation model based on signal-dependent noise    Fig 5 shows that the amplitudes of the measured and artificial EMG signals increased together as the load weights increased. In addition, the amplitudes of the artificial EMG signals are similar to those of the measured EMG signals for each load weight. This result can be confirmed from the average absolute percentage errors in the amplitude shown in Fig 6. In the case where the measured EMG signals were reproduced for each load weight, the error rates in the amplitude of the proposed method are approximately 4% for all sources of the preset parameters and the load weights. No significant differences were found between the results of the proposed method without modulation and those of the constant variance-based method (Fig 6b). However, the standard deviations of the constant variance-based method are larger than those of the proposed method because the accuracy of the variance estimation based on Hayashi An artificial EMG generation model based on signal-dependent noise et al.'s method [13] is better than that of the estimation based on the maximum likelihood method. In the case of using the proportional variance modulation with the gain, the proposed model can predict/generate artificial EMG signals for other load weights with an error rate of 10% or less even if the reproducibility of the EMG amplitude tends to be worse, except for the reference load weight of 1000 g. An artificial EMG generation model based on signal-dependent noise Fig 7 shows that the correlation coefficients in the power spectrum densities exhibit strong correlations of over 0.90 regardless of the source of the preset parameters and the load weights for each subject. In addition, no significant differences were confirmed among the three methods because the frequency setting component was the same in each method (Fig 7b). The proposed model therefore can reproduce the frequency component with high accuracy because the frequency components of the proposed model are determined using the AR model given from each subject.

Discussion
In Fig 8a, no significant differences in RMSE in the kurtosis due to the differences in the recording source of the preset parameters are shown. Fig 8b shows that the RMSE in the kurtosis for the proposed method without/with modulation tends to be lower than that for the constant variance-based method. This result indicates that the artificial EMG signals generated using the proposed model can express the kurtosis of the measured EMG more precisely than those generated by the constant variance-based method. In the case of EMG generation using the constant variance-based method, the artificial EMG follows a Gaussian distribution. However, Hunter et al. experimentally showed that the probability density of EMG is more sharply peaked near zero than a Gaussian distribution [26]. Bilodeau et al. and Nazarpour et al. also reported that the measured EMG density has a larger kurtosis than a Gaussian distribution [27,28]. In contrast, because the variance is randomly determined from the variance distribution in the proposed model, the artificial EMG signals generated based on the proposed model do not follow a Gaussian distribution. Instead, they follow a distribution with a kurtosis that is more similar to the measured EMG. This indicates that the proposed model based on variance distribution enables artificial EMG signal generation with consideration of the signal-dependent noise affecting fluctuations in the EMG variance value. It should be noted that this representation capability is the most significant point of the proposed model as conventional generation methods cannot generate artificial EMG signals including this noise.
Thus, it is clear that the proposed model can generate artificial EMG signals that reproduce the amplitude, frequency component, and kurtosis of the measured EMG signals. The usability of the proposed artificial EMG generation model is also suggested from the viewpoint of the preset parameter setting because the recording source of the preset parameters does not significantly influence the generation accuracy of the model. Moreover, artificial EMG signals at the arbitrary muscle activation level can be generated from EMG signals recorded under other muscle activation levels with a certain precision if the proportional gain in Eq (10) is appropriately given. Fig 9 shows that the artificially generated EMG signals based on the proposed model possess features that are close to those of the measured EMG signals at 80%MVC. This result indicates that multi-channel artificial EMG signals with a high muscle contraction level can be generated from pre-measured EMG signals with a low muscle contraction level using the proposed method. In Fig 10a, the testing data for the muscle activation level of 40%MVC show that there are no significant differences between each method in the subjects, except for Subject C, and in the averages of all the subjects. These results suggest that the classification ability of the proposed method is equal to or better than that of every other method when the muscle activation levels of the training and testing data are equal.
By contrast, on the testing data for the muscle activation level of 80%MVC, the proposed method shows significantly higher classification rates than other methods in all the subjects and in their average (Fig 10b). The decreases in the classification rates for the methods, where the measured EMG signals are only learned, can probably be attributed to the increased fluctuation of the EMG patterns of the test data by signal-dependent noise during the strong muscle contraction. Increasing the number of learning samples from the measured signals tends to improve the classification rate. However, the proposed method can be used to generate artificial EMG signals with strong muscle contraction involving noise superimposed onto the EMG depending on the increased muscle force estimated using only the measured EMG signals with low muscle contraction. This facilitates accurate classification without increasing the burden on subjects during the training data collection. These results indicate that the proposed artificial EMG generation model is highly applicable to motion classification via machine learning.
The motion classification experiment conducted in this study was performed only in an offline condition. However, it is known that high classification accuracy in an offline condition yields better classification performance in a real-time context [29]. The classification accuracy in the online condition also tends to be better than that in the offline condition because the subjects can adjust their EMG pattern according to the feedback of the classification results [30]. Therefore, it can be expected that the proposed classification method will also work effectively in an online environment. In addition, the proposed method should have high applicability in a real-time context because it can directly generate artificial EMG signals for training data using measured EMG signals by setting the parameters in advance. On the other hand, because many factors (e.g. sensor movement, sweating, and muscle fatigue) affect classification performance in the online condition, verification of the robustness of the proposed method against such environments will have to be carried out in future studies.

Conclusion
This paper proposed an artificial EMG generation model based on signal-dependent noise. The proposed model estimates the variance distribution of EMG signals using the inverse gamma distribution, and generates artificial EMG signals with signal-dependent noise superimposed according to muscle activation levels. This is the major distinctive feature of our method compared with existing artificial EMG generation models.
The evaluations conducted on the generated artificial EMG signals and the comparison in terms of amplitude, frequent component, and kurtosis of EMG distribution revealed that the proposed variance distribution-based generation method can reproduce the features of the measured EMG signals during isometric muscle contraction. In the motion classification experiments conducted, the classification rates during strong muscle contraction were improved by using artificial EMG signals for training data. Thus, it is clear that the proposed model can generate artificial EMG signals having similar features to the measured EMG signals by setting suitable variance distribution and frequency characteristics. Moreover, it is possible to effectively apply the proposed model to motion classification.
A limitation of the proposed model and the proposed classification method is an assumption that a linear relationship exists between the muscle activation level and the mean of σ in Eqs (10) and (18). Previous studies found that the relationship between the muscle force and the EMG amplitude is sometimes nonlinear [31,32]. In future research, therefore, it will be necessary to consider this nonlinear relationship for a more accurate estimation and generation of EMG signals during strong muscle activation. Further, we would like to apply the proposed model to control myoelectric hands.

Author Contributions
Conceptualization: TT HH.