Figures
Abstract
Machine learning method has become a popular, convenient and efficient computing tool applied to many industries at present. Multi-hole pressure probe is an important technique widely used in flow vector measurement. It is a new attempt to integrate machine learning method into multi-hole probe measurement. In this work, six typical supervised learning methods in scikit-learn library are selected for parameter adjustment at first. Based on the optimal parameters, a comprehensive evaluation is conducted from four aspects: prediction accuracy, prediction efficiency, feature sensitivity and robustness on the failure of some hole port. As results, random forests and K-nearest neighbors’ algorithms have the better comprehensive prediction performance. Compared with the in-house traditional algorithm, the machine learning algorithms have the great advantages in the computational efficiency and the convenience of writing code. Multi-layer perceptron and support vector machines are the most time-consuming algorithms among the six algorithms. The prediction accuracy of all the algorithms is very sensitive to the features. Using the features based on the physical knowledge can obtain a high accuracy predicted results. Finally, KNN algorithm is successfully applied to field measurements on the angle of attack of a wind turbine blades. These findings provided a new reference for the application of machine learning method in multi-hole probe calibration and measurement.
Citation: Zhou S, Wu G, Dong Y, Ni Y, Hao Y, Jiang Y, et al. (2023) Evaluations on supervised learning methods in the calibration of seven-hole pressure probes. PLoS ONE 18(1): e0277672. https://doi.org/10.1371/journal.pone.0277672
Editor: Joseph Donlan, PLOS ONE, UNITED KINGDOM
Received: December 6, 2021; Accepted: November 1, 2022; Published: January 23, 2023
Copyright: © 2023 Zhou et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: All relevant data are within the paper and its Supporting information files.
Funding: The Southern Marine Science and Engineering Guangdong Laboratory (Zhanjiang) provided support to SZ and YD [ZJW-2019-02], The Guangdong Branch of National Engineering Research Center for Offshore Windpower provided support to SZ [2019B090904005]. The National Engineering Research Center for Offshore Windpower [HSFD22001], the National Natural Science Foundation of China [52276187], the Fundamental Research Funds for the Central Universities [2021MS029], and North China Electric Power University provided support to GW. Guangdong Haizhuang Offshore Windpower Research Center Company Limited provided support in the form of salaries for SZ, YN, and YH. CSSC Haizhuang Windpower Company Limited provided support in the form of salaries for SZ, YD, YN, YJ, YH, CZ, and ZT. The specific roles of these authors are articulated in the ‘author contributions’ section. CSSC Haizhuang Windpower Company Limited played a role in data collection and analysis, and preparation of the manuscript. The North China Electric Power University played a role in study design and decision to publish.
Competing interests: The authors have read the journal’s policy and have the following competing interests: Guangdong Haizhuang Offshore Windpower Research Center Company Limited provided support in the form of salaries for Shuni Zhou, Yuanxiang Ni, and Yuheng Hao. CSSC Haizhuang Windpower Company Limited provided support in the form of salaries for Shuni Zhou, Yehong Dong, Yuanxiang Ni, Yunhe Jiang, Yuheng Hao, Chuang Zhou, and Zhiyu Tao. There are not any patents, products in development or marketed products associated with this research. The support received from Guangdong Haizhuang Offshore Windpower Research Center Company Limited and/or CSSC Haizhuang Windpower Company Limited didn’t alter our adherence to PLOS ONE policies on sharing data and materials.
Abbreviations: Cpi, pressure coefficient of port pressure; Cq, Cps, pressure coefficient of static pressure; Cα, pressure coefficient of angle of attack; Cβ, pressure coefficient of sideslip angle; Cϕ, pressure coefficient of roll angle; Co, Cpt, pressure coefficient of total pressure; Cθ, pressure coefficient of pitch angle; Pi, port pressure; Ps, static pressure; Pt, total pressure; U, freestream velocity; u, x components of freestream velocity; v, y components of freestream velocity; w, z components of freestream velocity; x, x coordinate; y, y coordinate; z, z coordinate; α, angle of attack for seven-hole probe; β, sideslip angle for seven-hole probe; ϕ, roll angle for seven-hole probe; θ, pitch angle for seven-hole probe; σ, standard error
1. Introduction
Multi-hole probes (MHPs) which can measure the flow angle and speed of airstreams, are widely used in many fields, such as wind-tunnel and field tests for the aviation [1], automobiles [2], compressors [3] and wind turbines [4–7]. Optical techniques of measuring flow velocity, which are non-contact with the measured fluids, have advantages in obtaining the undisturbed flow. However, they have some limitations in the test conditions and cost considerations. For instance, particle image velocimetry (PIV) is difficult to apply to a larger field of view test, such as field test of large-scale wind turbine. And complex optical adjustments are necessary before achieving accurate flow measurement [8–11]. Doppler LiDAR usually has lower spatial resolution for flow structure measurement [12, 13]. In contrast, multi-hole probes in many cases can provide the easiest-to-use and most cost-effective method for three-velocity-component and pressure measurements in laboratory and industrial environments [14–16]. Moreover, compared to optical techniques, MHPs are more robust and reliable options for industrial applications in harsh working conditions, such as airspeed tubes on the airplane, atmospheric measurements with remotely piloted aircraft [17], and inflow measurements on rotational wind turbine blades [5, 7].
The minimum number of holes that are used for three-dimensional flow measurement is four, while probes with five or seven holes are common and probes with more than ten holes can also be found [18]. With increasing number of holes, the range of the angle of attack that can still be measured increases. There are also probes with only one pressure port, which is constantly turned inside the probe [19]. The shape of the probe (conical, hemispherical or faceted) has an effect on the maximum angle of attack as well as on the sensitivity regarding Reynolds number changes [20], due to the different points of flow separation.
Measurement principle of multi-hole pressure probes is based on the correlation between pressure distributions over a solid surface immersed in the flow and flow angles [21–23]. After measuring the pressure distributions on the surface of the probe head, the flow velocity and angle can be calculated by using the correlation matrix. Therefore, the accuracy of probe measurement is dominated by the correlation matrix calculation. Due to manufacturing errors, it is necessary to calibrate each probe, that is, to obtain the correlation matrix of each probe. It is generally obtained through the following steps [24, 25]. Firstly, port pressures, static pressure and total pressure are collected at the flow conditions whose velocity magnitude and directions are known. The calibration data set is generated. Usually, this procedure is conducted in wind tunnel or free jet nozzle with actuator device adjusting probe attitude angles. Secondly, the pressure data are transformed into appropriate non-dimensional directional coefficients, which are sensitive to the flow variables. The definition of directional coefficients from Krause [26] and Gallington [21] has been widely used in five-hole probe and seven-hole probe calibration, respectively. Pisasale and Ahmed [27] proposed a new normalization technique to extend the calibration range of five-hole probe. Paul [28] defined the different non-dimensional pressure coefficients to obtain less computational errors. Finally, the velocity magnitude and direction of an unknown flow field will be calculated from comparisons between the measured port pressures and the calibration data set. The calculation methods include direct-interpolation developed by Treaster and Yocum [29] and Zilliac [22, 23], least squares polynomial curve-fit developed by Gallington [21, 30], look-up error tables developed by Wenger [31], and neural networks developed by Rediniotis [32–35].
Machine learning method, as an independent research topic, has been developed since the 1940s. The most representative work is the M-P neuron model proposed McCulloch and Pitts in 1943 [36]. After that, more and more machine learning models are developed to help solve the problems at data analysis and decision making [37–40]. They are widely used in a growing number of fields today including computer vision, natural language processing, and medical diagnosis [41]. Machine learning algorithms can be divided into supervised learning, unsupervised learning and reinforcement learning. Actually, calibration of seven-hole pressure probes is a classic supervised learning process. The recorded port pressures in known flow velocity magnitude and directions can be split in to the training dataset and test dataset. However, only few literatures [32–35] conducted the probe calibrations with machine learning method, and only artificial neural networks were focused.
In 2012, a Python module, Scikit-learn, integrating a wide range of state-of-the-art machine learning algorithms for medium-scale supervised and unsupervised problems was proposed. This package focuses on bringing machine learning to non-specialists using a general-purpose high-level language [42]. Based on Scikit-learn, the machine learning algorithms become a user-friendly toolkit, especially for parametric study. Supervised problems include regression and classification problems [38]. The calibration of seven-hole pressure probes belongs to regression problem. There are lots of regression algorithms in Scikit-learn, such as Nearest Neighbors, Polynomial regression, Neural network, Support Vector Machines, Decision Trees and Random Forests.
Are the machine learning methods suitable for seven-hole probe calibration? What are the advantages and disadvantages of the above algorithms in the seven-hole pressure probe calibrations? Which algorithm is the most suitable in comparison?
To answer these questions, six typical supervised learning methods were chosen for the seven-hole probe calibration. The contributions of this work mainly lie in two aspects. First, the performance of the supervised learning methods in the probe calibration is systematically evaluated from prediction accuracy, computational time, feature sensitivity, and robustness on the failure of some hole port. Second, compared with the in-house traditional algorithm, Random forests and K-nearest neighbors algorithms have the great advantages in the computational efficiency and the convenience of writing code with the same prediction accuracy. This shows that the supervised learning methods has great potential in real-time online calibration of multi-hole probes.
2 Calibration data set from wind tunnel measurement
The seven-hole probe tested in this work was manufactured by 3-D printing techniques with manufacturing accuracy of 0.01mm. The calibration measurements were carried out in a closed test section of a wind tunnel with an outlet size of 0.5 m by 0.5 m. The turbulence intensity is less than 0.2% when the free stream wind speed is 20 m/s. The probe was installed on a two degrees of freedom calibration apparatus, which includes an electric rotator, a lifting platform and a customized rolling device. The experimental setup on the seven-hole probe calibration is shown in Fig 1. The detailed parameters of the experimental setup can be found in the authors’ publication [25].
The body-coordinate system is defined as Fig 2, in which x-axis is parallel to the axis of probe tip and points to probe tail, z-axis points from Port 7 to Port 1, and y-axis meets the right-hand coordinate system. Two flow angle systems are discussed. One is α-β system, where angle of attack α is the angle between the probe’s x-axis and the projection of the velocity vector in the vertical plane, and sideslip angle β is defined as the angle between the velocity vector and the vertical plane. The other is θ-ϕ system, where pitch angle θ is the absolute angle between the velocity vector and the probe’s x–axis, and roll angle ϕ describes the azimuthal orientation of the velocity vector in the y-z plane, measured positive counterclockwise from the negative z-axis as viewed from the front. The relationships between the two angle system definitions are as Eq (1).
Angular range of calibration data set shown in Fig 3 is from -80° to 80° for pitching angle and from -90° to 90° for rolling angle. The interval of pitching angle is 2° from -20° to 20°, and 5° from ±20° to ±80°. The interval of rolling angle is 5°. Wind speeds of freestream in calibration dataset are 10 m/s, 20m/s and 30 m/s. Therefore, the data set is a 4995 ×10 matrix. The 4995 rows of data include the combination parameter of 3 wind speeds, 45 pitch angles, and 37 roll angles. The 10 columns of data represent wind speed, pitch angle, roll angle, and 7 port pressures, respectively. The table header in the attached data file shows the variables and units corresponding to each column of data. In the machine learning algorithms, the data at 10 m/s and 30m/s are as training set, and the data at 20 m/s and pitching angle from -50° to 50° are as test set.
Port number with highest pressure was recorded as dominant partition at each flow angle. Based on this definition, the dominant partition distribution of the database is obtained and indicated in θ-ϕ coordinate and α-β coordinate in Fig 3, where the black line is theoretical boundary to divide the seven partitions. The partitions are conducive to understanding the distribution of prediction errors.
3 Calibration process
The probe calibration is the process finding out the correlation between port pressure distributions and flow parameters including flow angles and velocity. The flow chart of the seven-hole probe calibration with machine learning methods is shown in Fig 4. Basically, the process includes three parts: feature extraction, model training, and model evaluation. The purpose of feature extraction is to find out the most suitable features for the machine learning model. In the Python module, Scikit-learn, the machine learning model can be called directly, which provides the convenience for the non-specialists, and the calibration codes are simple and compact.
Usually, feature extraction is time-consuming and plays an important role in the prediction accuracy of the machine learning models. With regard to seven-hole probe calibration, machine learning methods are used to find out the correlation between port pressure distributions and flow parameters. In the dataset of probe calibration, the features include θ, ϕ, pt, ps, p1,…,p7, where pt and ps are the total and static pressure representing flow velocity, p1,…,p7 are the pressure of seven ports, respectively. In most of the traditional calibration methods, the data fitting algorithms are developed to find out the correlation between flow angles α, β and normalized pressure values. This is due to normalized pressure values or pressure coefficients have the direct relationship with flow angle α, β based on the aerodynamic theory [25]. However, as a mathematical technology, machine learning can still work without the physical knowledge. Therefore, the effects of feature extraction on the prediction accuracy are focused on in this work to investigate the feature sensitivity of machine learning methods. Based on the coordinate system of flow angles and normalization of pressure values, the relationship of features can be expressed in three types, as shown in Eqs (2–4).
(2)
(3)
(4)
where,
, which is a pressure normalization method proposed in the authors’ previous work [25], and dp = pt − ps. In Eq (2), the features are flow angles α, β and normalized pressure values. In Eq (3), the features are flow angles θ, ϕ and normalized pressure values. In Eq (4), the features are flow angles α, β dynamic pressure and dimensional pressure values. The flow chart of calibration process with the three feature extraction methods, which are highlighted in yellow background, is shown in Fig 5. For feature extraction methods in Eqs (2) and (3), normalized pressure values are irrelevant to the flow velocity or dynamic pressure, so only the flow angles were predicted firstly. Then a secondary prediction on the total and static pressures was conducted with K- nearest neighbors.
Based on the above calibration process, parameter of each machine learning algorithm was adjusted and optimized firstly based on the global errors. The global error is computed by root mean square error (RMSE), which is defined as Eq (5). where m is the number of test set, h(xi) is the predicted result at the input point of xi, and yi is the output truth value.
Then all the machine learning algorithms with optimal parameters were evaluated in comparison from prediction accuracy, computational time, feature sensitivity and robustness on the failure of some hole port.
4 Parameter adjustment based on the RMSEs
The regression algorithms this work focused on include K-nearest neighbors, polynomial regression, multi-layer perceptron, support vector machines, decision trees and random forests. The detailed illustrations on these algorithms can be found in the user guide of Scikit-learn, which will not be repeated in this work [42]. The key parameters of each algorithms were tested firstly. In this section, only features in Eq (2) were selected. The RMSEs are calculated from all the error data in the test set.
4.1 K-nearest neighbors
K-nearest neighbor algorithm is to find a predefined number of training samples closest in distance to the new point, and predict the label from these training samples [43]. The parameters of K-nearest neighbor algorithm in Scikit-learn module include number of neighbors, weights, algorithm, power parameter for the Minkowski metric. In this work, the effects of number of neighbors and weights are studied. Power parameter is chosen to be 2 and algorithm is automatically selected from BallTree, KDTree and brute-force search.
Six cases with different parameters in Table 1 were tested. It is shown from Fig 6 number of neighbors has slight effects on the RMSEs. Overall, the RMSEs of flow angles α, β first increase then decrease with the increase of number of neighbors. A better calibration is obtained when using weight of distance by comparison from case 3 and 6. The best calibration results are RMSEs of 0.6° for α, 0.4° for β and 1% of v. The computation time at all the cases is less than 0.1s, which shows the K-nearest neighbor algorithm has very high efficiency.
4.2 Polynomial regression
Polynomial regression fits a polynomial model with coefficients to minimize the residual sum of squares between the observed targets in the dataset based on the least-square method. When the polynomial degree is 1, the polynomial regression degenerate into linear regression. The cases with polynomial degree from 1 to 7 were tested. It is shown from Fig 7 the cases with polynomial degree of 4, 5, and 6 have acceptable predicted accuracy, while the computation time increases exponentially when polynomial degree is larger than 5. Hence polynomial regression with degree of 5 is most suitable for the calibration.
4.3 Multi-layer perceptron
Multi-layer perceptron is a supervised learning algorithm that learns a non-linear function approximator for regression. Between the input and the output layer, there can be one or more hidden layers and non-linear activation function [44]. The hidden layer sizes and activation are the most important parameters to determine the performance of multi-layer perceptron. The solver is ‘adam’. L2 penalty (regularization term) parameter is 0.0001. The learning rate is constant and the initial learning rate is 0.001.
As shown in Table 2, 12 cases were tested. The results from case 1 to 9 are used to identify the effects of hidden layer sizes, and the results from case 10 to 12 and case 1 are used to identify the effects of activation. It is shown from Fig 8 the first layer size plays a dominant role in the predicted accuracy and computation time. Meanwhile, the predicted accuracy and computation time are not sensitive to the activation functions except for ‘identity’. It is noteworthy that multi-layer perceptron is a time-consuming learning method, which takes time of two orders of magnitude more than K-nearest neighbor algorithm.
4.4 Support vector machines
Support vector regression is extended from support vector classification. In the support vector regression, the inner product in high dimensional space is skillfully solved by introducing kernel function, and the nonlinear regression problem is well realized [45]. The selection of the kernel function parameter and error penalty factor affected the precision of the support vector machine significantly. Kernel coefficient is default as ‘scale’.
Six cases with different parameters in Table 3 were tested. The results from case 1 to 4 are used to identify the effects of kernel function, and the results from case 5 to 6 and case 1 are used to identify the effects of error penalty factor of C. It is shown from Fig 9 that kernel function has significant influence on the predicted accuracy and computation time. In comparisons among cases from 1 to 4, kernel of ‘rbf’ has best learning performance for seven-hole probe calibration. Error penalty factor of C has little effects on the accuracy but affects the computation time significantly.
4.5 Decision trees and random forests
Decision trees are a kind of tree structure, in which each internal node represents a decision on an attribute, each branch represents the output of a judgment result, and finally each leaf node represents a classification or regression result [46]. Maximum depth is the distance between the node and the root of decision tree. When the depth reaches the specified number, the splitting will be stopped. The criterion is chosen to be ‘squared_error’. The minimum number of samples required to split an internal node is 2. The minimum number of samples required to be at a leaf node is 1.
Random forests can achieve a reduced variance by combining diverse decision trees [47]. The number of estimators represents the number of trees in the forest, which is a main parameter in random forests. The maximum depth of the tree is chosen to be ‘none’, which means nodes are expanded until all leaves are pure or until all leaves contain less than the minimum number of samples required to split an internal node.
Nine cases with different parameters in Table 4 were tested. It is shown from Fig 10 random forests achieve better calibration accuracy but take slightly more computation time than decision trees overall. In the decision trees, the calibration accuracy stops getting significantly better with max depth of decision trees larger than 20. The computation time is insensitive to the max depth. In the random forests, with the increase of number of estimators, the RMSEs of flow angles decrease, and the computation time increases in second-level.
5 Evaluations on machine learning methods
Based on the above parametric studies, the results using the six machine learning algorithms with the corresponding optimal parameters in Table 5 were compared with each other. They were systematically evaluated from four aspects: prediction accuracy, computational time, feature sensitivity, and robustness on the failure of some hole port.
5.1 Prediction accuracy and computational time
RMSEs and computation time with the six machine learning algorithms and the traditional method are shown in Fig 11. The prediction accuracy with the traditional method has RMSEs of 0.86° for α, 0.72° for β and 0.97% of v, which can be found in Wu’s previous work [25]. By comparison, the RFs has the highest prediction accuracy for both the flow angles and velocity. The RMSEs of calibration with RFs are 0.57° for α, 0.40° for β and 1.02% of v. The DTs has the lowest prediction accuracy for the flow angles, with RMSEs of 1.41° for α, and 0.98° for β. It is worth noting that RMSE of β are lower than α for all the algorithms. This is due to there are more pressure ports directly relating to β than α. For example, β are mainly dominated by pressure values of port 2, 3, 5 6 and 7, while α are mainly dominated by pressure values of port 1, 4 and 7, as shown in Fig 2. Therefore, for each algorithm, the more data relating to the feature are, the higher the prediction accuracy is.
Among the six algorithms, only MLP and SVM are time-consuming, which take 63s and 17s respectively. The other algorithms have high efficiency with computation time less than 1s, while the traditional method takes about 4s.
RMSE represents a global prediction error, which can not show the distribution information of the prediction error. Quantile statistic is another form of probability statistic, which provides the probability information of the prediction error less than some value. In this work, quantile statistics were conducted on the absolute value of the prediction errors to indicate the extent of deviation from the true value. As shown in Fig 12, five quantile points including 10%, 25%, 50%, 75%, and 90% are focused. It can be observed that the results of the DTs algorithm are special, of which the 75% quantile points of prediction errors on flow angles α and β are 0°. However, RMSEs of the predicted flow angles are the largest, which indicates that the DTs algorithm is easy to cause the over fitting. For the prediction on flow velocity, it is noticeable that KNN algorithm was used based on the predicted results of flow angles, which is illustrated in Fig 5. Hence, the quantile distribution of prediction error on flow velocity is different from it of prediction error on flow angles for the DTs algorithm. Overall, except for DTs algorithm, RMSEs are close to 75% quantile points, and the other quantile points show the same character with RMSEs for different algorithm. The RFs has the best prediction performance, of which 90% quantile of prediction errors on α, β and v are less than 0.8° and 0.6° and 2%. The prediction accuracy is acceptable for the flow measurement in wind tunnel or field tests.
(a) angle of attack, α, (b) sideslip angle, β, (c) flow velocity, v.
The DTs algorithm has maximum RMSEs on the predicted flow angles, while the RFs algorithm has minimum RMSEs. The detailed error distributions of these two algorithms are shown in Figs 13 and 14, respectively. The error distributions of the traditional method are shown in Fig 15 as a comparison group. By comparison, there are more randomly distributed data points with prediction errors larger than 5° for the DTs algorithm, which results in the larger RMSEs.
5.2 Feature sensitivity
In the above studies, feature in Eq (2) based on the aerodynamic knowledge was chosen, and the acceptable predicted accuracy was obtained. However, as a mathematical technology, machine learning can still work without the physical knowledge. Therefore, the effects of feature extraction on the prediction accuracy are focused on in this section to investigate the feature sensitivity.
5.2.1. Machine learning with feature in Eq (3).
Based on the feature in Eq (3), machine learning methods are used to fit the function between flow angles (θ, ϕ) and the normalized pressure coefficients. The RMSEs of flow angles (θ, ϕ) and velocity are shown in Fig 16. Obviously, a large and unacceptable prediction error on rolling angle is obtained for all the machine learning algorithms. It is shown from quantile statistics in Fig 17 the RMSEs of pitching angle and flow velocity are close to or even larger than 90% quantile points, which indicates that the errors at 10% data points are huge and abnormal. In comparisons, the SVM algorithm has maximum RMSEs on the predicted flow angles, while the RFs algorithm has minimum RMSEs.
(a) pitching angle, θ, (b) rolling angle, ϕ, (c) flow velocity, v.
Figs 18 and 19 present the error distributions in θ ~ϕ coordinates. Firstly, there are more points with larger errors when using SVM algorithm. Secondly, larger errors of pitching angle appear at the high rolling angle points, while larger errors of rolling angle appear at the points with pitching angle of 0°. This is due to that port pressure values are directly sensitive to α and β, rather than θ and ϕ. The relationships between θ ~ ϕ and α ~ β are shown in Fig 20. It can be observed at the high rolling angle, for example 90° or -90°, α will be irrelative with θ, which means different pressure values will be corresponding to one pitching angle. This leads to larger error of predicted pitching angle at the high rolling angle points. Similarly, at pitching angle of 0°, α and β will be both irrelative with ϕ, which leads to larger error of predicted rolling angle, as shown in Figs 18 and 19.
(a) α ~ θ (b) β ~ θ, ϕ.
5.2.2. Machine learning with feature in Eq (4).
The basic principle of multi-hole probe calibration is to find out the relationship between flow parameters and port pressure values. The function in Eq (4) is the initial and direct relation without any preprocessing on the pressure data. Based on the feature in Eq (4), machine learning methods are used to fit the function between flow parameters (α, β, dp) and the pressure coefficients.
The RMSEs of flow angles (α, β) and velocity v are shown in Fig 21. Obviously, the predicted accuracy of flow velocity is higher than it of the flow angles for most of the machine learning method. This is due to the port pressure values are proportional to the square of flow velocity, while they vary with flow angles in a linear manner approximately. When the flow angles and velocity are predicted simultaneously, the flow velocity will be a dominant feature and weaken the effects of flow angles on the pressure values.
It can be observed from the quantile statistics in Fig 22 that Poly, MLP and SVM algorithms have significant advantages in predicting the flow velocity, but they are weak in predicting the flow angles. This indicates that Poly, MLP and SVM algorithms are suitable to find out the main feature, while KNN, DTs and RFs algorithms are suitable to solve the prediction problem with high density training samples. In the training dataset, only two flow velocities, 10 m/s and 30 m/s, are included, while high density flow angles are included.
(a) angle of attack, α (b) sideslip angle, β (c) flow velocity, v.
Figs 23 and 24 present the error distributions with SVM and RFs algorithm. Only the predicted flow velocity by SVM is satisfactory.
In general, the predicting accuracy is very sensitive to the features from above studies. Without appropriate features, the machine learning methods may be invalid. Hence, feature extraction plays an important role in the machine learning methods. In the practice process, the combination of physical knowledge and data mining is the best solution for high accuracy predictions.
5.3 Robustness on the failure of some hole port
In hostile environments, such as sandstorms in wind farm, some port may be malfunctioning. For a seven-hole probe that port number is defined in Fig 1, Port 4 is symmetric with Port 1, and Port 2 is symmetric with Port 3, 5, and 6. Therefore, the cases that Port 1, 2 and 7 is invalid were studied, respectively.
The RMSEs with the six algorithms and the traditional method are shown in Fig 25. It can be observed from Fig 25(a) and 25(b) that when port 7 is invalid, the prediction accuracy of flow velocity becomes worsen markedly for all the algorithms in comparison with the normal case. In the normal case in Fig 11, the RMSEs of predicted flow velocity are close to 1%, while they are close to 10% when port 7 is invalid. By comparison, the traditional method has the highest predicted accuracy when port 7 is invalid. Meanwhile, the RMSEs of predicted flow angles are about two times compared to the normal case, except the Poly algorithm. Fig 26 presents the quantile statistics on the prediction errors when port 7 is invalid. It can be observed that RMSEs of flow angles are much higher than the 90% quantile for the Poly algorithm, which indicates there are some abnormal predicted results. It is shown from the error distributions in Fig 27 that compared with the traditional method, the predicted errors at the region of flow angles close to 0° are huge and abnormal. It is worth noting that the color-coded values are from -20 to 20 in Fig 27, which is different from the other error distribution figures.
a) port 7 is invalid (zoom out), b) port 7 is invalid, c) port 1 is invalid, d) port 2 is invalid.
(a) angle of attack, α, (b) sideslip angle, β, (c) flow velocity, v.
a) Poly algorithm, b) traditional method.
It can be observed from Fig 25(c) and 25(d) that when port 1 or 2 is invalid, the prediction accuracy of flow velocity and angles are slightly affected except Poly and DTs algorithms. It is worth noting that the prediction accuracy of the traditional method deteriorates significantly when port 1 or 2 is invalid. This indicates that the machine learning method is more robust than the traditional method.
It can be observed from Figs 11 and 25 that RFs and KNN algorithm presents the best predicted performance. The error distributions with RFs and KNN algorithms on the three cases when some port is invalid and all ports are valid are shown in Figs 28 and 29 to study the effects of the missing port data. Numbers from 1 to 7 in different sectors of Figs 28 and 29 represents that the pressure of this port number are maximum in this sector. Black lines in Figs 28 and 29 are the theoretical boundary of different sectors. When Port 7 is invalid, errors in all the sector are increased in comparison with the normal case. When Port 1 is invalid, errors of flow angles only in sector 1 are increased, and errors of flow velocity in all the sector are increased. When Port 2 is invalid, errors of flow angles only in sector 2 are increased, and errors of flow velocity in all the sector are increased. These results from Figs 25, 28 and 29 indicate that the missing of port 7 data will increase the predicted errors of all the flow parameters at all the sectors, while the missing of the other port data has slightly effects on the predicted errors of flow angles at the region except the corresponding sector. Overall, the prediction accuracy is acceptable for the probe with one invalid port.
a) port 7 is invalid, b) port 1 is invalid, c) port 2 is invalid, d) normal case: all ports are valid.
a) port 7 is invalid, b) port 1 is invalid, c) port 2 is invalid, d) normal case: all ports are valid.
5.4 Comprehensive evaluation on the machine learning methods
The above studies have discussed the prediction performance of the six machine learning algorithms in detail from four aspects: prediction accuracy, computational time, feature sensitivity, and robustness on the failure of some hole port. Based on the findings, a min-max normalization study was conducted to present a quantitative and comprehensive evaluation on the machine learning methods. The min-max normalization is defined as Eq (6), where x is the value of RMSEs or the computational time, min is 0, and max is the maximum value of all the RMSEs or the computational time.
Fig 30 presents the comprehensive prediction performance of the six algorithms. For the prediction accuracy, all the algorithm can obtain the satisfactory predicted results when choosing the appropriate parameters and features. In contrast, DTs typically exhibit high variance and tend to overfit. For the computational efficiency, only MLP and SVM are time-consuming, which take 63s and 17s respectively. The other algorithms have high efficiency with computation time less than 1s. The machine learning methods have the great advantage in the computational efficiency in comparison with the in-house traditional method. The detailed description can be found in the publication [25]. For the feature insensitive, all the algorithms are very sensitive to the features. Using the features based on the physical knowledge can obtain a high accuracy predicted results. In contrast, Poly, MLP and SVM algorithms are suitable to find out the most important feature without physical knowledge, while KNN, DTs and RFs algorithms are suitable to solve the prediction problem with high density training samples. For the robustness on the failure of some hole port, the missing of central port data will reduce the prediction accuracy of flow velocity from 1% to about 10% with all the algorithms, and the prediction accuracy of flow angles can not be accepted only with Poly algorithm. The missing of surrounding port data has slight effects on the prediction accuracy, and all the algorithms can obtain a high and acceptable prediction accuracy.
Overall, RFs and KNN algorithms have the better comprehensive prediction performance. Meanwhile, they have the great advantages in the computational efficiency and the convenience of writing code, compared with the in-house traditional algorithm that takes about 100 times as long.
6 Application in measuring angle of attack of a wind turbine blade in field
Based on a 100kW wind turbine, an aerodynamic measurement system was constructed in the previous research work [6]. In the field test, the flow velocity vector was measured by a self-developed seven-hole probe shown in Fig 31, and the aerodynamic force was obtained with the surface pressure measurement method. It is shown from Fig 30 that the KNN algorithm has the best comprehensive performance. Therefore, in this work, the KNN algorithm is used to calculate the angle of attack of the wind turbine blade in the standstill case and normal operating case. During the data processing, it is found that KNN algorithm is about 100 times faster than the in-house traditional algorithm. Meanwhile, the results from between KNN algorithm and in-house traditional algorithm have good agreement with each other, as shown in Fig 32. In the standstill case, the rotating speed is close to zero, so angle of attack of the blade is dominated by the yawing angle and azimuth angle. Also, the variation range of angle of attack is from about -15° to 15°, which is very large relative to the change of aerodynamic characteristics. In the normal operating case, the rotating speed is higher to make the blade work near the optimal angle of attack. The above example results of field measurements on the angle of attack show that the machine learning method can be more effectively applied to the calibration and measurement of seven-hole probe.
(a) standstill case (b) normal operating case.
7 Conclusions
Machine learning methods have become a popular, convenient and efficient computing tool at present, in which supervised learning mainly solves the data prediction problem. The calibration process of multi-hole probes is exactly the process of using calibration data set to predict the tested results, so machine learning methods are supposed to play a certain advantage role in the calibration of multi-hole probes. In some literatures, specific machine learning method has been developed to apply in the probe calibration, but there is the lack of comprehensive evaluation of machine learning methods. In this work, six typical supervised learning methods in scikit-learn library are selected for parameter adjustment at first. Based on the optimal parameters, a comprehensive evaluation is made from four aspects: prediction accuracy, prediction efficiency, feature sensitivity and robustness on the failure of some hole port. The main conclusions are summarized as follows:
- Random forests and K-nearest neighbors algorithms have the better comprehensive prediction performance among the six algorithms.
- Compared with the in-house traditional algorithm, the machine learning algorithms have the great advantages in the computational efficiency and the convenience of writing code.
- Multi-layer perceptron and support vector machines are the most time-consuming algorithms among the six algorithms.
- The prediction accuracy of all the algorithms are very sensitive to the features. Using the features based on the physical knowledge can obtain a high accuracy predicted results.
- The missing of central port data will reduce the prediction accuracy of flow velocity from 1% to about 10% for all the algorithms, and the prediction accuracy of flow angles are abnormal and unacceptable only using polynomial algorithm. The missing of surrounding port data has slight effects on the prediction accuracy for all the algorithms.
- Due to the ports directly related to sideslip angle are more than the ports related to angle of attack, the prediction accuracy of sideslip angle is always higher than it of angle of attack for all the algorithm.
- KNN algorithm is applied to field measurements on the angle of attack of a wind turbine blades. It is found that KNN algorithm is about 100 times faster than the in-house traditional algorithm during the data processing, and the results from the two algorithm have good agreements.
The results in this work show that supervised learning methods has great potential in real-time online calibration of multi-hole probes. Nevertheless, some aspects need to be enhanced in further studies. First, the supervised learning methods should be integrated into the online calibration of probes to test real-time. Second, this work is only applied to the 7-hole probe, and can be applied to the n-hole probe in the future. Third, this work mainly discusses the calibration accuracy of pitch angle between -50° and 50°, where port pressure is insensitive to Reynolds number. In the future, the application in larger angle calibration can be analyzed and discussed.
References
- 1. Crawford T.L., Dobosy R.J., A sensitive fast-response probe to measure turbulence and heat flux from any airplane, Boundary Layer Meteorology 59(3) (1992) 257–278.
- 2. Zimmer G., Alam F., Watkins S., Peric C., Comparison of a High Blockage Wind Tunnel, an Open Jet Wind Tunnel and On-road Testing with Respect to External Surface Pressures, Sae World Congress, 2001.
- 3. Han F., Wang Z., Mao Y., Zhang Y., Tan J., Large-Scale Flow Measurements and Analysis for Radial Inlets of Industrial Centrifugal Compressors Based on Multi-hole Probe System, International Journal of Precision Engineering and Manufacturing 20(1) (2019) 79–92.
- 4. Simms D., Hand M.M., Fingersh L., Jager D., Cotrell J., Schreck S., et al, Unsteady aerodynamics experiment phase V: test configuration and available data campaigns, National Renewable Energy Laboratory, 2001.
- 5.
H.A. Madsen, C. Bak, U. Paulsen, M. Gaunaa, N. Sørensen, P. Fuglsang, et al, The DAN-AERO MW Experiments, Conference Proceedings—48th AIAA Aerospace Sciences Meeting Including The New Horizons Forum and Aerospace Exposition, 2010.
- 6. Wu G., Zhang L., Yang K., Development and Validation of Aerodynamic Measurement on a Horizontal Axis Wind Turbine in the Field, Applied Sciences 9(3) (2019) 482.
- 7. Wu G., Zhang C., Cai C., Yang K., Shi K., Uncertainty prediction on the angle of attack of wind turbine blades based on the field measurements, Energy 200 (2020) 117515.
- 8. Hong J., Toloui M., Chamorro L.P., Guala M., Howard K., Riley S., et al, Natural snowfall reveals large-scale flow structures in the wake of a 2.5-MW wind turbine, Nature Communications 5(1) (2014) 4216. pmid:24960397
- 9. Wernert P., Koerber G., Wietrich F., Raffel M., Kompenhans J., Demonstration by PIV of the non-reproducibility of the flow field around an airfoil pitching under deep dynamic stall conditions and consequences thereof, Aerospace Science and Technology 1(2) (1997) 125–135.
- 10. Ferreira C.S., van Kuik G., van Bussel G., Scarano F., Visualization by PIV of dynamic stall on a vertical axis wind turbine, Experiments in Fluids 46(1) (2009) 97–108.
- 11. Tian W., Ozbay A., Wang X.D., Hu H., Experimental investigation on the wake interference among wind turbines sited in atmospheric boundary layer winds, Acta Mechanica Sinica 33(4) (2017) 12.
- 12. Giyanani A., Savenije F.J., van Bussel G.J.W., Correlation between Lidar measured wind speeds and aerodynamic loading, Journal of Physics: Conference Series 1037 (2018) 052038.
- 13.
T. Mikkelsen, K.H. Hansen, N. Angelou, M. Sjöholm, M. Harris, P. Hadley, et al, Lidar wind speed measurements from a rotating spinner, 2010 European Wind Energy Conference and Exhibition, 2010.
- 14.
J. Schepers, A. Brand, A. Bruining, J. Graham, M. Hand, D. Infield, H. Madsen, R. Paynter, D. Simms, Final report of IEA An-nex XIV: Field Rotor Aero-dynamics, ECN-C—97-027, 1997.
- 15.
H.A. Madsen, A. Fischer, Wind Shear and Turbulence Characteristics from Inflow Measurements on the Rotating Blade of a Wind Turbine Rotor, Scientific Proceedings of the 2009 European Wind Energy Conference and Exhibition, Marseille, France, Mar, 2009, pp. 16–19.
- 16.
H. Aagaard Madsen, C. Bak, U. Schmidt Paulsen, M. Gaunaa, P. Fuglsang, J. Romblad, et al. The DAN-AERO MW experiments: final report, Danmarks Tekniske Universitet, Risø Nationallaboratoriet for Bæredygtig Energi, 2010.
- 17. Wildmann N., Ravi S., Bange J., Towards higher accuracy and better frequency response with standard multi-hole probes in turbulence measurement with remotely piloted aircraft (RPA), Atmospheric Measurement Techniques 7(4) (2013) 1027–1041.
- 18. Shaw Ward S., Titchmarsh A., Birch D.M., Calibration and use of n-hole velocity probes, AIAA Journal (2015).
- 19.
J. Schlienger, A. Pfau, A.I. Kalfas, A.I. Kalfas, Single Pressure Transducer Probe for 3D Flow Measurements, 16th Symposium on Measuring Techniques in Transonic and Supersonic Flow in Cascades and Turbomachines, Cambridge, UK, 2002.
- 20.
D. Telionis, Y. Yang, O.K. Rediniotis, Recent developments in multi-hole probe (mhp) technology, 20th International Conference of Mechanical Engineering, Gramado, RS, Brazil, 2009.
- 21. Gallington R.W., Measurement of very large flow angles with non-nulling seven-hole probes, (1981).
- 22. Zilliac G.G., Calibration of seven-hole pressure probes for use in fluid flows with large angularity, (1989).
- 23. Zilliac G., Modelling, calibration, and error analysis of seven-hole pressure probes, Experiments in Fluids 14(1) (1993) 104–120.
- 24. Benay R., A global method of data reduction applied to seven-hole probes, Experiments in fluids 54(6) (2013) 1535.
- 25. Wu G., Guo X., Yang K., Yang H., A robust calibration method for seven-hole pressure probes, Experiments in Fluids 60(8) (2019) 120.
- 26.
L.N. Krause, T.J. Dudzinski, Flow Direction Measurement with Fixed Position Probes in Subsonic Flow over a Range of Reynolds Number, NASA TMX-52576, May (1969).
- 27. Pisasale A., Ahmed N., A novel method for extending the calibration range of five-hole probe for highly three-dimensional flows, Flow Measurement and Instrumentation 13(1) (2002) 23–30.
- 28. Paul A.R., Upadhyay R.R., Jain A., A novel calibration algorithm for five-hole pressure probe, International Journal of Engineering, Science and Technology 3(2) (2011).
- 29. Treaster A.L., Yocum A.M., The calibration and application of five-hole probes, PENNSYLVANIA STATE UNIV UNIVERSITY PARK APPLIED RESEARCH LAB, 1978.
- 30. Gerner A., Maurer C., Gallington R., Non-nulling seven-hole probes for high angle flow measurement, Experiments in Fluids 2(2) (1984) 95–103.
- 31. Wenger C.W., Devenport W.J., Seven-hole pressure probe calibration method utilizing look-up error tables, AIAA journal 37(6) (1999) 675–679.
- 32. Rediniotis O., Chrysanthakopoulos G., Application of neural networks and fuzzy logic to the calibration of the seven-hole probe, Journal of fluids engineering 120(1) (1998) 95–101.
- 33. Rediniotis O.K., Vijayagopal R., Miniature multihole pressure probes and their neural-network-based calibration, AIAA journal 37(6) (1999) 666–674.
- 34. Jeyan J.V.M.L., Kumar M.S., Performance evaluation for multi-hole probe with the aid of artificial neural network, Journal of Theoretical & Applied Information Technology 65(3) (2014).
- 35. Nikpey Somehsaraei H., Hölle M., Hönen H., Assadi M., A novel approach based on artificial neural network for calibration of multi-hole pressure probes, Flow Measurement and Instrumentation 73 (2020) 101739.
- 36. McCulloch W.S., Pitts W., A logical calculus of the ideas immanent in nervous activity, The bulletin of mathematical biophysics 5(4) (1943) 115–133.
- 37. Cover T., Hart P., Nearest neighbor pattern classification, IEEE Transactions on Information Theory 13(1) (2003) 21–27.
- 38. Gerstner W., Chapter 1 Supervised Learning for Neural Networks: A Tutorial with JAVA-exercises, (2000).
- 39. Mitchell T.M., Generative and discriminative classifiers: Naive Bayes and Logistic Regression, (2010).
- 40. Zhou Z.H., Ensemble Methods: Foundations and Algorithms, Ensemble Methods: Foundations and Algorithms 2012.
- 41. Ji S., Pan S., Li X., Cambria E., Long G., Huang Z., Suicidal Ideation Detection: A Review of Machine Learning Methods and Applications, (2019).
- 42. Swami A., Jain R., Scikit-learn: Machine Learning in Python, Journal of Machine Learning Research 12(10) (2012) 2825–2830.
- 43. Bentley J.L., Multidimensional binary search trees used for associative searching, Communications of the ACM 18(9) (1975) 509–517.
- 44. Bengio Y., Glorot X., Understanding the difficulty of training deep feed forward neural networks, Proc. AISTATS, 2010 (2010).
- 45. Platt J.C., Probabilistic Outputs for Support Vector Machines and Comparisons to Regularized Likelihood Methods, Advances in Large Margin Classifiers (2000).
- 46. Breiman L.I., Friedman J.H., Olshen R.A., Stone C.J., Classification and Regression Trees. Wadsworth, Biometrics 40(3) (1984) 358.
- 47.
L. Breiman, Random Forests, Machine Learning (2001).