Skip to main content
  • Loading metrics

Machine learning approach for automatic recognition of tomato-pollinating bees based on their buzzing-sounds

  • Alison Pereira Ribeiro,

    Roles Formal analysis, Investigation, Methodology, Software, Validation, Visualization, Writing – original draft

    Affiliation Instituto de Informática, Universidade Federal de Goiás, Goiánia, Goiás, Brazil

  • Nádia Felix Felipe da Silva,

    Roles Conceptualization, Formal analysis, Methodology, Supervision, Visualization, Writing – original draft, Writing – review & editing

    Affiliation Instituto de Informática, Universidade Federal de Goiás, Goiánia, Goiás, Brazil

  • Fernanda Neiva Mesquita,

    Roles Conceptualization, Investigation, Writing – original draft, Writing – review & editing

    Affiliation Instituto de Informática, Universidade Federal de Goiás, Goiánia, Goiás, Brazil

  • Priscila de Cássia Souza Araújo,

    Roles Data curation, Investigation, Resources, Validation, Writing – review & editing

    Affiliation Programa de Pós-graduação em Zoologia, Universidade Federal de Minas Gerais, Belo Horizonte, Minas Gerais, Brazil

  • Thierson Couto Rosa,

    Roles Conceptualization, Investigation, Methodology, Supervision, Writing – review & editing

    Affiliation Instituto de Informática, Universidade Federal de Goiás, Goiánia, Goiás, Brazil

  • José Neiva Mesquita-Neto

    Roles Conceptualization, Funding acquisition, Investigation, Methodology, Project administration, Resources, Supervision, Writing – original draft, Writing – review & editing

    Affiliation Centro de Investigación en Estudios Avanzados del Maule, Vicerrectoría de Investigación y Postgrado, Universidad Católica del Maule, Talca, Chile


Bee-mediated pollination greatly increases the size and weight of tomato fruits. Therefore, distinguishing between the local set of bees–those that are efficient pollinators–is essential to improve the economic returns for farmers. To achieve this, it is important to know the identity of the visiting bees. Nevertheless, the traditional taxonomic identification of bees is not an easy task, requiring the participation of experts and the use of specialized equipment. Due to these limitations, the development and implementation of new technologies for the automatic recognition of bees become relevant. Hence, we aim to verify the capacity of Machine Learning (ML) algorithms in recognizing the taxonomic identity of visiting bees to tomato flowers based on the characteristics of their buzzing sounds. We compared the performance of the ML algorithms combined with the Mel Frequency Cepstral Coefficients (MFCC) and with classifications based solely on the fundamental frequency, leading to a direct comparison between the two approaches. In fact, some classifiers powered by the MFCC–especially the SVM–achieved better performance compared to the randomized and sound frequency-based trials. Moreover, the buzzing sounds produced during sonication were more relevant for the taxonomic recognition of bee species than analysis based on flight sounds alone. On the other hand, the ML classifiers performed better in recognizing bees genera based on flight sounds. Despite that, the maximum accuracy obtained here (73.39% by SVM) is still low compared to ML standards. Further studies analyzing larger recording samples, and applying unsupervised learning systems may yield better classification performance. Therefore, ML techniques could be used to automate the taxonomic recognition of flower-visiting bees of the cultivated tomato and other buzz-pollinated crops. This would be an interesting option for farmers and other professionals who have no experience in bee taxonomy but are interested in improving crop yields by increasing pollination.

Author summary

Bees are the most important pollinators of cultivated tomatoes. We also know that the distinct species of bees have different performances as pollinators, and these performances are directly related to the size and weight of the fruits. Moreover, the characteristics of the buzzing sounds tend to vary between the bee species. However, the buzzing sounds are complex and can widely vary over time, making the analysis of this data difficult using the usual statistical methods in Ecology. In the face of this problem, we proposed to automatically recognize pollinating bees of tomato flowers based on their buzzing sounds using Machine Learning (ML) tools. In fact, we found that the ML algorithms are capable of recognizing bees just based on their buzzing sounds. This could lead to automating the recognition of flower-visiting bees of the cultivated tomato, which would be a nice option for farmers and other professionals who have no experience in bee taxonomy but are interested in improving crop yields. On the other hand, this encourages the farmer to adopt sustainable agricultural practices for the conservation of native tomato pollinators. To achieve this goal, the next step is to develop applications compatible with smartphones capable of recognizing bees by their buzzing sounds.


Tomato (Solanum lycopersicum L.) is the second most important vegetable crop in the world [1]. Global tomato production was around 180, 766 tonnes in 2019 and the production grew 14.1% over the past decade [1]. Despite cultivated tomato being self-pollinated, bee-mediated pollination greatly enhances the quantity and quality of the fruits (greater size and weight), also, contributing to the increase of overall crop productivity [29]. The tomato pollinator-dependency is so evident that when it is cultivated in greenhouses, it typically needs to be done by bumblebees that are reared particularly for this purpose, generating an extra cost to the growers [10]. For instance, the pollination service in the tomato crop is estimated at about US$ 992 million/year in Brazil [5].

The morphological specialization of tomato flowers, characterized by the presence of poricidal anthers, restricts the exit of the pollen to a tiny opening sited at the apex of the anther [9, 11, 12]. During visits to these flowers, pollen-collecting bees firmly grasp the anthers and quickly contract their flight muscles, but without moving the wings, producing an audible sound [13, 14]. The resulting vibrations are transferred to the anthers, which shake the pollen inside them, stimulating it to leave by the pores, a phenomenon known as floral sonication or buzz-pollination [12, 14, 15].

Although sonicating bees are among the best pollinators of tomatoes, bees belonging to different taxonomic groups tend to differ in their performance as pollinators [4, 69, 16]. In this context, the taxonomic recognition is an indispensable requirement to distinguish among the local set of flower-visiting bees those that are the most efficient pollinators.

However, the huge number of bee species and other insects is a challenge for taxonomists. It is estimated that there are about 20, 000 bee species worldwide [17], and 58% of them, about 11, 600 species of 74 genera, are able to vibrate flowers to extract pollen [18]. Furthermore, the taxonomic identification normally depends on visible morphological characteristics of tiny size, which requires the active participation of experts in the decision-making process, since, for an untrained eye, the species are very similar [19]. Besides that, the decreasing number of taxonomists seriously affects the efficiency of species recognition [20]. This is especially evident in regions where the bee diversity is poorly sampled and underestimated like in Africa, Asia, and some tropical regions [17]. Therefore, the development and implementation of new technologies that also fulfill taxonomic requirements are needed [2022].

Due to the limitations of the traditional taxonomy, the automatic classification based on artificial intelligence algorithms has been applied for the identification of plants and animals during the last decades. The automatic classifications based on the recognition of images and/or sounds has been implemented [2326]. However, recognition based on images is difficult due to complications derived from the orientation of the object, the image quality, the condition of the light, and/or the image background [20]. On the other hand, the sound is relatively easy to acquire and can, in principle, be picked up remotely and continuously [19].

The classification based on Machine Learning (ML) algorithms have demonstrated high efficiency and accuracy for the recognition of animal vocalizations, such as birds and frogs [2729]. The ML algorithms powered by a method for sound feature extraction (e.g., Mel-Frequency Cepstral Coefficients, Hilbert–Huang Transform) have been also employed for beehives monitoring using audio as one of the inputs (see S1 Table for a detailed description; [30] and references therein). These studies sought to differentiate the bee buzzing sounds from other sounds (cricket chirping and ambient noise) [31], recognize the presence of the queen in a beehive and detect an orphaned colony [32, 33], or identify the circadian rhythm of a honeybee colony [34]. However, only three studies address the problem of automatic bee species classification, and these deal with twelve, two and four classes respectively [19, 35, 36]. Random Forest, Support Vector Machines, and Logistic Regression are the most applied classifiers, and Mel Frequency Cepstral Coefficients (MFCC) is the most used feature extraction strategy (see S1 Table). Although preliminary restricted to a few bees taxa, these studies indicated that the ML algorithms could generate classifiers able to quickly and accurately recognize bee identity.

In this context, the automatic recognition of bees would be especially relevant for the pollination of commercial tomatoes, which need the local native pollinators to enhance the crop productivity [4, 68]. Moreover, the professionals typically involved with the management of tomato crops (e.g. farmers, agronomists) have no experience in bee taxonomy. Based on this, we aim to verify the capacity of ML algorithms to automatically recognize the taxonomic identity of visiting bees of tomato flowers based on the characteristics of their buzzing-sounds. In addition, we compared the performance of the ML algorithms and MFCC feature extraction method with classifications based on fundamental frequency realized on the same data set, thus, providing a direct comparison between the two approaches. Due to the high efficiency and accuracy demonstrated by ML tools powered by MFCC features in automatic sound classification, we expected that the join of these two methods would obtain a greater performance compared to classifications based solely on fundamental frequency (hypothesis 1). Additionally, we related the performance of ML algorithms in recognizing bees taxa from buzzing-sounds produced during two different behavioral contexts: flight and sonication. While the flight sound generally has few oscillations and is roughly time-independent [37, 38], the sonication produces more complex sounds that may be associated with intrinsic characteristics of the bee [3942]. Based on this, we predicted that the buzzing-sounds produced by floral vibrations are more relevant for recognizing bees taxa (hypothesis 2). Therefore, the main contributions of our work are: (1) To evidence the ML classifiers performance in recognizing flower visiting-bees species in relation to the statistical approach used by [43]; (2) To classify a higher diversity of taxonomic groups than previous studies, grouping the largest number of genera and families of bees; (3) To provide evidence of which buzzing sounds is more relevant for taxonomic recognition of the bees by the ML classifiers; (4) To indicate the best ML algorithms that could lead to automatize the taxonomic recognition of flower visiting bees of tomato crops.

Materials and methods

Buzzing sounds acquisition

The acoustic recording of buzzes was carried out using tomato plants (Solanum lycopersicum ac. BGH 7488) grown at the experimental fields of the Federal University of Viçosa (Minas Gerais, Brazil). A portable hand recorder (SongMeter SM2, Wildlife Acoustics, USA) was used to record the buzzing sounds of bees visiting the tomato flowers. To record the buzzing sounds, a researcher constantly walking through the rows of tomatoes handing-held the recorder while searching the flower-visiting-bees. When the researcher spotted a visiting bee, she carefully approached holding the recorder microphone as close as about 10 cm from the flower being visited. The microphone was constantly pointed toward the bee body, and whenever possible to the dorsum. Sound recordings were obtained for 15 bee species from eight genera and two families (See Table 1). Just after leaving the flower, the bees were captured with an entomological net and placed in glass vials with ethyl acetate, for taxonomic identification. When the researcher was not able to capture a bee individual, the corresponding audio sample was not considered for our analysis. We adopted this procedure for ensuring a correct bee taxonomic recognition, then, the number of bee individuals sampled corresponds to the number of audio files (see Table 1). All bee individuals sampled were identified at the species level by an expert in bee taxonomy.

Table 1. Taxonomic diversity of sonicating bees recorded visiting tomato flowers and the corresponding higher taxonomic group (according to [44]).

(N recordings) denotes the number of individuals with buzzing-sounds recorded; (AF) average frequency ± standard deviation; (Flight segments) the total number of flight segments per species; (Sonication segments) the total number of sonication segments per species.

Acoustic pre-processing

The original sound recordings (.wav files) were manually classified into two behavioral contexts: (i) Sonication; (ii) flight, see Fig 1. We categorized as sonication all the segments of buzzing-sounds produced by bees vibrating tomato flowers and as flight the sounds produced by the flying displacement of the bees between tomato flowers, as illustrated in Fig 2. As a result, the set of 59 recordings generated 321 segments, 218 of sonication and 103 of flight (see Table 1). The flight and sonication buzz present pronounced differences in acoustic characteristics, so they can be easily distinguished from the recordings afterward by an experienced user. Parts with no bee sound were not selected, but were kept for the subsequent analyses. We performed these analyses using the Raven Lite software (Cornell Laboratory of Ornithology, Ithaca, New York). The length of recordings ranged from five seconds to over one minute.

Fig 1. Overview of the approach adopted for the acoustic classification of bees buzzing-sounds and machine learning workflow.

The original audio files (.wav format) containing recordings of bees buzzing-sounds during visits to tomato flowers were manually classified into sonication or flight segments. Then, the Mel Frequency Cepstral Coefficients method (MFCC) was used to extract the audio features. After, the resulting data set was split into 50% for the training/development set (delimited by the red dashed line) and 50% for the testing data set. The GridSearchCV method was used to tune the hyperparameters of the training set (using 5-cross validations). The test data set was used to evaluate the performance of the Machine-Learning classifiers in correctly assigning the buzzing sound to the respective bee taxa.

Fig 2. Spectrograms of different types of buzzing (sonication and flight) for two visiting-bees species of tomato flowers (Melipona bicolor and Exomalopsis analis).

Note that the duration and amplitude and frequency of the buzzing-sounds vary between the species and among the type of buzzing.

Audio feature extraction

After the acoustic processing, audio feature extraction was applied to transform raw audio data into features that explicitly represent properties of the data and may be relevant for classification. This process is carried out through the MFCC [45], which is present in the librosa library [46]. As input, the algorithm takes an audio segment (flight or sonication), which goes through the following steps: pre-emphasis, framing, windowing, Discrete Fourier Transform (DFT), and filter bank (applying Discrete Cosine Transform—DCT), as described by [45] (see Fig 3).

Fig 3. Overview of the steps for audio feature extraction by Mel Frequency Cepstral Coefficients Method (MFCC)

Pre-emphasis, framing, windowing, Discrete Fourier Transform (DFT), and filter bank (applying Discrete Cosine Transform—DCT).

The Discrete Fourier Transform (DFT) was applied in each frame and we calculate the spectrum; and subsequently compute the filter banks, which are formed by triangular filters, spaced according to the MEL frequency scale; then we obtained the log-energy output of each one of the MEL filters. Finally, the MFCC coefficients were obtained by applying the inverse transformation of the cosine (DCT) to the logarithm of the energy coefficients obtained in the previous step. The parameters applied to generate the MFCC coefficients were maintained by default, except for the minimum frequency of each audio segment and the number of features that was set to 40. This was necessary because MFCC cannot generate a larger number of features and the average duration of the segments was small, approximately 1.54 seconds.

Similarity of the buzzing-sounds.

After audio feature extraction by the MFCC, we applied the euclidean distance score to estimate the similarity between the two types of sounds produced by bees visiting tomato flowers (sonication and flight). We calculated the euclidean distance score to all possible combinations between sonication and flight sounds and the average euclidean distance per bee taxon (species and genera). The euclidean distance has been used to measure sound similarities, especially in the human context (e.g. between the voice produced by different speakers [47], the speech of music and non-vocal sounds [48], and characteristics from the signals and to model its probability density functions [49]), being therefore, likely to be applied to bee buzzing classification. By definition, the greater the distance, the less similar the sonication and flight features would be. However, to simplify the interpretation of the results, we standardized the index between 0 and 1, where 1 represents maximum similarity and 0 no similarity (according to [50]).


Data splitting.

During an exploratory analysis, we detected an unbalance of the sampled data between the classes (species/genera) and between the two behaviors (sonication and flight). There were 103 samples containing segments of flight and 218 of sonication. Moreover, the distribution of these segments by the classes was even more unbalanced (see Table 1). For example, the flight of the species Augochloropsis brachycephala was recorded just once. Considering that we need to distribute the data into training and testing sets, this species could not be part of both. This problem does not occur at the genus-level, because the number of classes decreases, consequently, the number of samples per class increases.

Due to the mentioned issues, the data division was stratified. As shown in Fig 1, the data set was divided into 50% for training and 50% for testing. The division was done through the function train_test_split of scikit-learn [51], this function is able to separate the data in a stratified way through the StratifiedKFold method. This method is a variation of k-fold that returns stratified folds: each set contains approximately the same percentage of samples of each target class as the complete set, dividing the classes by 50%.

After data division, it is necessary to apply standardization to the data set, this step is important for many machine learning estimators, because they can behave badly if the individual resources do not look more or less like standard normal distributed data set (for example, Gaussian with mean 0 and unit variance) [51]. Elements used in the objective function of a learning algorithm (such as the RBF kernel of Support Vector Machines or the L1 and L2 regularizers of linear models) assume that all resources are centered around 0 and have variance in the same order. If a feature has a variance that is orders of magnitude greater than others, it may dominate the objective function and make the estimator unable to learn from other features correctly as expected. Therefore, in order to solve this problem, data normalization was done with the StandardScaler method.

Machine learning algorithms.

Machine Learning techniques have demonstrated high efficiency and accuracy for classification of bumblebees and other groups of bees based on the characteristics of their buzzing-sounds [19, 35, 36]. Therefore, we chose some of the most common used ML classifiers to recognize the taxonomic identity of bees during visits to tomato flowers (according to S1 Table): Logistic Regression [52], Support Vector Machines [53, 54], Random Forest [55], Decision Trees [56, 57], and a classifier ensemble [58, 59]—a combination of multiple and diversified classifiers to generate a single classifier model. Ensemble methods train multiple learners to solve the same problem [59]. In contrast to classic learning approaches, which construct one learner from the training data, ensemble methods construct a set of learners and combine them. Therefore, we combine three classifiers (Random Forest, SVM e Logistic Regression) by majority vote. These classifiers were chosen because they achieved the best performance in recognizing bees buzzing-sounds (see S1 Table).

Tuning the hyperparameters.

In many cases, the performance of an algorithm in a given learning task depends on its hyperparameter settings. In order to obtain the best performance, the hyperparameters have been thoroughly tested. Methods to tune hyperparameters to the problem at hand require the definition of a search space: the set of hyperparameters and ranges that need to be considered.

An important problem is to decide which hyperparameters should be considered, because a very large set of hyperparameters is computationally expensive and becomes more expensive as the search space augments. So far, there is no empirical evidence on which hyperparameters are most important to adjust and which hyperparameters result in similar performance when set to a reasonable default value. Hyperparameters that fall into this last category can be completely eliminated from the search space when the computational resources are limited [60].

We used the GridSearchCV method found in the scikit-learn library [51]. This method performs an exhaustive search, as input, it receives an estimator, a hyperparameter dictionary, and the cross-validation method then creates a model for each combination. Cross-validation is used to evaluate each individual model, this step divides the training data into 5 folds, see Fig 1. The 5 folds are also built with the StratifiedKFold method. After training and validation, the GridSearchCV method returns the model that achieved the best performance and uses it for the test set.

The sets of hyperparameters were defined as follows: for SVM, we vary the Kernel (Rbf, polynomial, sigmoid, linear), C ranging in {0.001, 0.01, 0.1, 1, 10}, and γ ranging in {1e − 2, 1e − 3, 1e − 4}. For Logistic Regression, we considered penalty {l1, l2} and C ranging in {0.001, 0.01, 0.1, 1, 10}. For Decision Trees, we validated the “Gini impurity” and “entropy” for the information gain, these functions are used to measure the quality of a split in the tree. For Random Forest, we considered the number of trees in the forest varying in {100, 200}. Finally, for the ensemble model we vary only the C parameter as already described, this is due to the great computational processing that this model requires.

Evaluation metrics

To evaluate the performance of the classification generated by the algorithms and baselines, we used the following metrics: Accuracy (Acc), Macro-Precision (MacPrec), Macro-Recall (MacRec) and Macro-F1 (MacF1).

Let i be a class from the set of classes . Let be test set and let c be a classifier, such that c(t) = l, where t is an element of the test set and is a label corresponding to a class in assigned to t by c. Let g(t) be the ground truth class label of t. In regard to the c classifier we define:

  • True Positives of class i, denoted by TPi, as the number of elements in correctly labeled with class i by c, i.e., .
  • False Positives of class i, denoted by FPi, as the number of elements in that were wrongly classified by c as belonging to class i. Formally, .
  • False Negatives of class i, denoted by FNi, as the number o elements in belonging to class i but classified by c with a label different from i, that is, .

The above numbers are used to define traditional effectiveness measures of classifiers. These measures are: Precision, Recall and F1 [61]. Precision p(c, i) of a classifier c in relation to a class i is defined in Eq 1 (1)

Informally, precision is the ratio between the number of test elements correctly labeled by c with the class label i and the number of all elements labeled (correctly or incorrectly) by c.

Recall, denoted by r(c, i), of a classifier c in relation to a class i is defined by Eq 2 (2)

Thus, recall is the ratio between the number of test elements belonging to class i which were correctly labeled by c and the total number of test elements of class i.

The F1 measure is a combination of the precision and recall measures and is defined by Eq 3. (3)

When comparing the effectiveness of classifiers generated from distinct learning methods, it is common to use a global measure of effectiveness. A global measure aims at resuming the effectiveness of the classifier over all classes in the test set. In this work we use the following global measures to compare the results of classifiers we use: Accuracy (Acc) (which is equivalent to Micro-F1), Macro-Precision (MacPrec), Macro-Recall (MacRec) and Macro-F1 (MacF1). Accuracy of a classifier c is the fraction of test elements that were correctly labeled by c, and is formally defined by Eq 4 (4)

The Macro measure (Macro-Precision, Macro-Recall and Macro-F1) is the average of the corresponding measure (Precision, Recall and F1) over all classes and are defined by Eqs 5, 6 and 7. (5) (6) (7)

Baselines establishment

To assess and compare the performance of ML algorithms in recognizing bees based on their buzzing sounds, we built three baselines. The first one, named “fundamental frequency” was estimated to compare our results —based on ML techniques and audio feature extraction by the MFCC—with results obtained by [43]—based on differences in the average fundamental frequency of the bees buzzing. The fundamental frequency was obtained by dividing each sound recording into three sections of similar duration and the lowest frequency of each section was measured using Avisoft-SASLab Lite (Avisoft Bioacoustics, Germany); the average fundamental frequency of each sound recording was the mean of the three frequencies, as performed by [43]. Then, the values of average fundamental frequency (±SD) were associated with the corresponding bee taxon (species/genus). The species/genus whose average was between the lowest and highest frequency will be selected and the species/genus that has the lowest standard deviation will be predicted. For the second baseline, named as “Fundamental frequency (SVM)”, we employed the best classifier here (based on the best F1-score) to recognize the bees taxa based only on the fundamental frequency. Lastly, we report the result of a majority baseline that assigns all the classes to the majority class, that is Exomalopsis for genus-level and Exomalopsis analis for species-level classification.


Acoustic characteristics of the buzzing sounds

The acoustic proprieties (amplitude, frequency, and duration) of the buzzing can vary depending on the behavioral activity and bee species visiting tomato flowers. For example, the spectrograms of Melipona bicolor and Exomalopsis analis show that the flight and sonication buzzing-sounds are distinct from each other (Fig 2). During the flight (Fig 2), the spectrograms are time-independent and consist of a continuous frequency; amplitude variations may be related to the intensity of the sound over time, since the distance from the bee to the microphone can also vary. During the sonication, the fundamental frequency increases significantly (around 240 Hz) and the amplitude reaches higher values at higher frequencies.

The acoustic proprieties of the buzz are also different among the bee species. For example, while M. bicolor (Fig 2 upper spectrogram) presents successive short sonication buzzing-sounds, with brief breaks among them, the E. analis (Fig 2 bottom spectrogram) shows sonication intervals with irregular duration (generally longer than M. bicolor) and longer breaks among them.

Performance of the machine learning algorithms

Regarding the type of buzzing-sound.

The sonication and flight features extracted by the MFCC can be easily distinguished. They presented very low similarity among each other, ranging from 0.01 to 0.03 (Euclidean distance score) for bee species and between 0.01–0.02 for the genus. Moreover, the type of buzzing-sound also influenced the capacity of classifiers to recognize the visiting bees of tomato flowers. The ML algorithms reached a better performance, recognizing bees at species level based on sonication sounds (based on the best Macro-F1 score, see Table 2). The accuracy and Macro-F1-score were higher in classifications considering only the segments of floral sonication sounds rather than those of flight (Table 2). The sonication sounds classified by the SVM algorithm achieved the best performance among all combinations tested here (Accuracy = 73.39%; Macro-F1 = 59.06%, Table 2).

Table 2. Predictive performance of different Machine-Learning algorithms on acoustic recognition of bee species based on the type of buzzing-sound (flight, sonication, and flight+sonication) during visits to tomato flowers.

The performance of the ML algorithms was measured by Accuracy (Acc), Macro-Precision (MacPred), Macro-Recall (MacRec) and Macro-F1 (MacF1) and compared with three baselines scenarios: (1) Majority class: assigning all the classes to the majority class; (2) Fundamental frequency: bees recognition based solely on the average frequency of the sonication, as performed by [43]; (3) Fundamental frequency (SVM): bees recognition based fundamental frequency and using the SVM algorithm, classifier with the best performance (based on the MacF1-score). Bold numbers represent the best results per evaluation metric within buzz-sound; Different upper side letters denote significant differences in the F1-score among the algorithms of the same buzzing-behavioral (p ≤ 0.05, T-test); (**) denotes that the performance of the algorithm is higher than the baselines (based on the MacF1 measure; p ≤ 0.05, T-test).

Nonetheless, at genus-level recognition, the performance of the algorithms did not seem to depend on the type of buzzing-sound (based on the higher Macro-F1 measure, Table 3). However, the buzzing sounds from flights led to a marginally better ML algorithms performance than sonication in recognizing the genera of bees (Table 3).

Table 3. Predictive performance of different Machine-Learning algorithms on acoustic recognition of bee genera based on the type of buzzing-sound (flight, sonication, and flight+sonication) during visits to tomato flowers.

The performance of the ML algorithms was measured by Accuracy (Acc), Macro-Precision (MacPrec), Macro-Recall (MacRec) and Macro-F1 (MacF1) and compared with three baseline scenarios: (1) Majority class: assigning all the classes to the majority class; (2) Fundamental frequency: bee recognition based solely on the average frequency of the sonication, as performed by [43]; (3) Fundamental frequency (SVM): bee recognition based fundamental frequency and using the SVM algorithm, classifier with the best performance (based on the MacF1 score). Bold numbers represent the best results per evaluation metric within buzz-sound; Different upper side letters denote significant differences in the MacF1 scores among the algorithms of the same buzzing-behavioral (p ≤ 0.05, T-test); (**) denotes that the performance of the algorithm is higher than the baselines (based on the MacF1 measure; p ≤ 0.05, T-test).

Regarding the level of taxonomic resolution.

The performance of ML classifiers was different for acoustic recognition of bees at species and genus levels. Indeed, the complexity increases for species recognition in relation to genus recognition: there are 15 classes (against 8 genera) and the number of samples for some of them is very small (N ≤ 5). The SVM reached the best Macro-F1 values at genus-level recognition (Flight, 60.2%; Table 3), which was similar to the Macro-F1 obtained by the SVM in species recognition (Sonication, 59.06%; Table 2). However, based on Accuracy, the Ensemble was the best for genus recognition and the SVM for species (Tables 3 and 2).

Just the LR and SVM classifiers always presented Macro-F1 values higher than the baselines at genus-level classification (S2 Table). On the other hand, only the DTree continually achieved lower performance than the baselines (based on the Macro-F1 measure, S2 Table). Considering the species recognition, besides the LR and SVM, the Ensemble also reached a better score than the baselines (based on the Macro-F1 measure, S3 Table).

The confusion matrix shows the number of correctly predicted genera versus erroneously predicted genera by SVM, the classifier with the best performance here (based on Mac-F1 score, Table 4). The SVM was capable to correctly recognize 64% (34 of 53) of the flight sounds samples. However, the capacity of the algorithm to identify bees was unequal among the genera. The SVM was able to correctly recognize more than 50% of the samples of four out of eight genera (Bombus, Centris, Melipona and Eulaema, Table 4).

Table 4. Confusion matrix with the best performance for bee buzzing-sounds classification at genus-level using MFCC features (flight with SVM classifier, MacF1 = 60.20% and Acc = 64.15%).

The numbers in the matrix correspond to correctly (diagonal elements, bold) and incorrectly (out-of-diagonal elements) recognized samples in the data set. The best parameters of this classification were C = 10, decision_function_shape = “ovo”, gamma = 0.01, kernel = “rbf”.

On the other hand, for species-level recognition, the SVM was able to correctly predict 79% of the sonication samples (80 of 109) (Table 5). Moreover, this algorithm was capable to recognize E. analis (28 of 30 samples), the most representative species (Table 5). Moreover, this algorithm correctly recognized some species with a small number of samples, like Augochloropsis brachycephala, Augochloropsis sp.2, Melipona quadrisfaciata and Centris trigonoides.

Table 5. Confusion matrix with the best performance for bees buzzing-sounds classification at species-level using MFCC features (sonication with SVM classifier, MacF1 = 59.06% and Acc = 73.39%).

The numbers in the matrix correspond to correctly (diagonal elements, bold) and incorrectly (out-of-diagonal elements) recognized samples in the data set. The best parameters of this classification were C = 10, decision_function_shape = “ovo”, gamma = 0.01, kernel = “rbf”.


The accuracy of tested ML algorithms in recognizing flower-visiting bees of tomato crops ranged from 49 to 74% on a data set of 59 audio recording samples. The algorithms reached a better performance to assign the bees buzzing sounds to their respective taxa than frequency-based trials. Moreover, we found that the sonication sounds are more relevant to bees species recognition. The ML algorithms achieved a greater performance in recognizing bee species when we considered only the sounds produced during sonication. On the other hand, the genera recognition was not dependent on the type of buzzing-sound.

Advantages of machine-learning over classifications based on fundamental frequency

The ML algorithms achieved higher performance recognizing bee taxa than analyses based on fundamental frequency and realized on the same data set. Moreover, the statistical analysis based on fundamental frequency differences performed by [43], failed to distinguish between most bee species. In fact, analyses based solely on the fundamental frequency (average frequency) must lose part of the intrinsic complexity of buzzing sounds, which is multifactorial and time-dependent [12]. The buzz has other acoustic features, like the amplitude and duration, that combined between them and with the frequency must contribute to characterize the buzzing-sounds [62]. However, this must result in a huge amount of data with unusual distributions, non-linearity, complex data interactions, dependence on the observations that would not be well handled by commonly used statistical methods in ecology [63, 64]. On the other hand, the ML algorithms combined with the MFCC method has been able to correctly predict 66% of all samples; 79% of the samples of species based on sonication sounds and SVM algorithm. Likely due to the ML attributes boosted by MFCC, we reached here a higher performance on acoustic recognition of bees than classifications based only on the fundamental frequency.

The recognition of bees depends on the type of buzzing

There are pronounced differences between the biomechanical properties of the buzzing produced during sonication and those produced during the flight [38]. The sonication sounds have amplitudes and frequencies higher than flight buzzes [38]. The flight sound has few oscillations and roughly time-independent. It consists of the natural frequency (the frequency at which the wings oscillate) and its higher harmonics [37, 38]. Therefore, the flight buzzing can be more similar among species over the higher-level taxa. This may be the reason why flight sounds were more relevant to the recognition of the genus than to the species. Besides that, the incorporation of both buzzing sounds (sonication+flight) does not seem to interfere with the performance of the algorithms in recognizing bees at genus-level, because the performance of the ML algorithms was similar.

On the other hand, the sonication sounds were associated with higher performance in recognizing bee species. Although, the mechanical characteristics of the sonication have been related to the amount of pollen released from poricidal anthers [12, 43, 6567], the acoustic properties of buzzing-sounds are also related to intrinsic attributes of the species [3942]. Consequently, the higher specificity related to sonication sounds makes it more relevant to species recognition by the ML algorithms. Although the behavioral context that the buzzing-sound was produced was not relevant for genera recognition by ML algorithms, it was for the recognition of species.

Limitations of buzzing-sound classification with machine learning

Although here we classified a greater taxonomic diversity of flower-visiting bees based on their buzzing than previous studies (see S1 Table); grouping the largest number of genera and families of bees (15 species from 8 genera and 2 subfamilies; Table 1), the machine-learning approach presented some limitations in recognizing bees based on buzzing sounds. Firstly, the ML algorithms are domain-dependent. This means that a classifier can perform very well when it is applied on the same domain to the one it was trained, yet the performance decreases when it is applied to a different domain (e.g. species/genus, sonication/flight). Thus, the classifier needs to be retrained in order to perform well on a different domain. Secondly, the performance of ML algorithms was not homogeneous among the classes of species and genera. The performance was very high for some bee taxa, especially the most sampled, and varied for unrepresentative taxa (Augochloropsis sp.2, 100%; E. analis, 93%; P. graminea, 72%), which may be related to the unbalanced number of samples per bee taxa, an issue also reported by related studies (see S1 Table). Consequently, bees that rarely visited the tomato flowers and/or were difficult to capture were under-sampled. This bias is inherent to the system since the local abundance of individuals per species naturally varies and the bees spontaneously visit the flowers. On the other hand, to ensure taxonomic identification, all specimens had to be captured. This was a requirement to include the buzz sound associated with a given bee on the acoustic analysis, in case the bee could not be sampled, we deleted the corresponding sound file. The oil collecting bees (Centris sp.), for example, were more difficult to sample, because they flew quite fast between flowers, remained for a short time in the same flower, and/or did not visit nearby flowers, which make it difficult to follow them. The general consequence of these two factors mentioned above (different abundance of individuals among bee taxa and sampling bias) was an unbalanced sampling among classes of buzzing bees. Unfortunately, the Machine Learning algorithms have a considerable loss of performance in classifying unbalanced data [68].

Nevertheless, the performance in recognizing the buzzing sounds was uneven among the ML algorithms. The LR and especially the SVM outstanding from the other algorithms and constantly obtained better performance than the baselines. The SVM through weighted evaluation metrics stood out among the classifiers, achieving the best performance in recognizing the visiting-bees of tomato. In fact, the SVM has strong theoretical foundations with excellent empirical successes [69] and has demonstrated tolerance to data sets with few samples per class and unbalanced data (see Evaluation metrics in text) [70]. This may be the main reason that this classifier was almost always produced the best classifications in relation to baselines and other algorithms. Despite that, the SVM performance is still lowed on the small data set tested here, compared to ML standards (see S1 Table). Therefore, we suppose that there is so little audio data that no classical ML classifier can, in principle, generalize well on it. Further studies, considering larger recording samples, and/or applying algorithms that can perform more complex processing tasks like unsupervised learning systems (e.g., clustering, dimensionality reduction, recommender systems, deep learning) may reach better classification performance.

Consequences of automating the bee recognition to tomato yields

The bee identity is associated with pollination effectiveness and fruit yields since the performance as pollinators tends to be different among species/groups of visiting bees (e.g. [4, 69, 16]. Differences in the body size of the bees in relation to the distance between anthers and stigma may be a key factor in explaining this. Larger pollinators transfer more pollen than smaller ones [71], since their body size fits or exceeds the distance between anthers and stigma [7274]. Therefore, automating the taxonomic recognition of flower-visiting bees would be especially relevant for tomato production, whereas the quality of the pollination provided is linked to the identity of the bee. Then, farmers, agronomists, and other professionals interested in improving the pollination of cultivated tomatoes could identify the species of visiting bees without needing an expert in insect taxonomy. Aware of the value of bees to the crop income, the farmer could be motivated to adopt practices to benefit the most successful pollinators and indirectly the overall local bee community, promoting profitable and sustainable agricultural practices.

Moreover, the automated taxonomic recognition of bees may apply to other buzz-pollinated plants, since they are primarily visited by bees that produce buzzing sounds to extract pollen [9, 12, 14, 15]. Some of these plants are, as well as tomato, important food crops, like blueberry, kiwi, cranberry, and eggplant [9, 7577]. However, some procedures must be adopted to avoid sampling bias and facilitate acoustic recognition: (1) studies must focus on one plant species because the same bee species produce vibrations with different frequencies and duration when visiting different plant taxa [42, 78]; (2) consider the limitations when analyzing the relative acoustical amplitude because this energy-related parameter is dependent on measurement procedures (e.g. the recorder model and configuration, the distance of the focal object) and does not necessarily correspond to the vibrational amplitude [19, 79].

In summary, the ML algorithms powered by the MFCC feature extraction method could lead to automate the taxonomic recognition of flower-visiting bees of tomato crop. We found advantages of ML classifiers in recognizing species of bees based on their buzzing sounds over conventional analyzes based on fundamental frequency alone [43]. Some classifiers, especially the SVM, an algorithm that better handles a data set of low sampling, achieved better performance in relation to the randomized and sound frequency-based trials. The buzzing sounds produced during sonication were more relevant for the taxonomic recognition of bees species than the flight sounds. On the other hand, we found that the ML classifiers achieve better performance to recognize bee genera based on flight sounds. As far as we know, the use of ML algorithms to explore these two kinds of bee sounds for bee taxa identification has not been reported previously. Future studies may focus on the extension of this approach to other buzz-pollinated crops as well as on the technological application of this model, for example, the development of apps based on ML techniques and compatible with smartphones.

Supporting information

S1 Table. Overview of the studies applying machine learning and audio feature extraction methods to the acoustic monitoring/detection of bees.


S2 Table. Pairwise comparison of the performance of the machine-learning algorithms and baseline scenarios (majority class, fundamental frequency, and fundamental frequency (SVM)) in acoustic recognition of bee genera based on buzzing-sounds produced during three behavioral contexts (flight, sonication, and flight + sonication).

Internal numbers correspond to P-values obtained by the T-test; P-values highlighted in bold (p ≤ 0.05) indicate significant differences among the F1-score of the ML algorithms/baselines.


S3 Table. Pairwise comparison of the performance of the machine-learning algorithms and baseline scenarios (majority class, fundamental frequency, and fundamental frequency (SVM)) in acoustic recognition of bee species based on buzzing-sounds produced during three behavioral contexts (flight, sonication, and flight + sonication).

Internal numbers correspond to P-values obtained by the T-test; P-values highlighted in bold (p ≤ 0.05) indicate significant differences among the F1-score of the ML algorithms/baselines.



We thank Eduardo Almeida of the University of São Paulo (USP) and Fernando Silveira of the Federal University of Minas Gerais (UFMG) for identifying bees.


  1. 1. FAOSTAT. database. Food and Agriculture Organization of the United Nations, Rome, Italy. 2020;1(6):491–499.
  2. 2. Franceschinelli EV, Neto CMS, Lima FG, Gonçalves BB, Bergamini LL, Bergamini BAR, et al. Native bees pollinate tomato flowers and increase fruit production. Journal of Pollination Ecology. 2013;11(4):234–253.
  3. 3. Deprá MS, Girondi Delaqua GC, Freitas L, Gaglianone MC. Pollination deficit in open-field tomato crops (Solanum lycopersicum L., Solanaceae) in Rio de Janeiro state, southeast Brazil. Journal of Pollination Ecology. 2014;12(1):233–258.
  4. 4. Santos A, Bartelli B, Nogueira-Ferreira F. Potential pollinators of tomato, Lycopersicon esculentum (Solanaceae), in open crops and the effect of a solitary bee in fruit set and quality. Journal of economic entomology. 2014;107(3):987–994. pmid:25026657
  5. 5. Giannini TC, Cordeiro GD, Freitas BM, Saraiva AM, Imperatriz-Fonseca VL. The dependence of crops for pollinators and the economic value of pollination in Brazil. Journal of Economic Entomology. 2015;108(3):849–857. pmid:26470203
  6. 6. Silva-Neto CdM, Bergamini LL, Elias MAdS, Moreira G, Morais J, Bergamini BAR, et al. High species richness of native pollinators in Brazilian tomato crops. Brazilian Journal of Biology. 2017;77(3):506–513. pmid:27683812
  7. 7. Vinícius-Silva R, Parma DdF, Tostes RB, Arruda VM, Werneck MdV. Importance of bees in pollination of Solanum lycopersicum L.(Solanaceae) in open-field of the Southeast of Minas Gerais State, Brazil. Hoehnea. 2017;44(3):349–360.
  8. 8. Toni HC, Djossa BA, Ayenan MAT, Teka O. Tomato (Solanum lycopersicum) pollinators and their effect on fruit set and quality. The Journal of Horticultural Science and Biotechnology. 2020; p. 1–13.
  9. 9. Cooley H, Vallejo-Marín M. Buzz-Pollinated Crops: A Global Review and Meta-analysis of the Effects of Supplemental Bee Pollination in Tomato. Journal of Economic Entomology. 2021;14(1):179–213. pmid:33615362
  10. 10. Peet M, Welles G, et al. Greenhouse tomato production. Crop production science in horticulture. 2005;13:257.
  11. 11. Buchmann SL, et al. Buzz pollination in angiosperms. Buzz pollination in angiosperms. 1983;28(1):73–113.
  12. 12. De Luca PA, Vallejo-Marin M. What’s the ‘buzz’about? The ecology and evolutionary significance of buzz-pollination. Current opinion in plant biology. 2013;16(4):429–435. pmid:23751734
  13. 13. Michener CD. An interesting method of pollen collecting by bees from flowers with tubular anthers. Revista de Biologia Tropical. 1962;10(2):167–175.
  14. 14. Buchmann SL, Hurley JP. A biophysical model for buzz pollination in angiosperms. Journal of Theoretical Biology. 1978;72(4):639–657. pmid:672247
  15. 15. Vallejo-Marín M. Buzz pollination: studying bee vibrations on flowers. New Phytologist. 2019;224(3):1068–1074. pmid:30585638
  16. 16. Nunes-Silva P, Hnrcir M, Shipp L, Imperatriz-Fonseca VL, Kevan PG. The behaviour of Bombus impatiens (Apidae, Bombini) on tomato (Lycopersicon esculentum Mill., Solanaceae) flowers: pollination and reward perception. Journal of Pollination Ecology. 2013;11(5):33–40.
  17. 17. Orr MC, Hughes AC, Chesters D, Pickering J, Zhu CD, Ascher JS. Global patterns and drivers of bee distribution. Current Biology. 2020;50(3):53–78. pmid:33217320
  18. 18. Cardinal S, Buchmann SL, Russell AL. The evolution of floral sonication, a pollen foraging behavior used by bees (Anthophila). Evolution. 2018;72(3):590–600. pmid:29392714
  19. 19. Gradišek A, Slapničar G, Šorn J, Luštrek M, Gams M, Grad J. Predicting species identity of bumblebees through analysis of flight buzzing sounds. Bioacoustics. 2017;26(1):63–76.
  20. 20. Gaston KJ, O’Neill MA. Automated species identification: why not? Philosophical Transactions of the Royal Society of London Series B: Biological Sciences. 2004;359(1444):655–667. pmid:15253351
  21. 21. Soberón J, Peterson T. Biodiversity informatics: managing and applying primary biodiversity data. Philosophical Transactions of the Royal Society of London Series B: Biological Sciences. 2004;359(1444):689–698. pmid:15253354
  22. 22. Lewis OT, Basset Y. Insect conservation in tropical forests. Insect conservation biology. 2007;456(2):34–56.
  23. 23. Schroder S, et al. The new key to bees: automated identification by image analysis of wings, Pollinating bees-the Conservation Link Between Agriculture and Nature. Brasilia: Ministry of Environment. 2002;94(2):691–596.
  24. 24. Santana FS, Costa AHR, Truzzi FS, Silva FL, Santos SL, Francoy TM, et al. A reference process for automating bee species identification based on wing images and digital image processing. Ecological informatics. 2014;24:248–260.
  25. 25. Yanikoglu B, Aptoula E, Tirkaz C. Automatic plant identification from photographs. Machine vision and applications. 2014;25(6):1369–1383.
  26. 26. Valliammal N, Geethalakshm SN. Automatic Recognition System Using Preferential Image Segmentation For Leaf And Flower Images. Computer Science & Engineering: An International Journal (CSEIJ). 2011;1(4):13–25.
  27. 27. Huang CJ, Yang YJ, Yang DX, Chen YJ. Frog classification using machine learning techniques. Expert Systems with Applications. 2009;36(2):3737–3743.
  28. 28. Cheng J, Sun Y, Ji L. A call-independent and automatic acoustic system for the individual recognition of animals: A novel model using four passerines. Pattern Recognition. 2010;43(11):3846–3852.
  29. 29. Lee CH, Han CC, Chuang CC. Automatic classification of bird species from their sounds using two-dimensional cepstral coefficients. IEEE Transactions on Audio, Speech, and Language Processing. 2008;16(8):1541–1550.
  30. 30. Terenzi A, Cecchi S, Spinsante S. On the importance of the sound emitted by honey bee hives. Veterinary Sciences. 2020;7(4):168. pmid:33142815
  31. 31. Kulyukin V, Mukherjee S, Amlathe P. Toward Audio Beehive Monitoring: Deep Learning vs. Standard Machine Learning in Classifying Beehive Audio Samples. Applied Sciences. 2018;8(9).
  32. 32. Nolasco I, Terenzi A, Cecchi S, Orcioni S, Bear HL, Benetos E. Audio-based identification of beehive states. In: ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 4. IEEE; 2019. p. 8256–8260.
  33. 33. Terenzi A, Cecchi S, Orcioni S, Piazza F. Features Extraction Applied to the Analysis of the Sounds Emitted by Honey Bees in a Beehive. In: 2019 11th International Symposium on Image and Signal Processing and Analysis (ISPA). 45; 2019. p. 03–08.
  34. 34. Cejrowski T, Szymański J, Logofătu D. Buzz-based recognition of the honeybee colony circadian rhythm. Computers and Electronics in Agriculture. 2020;175:505–486.
  35. 35. Arruda H, Imperatriz-Fonseca V, de Souza P, Pessin G. Identifying Bee Species by Means of the Foraging Pattern Using Machine Learning. In: 2018 International Joint Conference on Neural Networks (IJCNN). 14; 2018. p. 1–6.
  36. 36. Kawakita S, Ichikawa K. Automated classification of bees and hornet using acoustic analysis of their flight sounds. Apidologie. 2019;50(1):71–79.
  37. 37. De Luca PA, Buchmann S, Galen C, Mason AC, Vallejo-Marín M. Does body size predict the buzz-pollination frequencies used by bees? Ecology and evolution. 2019;9(8):4875–4887. pmid:31031950
  38. 38. Pritchard DJ, Vallejo-Marín M. Floral vibrations by buzz-pollinating bees achieve higher frequency, velocity and acceleration than flight and defence vibrations. Journal of Experimental Biology. 2020;223(11):63–103. pmid:32366691
  39. 39. Arroyo-Correa B, Beattie C, Vallejo-Marín M. Bee and floral traits affect the characteristics of the vibrations experienced by flowers during buzz pollination. Journal of Experimental Biology. 2019;222(4):391–396. pmid:30760551
  40. 40. Kawai Y, Kudo G. Effectiveness of buzz pollination in Pedicularis chamissonis: significance of multiple visits by bumblebees. Ecological Research. 2009;24(1):2–15.
  41. 41. Burkart A, Lunau K, Schlindwein C. Comparative bioacoustical studies on flight and buzzing of neotropical bees. Journal of Pollination Ecology. 2011;6(2):491–596.
  42. 42. Switzer CM, Combes SA. Bumblebee sonication behavior changes with plant species and environmental conditions. Apidologie. 2017;48(2):223–233.
  43. 43. Rosi-Denadai CA, Araújo PCS, Campos LAdO, Cosme L Jr, Guedes RNC. Buzz-pollination in Neotropical bees: genus-dependent frequencies and lack of optimal frequency for pollen release. Insect science. 2018;27(1):133–142. pmid:29740981
  44. 44. Melo GA, Gonçalves RB. Higher-level bee classifications (Hymenoptera, Apoidea, Apidae sensu lato). Revista Brasileira de Zoologia. 2005;22(1):153–159.
  45. 45. Logan B, et al. Mel frequency cepstral coefficients for music modeling. In: Ismir. vol. 270. Citeseer; 2000. p. 1–11.
  46. 46. McFee B, Raffel C, Liang D, Ellis DP, McVicar M, Battenberg E, et al. librosa: Audio and music signal analysis in python. In: Proceedings of the 14th python in science conference. vol. 8. Citeseer; 2015. p. 18–25.
  47. 47. San Segundo E, Tsanas A, Gómez-Vilda P. Euclidean Distances as measures of speaker similarity including identical twin pairs: A forensic investigation using source and filter voice characteristics. Forensic Science International. 2017;270:25–38. pmid:27912151
  48. 48. Foote J. A similarity measure for automatic audio classification. 11. Proc. AAAI 1997 Spring Symposium on Intelligent Integration and Use of Text, Image, Video, and Audio Corpora; 1997.
  49. 49. Helén M, Virtanen T. Query by example of audio signals using Euclidean distance between Gaussian mixture models. In: 2007 IEEE International Conference on Acoustics, Speech and Signal Processing-ICASSP’07. vol. 1. IEEE; 2007. p. 200–225.
  50. 50. Segaran T. Programming collective intelligence: building smart web 2.0 applications. 6. “O’Reilly Media, Inc.”; 2007.
  51. 51. Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, et al. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research. 2011;12(1):2825–2830.
  52. 52. Hosmer DW Jr, Lemeshow S, Sturdivant RX. Applied logistic regression. 3. John Wiley & Sons; 2013.
  53. 53. Cortes C, Vapnik V. Support-vector networks. Machine learning. 1995;20(3):273–297.
  54. 54. Steinwart I, Christmann A. Support vector machines. 6. Springer Science & Business Media; 2008.
  55. 55. Breiman L. Random forests. Machine learning. 2001;45(1):5–32.
  56. 56. Breiman L, Friedman JH, Olshen RA, Stone CJ. Classification and Regression Trees. Monterey, CA: Wadsworth and Brooks; 1984.
  57. 57. Quinlan JR. C4.5: Programs for Machine Learning. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc.; 1993.
  58. 58. Zhou ZH. Ensemble Methods: Foundations and Algorithms. 1st ed. Chapman & Hall/CRC; 2012.
  59. 59. Kuncheva LI. Combining Pattern Classifiers: Methods and Algorithms. USA: Wiley-Interscience; 2004.
  60. 60. Weerts HJ, Mueller AC, Vanschoren J. Importance of tuning hyperparameters of machine learning algorithms. arXiv preprint arXiv:200707588. 2020;22(6):91–96.
  61. 61. Fatourechi M, Ward RK, Mason SG, Huggins J, Schlögl A, Birch GE. Comparison of Evaluation Metrics in Classification Applications with Imbalanced Datasets. In: 2008 Seventh International Conference on Machine Learning and Applications. 45; 2008. p. 777–782.
  62. 62. De Luca PA, Cox DA, Vallejo-Marín M. Comparison of pollination and defensive buzzes in bumblebees indicates species-specific and context-dependent vibrations. Naturwissenschaften. 2014;101(4):331–338. pmid:24563100
  63. 63. Crisci C, Ghattas B, Perera G. A review of supervised machine learning algorithms and their applications to ecological data. Ecological Modelling. 2012;240:113–122.
  64. 64. Cutler DR, Edwards TC Jr, Beard KH, Cutler A, Hess KT, Gibson J, et al. Random forests for classification in ecology. Ecology. 2007;88(11):2783–2792. pmid:18051647
  65. 65. Harder L, Barclay R. The functional significance of poricidal anthers and buzz pollination: controlled pollen removal from Dodecatheon. Functional Ecology. 1994;89(2):509–517.
  66. 66. De Luca PA, Bussiere LF, Souto-Vilaros D, Goulson D, Mason AC, Vallejo-Marín M. Variability in bumblebee pollination buzzes affects the quantity of pollen released from flowers. Oecologia. 2013;172(3):805–816. pmid:23188056
  67. 67. Pritchard DJ, Vallejo-Marín M. Buzz pollination. Current Biology. 2020;30(15):858–860. pmid:32750339
  68. 68. Tyagi S, Mittal S. Sampling approaches for imbalanced data classification problem in machine learning. In: Proceedings of ICRIC 2019. 1. Springer; 2020. p. 13–40.
  69. 69. Akbani R, Kwek S, Japkowicz N. Applying support vector machines to imbalanced datasets. In: European conference on machine learning. 5. Springer; 2004. p. 39–50.
  70. 70. Palade V. Class imbalance learning methods for support vector machines. Imbalanced Learning: Foundations, Algorithms, and Applications; Wiley: Hoboken, NJ, USA. 2013;34(5):83.
  71. 71. Földesi R, Howlett BG, Grass I, Batáry P. Larger pollinators deposit more pollen on stigmas across multiple plant species—A meta-analysis. Journal of Applied Ecology. 2021;58(4):699–707.
  72. 72. Solís-Montero L, Vallejo-Marín M. Does the morphological fit between flowers and pollinators affect pollen deposition? An experimental test in a buzz-pollinated species with anther dimorphism. Ecology and Evolution. 2017;7(8):2706–2715. pmid:28428861
  73. 73. Morais JM, Consolaro HN, Bergamini LL, Ferrero V. Patterns of pollen flow in monomorphic enantiostylous species: the importance of floral morphology and pollinators’ size. Plant Systematics and Evolution. 2020;306(2):1–12.
  74. 74. Mesquita-Neto JN, Vieira ALC, Schlindwein C. Minimum size threshold of visiting bees of a buzz-pollinated plant species: consequences for pollination efficiency. American Journal of Botany. 2021. pmid:34114214
  75. 75. Kim Y, Jo Y, Lee S, Lee M, Yoon H, Lee M, et al. The comparison of pollinating effects between honeybees (Apis mellifera) and bumblebee (Bombus terrestris) on the Kiwifruit raised in greenhouse. Korean Journal of Apiculture. 2005.
  76. 76. Stubbs C, Drummond F. Blueberry and cranberry (Vaccinium spp.) pollination: a comparison of managed and native bee foraging behavior. In: VII International Symposium on Pollination 437; 1996. p. 341–344.
  77. 77. Hikawa M. Effects of pollination by honeybees on yield and the rate of unmarketable fruits in forcing eggplant [Solanum melongena] cultures. Horticultural Research (Japan). 2004.
  78. 78. Corbet SA, Huang SQ. Buzz pollination in eight bumblebee-pollinated Pedicularis species: does it involve vibration-induced triboelectric charging of pollen grains? Annals of botany. 2014;114(8):1665–1674. pmid:25274550
  79. 79. De Luca PA, Giebink N, Mason AC, Papaj D, Buchmann SL. How well do acoustic recordings characterize properties of bee (Anthophila) floral sonication vibrations? Bioacoustics. 2018;29(1):1–14.