Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

MosquitoSong+: A noise-robust deep learning model for mosquito classification from wingbeat sounds

  • Akara Supratak,

    Roles Conceptualization, Formal analysis, Investigation, Methodology, Resources, Supervision, Validation, Visualization, Writing – original draft, Writing – review & editing

    Affiliation Faculty of ICT, Mahidol University, Nakhon Pathom, Thailand

  • Peter Haddawy,

    Roles Conceptualization, Formal analysis, Funding acquisition, Investigation, Methodology, Project administration, Resources, Supervision, Validation, Writing – original draft, Writing – review & editing

    Affiliations Faculty of ICT, Mahidol University, Nakhon Pathom, Thailand, Bremen Spatial Cognition Center, University of Bremen, Bremen, Germany

  • Myat Su Yin ,

    Roles Conceptualization, Data curation, Formal analysis, Funding acquisition, Investigation, Methodology, Project administration, Resources, Supervision, Validation, Writing – original draft, Writing – review & editing

    myatsu.yin@mahidol.ac.th

    Affiliation Faculty of ICT, Mahidol University, Nakhon Pathom, Thailand

  • Tim Ziemer,

    Roles Conceptualization, Formal analysis, Investigation, Methodology, Resources, Supervision, Validation, Visualization, Writing – original draft, Writing – review & editing

    Affiliations Bremen Spatial Cognition Center, University of Bremen, Bremen, Germany, Institute of Systematic Musicology, University of Hamburg, Hamburg, Germany

  • Worameth Siritanakorn,

    Roles Data curation, Methodology, Software, Visualization, Writing – original draft, Writing – review & editing

    Affiliation Faculty of ICT, Mahidol University, Nakhon Pathom, Thailand

  • Kanpitcha Assawavinijkulchai,

    Roles Methodology, Software, Visualization, Writing – original draft, Writing – review & editing

    Affiliation Faculty of ICT, Mahidol University, Nakhon Pathom, Thailand

  • Kanrawee Chiamsakul,

    Roles Data curation, Software

    Affiliation Faculty of ICT, Mahidol University, Nakhon Pathom, Thailand

  • Tharit Chantanalertvilai,

    Roles Data curation, Software

    Affiliation Faculty of ICT, Mahidol University, Nakhon Pathom, Thailand

  • Wish Suchalermkul,

    Roles Methodology, Software, Visualization

    Affiliation Faculty of ICT, Mahidol University, Nakhon Pathom, Thailand

  • Chaitawat Sa-ngamuang,

    Roles Data curation

    Affiliations Faculty of ICT, Mahidol University, Nakhon Pathom, Thailand, Bremen Spatial Cognition Center, University of Bremen, Bremen, Germany

  • Patchara Sriwichai

    Roles Data curation, Investigation, Resources, Validation

    Affiliation Faculty of Tropical Medicine, Mahidol University, Bangkok, Thailand

Abstract

In order to assess risk of mosquito-vector borne disease and to effectively target and monitor vector control efforts, accurate information about mosquito vector population densities is needed. The traditional and still most common approach to this involves the use of traps along with manual counting and classification of mosquito species, but the costly and labor-intensive nature of this approach limits its widespread use. Numerous previous studies have sought to address this problem by developing machine learning models to automatically identify species and sex of mosquitoes based on their wingbeat sounds. Yet little work has addressed the issue of robust classification in the presence of environmental background noise, which is essential to making the approach practical. In this paper, we propose a new deep learning model, MosquitoSong+, to identify the species and sex of mosquitoes from raw wingbeat sounds so that it is robust to the environmental noise and the relative volume of the mosquito’s flight tone. The proposed model extends the existing 1D-CNN model by adjusting its architecture and introducing two data augmentation techniques during model training: noise augmentation and wingbeat volume variation. Experiments show that the new model has very good generalizability, with species classification accuracy above 80% on several wingbeat datasets with various background noise. It also has an accuracy of 93.3% for species and sex classification on wingbeat sounds overlaid with various background noises. These results suggest that the proposed approach may be a practical means to develop classification models that can perform well in the field.

Introduction

Mosquito vector-borne diseases such as malaria, dengue, and Zika pose some of the most serious public health burdens in tropical and sub-tropical countries [1]. Due to the ongoing climate change, urbanization, and other global changes, the geographical range of vector-borne diseases is expected to further expand [2, 3]. In order to assess risk, effectively target public health interventions, and monitor the effectiveness of vector control efforts, accurate information about mosquito vector population densities is needed. Since different species of mosquitoes transmit different diseases, we require not only overall mosquito population density estimates, but also estimates by species.

The traditional and most common approach for adult mosquito vector monitoring is to deploy collection methods and then manually count and identify the species of the mosquitoes caught. Commonly used approaches include traps that contain one or more attractants such as light, CO2, heat, or odor [4, 5], cow bait tents [6], and human landing catch [7]. These approaches are highly labor-intensive in terms of deployment, as well as classification of the mosquitoes caught. Because of the labor-intensive nature, they are typically used only for occasional surveys. In addition, because of the effort required to deploy the collection methods, they are primarily used with limited coverage and thus only to estimate the relative populations of different species, rather than population densities, which would be highly valuable information to have. Thus, there is need for an alternative approach that could enable accurate estimation of population densities of the variety of mosquito species present on a continuing basis and at low cost.

It has been long shown that the different mosquito species have different wingbeat audio signatures [8], enabling them to recognize each others’ species and sex [9, 10]. Many studies have developed machine learning models to automatically identify species and sex of mosquitoes based on their wingbeat sounds. Several works have proposed to extract features from the wingbeat sounds, such as fundamental frequency [11] and MFCC [12], for the classification. Other researchers have suggested to extract features after processing the mosquito sound in a way that mimics the sound processing in the mosquito antenna [13]. Some researchers have found that the fundamental frequency of wingbeat seems to be insufficient to differentiate between mosquito species [14]. Recently, researchers have turned to employ the popular deep learning models for image classification [1517] to extract features from the spectrogram representations of the wingbeat recordings for species and sex identification [1822]. The utilization of spectrograms with deep learning models is similar to the approaches to other bioacoustic classification problems such as classification of birds [2325], bats [26], elephants [27], fish [28] and insects [29]. However, recent work has suggested that use of spectrograms may overlook some details that are important for fine high-resolution discriminations [30, 31]. Training a model to discover and extract features from raw audio signals may overcome these issues. For example, Varma et al. [32] showed that a SincNet model [33] operating on raw audio outperformed a CNN operating on a Mel-spectrogram representation of the same data for the task of distinguishing between crickets and katydids. Another recent study has demonstrated the potential of a 1D-CNN model and a 1D-CNN with LSTM model for mosquito classification based on raw audio waveforms [34].

Since the acoustic methods are sensitive to the quality and type of the microphones, as well as background noise present under field conditions, several approaches have been proposed to address these issues. Wavelet transforms have been applied to transform from audio waveforms into spectrograms before training a CNN model for mosquito detection [21]. Later, this work was evaluated with the existing Humbug Zooniverse [35] dataset, which contains wingbeat recordings captured using smartphones in noisy field environments. Results showed promising performance with true positive and true negative rates of 89% and 97%, respectively, for the task of distinguishing between wingbeat sounds and noise [19]. Another study also proposed to use Mel-frequency spectrograms with a CNN model to identify mosquito species and evaluated the model using the recordings from the field-captured mosquitoes in cups with background noise [18]. The results showed an average classification accuracy of only 60%, which could be due to the class imbalance and the limited number of labeled examples. Apart from utilizing spectrograms, a recent study has developed a low-cost acoustic sensor to monitor and classify mosquitoes in a field environment [11]. The fundamental frequencies extracted using Fast Fourier Transform (FFT) and a simple rule-based model were used for counting mosquitoes. Band-pass filtering and smoothing functions in a scrolling window were used to alleviate the impact of background noises. Later, the same approach was evaluated in a different area with help from specialists in labeling the species and sex of mosquitoes in the traps [36]. They found that the fundamental frequency alone could not distinguish different species due to the overlapping frequencies and the effect of environmental noise.

Another research direction is to use an optical sensor to capture light fluctuations when a mosquito flies across the sensor and then synthesize pseudo wingbeat sounds. Such synthesized wingbeat sounds are unaffected by wind noise and ambient sounds, and their potential has been demonstrated for mosquito species and sex identification [14]. A recent study has demonstrated in a controlled environment that such optical sensors can achieve high accuracy in sex classification, but not in species and sex classification [37]. Another study has proposed to convert the synthesized wingbeats to spectrograms and use CNN models to identify mosquito species [38]. This optical sensing technology should be considered as complementary to acoustic sensors. While the optical sensors are unaffected by audio background noise, they are sensitive to optical background noise such as ambient light and airborne dust. The same techniques for classification using acoustic data can also be applied to the wingbeat sounds synthesized from optical sensors.

In order to train a model to be robust to environmental noise, a potential solution is to include the noise during the model training, which has shown promising results in bird classification [24]. However, the process of gathering mosquito wingbeat sounds with noises from the real-field environment is challenging. A more practical alternative is to collect the noises from the field and overlay them with wingbeat sounds collected in controlled environments. This approach also enables us to create variations of the background noises in the wingbeat sounds that can be used to train the classification model to be more robust to such noises (i.e., data augmentation) and that can be used to evaluate the models under varying noise conditions. In the fields of computer vision and audio classification, data augmentation has been shown to help reduce the overfitting problem when the amount of data is limited [39, 40].

In this paper, we propose a new deep learning model to identify the species and sex of mosquitoes from raw wingbeat sounds in such a way that it is robust to the environmental noise and the relative volume of the mosquito’s flight tone. The proposed method that we call MosquitoSong+ extends the existing MosquitoSong 1D-CNN model [34], by making adjustments to its architecture and introducing two data augmentation techniques during model training: noise augmentation and wingbeat volume variation. To evaluate our approach in a variety of settings, we carry out evaluation experiments using several datasets. Two datasets contain pure wingbeat sounds: the HumBugDB dataset [35] and indoor recordings collected in our previous work [34]. For the current study, we collected a third dataset in an outdoor urban setting so that it contains wingbeat sounds with background noise. We also use two noise datasets: noise recordings from HumBugDB and noise that we recorded in an outdoor urban environment. The noise datasets are used to simulate a noisy environment by overlaying them on the two pure wingbeat datasets for the purposes of augmentation and testing. Our main contributions are as follows:

  • We show that the MosquitoSong+ model outperforms the previous MosquitoSong model for species and sex classification on wingbeats with synthesized noise.
  • We demonstrate model generalizability by showing that MosquitoSong+ has very good performance (accuracy above 0.8) for species classification across a variety of datasets: indoor wingbeats with noise overlay, HumBugDB wingbeats with noise overlay, and outdoor recordings.
  • We show that MosquitoSong+ has excellent performance for for the harder problem of both species and sex classification on the indoor recordings (average accuracy 0.933) under a wide variety of simulated background noise, but that due to data limitations performance on outdoor recordings is not a good.
  • In addition, all experiments show robustness to the volume of the wingbeat sounds relative to the background noise.

Materials and methods

Datasets

Indoor wingbeat recordings (W-INDOOR).

The mosquito wingbeat sounds were previously collected from the laboratory of the Medical Entomology Department of the Faculty of Tropical Medicine at Mahidol University [34]. The research has been approved by the Institutional Review Board of the Faculty of Tropical Medicine, Mahidol University (FTM-ACUC 030/2020). The dataset consists of recordings of laboratory strains of four mosquito species: Aedes aegypti, Aedes albopictus, Anopheles dirus, and Culex quinquefasciatus from both males (M) and females (F). Each mosquito was individually put into a small cylindrical net cage (8 cm width and 12 cm height). A condenser (Studio Behringer ECM8000 measurement) and a low-cost (Primo EM172) microphone were used to record the wingbeat sounds at 24-bit depth and 96 kHz sampling rate. The raw recordings were processed by extracting only the periods containing wingbeat sounds. These wingbeat sounds were then split into the 300-ms epochs with 150-ms overlap for training and evaluating the classification model. Table 1 summarizes the total duration (in seconds) of the wingbeat recordings and the number of 300-ms epochs of the wingbeat sounds from each species and sex after the split.

thumbnail
Table 1. The total duration (in seconds) of the wingbeat recordings and the number of 300-ms epochs from each species and sex for the three datasets.

For the HumBug dataset, mosquitoes of some sexes are not available and the sexes of some species are not indicated.

https://doi.org/10.1371/journal.pone.0310121.t001

Outdoor wingbeat recordings (W-OUTDOOR).

In addition to the indoor recordings, we collected wingbeat sounds from an urban environment. This involved placing live adult mosquitoes inside a small netted cylinder (measuring 6 cm × 8 cm) equipped with a low-cost Primo EM172 microphone. The cylinder was positioned approximately 5 cm above a Biogents BG-counter 2 trap designed for outdoor mosquito monitoring. Our objective was to record the wingbeat sounds of mosquitoes in their natural surroundings while considering outdoor noise and the trap’s fan noise. We maintained the same audio configuration in collecting data for the four mosquito species listed above. To accommodate the mosquitoes’ diurnal and nocturnal activity patterns, we recorded them during specific time periods. For day-active Aedes mosquitoes, recordings were made during the daytime. In contrast, recordings were conducted from dusk to the following morning for the Anopheles and Culex species, which are active during the evening and night. All recordings were captured at a sampling rate of 96 kHz and a depth of 24 bits. Table 1 summarizes the data.

Environmental noise recordings (N-OUTDOOR).

In the same urban environment as the outdoor setting, we also recorded environmental noises, comprising vehicles, animals (cats and dogs), and human activity (watering with a garden hose, sweeping, and cutting grass) using a condenser microphone. Since we are interested in the possibility of using our classifiers in conjunction with mosquito traps, the classifiers need to be robust to the noise of the fan that is used in such traps. Thus, we also collected the fan noises from a CDC light trap model 512 (John W. Hock, Gainesville, FL) and a miniature light trap Model 2836BQ (BioQuip, Rancho Dominguez, CA).

HumBugDB dataset (W-HUMBUG).

To further evaluate the generalizability of the proposed model, we utilized the existing mosquito wingbeat and noise recordings from the HumBugDB public dataset [35]. As our indoor and outdoor recordings contain wingbeat recordings from four species, we only used the subset of the recordings from these species to facilitate the performance comparison across different datasets.

  • Aedes aegypti The recordings were gathered from wild Aedes aegypti mosquitoes sampled in Tanzania. These mosquitoes were collected and recorded in sample cups using the Telinga EM23 field microphone at a sampling rate of 44.1kHz and a 24-bit depth. According to the given metadata, only female Aedes aegypti appear in the total of 1322.4 seconds of recordings.
  • Aedes albopictus Laboratory cultures raised at the US Center for Diseases Control and Prevention were recorded using smartphones with 8 kHz sampling rate and 24-bit depth. We selected those recordings consisting of only a single mosquito, resulting in 33.2 seconds of wingbeat sounds. The sex labels of more than half of the audio signals were not available.
  • Anopheles dirus Wild mosquitoes sampled at a mosquito monitoring site in Thailand were brought to a laboratory and recorded using a setup similar to the one used to record the Aedes aegypti sounds. A total of 909.8 seconds of recordings are provided.
  • Culex quinquefasciatus Laboratory cultures at the University of Oxford, UK, were recorded using the same setup as for Aedes aegypti. Since there are no labels indicating the number of mosquitoes in a cup, we manually selected recordings with a single mosquito. In addition, the sex of the mosquitoes was not indicated.

Since sex annotations were not available for some species, this dataset can only be used to evaluate the species classification. The total duration (in seconds) and the total number of epochs after splitting are shown in Table 1.

Environmental noises (N-HUMBUG). In the recordings of mosquito wingbeats, there are segments labeled as background noise that do not overlap with the wingbeat sounds. These noises come from human activities, including human speech and tapping sounds caused by handling containers. We used these noise segments from two different recording setups—one conducted in Thailand using a Telinga EM23 microphone, and the other in the UK using a Telinga EM23 microphone and a phone. The total duration of the noise recordings used is summarized in Table 2.

thumbnail
Table 2. The total duration of the N-HUMBUG and N-OUTDOOR datasets from each category and recording device.

https://doi.org/10.1371/journal.pone.0310121.t002

Each audio file has a unique ID and date when it was recorded. Some of these files were parts of the same longer recording, meaning that they probably captured the wingbeat sound of the exact same individual mosquito. Therefore, we randomly selected only one segment from those longer recordings. This is to ensure we obtained a variety of 300-ms segments from various mosquitoes.

Data preprocessing.

Since the proposed approach is intended to be used with low-cost IoT devices with relatively low computing power, the whole dataset of both mosquito wingbeats and noise was downsampled before being used in the model training and evaluation. A recent study demonstrated that a deep learning model can achieve a classification performance on wingbeat recordings with a sampling rate of 8 kHz and 16-bit depth similar to that achieved on the same recordings with a sampling rate of 96 kHz with a 24-bit depth [41]. We, therefore, downsampled wingbeat sounds from all datasets as well as noise recordings to have a sampling rate of 8 kHz with 16-bit depth. So that we could use the entire bit depth without any clipping, the recordings were normalized using the maximum absolute value of the amplitude from all wingbeat recordings. The recording of each species from the HumBugDB dataset was divided by the maximum absolute amplitude of the species due to a variety of collection methods and sources. On the other hand, since the same recording setting was used for each of the entire indoor clean wingbeat and outdoor datasets, the normalization was done using the maximum absolute value from the entire dataset.

Noise overlay simulation

We simulated epochs of noisy wingbeat sounds, xnw, with the following equation: (1) where xw and xn are randomly selected epochs of the original wingbeat and the noise recordings, respectively. Both xw and xn are of the same size: . The value of s represents the duration of each epoch in seconds, and fs denotes the sampling rate in Hz, which depends on the model architecture that we will discuss later. The gain factor G is a parameter that is used to adjust the amplitude of mosquito sounds relative to the background noises. Essentially, this factor can be interpreted as varying the distance between the mosquitoes and the recording microphone. When G = 1, the original wingbeat recordings were used, without any adjustments. In case G is between 1 and 2, a fairly audible mosquito sound can be heard even in the presence of continuous background noise. However, the same mosquito sound becomes inaudible in the presence of obtrusive sounds, such as bird calls and car brakes.

Fig 1 shows an example of the spectrogram of the noise-overlaid wingbeat sound. The mosquito produces a relatively steady, harmonic sound (parallel lines in Fig 1a). The recorded watering noise (Fig 1b) is partly impulsive (vertical line at 6 seconds) and partly harmonic (parallel lines from 7.5 to 10 seconds). When merging the two (Fig 1c), both the mosquito and the watering can be seen in the spectrograms and heard in the audio.

thumbnail
Fig 1. Example spectrograms of the noise-overlaid wingbeat sounds.

The spectrograms of a 10-s recordings of (a) a female Aedes aegypti wingbeat, (b) a watering noise, and (c) a wingbeat merged (or overlaid) with watering noise. The frequency components of the wingbeat sounds (e.g., around 512 Hz) are still present in the spectrogram of the noise-overlaid sound.

https://doi.org/10.1371/journal.pone.0310121.g001

MosquitoSong+ model

Our proposed model, named MosquitoSong+, is an extension of the previously proposed MosquitoSong [34] deep learning model for mosquito species and sex classification from low-sample-rate raw audio signal without noise. The MosquitoSong+ model uses a modified architecture, as well as two data augmentation techniques to add variations of background noise and gain factors to the wingbeat sounds during the model training.

Model architecture.

In the MosquitoSong+ model, we replaced the initial three layers of the MosquitoSong model with three new 1-D convolutional layers, as illustrated in Fig 2. Our findings indicate that this stack of convolutional layers exhibits improved generalizability, primarily attributable to its learnable weights for downsampling, in contrast to the previous approach that relied on simple statistical summarization with the maximum value.

thumbnail
Fig 2. Comparison between the model architectures of the MosquitoSong (left) and MosquitoSong+ (right).

The value of num_class in the final layer of both models varies depending on the number of species and/or sex intended for training.

https://doi.org/10.1371/journal.pone.0310121.g002

The model receives an epoch of input sounds and determines the species and sex of a mosquito. There are eight classes in this study, corresponding to four mosquito species: Aedes aegypti, Aedes albopictus, Anopheles dirus, and Culex quinquefasciatus for both males and females. Each input passes through two blocks of 1-D convolution and a 1-D max-pooling layers, two fully-connected layers, and a final softmax layer to predict a probability value between 0 and 1 for each output class.

Formally, suppose there are N epochs of input sounds: {x1, …, xn}, where , s is the duration in seconds of each epoch, and fs is the sampling rate in Hz of the input sound. The model determines the species and sex for all epochs, resulting in N predicted classes , where is the predicted class of xi, and corresponding to each species and sex mentioned earlier. In this study, s and fs are 0.3 seconds (or 300 milliseconds) and 8000 Hz, resulting in an input size of 2400 values.

Model training with data augmentation.

Our technique trains the model end-to-end via minibatch gradient descent, equipped with data augmentation. Such data augmentation helps to produce new training examples from the original ones for every training epoch. The weighted cross-entropy loss is used to minimize the class imbalance problem. The model is trained for 1000 epochs with the Adam optimizer [42] using a learning rate of 0.0001. The best performing model based on the validation set is used.

The two data augmentation techniques used are noise augmentation and wingbeat volume variation:

  • Noise augmentation. An epoch of the environmental noise is randomly sampled from the pool of different types of noises, which is then added to the wingbeat sounds. This technique helps introduce new wingbeat sounds with different background noises, such that the model is better at learning the patterns of the wingbeat sounds. To prevent the model from overfitting to the environmental noises, the original pure wingbeat sounds are also included during the training with a probability of 10%. As a result, 90% of the training set in each training epoch will have background noise and the other 10% will not.
  • Wingbeat volume variation. A gain factor (i.e., G in Eq 1) randomly selected from the range of 1 to 2 is used to multiply the wingbeat sounds before the environmental noise addition. As the mosquitoes are not flying at the same distance from the microphones all the time, it is more realistic to vary the amplitude of the wingbeat sounds relative to the background noise. This technique helps generate realistic dynamics of the mosquitoes flying past the microphones.

These techniques are also used in conjunction with the data augmentation techniques used in the previous model [34], which are Gaussian noise addition, time shifting, and amplitude variation:

  • Gaussian noise addition. An epoch of Gaussian noise multiplied with a factor randomly sampled from a range of 0.001 to 0.01 is added to an input sound.
  • Time shifting. An input sound is randomly shifted along the time axis. The shifting amount is uniformly sampled from a range of ±10% of the 300-ms epoch.
  • Amplitude variation. An input sound is multiplied with a random amplitude factor of 1/4 to 4 to reduce or increase the volume with a range of −12 to 12 dB.

With these data augmentation techniques, we can train the model to be robust to the variation of background noise and the dynamics of the wingbeat sounds from the flying mosquitoes. Our code is publicly available at https://github.com/akaraspt/mosquitosongp.

Results

Performance metrics

Per-class precision (PR), per-class recall (RE), per-class F1-score (F1), macro-averaged F1-score (MF1), and overall accuracy (ACC) were used to evaluate the proposed approach. The per-class metrics were computed by considering one species and sex as a positive class, and all others combined as a negative class. The ACC and MF1 were computed as follows: (2) (3) where TPc is the true positives of class c, F1c is per-class F1-score of class c, C is the number of classes (i.e., the number of mosquito species and sex), and N is the total number of test examples. The non-parametric Mann-Whitney U Test (alpha value = 0.05, one-sided test) was also used to examine the statistical significance when we compared the performance metrics between two settings.

Experiment 1: Model improvement—Classification of species and sex under noise overlay simulation

Experimental setup.

The goal of this experiment is to evaluate the effectiveness of our extension to the previous MosquitoSong model [34]. The experimental setup is similar to the earlier work. Our proposed approach was evaluated using stratified 10-fold cross-validation under noise overlay simulation, utilizing the W-INDOOR and N-OUTDOOR datasets. In particular, the wingbeat sounds from each species and sex were split chronologically into 10 folds. For each fold, one part was used as the test set, and the remaining parts were used for the training and the validation sets. This process was repeated 10 times, yielding a total of 10 models that were trained and evaluated to get the predictions from all folds. As the proposed model would eventually be deployed with a low-cost microphone, only the wingbeat recordings from the low-cost microphone were used in the test set (i.e., no wingbeat recordings from the condenser microphone).

We made sure that there was no overlap between the training, validation, and test sets in each fold for both the wingbeat and environmental noises. Additionally, we maintained the original wingbeat sounds in the test set, such that there were wingbeats both with and without background noise. This allowed for testing whether the model was overfitted to the environmental noise. It is also realistic to assume that in the real environment, there would be periods of wingbeat without background noise.

To simulate situations in which mosquitoes fly close to or far away from the microphones, we also applied different gain factors (G): 1, 1.5 and 2 to the wingbeat sounds in the test set for both simulations. By applying three different G values to the test sets, the stratified 10-fold cross-validation was repeated three times. The predictions from all folds and the three G values (30 in all) were combined and used to compute the performance metrics, which are discussed in the next section.

Impact of environmental noise.

The previous MosquitoSong model [34] was tested with and without the presence of environmental noise (see Table 3). We observed a significant drop in all performance metrics in the presence of environmental noise. The overall ACC/MF1 reduced from 0.908/0.906 to 0.862/0.856 (p = 0.0003). This shows that environmental noise has a significant impact on the model performance, and should always be considered in the model evaluation.

thumbnail
Table 3. Comparison between MosquitoSong+ (proposed model) and MosquitoSong (previous model) in terms of overall accuracy (ACC), macro-averaging F1-score (MF1), and per-class F1-score.

These performance metrics were computed by combining the test sets from three stratified 10-fold cross-validations, each corresponding to the application of one of the three gain factors (G): 1, 1.5, and 2. The numbers in bold indicate the highest performance metrics of all methods (excluding the first row that was evaluated with the original recorded wingbeat sounds without noise).

https://doi.org/10.1371/journal.pone.0310121.t003

Performance gain from the proposed model.

As shown in Table 3, our MosquitoSong+ model achieved significantly better classification performance (ACC/MF1 = 0.918/0.911) than the MosquitoSong model (ACC/MF1 = 0.862/0.859), (p = 9e − 4). Additionally, per-class F1-scores indicate that the MosquitoSong+ model demonstrated better performance across all species and genders as compared to the MosquitoSong model. This strongly suggests that the proposed model architecture and data augmentation techniques are effective in reducing the impact of environmental noise on classification performance.

Upon careful analysis of the confusion matrix of the best method (Fig 3), we observed that most of the misclassifications occurred between Ae. aegypti and Cx. quinquefasciatus species. This can be attributed to the overlapping wingbeat frequency components between these species and sex [41], with the noise potentially adding further ambiguity to the distinction.

thumbnail
Fig 3. Confusion matrix of our MosquitoSong+ model from one cross-validation fold.

Most of the misclassifications were from between Ae. aegypti and Cx. quinquefasciatus species, which could be due to the overlap between the wingbeat frequency components among these species and sex, with the noise possibly further blurring the distinction. Note: Cx. quin refers to Cx. quinquefasciatus species.

https://doi.org/10.1371/journal.pone.0310121.g003

Impact of different gain factors.

As the results were from the combinations of the test sets from three different gain factors (G), we further investigate whether the classification performance is influenced by the volume of the wingbeat sounds. Table 4 shows the ACC, MF1, PR, RE and F1 for each of the gain factors (G): 1, 1.5 and 2. Our analysis indicates that the volume of the wingbeat sound in the presence of noise has only a minimal impact on classification performance. The overall ACC/F1 score is 0.911/0.902 when G = 1, 0.922/0.915 when G = 1.5, and 0.922/0.916 when G = 2.0. This observation can be attributed, in part, to our data augmentation techniques, which enhance the robustness to variations in the wingbeat volume.

thumbnail
Table 4. Species and sex classification performance of our MosquitoSong+ model when tested with different gain factors (G): 1, 1.5 and 2 across overall accuracy (ACC), per-class precision (PR), per-class recall (RE), per-class F1-score (F1), and macro-averaged of these metrics (i.e., the Total column).

The Total column represents either the total number of examples or the macro-averaged metrics from all species and sex.

https://doi.org/10.1371/journal.pone.0310121.t004

Experiment 2: Model generalizability—Classification of species across different datasets

Experimental setup.

To demonstrate the model’s generalizability across a variety of data characteristics, we conducted both training and testing on a combined wingbeat dataset composed of all three datasets: W-INDOOR, W-OUTDOOR, and W-HUMBUG. However, due to incomplete sex annotations in some species within W-HUMBUG, our evaluation and comparison of model performance is limited to mosquito species classification.

Prior to merging these datasets, we independently divided each of them into three distinct subsets: training, validation, and test sets. Subsequently, we combined the training portions from the datasets to train the model. To address imbalances within the training set, we applied the random over-sampling method separately to each class within each dataset before merging them. The noise overlay simulation was only applied to the W-INDOOR and W-HUMBUG datasets, not W-OUTDOOR, as they were not recorded with any background noise. The noise recordings utilized in this experiment were from the N-OUTDOOR and N-HUMBUG datasets.

During the training, we applied our data augmentation techniques to the W-INDOOR and W-HUMBUG datasets. In contrast, the W-OUTDOOR dataset underwent augmentation using only the techniques previously employed in our earlier work [34], which do not involve noise augmentation. The combined validation sets were used in selecting the best performing model during the training phase.

During testing, the noise simulation was only applied to the W-INDOOR and W-HUMBUG datasets. Similar to Experiment 1, we also kept the original wingbeat sounds in the test set of each dataset, such that there were both the wingbeat with and without background noise. It is realistic to assume that there would be periods of wingbeat with and without background noise. The test set from the W-OUTDOOR dataset was directly used for evaluation, as they already contained background noises from the real environment.

It is important to note that during the experiment, there was no overlap in mosquito and environmental noise recordings among the training, validation, and test sets. This was done to ensure that the evaluation of the model reflects a real-world scenario where the model is exposed to wingbeat recordings from different mosquitoes and background noises.

Species classification performance.

Our model performed well on all datasets with overall ACC and MF1 scores exceeding 0.80 and 0.79, respectively (see Table 5). Specifically, the overall ACC/MF1 score was 0.893/0.873 for W-INDOOR, 0.825/0.807 for W-HUMBUG, and 0.805/0.791 for W-OUTDOOR. Notably, the dataset that exhibited the highest performance was the W-INDOOR, while the lowest performance was observed in the W-OUTDOOR. The species with the highest F1-score consistently was Cx. quinquefasciatus, while the species with the lowest score varied across datasets. These variations in performance may be attributed to differences in recording hardware and environmental conditions among the datasets. Such disparities may have led to variations in wingbeat sound patterns, presenting challenges for the model in accurately identifying distinguishing features.

thumbnail
Table 5. Species classification performance of MosquitoSong+ evaluated on test sets from W-INDOOR, W-HUMBUG, and W-OUTDOOR datasets in terms of overall accuracy (ACC), macro-averaging F1-score (MF1), and per-class F1-score.

For the W-INDOOR and W-HUMBUG datasets, the performance metrics were evaluated on an independent test set with and without noise overlay simulation, each corresponding to the application of the three gain factors (G): 1, 1.5, and 2. The W-OUTDOOR dataset was tested directly without the simulation.

https://doi.org/10.1371/journal.pone.0310121.t005

After analyzing the confusion matrices of each dataset, it was found that the primary reason for misclassification is the confusion between the Aedes genus and other genera. On the W-INDOOR dataset, the misclassification was mostly due to An. dirus being misclassified as Ae. aegypti (Fig 4). On the W-HUMBUG dataset, the largest misclassification error was due to Ae. aegypti being misclassified as Ae. albopictus (Fig 5). On the W-OUTDOOR dataset, the main misclassification error was Cx. quinquefasciatus being misclassified as Ae. albopictus (Fig 6).

thumbnail
Fig 4. Confusion matrix of the species classification from W-INDOOR.

The confusion matrix indicates that the majority of the misclassifications were due to An. dirus being misclassified as Ae. aegypti. Note: Cx. quin refers to Cx. quinquefasciatus species.

https://doi.org/10.1371/journal.pone.0310121.g004

thumbnail
Fig 5. Confusion matrix of the species classification from W-HUMBUG.

The confusion matrix indicates that the majority of the misclassifications were due to An. aegypti being misclassified as Ae. albopictus. Note: Cx. quin refers to Cx. quinquefasciatus species.

https://doi.org/10.1371/journal.pone.0310121.g005

thumbnail
Fig 6. Confusion matrix of the species classification from W-OUTDOOR.

The confusion matrix indicates that the majority of the misclassifications were due to Cx. quin being misclassified as Ae. albopictus. Note: Cx. quin refers to Cx. quinquefasciatus species.

https://doi.org/10.1371/journal.pone.0310121.g006

Upon analyzing the model performance across all three different G values on various datasets, we observed that the amplitude of mosquito sounds relative to background noises had minimal impact on the model’s performance. This finding aligns with the results obtained in Experiment 1, indicating that our data augmentation techniques effectively mitigate variations in wingbeat amplitude within noisy environments.

Experiment 3: Model performance in practice—Classification of species and sex under simulated vs. real noisy environments

Experimental setup.

To investigate the MosquitoSong+ model’s ability to determine mosquito species and sex in real noisy environments further, we compared the classification performance between the W-INDOOR and W-OUTDOOR datasets. The experimental setup closely resembled that of Experiment 1. Specifically, the two datasets were combined before being used to evaluate the model using stratified 10-fold cross-validation. There was no overlapping between the training, validation, and test sets in each fold for both wingbeat and environment noises.

For the W-INDOOR dataset, we maintained the original wingbeat sounds in the test set, such that there were both the wingbeat sounds with and without background noise. Three gain factors (G): 1, 1.5, and 2 were applied, resulting in three separate 10-fold cross-validations. However, there was one difference between the simulation of W-INDOOR and Experiment 1. Both the N-OUTDOOR and N-HUMBUG datasets were employed to simulate noisy environments. This is because we would like the classifier to be robust to a wide range of environmental noise. For the W-OUTDOOR dataset, the recordings were used as-is, collected in real noisy environments, and the 10-fold cross-validation was conducted once.

Comparison of species and sex classification performance.

The results (see Table 6) indicate that the inclusion of N-HUMBUG and W-OUTDOOR datasets contributed to an enhancement in species and sex classification performance, measured in terms of ACC/MF1, on the W-INDOOR dataset compared to Experiment 1 (i.e., 0.933/0.928 vs. 0.918/0.911). This improvement is likely attributed to the model’s exposure to a broader range of wingbeat patterns in the presence of more diverse noises from both simulated and real environments. Similar to the findings in Experiments 1 and 2, the model performance was minimally impacted by the different gain factors, resulting in ACC/MF1 scores of 0.927/0.921, 0.935/0.930, and 0.936/0.933 when G was set to 1, 1.5, and 2, respectively.

thumbnail
Table 6. Comparison of Species and sex classification performance of MosquitoSong+ between the simulated (W-INDOOR) and the real (W-OUTDOOR) noisy environments in terms of overall accuracy (ACC), macro-averaging F1-score (MF1), and per-class F1-score.

For the W-INDOOR dataset, the performance metrics were evaluated on an independent test set with and without noise overlay simulation, each corresponding to the application of the three gain factors (G): 1, 1.5, and 2. The W-OUTDOOR dataset was tested directly without the simulation.

https://doi.org/10.1371/journal.pone.0310121.t006

Conversely, the performance of species and sex classification on the W-OUTDOOR dataset (i.e., ACC/MF1 = 0.673/0.611) did not match the species classification results in Experiment 2 (i.e., ACC/MF1 = 0.805/0.791). Among all the genera, Anopheles showed the best F1-score (i.e., F1-score between 0.73 and 0.86) while Aedes, especially the male, showed the worst F1-score (i.e., F1-score between 0.324 and 0.614). This decrease in performance may be attributed to three main factors. First, the species and sex classification is a more difficult task compared to the species classification. Second, the distribution of samples across species is imbalanced (see Table 1). Despite employing the data and noise augmentation during model training, the results suggest that the augmented data may not generate adequate variation in wingbeat patterns from a limited number of samples to accurately represent the diverse range of wingbeat patterns encountered in practice. Lastly, there was an imbalance in the number of male and female samples within the same species. For instance, the ratio between the number of epochs of males and females for Ae. aegypti and Ae. albopictus was approximately equal to or more than 50%. Although the model performed well in species classification for Cx. quinquefasciatus in Experiment 2, there was a significant drop in species and sex classification due to the absence of female species in the W-OUTDOOR dataset. These findings suggest that collecting more wingbeat examples from the target environments could substantially improve model performance.

Discussion

With the aim of achieving a classification model that is robust to a variety of background noise, this study has presented an approach that combines an architecture that improves upon the existing MosquitoSong model, along with data augmentation techniques that incorporate noise, as well as its volume relative to the wingbeat sounds. The results from Experiment 1 show that the MosquitoSong+ model achieves significantly better species and sex classification performance than the previous MosquitoSong model in the presence of noise. After evaluating the performance of the model across different datasets in Experiment 2, we observed that it is capable of classifying species with high accuracy even in noisy environments, whether simulated or real. These environments include various recording hardware and environmental conditions, which demonstrates the model’s versatility. In the final experiment, additional noise was added from both N-OUTDOOR and N-HUMBUG datasets to the simulated noisy environment. Surprisingly, the model performed even better than in Experiment 1. This could be due to the additional variation of wingbeat sounds from the real noisy environment. However, we noticed a performance gap between the simulated and actual noisy environments. Specifically, the model showed suboptimal performance in species and sex classification in real noisy conditions. This is likely due to the limited sample size and imbalanced class distribution within the W-OUTDOOR dataset.

Data augmentation in mosquito wingbeat classification

Even though data augmentation that introduces variation of the original signals [40] and background noises [24] has already demonstrated its potential in bird species identification in noisy environments, there remains the question of whether such data augmentation is effective in mosquito wingbeat classification. This study is the first to evaluate and demonstrate the effectiveness of data augmentation that introduces the presence of noise and the relative volume of the wingbeat sound in mosquito wingbeat classification based on raw audio signals. Also, as this method trained the model based on the raw audio waveform, another interesting technique that can be incorporated into the model is band-pass filtering [36, 43]. This can be used to filter out frequency components that are, for instance, irrelevant to the mosquito hearing [9] during a pre-processing step before the model training and prediction. The CNN model can then focus more on learning the filters that are useful for distinguishing species and sex, instead of the noise. It can also be used to filter out human voices to address privacy concerns.

Compared to existing works

In contrast to other work on bioacoustic systems to monitor the general mosquito density in the field by detecting periods of wingbeat sounds in audio streams [19, 35], our study focuses on the classification task (i.e., identifying species and sex). The HumBug Project [19], for instance, has developed a Bayesian CNN model (extended from [21]) to detect mosquito wingbeat sounds based on the Mel-frequency spectrogram representation of the audio recordings. They were able to distinguish mosquito wingbeat sounds from background noise with 89% accuracy in 7.1 hours of field recordings. However, the main difference between their study and ours is that they did not have noise mixed in with the wingbeat sounds; rather, the sounds were played consecutively. A 2D CNN model with a similar architecture was also developed for the species classification task on field recording [18]. They achieved an average classification accuracy of 60% across six species + no mosquito. In comparison, our study achieved an average classification accuracy of 82.5% across four species using the same dataset. To achieve the full functionality that is needed to monitor mosquito vector populations, future work should focus on integrating the best models for detection with those for classification or developing single end-to-end models.

Impact of wingbeat variations

The species classification performance of W-INDOOR presented in Experiment 2 was not as high as the species and sex classification performance in Experiment 1 and 3, even though it is an easier task. This decline in performance may well be due to the significant variations in wingbeat patterns in W-HUMBUG, which were introduced by discrepancies in recording configurations, hardware, environments, and mosquito origins. Despite the lower species classification performance of the W-INDOOR dataset, our model still maintained an accuracy (ACC) and macro F1 score (MF1) higher than 0.8 across all three wingbeat datasets, demonstrating its generalization capabilities to other datasets.

Noise cancellation via dual-microphone signal subtraction

We also investigated whether dual-microphone signal subtraction for noise cancellation could reduce the impact of noise on species and sex classification performance. This approach assumes that the source of the desired sound is closer to one microphone than to the other. The envisaged configuration is that one microphone will be located close to the entrance of a mosquito trap, with the second microphone 30 cm away from the trap. In this way the one close to the trap should pick up the wingbeat sound from a flying mosquito much louder than the other microphone. In contrast, the background noise coming from a farther distance will reach both microphones in approximately equal volume.

We compared the model performance using a setup similar to Experiment 1, between noise overlay simulations with and without noise cancellation. We found that noise cancellation had no effect on the performance of the MosquitoSong+ model. The ACC/MF1 were the same with and without noise cancellation: 0.918/0.911 vs. 0.918/0.911. This may be because the data augmentation techniques already provided a high level of robustness to background noise. However, we believe that noise cancellation may help reduce the impact of environmental noises not included in the model training data.

Limitation

Even though our results are encouraging, the proposed method is still subject to several limitations. First, the datasets used in the study did not control for the temperature, humidity, or age of the mosquitoes. However, research has shown that factors like mosquito age and environmental conditions, including temperature and humidity, have an impact on wingbeat frequencies [19]. Thus, it would be important to conduct further data collection and to add to the model environmental factors like temperature that can be sensed and used as input at inference time. Secondly, our data augmentation techniques rely on the availability of environmental noise datasets, with only two noise datasets utilized in this study. To address this limitation, deploying acoustic sensors in the field to record noises from the target location could facilitate the acquisition of additional noise data, which can be done at the beginning of the surveillance process. Subsequently, the model can be calibrated with noises specific to the target location. Third, while our data and noise augmentation techniques helped introduce variation of the mosquito wingbeat patterns with various background noises, they may not fully account for the actual variation in wingbeat patterns observed in practice. As the amount mosquito recordings from real noisy environments from different species and sex is limited, further studies are necessary to validate the model performance when more wingbeat examples become available.

In the future, we plan to evaluate the MosquitoSong+ model in a low-cost IoT device to evaluate its effectiveness in the real field environment. This includes the investigation of how to incorporate the proposed model into a pipeline for mosquito detection and classification from wingbeat sounds [41], how to calibrate or fine-tune the model after the initial data collection of the environmental noises at a target location, and how to fuse the features from acoustic and optical sensors for the classification. We also plan to study how to combine the mosquito counts from the IoT devices with the density maps of the potential vector breeding sites from geotagged images [44] to improve the estimation of vector populations. We aim to incorporate such estimates into predictive models to improve their accuracy, since current models typically rely on proxies to estimate mosquito vector populations [45].

Conclusion

We presented the MosquitoSong+ model to identify mosquito species and sex from wingbeat sounds in the presence of environmental noise. Our experimental results, which included noise overlay simulations, indicate that the new 1D-CNN architecture, along with two data augmentation techniques, significantly improved the model’s performance in species and sex classification. Furthermore, our results demonstrate the model’s generalizability, achieving accuracy above 80% in species classification across various datasets, including indoor wingbeats with noise overlay, HumBugDB wingbeats with noise overlay, and outdoor wingbeat recordings. Even though the initial results of the species and sex classification on the outdoor recordings were less accurate than the noise overlay simulation, this is likely due to the limited and imbalanced number of examples used. Therefore, further study is required to validate the model performance on a larger number of outdoor recordings. Future work should focus on integrating the detection and classification models to create a comprehensive mosquito monitoring system. Additionally, field evaluations are crucial to validate the effectiveness of MosquitoSong+ and to identify practical challenges that may arise in deploying the approach in real-world settings outside of controlled laboratory environments.

Acknowledgments

This work was partially supported by a grant from the Mahidol University Office of International Relations to Haddawy in support of the Mahidol-Bremen Medical Informatics Research Unit (MIRU), by a grant from the Hanse-Wissenschaftskolleg Institute for Advanced Study to Haddawy and a fellowship to Su Yin, and by a Young Researcher grant from Mahidol University to Su Yin. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

  1. 1. World Health Organization (WHO). A global brief on vector-borne diseases. World Health Organization; 2014.
  2. 2. Caminade C, McIntyre KM, Jones AE. Impact of recent and future climate change on vector-borne diseases. Annals of the New York Academy of Sciences. 2019;1436(1):157–173. pmid:30120891
  3. 3. Franklinos LH, Jones KE, Redding DW, Abubakar I. The effect of global change on mosquito-borne disease. The Lancet Infectious Diseases. 2019;19(9):e302–e312. pmid:31227327
  4. 4. Wilke AB, Vasquez C, Carvajal A, Moreno M, Petrie WD, Beier JC. Evaluation of the effectiveness of BG-Sentinel and CDC light traps in assessing the abundance, richness, and community composition of mosquitoes in rural and natural areas. Parasites and Vectors. 2022;15:1–9. pmid:35135589
  5. 5. Dormont L, Mulatier M, Carrasco D, Cohuet A. Mosquito attractants. Journal of Chemical Ecology. 2021;47(4):351–393. pmid:33725235
  6. 6. St Laurent B, Oy K, Miller B, Gasteiger EB, Lee E, Sovannaroth S, et al. Cow-baited tents are highly effective in sampling diverse Anopheles malaria vectors in Cambodia. Malaria journal. 2016;15(1):1–11. pmid:27577697
  7. 7. Mathenge EM, Misiani GO, Oulo DO, Irungu LW, Ndegwa PN, Smith TA, et al. Comparative performance of the Mbita trap, CDC light trap and the human landing catch in the sampling of Anopheles arabiensis, An. funestus and culicine species in a rice irrigation in western Kenya. Malaria journal. 2005;4(1):1–6.
  8. 8. Offenhauser WH, Kahn MC. The Sounds of Disease-Carrying Mosquitoes. The Journal of the Acoustical Society of America. 1949;21:259–263.
  9. 9. Gopfert MC, Briegel H, Robert D. Mosquito hearing: sound-induced antennal vibrations in male and female Aedes aegypti. Journal of Experimental Biology. 1999;202:2727–2738. pmid:10504309
  10. 10. Ziemer T, Koch J, Sa-Ngamuang C, Yin MS, Siai M, Berkhausen B, et al. A bio-inspired acoustic detector of mosquito sex and species. The Journal of the Acoustical Society of America. 2020;148:2480–2480.
  11. 11. Vasconcelos D, Nunes N, Ribeiro M, Prandi C, Rogers A. LOCOMOBIS: a low-cost acoustic-based sensing system to monitor and classify mosquitoes. In: 2019 16th IEEE Annual Consumer Communications & Networking Conference (CCNC). IEEE; 2019. p. 1–6. Available from: https://doi.org/10.1109/CCNC.2019.8651767.
  12. 12. Lukman A, Harjoko A, Yang CK. Classification MFCC feature from culex and aedes aegypti mosquitoes noise using support vector machine. Proceedings—2017 International Conference on Soft Computing, Intelligent System and Information Technology: Building Intelligence Through IOT and Big Data, ICSIIT 2017. 2017;2018-January:17–20. https://doi.org/10.1109/ICSIIT.2017.28
  13. 13. Ziemer T, Wetjen F, Herbst A. The Antenna Base Plays a Crucial Role in Mosquito Courtship Behavior. Frontiers in Tropical Diseases. 2022;3.
  14. 14. Chen Y, Why A, Batista G, Mafra-Neto A, Keogh E. Flying insect classification with inexpensive sensors. Journal of insect behavior. 2014;27(5):657–677. pmid:25350921
  15. 15. Simonyan K, Zisserman A. Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv preprint arXiv:14091556. 2014;.
  16. 16. He K, Zhang X, Ren S, Sun J. Deep Residual Learning for Image Recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR); 2016. p. 770–778.
  17. 17. Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z. Rethinking the Inception Architecture for Computer Vision. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR); 2016.
  18. 18. Li Y, Kiskin I, Sinka M, Zilli D, Chan H, Herreros-Moya E, et al. Fast mosquito acoustic detection with field cup recordings: an initial investigation. In: Proceedings of Detection and Classification of Acoustic Scenes and Events 2018 Workshop (DCASE2018). Tampere University of Technology; 2018. p. 153–157.
  19. 19. Sinka M, Zilli D, Li Y, Kiskin I, Kirkham D, Rafique W, et al. HumBug—An Acoustic Mosquito Monitoring Tool for use on budget smartphones. Methods in Ecology and Evolution. 2021;12:1848–1859.
  20. 20. Steinfath E, Palacios-Muñoz A, Rottschäfer JR, Yuezak D, Clemens J. Fast and accurate annotation of acoustic signals with deep neural networks. eLife. 2021;10. pmid:34723794
  21. 21. Kiskin I, Zilli D, Li Y, Sinka M, Willis K, Roberts S. Bioacoustic detection with wavelet-conditioned convolutional neural networks. Neural Computing and Applications. 2020;32:915–927.
  22. 22. Khalighifar A, Jiménez-García D, Campbell LP, Ahadji-Dabla KM, Aboagye-Antwi F, Ibarra-Juárez LA, et al. Application of Deep Learning to Community-Science-Based Mosquito Monitoring and Detection of Novel Species. Journal of medical entomology. 2022;59:355–362. pmid:34546359
  23. 23. Joly A, Goëau H, Glotin H, Spampinato C, Bonnet P, Vellinga WP, et al. Biodiversity information retrieval through large scale content-based identification: a long-term evaluation. In: Information Retrieval Evaluation in a Changing World. Springer; 2019. p. 389–413.
  24. 24. Stowell D, Petrusková T, Šálek M, Linhart P. Automatic acoustic identification of individuals in multiple species: improving identification across recording conditions. Journal of the Royal Society Interface. 2019;16. pmid:30966953
  25. 25. Gupta G, Kshirsagar M, Zhong M, Gholami S, Ferres JL. Comparing recurrent convolutional neural networks for large scale bird species classification. Scientific Reports 2021 11:1. 2021;11:1–12. pmid:34429468
  26. 26. Zualkernan I, Judas J, Mahbub T, Bhagwagar A, Chand P. A Tiny CNN Architecture for Identifying Bat Species from Echolocation Calls. 2020 IEEE / ITU International Conference on Artificial Intelligence for Good, AI4G 2020. 2020; p. 81–86. https://doi.org/10.1109/AI4G50087.2020.9311084
  27. 27. Bjorck J, Rappazzo BH, Chen D, Bernstein R, Wrege PH, Gomes CP. Automatic Detection and Compression for Passive Acoustic Monitoring of the African Forest Elephant. Proceedings of the AAAI Conference on Artificial Intelligence. 2019;33:476–484.
  28. 28. Waddell EE, Rasmussen JH, Ğirović A, Bolgan M, Iorio LD. Applying Artificial Intelligence Methods to Detect and Classify Fish Calls from the Northern Gulf of Mexico. Journal of Marine Science and Engineering 2021, Vol 9, Page 1128. 2021;9:1128. https://doi.org/10.3390/JMSE9101128
  29. 29. Hibino S, Suzuki C, Nishino T. Classification of singing insect sounds with convolutional neural network. Acoustical Science and Technology. 2021;42:E2152.
  30. 30. Stowell D. Computational bioacoustics with deep learning: a review and roadmap. PeerJ. 2022;10:e13152. pmid:35341043
  31. 31. Morfi V, Lachlan RF, Stowell D. Deep perceptual embeddings for unlabelled animal sound events. The Journal of the Acoustical Society of America. 2021;150:2. pmid:34340499
  32. 32. Varma ALS, Bateshwar V, Rathi A, Singh A. Acoustic Classification of Insects using Signal Processing and Deep Learning Approaches. In: 2021 8th International Conference on Signal Processing and Integrated Networks (SPIN). IEEE; 2021. p. 1048–1052.
  33. 33. Ravanelli M, Bengio Y. Speaker recognition from raw waveform with sincnet. In: 2018 IEEE Spoken Language Technology Workshop (SLT). IEEE; 2018. p. 1021–1028.
  34. 34. Yin MS, Haddawy P, Nirandmongkol B, Kongthaworn T, Chaisumritchoke C, Supratak A, et al. A Lightweight Deep Learning Approach to Mosquito Classification from Wingbeat Sounds. In: Proceedings of the Conference on Information Technology for Social Good. GoodIT’21. New York, NY, USA: Association for Computing Machinery; 2021. p. 37–42. Available from: https://doi.org/10.1145/3462203.3475908.
  35. 35. Kiskin I, Cobb AD, Wang L, Roberts S. Humbug Zooniverse: A Crowd-Sourced Acoustic Mosquito Dataset. ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing—Proceedings. 2020;2020-May:916–920. https://doi.org/10.1109/ICASSP40776.2020.9053141
  36. 36. Vasconcelos D, Yin MS, Wetjen F, Herbst A, Ziemer T, Förster A, et al. Counting Mosquitoes in the Wild: An Internet of Things Approach. In: Proceedings of the Conference on Information Technology for Social Good. GoodIT’21. New York, NY, USA: Association for Computing Machinery; 2021. p. 43–48. Available from: https://doi.org/10.1145/3462203.3475914.
  37. 37. Genoud AP, Basistyy R, Williams GM, Thomas BP. Optical remote sensing for monitoring flying mosquitoes, gender identification and discussion on species identification. Applied Physics B. 2018;124:46.
  38. 38. Fanioudakis E, Geismar M, Potamitis I. Mosquito wingbeat analysis and classification using deep learning. European Signal Processing Conference. 2018;2018-September:2410–2414. https://doi.org/10.23919/EUSIPCO.2018.8553542
  39. 39. Shorten C, Khoshgoftaar TM. A survey on Image Data Augmentation for Deep Learning. Journal of Big Data. 2019;6:1–48.
  40. 40. Nanni L, Maguolo G, Paci M. Data augmentation approaches for improving animal audio classification. Ecological Informatics. 2020;57:101084.
  41. 41. Yin MS, Haddawy P, Ziemer T, Wetjen F, Supratak A, Chiamsakul K, et al. A deep learning-based pipeline for mosquito detection and classification from wingbeat sounds. Multimedia Tools and Applications. 2022; p. 1–17. https://doi.org/10.1007/S11042-022-13367-0
  42. 42. Kingma D, Ba J. Adam: A method for stochastic optimization. arXiv preprint arXiv:14126980. 2014;.
  43. 43. Mehyadin AE, Abdulazeez AM, Hasan DA, Saeed JN. Birds Sound Classification Based on Machine Learning Algorithms. Asian Journal of Research in Computer Science. 2021;9:1–11.
  44. 44. Haddawy P, Wettayakorn P, Nonthaleerak B, Su Yin M, Wiratsudakul A, Schöning J, et al. Large scale detailed mapping of dengue vector breeding sites using street view images. PLoS neglected tropical diseases. 2019;13(7):e0007555. pmid:31356617
  45. 45. Haddawy P, Hasan AI, Kasantikul R, Lawpoolsri S, Sa-Angchai P, Kaewkungwal J, et al. Spatiotemporal Bayesian networks for malaria prediction. Artificial intelligence in medicine. 2018;84:127–138. pmid:29241658