Identifying indicator species in ecological habitats using Deep Optimal Feature Learning | PLOS One

Advertisement

Browse Subject Areas

?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

< Back to Article

Fig 1 — Fig 1.

Overall data analysis workflow in block diagram form.
(Step 1): The collection of raw input data samples, as well as a corresponding set of labelled “ground-truth” targets. (Step 2): The pre-processing of raw input data into suitable structures for modelling, guided by any available domain or expertise knowledge. (Step 3): The training of several types of classification models (including Deep Learning), which maps inputs to their corresponding discrete class labels. (Step 4): The design of special objective function within a Deep Learning classification, which identifies a latent space with improved class separation. The most dominant latent features are then distinguished by the magnitude (ex. ℓ₂ norm) of the neural network weights.

More »

Fig 2 — Fig 2.

Printout of pandas dataframes containing raw data collected directly from ecological sites.
(Left) Dataframe containing raw input variables. (Right) Dataframe containing output class labels.

More »

Fig 3 — Fig 3.

Discovery of an optimally-separating latent feature space.
(Top Left) The high-dimensional and confounded raw inputs X makes class separation a challenging task.(Bottom Left) A case example including six microorganism species (A through F) which are entangled in the raw input space. (Top Right) A DL model which learns a latent space Z that optimally separates the classes. (Bottom Right) The disentanglement of the six species into distinct classes, which can be further aggregated into two major classes—Class 0 (A and B) and Class 1 (C through F).

More »

Fig 4 — Fig 4.

Species abundance counts after each pre-processing transformation.
(Top) Distribution of counts from all 21721 species; the density is extremely skewed towards the low end. (Middle) Distribution of only the top 50 species by sum. (Bottom) Distribution of the top 50 species after a log₁₀ transformation. Rarer but higher-abundance species are now recognizable.

More »

Fig 5 — Fig 5.

Histogram of maximum abundance counts.

More »

Fig 6 — Fig 6.

Visualization of a linear separating hyperplane and separating margins in a 2-class SVM model.

More »

Fig 7 — Fig 7.

Visualization of a classification problem with a non-linear separation boundary between the two classes.
(Left) The raw feature-space spanned by raw variables x₁ and x₂ renders linear separation impossible. (Right) A desired latent feature space which optimally separates the two classes. The goal of the DL model is to learn its coordinates, z₁ and z₂.

More »

Fig 8 — Fig 8.

A two-layer neural network with a hinge-loss-like objective function.

More »

Fig 9 — Fig 9.

Feature selection based on direction of optimal separation.

More »

Fig 10 — Fig 10.

Selection of relevant input variables, by reverse-engineering matrix multiplication.

More »

Fig 11 — Fig 11.

The overall data analysis workflow applied to the Mount Polley case study.

More »

Fig 12 — Fig 12.

The ANN neuron architecture used in the Mount Polley case study.

More »

Fig 13 — Fig 13.

The 4-layer ANN architecture used to classify the Fisher Iris dataset.

More »

Table 1 — Table 1.

Iris ANN classifier details.

More »

Table 2 — Table 2.

ANN training and testing accuracies on the Iris dataset.

More »

Table 3 — Table 3.

Comparison of separating margins between traditional and optimally-separating ANNs, for the Iris classification problem.

More »

Fig 14 — Fig 14.

Visualization of data separation within the first ANN hidden layer.
(Left) Inter-class separation distance in the traditional ANN, between Class 2 (red) samples and Class 1 (black) samples. (Right) Inter-class separation distance in the optimally-separating ANN.

More »

Table 4 — Table 4.

Feature-ranking of the Iris dataset, according to our proposed optimally-separating ANN.

More »

Table 5 — Table 5.

Testing-set accuracy comparison between traditional and optimally-separating ANNs.

More »

Fig 15 — Fig 15.

Inter-class separations in the traditional ANN.
Black samples belong to the undisturbed class. Red samples belong to the disturbed class. Only the first 4 out of the total 5000 runs are shown.

More »

Fig 16 — Fig 16.

Inter-class separations in the optimally-separating ANN.
The separations are noticeably larger than those in Fig 15. Black samples belong to the undisturbed class. Red samples belong to the disturbed class. Only the first 4 out of the total 5000 runs are shown.

More »

Table 6 — Table 6.

Comparison of separating margins between traditional and optimally-separating ANNs.

More »

Table 7 — Table 7.

Testing-set accuracy comparison between classifiers, using either all species or only indicator species as inputs.
Plus-minus value represents one standard deviation.

More »

Table 8 — Table 8.

Indicator species identified by our proposed feature extractor, compared to those identified by [34].
The Frequency column denotes the number of times each species has been identified as a top weight, divided over all 5000 experiments. The Garris column represents whether each species has been identified in the paper [34], and if so, which habitat it belongs to. The maximum and mean abundance counts of each species are also shown, along with the Class taxonomy level.

More »

Fig 17 — Fig 17.

Taxonomic comparison of indicator species at the domain level.
Blue bars represent the indicators identified by Garris et al [34]. Purple bars represent the indicators identified by our proposed feature extractor. The horizontal axis represents the percentage of indicators belonging to each species.

More »

Fig 18 — Fig 18.

Taxonomic comparison of indicator species at the phylum level.
Blue bars represent the indicators identified by Garris et al [34]. Purple bars represent the indicators identified by our proposed feature extractor. The horizontal axis represents the percentage of indicators belonging to each species.

More »

Fig 19 — Fig 19.

Taxonomic comparison of indicator species at the class level.
Blue bars represent the indicators identified by Garris et al [34]. Purple bars represent the indicators identified by our proposed feature extractor. The horizontal axis represents the percentage of indicators belonging to each species.

More »