Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

< Back to Article

Fig 1.

Overall data analysis workflow in block diagram form.

(Step 1): The collection of raw input data samples, as well as a corresponding set of labelled “ground-truth” targets. (Step 2): The pre-processing of raw input data into suitable structures for modelling, guided by any available domain or expertise knowledge. (Step 3): The training of several types of classification models (including Deep Learning), which maps inputs to their corresponding discrete class labels. (Step 4): The design of special objective function within a Deep Learning classification, which identifies a latent space with improved class separation. The most dominant latent features are then distinguished by the magnitude (ex. 2 norm) of the neural network weights.

More »

Fig 1 Expand

Fig 2.

Printout of pandas dataframes containing raw data collected directly from ecological sites.

(Left) Dataframe containing raw input variables. (Right) Dataframe containing output class labels.

More »

Fig 2 Expand

Fig 3.

Discovery of an optimally-separating latent feature space.

(Top Left) The high-dimensional and confounded raw inputs X makes class separation a challenging task.(Bottom Left) A case example including six microorganism species (A through F) which are entangled in the raw input space. (Top Right) A DL model which learns a latent space Z that optimally separates the classes. (Bottom Right) The disentanglement of the six species into distinct classes, which can be further aggregated into two major classes—Class 0 (A and B) and Class 1 (C through F).

More »

Fig 3 Expand

Fig 4.

Species abundance counts after each pre-processing transformation.

(Top) Distribution of counts from all 21721 species; the density is extremely skewed towards the low end. (Middle) Distribution of only the top 50 species by sum. (Bottom) Distribution of the top 50 species after a log10 transformation. Rarer but higher-abundance species are now recognizable.

More »

Fig 4 Expand

Fig 5.

Histogram of maximum abundance counts.

More »

Fig 5 Expand

Fig 6.

Visualization of a linear separating hyperplane and separating margins in a 2-class SVM model.

More »

Fig 6 Expand

Fig 7.

Visualization of a classification problem with a non-linear separation boundary between the two classes.

(Left) The raw feature-space spanned by raw variables x1 and x2 renders linear separation impossible. (Right) A desired latent feature space which optimally separates the two classes. The goal of the DL model is to learn its coordinates, z1 and z2.

More »

Fig 7 Expand

Fig 8.

A two-layer neural network with a hinge-loss-like objective function.

More »

Fig 8 Expand

Fig 9.

Feature selection based on direction of optimal separation.

More »

Fig 9 Expand

Fig 10.

Selection of relevant input variables, by reverse-engineering matrix multiplication.

More »

Fig 10 Expand

Fig 11.

The overall data analysis workflow applied to the Mount Polley case study.

More »

Fig 11 Expand

Fig 12.

The ANN neuron architecture used in the Mount Polley case study.

More »

Fig 12 Expand

Fig 13.

The 4-layer ANN architecture used to classify the Fisher Iris dataset.

More »

Fig 13 Expand

Table 1.

Iris ANN classifier details.

More »

Table 1 Expand

Table 2.

ANN training and testing accuracies on the Iris dataset.

More »

Table 2 Expand

Table 3.

Comparison of separating margins between traditional and optimally-separating ANNs, for the Iris classification problem.

More »

Table 3 Expand

Fig 14.

Visualization of data separation within the first ANN hidden layer.

(Left) Inter-class separation distance in the traditional ANN, between Class 2 (red) samples and Class 1 (black) samples. (Right) Inter-class separation distance in the optimally-separating ANN.

More »

Fig 14 Expand

Table 4.

Feature-ranking of the Iris dataset, according to our proposed optimally-separating ANN.

More »

Table 4 Expand

Table 5.

Testing-set accuracy comparison between traditional and optimally-separating ANNs.

More »

Table 5 Expand

Fig 15.

Inter-class separations in the traditional ANN.

Black samples belong to the undisturbed class. Red samples belong to the disturbed class. Only the first 4 out of the total 5000 runs are shown.

More »

Fig 15 Expand

Fig 16.

Inter-class separations in the optimally-separating ANN.

The separations are noticeably larger than those in Fig 15. Black samples belong to the undisturbed class. Red samples belong to the disturbed class. Only the first 4 out of the total 5000 runs are shown.

More »

Fig 16 Expand

Table 6.

Comparison of separating margins between traditional and optimally-separating ANNs.

More »

Table 6 Expand

Table 7.

Testing-set accuracy comparison between classifiers, using either all species or only indicator species as inputs.

Plus-minus value represents one standard deviation.

More »

Table 7 Expand

Table 8.

Indicator species identified by our proposed feature extractor, compared to those identified by [34].

The Frequency column denotes the number of times each species has been identified as a top weight, divided over all 5000 experiments. The Garris column represents whether each species has been identified in the paper [34], and if so, which habitat it belongs to. The maximum and mean abundance counts of each species are also shown, along with the Class taxonomy level.

More »

Table 8 Expand

Fig 17.

Taxonomic comparison of indicator species at the domain level.

Blue bars represent the indicators identified by Garris et al [34]. Purple bars represent the indicators identified by our proposed feature extractor. The horizontal axis represents the percentage of indicators belonging to each species.

More »

Fig 17 Expand

Fig 18.

Taxonomic comparison of indicator species at the phylum level.

Blue bars represent the indicators identified by Garris et al [34]. Purple bars represent the indicators identified by our proposed feature extractor. The horizontal axis represents the percentage of indicators belonging to each species.

More »

Fig 18 Expand

Fig 19.

Taxonomic comparison of indicator species at the class level.

Blue bars represent the indicators identified by Garris et al [34]. Purple bars represent the indicators identified by our proposed feature extractor. The horizontal axis represents the percentage of indicators belonging to each species.

More »

Fig 19 Expand