Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

< Back to Article

Fig 1.

Internal structure and organization of LTR-retrotransposons in plants for: (A) Ty1/copia superfamily and (B) Ty3/gypsy superfamily. Depending on the position of the integrase (INT) domain, the element can be classify to Ty1/copia or Ty3/gypsy superfamily.

More »

Fig 1 Expand

Table 1.

Tools and approaches that used ML or DL approaches to analyze TEs.

TIR-Learner uses neural network, k-nearest neighbors, random forest, and Adaboost for the ensemble method, while ClassifyTE uses k-nearest neighbors, extra trees, random forest, support vector machine, AdaBoost, logistic regression, Gradient Boosting Classifiers and XGBoost Classifier for the stacking method. Abbreviations: RFSB: Random forest selective binary classifier, C: Classification, D: detection, A: annotation, CL: curation of TE libraries, NI: novel insertions, TU: TransposonUltimate.

More »

Table 1 Expand

Fig 2.

Performance of YORO in detecting internal domains of LTR retrotransposons using the Genomic Object Detection approach.

(A) Loss function during model training. Parameters used: Adam algorithm, learning rate of 0.001, batch size of 128, number of epochs 100, no droputs, data split: training (80%), validation (10%), testing (10%). (B) Precision-Recall curve with TP (True Positive), TN (True Negative), FP (False Positive) and FN (False Negative) defined on a nucleotide basis. Only domain detection is considered, regardless of its classification. (C) Parity plot for the positions of the beginning of the domains. (D) Visualization of the domains in the 50,000 bp window (X-axis). The upper part corresponds to the predictions done by YORO. The lower part corresponds to the actual label. AP: Asparic Protease (black); GAG: Capside protein (red), ENV: Enveloppe (green), INT: Integrase, RT: Reverse Transcriptase (blue), RNAseH: Ribonuclease H (light blue).

More »

Fig 2 Expand

Table 2.

COCO mAP@0.5 calculations for domain classification.

More »

Table 2 Expand

Table 3.

COCO mAP@0.5 calculations for lineage classification of the domains.

More »

Table 3 Expand

Fig 3.

YORO’s performance in detecting LTR-retrotransposon’s internal part grouped by at least three domains with a maximum distance of 3, 000 bp, according the Genomic Object Detection approach.

(A) Precision-Recall curve with TP (True Positive), TN (True Negative), FP (False Positive) and FN (False Negative) defined on a nucleotide basis. Clusters with a minimum of three domains and a maximum separation of 3, 000 bp. (B) Parity plot for the positions of the beginning of the clusters with a minimum of three domains and a maximum separation of 3, 000 bp. (C) Visualization of the clusters with a minimum of three domains and a maximum separation of 3, 000 bp, in the 50, 000-bp window. The top section corresponds to the predictions by YORO. The bottom section corresponds to the real label. There is a false negative. (D) Visualization of the clusters with a minimum of three domains and a maximum separation of 3, 000 bp, in the 50, 000-bp window. The upper section corresponds to the predictions by YORO. The lower section corresponds to the real label.

More »

Fig 3 Expand

Table 4.

Precision and recall for YORO predictions on the Oryza Sativa ssp. indica genome versus its publicly available annotation [48].

GT: Ground Truth. P: process applied.

More »

Table 4 Expand

Fig 4.

Relationship between speed-up and assembly variables analyzed by YORO and BLAST: (A) speed-up vs genomes size (sample: 17 plant genomes), and (B) speed-up and normalized N50 value.

More »

Fig 4 Expand

Fig 5.

Nucleotide distances between domains reported by [12] for: (A) Ty3/gypsy and (B) Ty1/copia, and predicted by the Genomic Object Detection approach (throught YORO) for: (C) Ty3/gypsy and (D) Ty1/copia.

More »

Fig 5 Expand

Table 5.

Average distance between domains for the Ty1/copia superfamily as observed in the analysis of 300 plant genomes [12] and in the YORO prediction.

More »

Table 5 Expand

Table 6.

Average distance between domains for the Ty3/gypsy superfamily as observed in the analysis of 300 plant genomes [12] and in the YORO prediction.

More »

Table 6 Expand

Fig 6.

Structure of the labels encoding the domain information of the LTR-RT domains.

The initial nucleotide sequence (upper section) is divided into 100 bp sections each one with 22 labels (lower section). Those labels represent the following: (1) Detection and structure information: Presence of a domain, starting position of the domain in the 100 bp section and length; (2) Domain classification: to which domain it is related (in one-hot coding, e.g. GAG = 1,0,0,0,0,0); (3) Lineage classification: to which lineage the domain is related (in one-hot coding, e.g. Tork = 1,0,0,0,0,0,0,0,0,0,0,0,0). Both the domain start position and the length are normalized to be values between 0 and 1. Following this approach, a neural network can learn how to do three different task at once (domain detection, domain classification, and lineage classification).

More »

Fig 6 Expand

Fig 7.

Neural network architecture of YORO.

More »

Fig 7 Expand