Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

< Back to Article

Fig 1.

Summary of the final datasets that will be used throughout the study.

More »

Fig 1 Expand

Fig 2.

Overview of the methodology designed to detect significant patterns within NSCLC drug target proteins, aimed at identifying potential opportunities for drug repositioning.

More »

Fig 2 Expand

Fig 3.

The algorithm is designed to identify patterns in the treatment dataset.

This process involves two key functions. First, from lines 1 to 8, the algorithm identifies amino acids that meet the occurrence threshold, starting with patterns of length 1. It then iteratively extends these potential patterns by adding amino acids to previously identified patterns of maximum length. From lines 9 to 21, the algorithm uses the previously found maximum-length patterns to identify those that continue to meet the occurrence criteria and eliminates those that are subpatterns of newly extended patterns (lines 19-21).

More »

Fig 3 Expand

Fig 4.

The procedure involves verifying whether patterns from NSCLC treatment sequences are present in sequences from various disease datasets, which include lung cancer, and four other cancers (breast, colon, pancreas, and head and neck).

More »

Fig 4 Expand

Fig 5.

An overview of the procedure for generating the 3D structure of the proteins and identifying common patterns using ChimeraX.

More »

Fig 5 Expand

Fig 6.

Methodology used to calculate the similarity between the sequences of the proteins under study, with due consideration of the various metrics.

More »

Fig 6 Expand

Fig 7.

The pseudocode outlines the proposed method for scoring the similarity between two protein sequences.

The Needleman-Wunsch (NW) alignment algorithm was adapted to allow custom values for each character pair in the alphabet, stored in a matrix-like structure. The process begins by setting up the scores for all amino acid pairs between lines 2 and 11, where values are assigned according to specified rules (pair_scores, line 5). Additionally, values for matches, mismatches, and gaps are also defined. Lines 12 to 18 describe the initialization of the score matrix. Initially, all rows and columns are set to 0, followed by the assignment of gap penalties to column 0 and row 0. The scoring algorithm is then implemented starting at line 19. This method uses a dynamic programming approach to compute the optimal alignment score between the two sequences. During the iteration, the maximum value is selected from among match, delete, and insert operations (lines 21, 22, and 23). The final output is a score that quantifies the similarity between the two input sequences (line 25).

More »

Fig 7 Expand

Fig 8.

Sankey plot showing the patterns identified in the amino acid sequences of target proteins for NSCLC drugs, with both 10% and 5% occurrence rates displayed.

More »

Fig 8 Expand

Fig 9.

Analysis of significant NSCLC treatment patterns with 10% and 5% occurrence in the other selected cancer types.

More »

Fig 9 Expand

Fig 10.

Superposition of protein structures was performed based on the detected pattern (CEGCKGFF’, in red) with a 10% occurrence threshold in the lung cancer dataset.

The reference proteins RARB (P10826, in brown) and VDR (P11473, in blue) were identified in the cancer dataset. For comparison, the proteins from the treatment dataset, NR1I2 (O75469, in pink) and PPARG (P37231, in green), are also highlighted.

More »

Fig 10 Expand

Table 1.

Summary of the protein treatments and drug targets from the disease lung datasets. This includes the following details: Protein ID treatment (the unique identifier for each protein), Protein name (the name or description of the protein), Drug treatment (the name of the drug administrated), Drug action (a description of the drug’s effect on the protein), Protein ID new (the name of the newly identified protein), and Protein name and Cancer types (specifically, lung cancer).

More »

Table 1 Expand

Fig 11.

Superposition of protein structures. a) Protein structures were aligned based on the detected pattern (CEGCKGFF’, in red) using a threshold of 10% occurrence in the lung cancer treatment dataset.

In a different cancer dataset, a similar recurring pattern was found in the proteins THB (P10828, in brown), VDR (P11473, in blue), NR1D1 (P20393, in purple), and RORA (P35398, in green). This same pattern was also noted in the protein treatment dataset, which included NR1I2 (O75469, in orange), RARA (P10276, in yellow), and PPARG (P37231, in pink). b) Superposition of protein structures based on the detected ‘QKCL’ pattern (shown in red), with a 10% occurrence threshold in the lung cancer treatment dataset. The reference protein, VEGFC (P49767, in brown), associated with breast cancer, is compared to proteins from the treatment dataset: RXRA (P19793, in blue), PPARG (P37231, in pink), and SMO (Q99835, in green). c) The reference protein, NR4A3 (Q92570, brown), associated with breast cancer, is compared to proteins from the treatment dataset: RXRA (P19793, in blue) and PPARG (P37231, in pink). d) Protein structures were aligned based on the detected pattern (AALET’, in red) with a 5% occurrence threshold from the lung cancer treatment dataset. The reference protein, AL1A3 (P47895, in brown), is linked to breast cancer. For comparison, proteins from the treatment dataset including ALDH2 (P05091, in blue), are highlighted.

More »

Fig 11 Expand

Table 2.

Overview of the protein treatment and drug targets derived from the cancer datasets.

More »

Table 2 Expand

Table 3.

Calculation of the RMSD by comparing the reference structure with the target structures.

More »

Table 3 Expand

Fig 12.

Boxplot showing the distribution of protein similarity across datasets using the different similarity metrics.

The results of the statistical test for both considered scenarios are also indicated. P-value annotation legend: *: 1.00 × 10-2 < p < = 5.00 × 10-2, **: 1.00 × 10-3 < p < = 1.00 × 10-2 and ***: 1.00 × 10-4 < p < = 1.00 × 10-3.

More »

Fig 12 Expand

Table 4.

Mean Protein Similarity and Standard Deviation Across Datasets Using Four Metrics for each dataset.

More »

Table 4 Expand

Fig 13.

The number of unique triplets identified in each case was analyzed.

These triplets represent the binding of a new protein to a potential drug, characterized by sharing at least one pattern with the original protein.

More »

Fig 13 Expand

Fig 14.

The method developed in this research identified a potentially successful case of drug repurposing, revealing that these two proteins share at least one common pattern despite exhibiting low similarity.

More »

Fig 14 Expand