Fig 1.
Scheme of treatment of incomplete sequencing samples.
All possible combinations were listed from incomplete sequencing samples, and their information was weighted by the conditional probability of occurrence. The conditional probability of occurrence of each amino acid at each position was determined based on amino acid sequences of complete sequencing samples.
Fig 2.
Normalized mutual information on co-occurrence of amino acids between any two positions.
(A) Gray-scale image matrix of normalized mutual information (NMI). (b) Statistical significance of NMI determined by permutation test at P<0.05 (black).
Table 1.
Numbers of sequencing samples listed in the database for each HIV-1 protease inhibitor.
Fig 3.
Selected grid points of steric (left) and electrostatic (right) molecular field parameters in the analysis of drug resistance for each HIV-1 protease inhibitor.
The grid points were selected by preprocessing and SVR feature selection.
Table 2.
Weighted determination coefficients for prediction in external test dataset (R2).
Table 3.
Goodness of classification by the LGBM modela.
Fig 4.
Contour map of steric effects in drug resistance acquisition.
Contours were generated based on PLS standardized partial regression coefficients. Yellow and green contours indicate 1st and 99th percentiles of standardized partial regression coefficients, respectively. Steric interaction of the protease with yellow regions negatively affects drug resistance acquisition, whereas green regions show a positive effect. Dimerized wild-type HIV-1 protease (gray ribbon) and lopinavir (wireframe) are shown in the same figure. Pink indicates amino acids involved in drug resistance.