Fig 1.
Overall framework design.
Fig 2.
Example of a code clone instance using Siamese [49].
Table 1.
Top 10 popular Java-artefacts on Maven with their computed reuse value.
Fig 3.
Code snippet from “Netty/Buffer” Maven artefact.
Fig 4.
Code snippet from “Apache Dubbo” GitHub project.
Table 2.
Overview of software metrics for software reuse evaluation.
Fig 5.
Gamma distribution of reuse.
Table 3.
Top 5 correlated features based on reuse.
Table 4.
Classification results based on the feature selection method (F1-score).
Fig 6.
RR performance when KBF, PCA and RFI feature selection methods and intensities are used respectively (from left to right).
Fig 7.
Ensemble methods’ performances when PCA feature selection method and intensities are used.
Top left: RF. Top right: ADA. Bottom left: XG. Bottom right: GB.
Table 5.
Regression results based on the feature selection method (R-squared).
Table 6.
Regression results based on the feature selection method (RMSE).
Table 7.
Features with the top importance score.
Fig 8.
High reuse distribution of the file-level metric PUA aggregated by sum.
Left: Full distribution. Right: Distribution less than 1000.
Fig 9.
High reuse distribution of the number of files in an artefact.
Left: Full distribution. Right: Distribution less than 1000.
Fig 10.
High reuse distribution of the class-level metric NII aggregated by max.
Left: Full distribution. Right: Distribution less than 500.
Fig 11.
High reuse distribution of the class-level metric NL aggregated by sum.
Left: Full distribution. Right: Distribution less than 1000.
Fig 12.
High reuse distribution of the class-level metric CBO aggregated by standard deviation.
Fig 13.
Software characteristics distribution based on important features.