Predicting software reuse using machine learning techniques—A case study on open-source Java software systems

doi:10.1371/journal.pone.0314512

Fig 1.

Overall framework design.

More »

Expand

Fig 2.

Example of a code clone instance using Siamese [49].

More »

Expand

Table 1.

Top 10 popular Java-artefacts on Maven with their computed reuse value.

More »

Expand

Fig 3.

Code snippet from “Netty/Buffer” Maven artefact.

More »

Expand

Fig 4.

Code snippet from “Apache Dubbo” GitHub project.

More »

Expand

Table 2.

Overview of software metrics for software reuse evaluation.

More »

Expand

Fig 5.

Gamma distribution of reuse.

More »

Expand

Table 3.

Top 5 correlated features based on reuse.

More »

Expand

Table 4.

Classification results based on the feature selection method (F1-score).

More »

Expand

Fig 6.

RR performance when KBF, PCA and RFI feature selection methods and intensities are used respectively (from left to right).

More »

Expand

Fig 7.

Ensemble methods’ performances when PCA feature selection method and intensities are used.

Top left: RF. Top right: ADA. Bottom left: XG. Bottom right: GB.

More »

Expand

Table 5.

Regression results based on the feature selection method (R-squared).

More »

Expand

Table 6.

Regression results based on the feature selection method (RMSE).

More »

Expand

Table 7.

Features with the top importance score.

More »

Expand

Fig 8.

High reuse distribution of the file-level metric PUA aggregated by sum.

Left: Full distribution. Right: Distribution less than 1000.

More »

Expand

Fig 9.

High reuse distribution of the number of files in an artefact.

Left: Full distribution. Right: Distribution less than 1000.

More »

Expand

Fig 10.

High reuse distribution of the class-level metric NII aggregated by max.

Left: Full distribution. Right: Distribution less than 500.

More »

Expand

Fig 11.

High reuse distribution of the class-level metric NL aggregated by sum.

Left: Full distribution. Right: Distribution less than 1000.

More »

Expand

Fig 12.

High reuse distribution of the class-level metric CBO aggregated by standard deviation.

More »

Expand

Fig 13.

Software characteristics distribution based on important features.

More »

Expand