A novel code representation for detecting Java code clones using high-level and abstract compiled code representations

doi:10.1371/journal.pone.0302333

Fig 1.

Example of AST extracted from f() function.

More »

Expand

Fig 2.

Two functions for summation using integer and float data types.

More »

Expand

Fig 3.

Two functions for summation using integer and float data types.

More »

Expand

Fig 4.

Baf IR for the Integer_Sum() and Float_Sum() functions.

More »

Expand

Fig 5.

Code fragment validate() and its optimized Jimple IR.

More »

Expand

Fig 6.

Code fragments with different loop statements and their Jimple IR.

More »

Expand

Fig 7.

Jimple block program dependence graph example.

More »

Expand

Table 1.

Code representation techniques used in the literature.

More »

Expand

Fig 8.

The proposed work’s architecture workflow.

More »

Expand

Fig 9.

The process of extracting syntactic and semantic features.

More »

Expand

Table 2.

Selected AST non-terminal nodes.

More »

Expand

Table 3.

Selected Baf instructions.

More »

Expand

Table 4.

Selected Jimple and Block PDG features.

More »

Expand

Table 5.

Details of dataset information.

More »

Expand

Fig 10.

Performance comparison of fifteen classifiers utilizing various feature combinations on the BigCloneBench dataset.

More »

Expand

Fig 11.

Performance of top 5 classifiers with linearly combined features on four BigCloneBench datasets.

More »

Expand

Fig 12.

Performance of top 5 classifiers with multiplicative combined features on four BigCloneBench datasets.

More »

Expand

Fig 13.

Performance of top 5 classifiers with distance combined features on four BigCloneBench datasets.

More »

Expand

Fig 14.

Performance of XgBoost, LightGBM, Random Forest, and Rotation Forest classifiers with different feature sizes using distance combination approach on the BigCloneBench dataset.

More »

Expand

Fig 15.

Performance of XgBoost, LightGBM, Random Forest, and Rotation Forest classifiers with different feature sizes using multiplicative combination approach on the BigCloneBench dataset.

More »

Expand

Fig 16.

Performance of XgBoost, LightGBM, Random Forest, and Rotation Forest classifiers with different feature sizes using linear combination approach on the BigCloneBench dataset.

More »

Expand

Table 6.

Results of detected semantic clones using the proposed technique.

More »

Expand

Fig 17.

Recall values when comparing the proposed method against selected methods with respected to different clone types.

More »

Expand

Fig 18.

F1-Score values when comparing the proposed method against selected methods with respected to different clone types.

More »

Expand