Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

< Back to Article

Table 1.

Feature descriptions used in our study for the effects of the page language on the spam detection rate.

Note that the numbers are per page.

More »

Table 1 Expand

Fig 1.

Cumulative distribution function (CDF) for different features in both data sets.

More »

Fig 1 Expand

Table 2.

Results obtained after applying the feature selection algorithms to both data sets.

More »

Table 2 Expand

Table 3.

Performance of the decision tree classifier using different sets of features (where S = spam and NS = non-spam).

More »

Table 3 Expand

Table 4.

Confusion matrix obtained by the decision tree classifier using different sets of features (where S = spam, NS = non-spam).

More »

Table 4 Expand

Table 5.

Performance measurement indices.

More »

Table 5 Expand

Fig 2.

Percentages of the collected URLs in each Google Trends category.

More »

Fig 2 Expand

Fig 3.

Process flow employed for collecting and building our web spam corpus.

More »

Fig 3 Expand

Fig 4.

Distribution of the URL categories in the data set.

More »

Fig 4 Expand

Fig 5.

Distribution of positive and negative URLs for different manually labeled categories.

More »

Fig 5 Expand

Fig 6.

System sequence diagram.

More »

Fig 6 Expand

Table 6.

Descriptions of the detection features.

More »

Table 6 Expand

Fig 7.

Cumulative distribution function (CDF) for features 2, 5, 6, and 7 in the spam, borderline, and non-spam categories.

More »

Fig 7 Expand

Fig 8.

Probability density function (PDF) for features 2, 5, 6, and 7 in the spam, borderline, and non-spam categories.

More »

Fig 8 Expand

Fig 9.

Distributions of features 1, 3, and 4 in the spam, borderline, and non-spam categories.

More »

Fig 9 Expand

Fig 10.

Probability density functions Pn and Ps for different combinations of features, where n denotes non-spam pages (in red) and s denotes spam pages (in green).

More »

Fig 10 Expand

Table 7.

Parameters used in the decision tree, Bayesian network, support vector machine (SVM), and multilayer neural network methods (see Part II of the WEKA Manual for descriptions of the various algorithms used in our study [40]).

More »

Table 7 Expand

Table 8.

Classification accuracy for three classes.

More »

Table 8 Expand

Table 9.

Classification accuracy for two classes.

More »

Table 9 Expand

Table 10.

Confusion matrix for three-class classifiers.

More »

Table 10 Expand

Table 11.

Confusion matrix for two-class classifiers.

More »

Table 11 Expand