A novel machine learning model for perimeter intrusion detection using intrusion image dataset

Shahneela Pitafi; Toni Anwar; I Dewa Made Widia; Zubair Sharif; Boonsit Yimwadsana

doi:10.1371/journal.pone.0313890

Abstract

Perimeter Intrusion Detection Systems (PIDS) are crucial for protecting any physical locations by detecting and responding to intrusions around its perimeter. Despite the availability of several PIDS, challenges remain in detection accuracy and precise activity classification. To address these challenges, a new machine learning model is developed. This model utilizes the pre-trained InceptionV3 for feature extraction on PID intrusion image dataset, followed by t-SNE for dimensionality reduction and subsequent clustering. When handling high-dimensional data, the existing Density-Based Spatial Clustering of Applications with Noise (DBSCAN) algorithm faces efficiency issues due to its complexity and varying densities. To overcome these limitations, this research enhances the traditional DBSCAN algorithm. In the enhanced DBSCAN, distances between minimal points are determined using an estimation for the epsilon values with the Manhattan distance formula. The effectiveness of the proposed model is evaluated by comparing it to state-of-the-art techniques found in the literature. The analysis reveals that the proposed model achieved a silhouette score of 0.86, while comparative techniques failed to produce similar results. This research contributes to societal security by improving location perimeter protection, and future researchers can utilize the developed model for human activity recognition from image datasets.

Citation: Pitafi S, Anwar T, Widia IDM, Sharif Z, Yimwadsana B (2024) A novel machine learning model for perimeter intrusion detection using intrusion image dataset. PLoS ONE 19(12): e0313890. https://doi.org/10.1371/journal.pone.0313890

Editor: Agbotiname Lucky Imoize, University of Lagos Faculty of Engineering, NIGERIA

Received: July 29, 2024; Accepted: November 3, 2024; Published: December 19, 2024

Copyright: © 2024 Pitafi et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: All relevant data are within the manuscript.

Funding: YUTP cost center of 015LC0-350.

Competing interests: The authors have declared that no competing interests exist.

1 Introduction

The Internet of Things (IoT) has infused everyday life, contributing to social sensing and information collection in a variety of fields such as smart transportation, intelligent cities, healthcare, and smart homes. Numerous smart-sensor nodes are installed inside specified domains to manage the sensing, processing, and transmission of diverse data kinds, including environmental characteristics, traffic density, human activity, and health data [1]. Fig 1 depicts a high-level overview of IoT applications. In addition, IoT is used in intrusion detection systems such as perimeter intrusion detection, network intrusion detection, and cloud intrusion detection [2].

Download:

Fig 1. Illustrates the applications of IoT.

https://doi.org/10.1371/journal.pone.0313890.g001

PIDS are essential for protecting vital organizations by preventing illegal entry and hypothetical protection breaches. These methods are specifically engineered to identify and address intrusions occurring along the perimeter of a facility, acting as the initial line of defense against external threats. Nevertheless, with the evolving complexity and diversity of security threats, conventional PIDS approaches frequently encounter challenges, leading to issues such as false alarms, overlooked detections, and compromised security [3, 4]. The progress of vision-based technologies has seen substantial advancements throughout the previous two centuries [5].

PID is a task aimed at identifying the presence of an unauthorized object within a secure outdoor area during specific time intervals [6–8]. To maintain the security of an outdoor space, cameras continuously record footage in this environment. The distinction of being an outdoor setting is essential due to challenges such as changing weather conditions, variations in light, and the presence of insects and animals, unlike indoor environments [9, 10]. The user defines the protected zone at the location, potential intrusion elements, and the specific times when the system should offer protection, such as exclusively during nighttime.

The widespread deployment of cameras in critical settings such as banks, grocery stores, and prominent sidewalks has significantly contributed to the advancement and assessment of these systems. Among the noteworthy and relevant applications for intelligent vision systems is visual monitoring [11]. This encompasses various initiatives, including object identification, object monitoring, and the identification of unusual behaviors, all achievable through visual inspection of a scene [12].

The duty of intrusion detection is inextricably linked to other aspects of surveillance, and numerous modern solutions target one of these supplemental duties inside the monitoring system. However, only a few solutions address the topic of PIDS in depth [13, 14]. Given that neglecting intrusions is regarded as a serious failure for PIDS [14], current techniques prioritize boosting detection rates even at the risk of some false alarms.

To determine the success of the suggested technique, extensive testing, and comparative analysis were conducted. The conclusions of this investigation are presented, emphasizing the good productivity of the machine learning-driven technique the results of proposed technique and comparative techniques are evaluated by using Adjusted Rand Index (ARI), Fowlkes-Mallows Index (FMI) and silhouette score these are the evaluation metrics for the clustering approaches the details are presented in methodology section. After that k-NN classification algorithm is applied to clustered data and labeled images to classify the various intrusion activities. Finally, the approach’s capacity to identify and distinguish intrusions is demonstrated by evaluation measures such as classification accuracy, precision, recall, and F1 score. As a result, this adds to a betterment in major infrastructure security. Moreover, this study considers only six types of intrusion activities, with an additional class considered as non-intrusion, resulting in a total of seven classes.

The remainder of the document is structured as follows: A selection of the well-known and closely related work is covered in Section 2. In Section 3, the suggested novel PIDS Model is presented in all its depth and with all the essential explanations. In Section 4, the experimental results are discussed, and the offered work is verified. In section 5 we concluded our research in Section 5.

2 Related work

A study conducted in [15] exclusively delved into advanced signal processing, proposing an algorithm for event categorization. This algorithm analyzed both static and dynamic signals related to vibrations in a perimeter security system. Utilizing wavelet packet decomposition and constructing a multiclass classification tree with support vector machines, it achieved a commendable 94.6 percent recognition rate for vibration signals across nine distinct events.

In a related study [8], authors demonstrated the usage of autoencoders. However, no approach was provided for selecting the threshold. To ensure IoT network safety, Intrusion Detection Systems (IDSs) play a crucial role in identifying intruders. However, deploying complex IDSs in IoT devices is often impractical due to resource constraints. IDSs encompass signature-based and anomaly-based types, with the former recognizing threats based on predefined patterns, and the latter learning typical system behavior to detect anomalies.

Martin Ester and colleagues proposed Traditional Density-Based Spatial Clustering of Applications with Noise (T-DBSCAN) [22], dependent on neighborhood parameters. Recent enhancements focused on clustering analysis, as seen in [23], where a modified DBSCAN was developed. Additionally, authors in [24, 25] proposed a modified DBSCAN method for gene expression data, and researchers in [26] suggested an adaptive DBSCAN method for constellation reconstruction and modulation detection. A binary differential evolution method was proposed by [27] as a means of improving DBScan parameters (Eps) and figuring out the Eps interval. To tackle the problem of improving two crucial DBScan parameters, The Authors of [28] proposed a unique method based on combining Particle Swarm Optimization (PSO) with DBScan. A hierarchical clustering technique was presented by [18] to provide a range of values and improve the Eps parameter. To speed up the procedure overall, researchers in [29] developed a novel model for the automated selection of Eps and MinPts using the K-distance graph approach. In addition, The OPTICS approach was presented by [30] and is intended to locate clusters with different densities by keeping track of the reachability and core distances of every site. If a point has at least MinPts points in its Eps-neighborhood, it is said to be core; its core distance tells us how far away it is from the point that is closest to it. The bigger the distance between two locations or the core distance of point p yields the reachability distance. The research did find a drawback, though, in automatically and efficiently obtaining both intrinsic and conventional clustering information [30]. In order to overcome the difficulty of finding clusters in high-dimensional data [31], suggested a method that could handle a variety of sizes, shapes, and densities [31]. The method looks at each data point’s closest neighbors, creating clusters surrounding points based on common. The [31] demonstrated increased performance on benchmark datasets by introducing a revised version of DBSCAN, called EPDCA, to detect clusters with varied forms and sizes [31].

The approximation adaptive density-based spatial clustering of applications with noise (AA-DBSCAN) was proposed by [32], with an emphasis on minimizing computation for parameter estimation using e-distance for finding clusters. GCMDDBSCAN was developed by [33] to improve clustering performance on big datasets. A multi-agent system-based technique called FlockStream was introduced by [34]. It performed well on both illusionary and real datasets. The D-Stream model was established by [35] as a density-based method for grouping data in real-time. Labeled data or paired restrictions are used by semi-supervised clustering algorithms, such as Multi-density-based spatial clustering of applications with noise (MDBSCAN) by [36], density-based clustering with constraints (C-DBSCAN) by [37], and Semi-supervised density-based clustering (SSDBSCAN) by [38], to increase clustering efficiency. However, multi-density data and clusters with different pairwise restrictions provide difficulties for these techniques. A very well semi-supervised clustering variation of K-means, the MPCKmean method by [39] is restricted in its ability to handle clusters of varying densities and sizes, but it is successful for tackling massive datasets. In order to overcome the density variation problem with DBSCAN [40], developed the VDBSCAN algorithm which is intended for the analysis of datasets with different densities. The basic idea behind VDBSCAN is to use pre-processing techniques to get many values for the Eps parameter from a k-dist plot, and then use the standard DBSCAN algorithm. Through the use of several Eps values, the method finds multiple densities of clusters at the same time. The process involves five stages: (1) finding, storing, and splitting k-dist plots for each object; (2) counting the number of densities from the k-dist plot; (3) choosing the Eps parameters automatically for each density; (4) scanning the dataset and grouping different densities based on the matching Eps values; and (5) displaying the valid clusters. Further, the Authors of [41] presented the foundation of LDBSCAN as the local-density-centered notion of clusters. This approach gives it an edge over previous density-based clustering algorithms since it streamlines the factor review process and uses LOF for vibration identification. In addition, a novel clustering approach was presented that overcomes the restrictions of DBSCAN and achieves a significant runtime acceleration in comparison to prior improvements. The user guide’s assessment technique has limits when it comes to the i-LIDS dataset; alerts that are more than 10 seconds old are penalized without taking the duration of the incursion into account [42].

This thorough review emphasizes the many strategies and difficulties associated with intrusion detection and clustering methods, highlighting the dynamic security environment on the Internet of Things and its associated applications further comparison is presented in Table 1.

Download:

Table 1. Presents the comparison of various density based algorithms.

https://doi.org/10.1371/journal.pone.0313890.t001

3 Proposed methodology

This section presents the methodology of a novel PIDS model for perimeter intrusion detection systems. Where an enhanced DBSCAN algorithm is used, with our proposed method the existing issues in PID can be resolved and the accuracy of detection is increased along with the classification and identification of various intrusion activities. In this section, we have presented the steps of our proposed model named novel PID model using image dataset as illustrated in Fig 2. The PID image-based dataset for the evaluation of proposed technique is used. Further the methodology of this study is divided into two phases, one is clustering part, and the other is classification part. The outcome of phase one is the input for the second phase of the research details of methodology are given in the next section.

Download:

Fig 2. Proposed PIDS model.

https://doi.org/10.1371/journal.pone.0313890.g002

3.1 Pre-processing

Pre-processing involves removing duplicate images during this phase, a meticulous step was taken to enhance the quality of the dataset. Duplicate images were systematically identified and subsequently removed using their unique hash values. This crucial procedure ensured that the dataset maintained a high level of integrity and eliminated redundancy. By filtering out duplicate instances based on their hash signatures, we aimed to enhance the overall robustness and efficiency of the dataset, laying the groundwork for more accurate and reliable analyses in subsequent stages of our research.

Data augmentation is a technique to increase the quantity and quality of dataset [43]. Additionally, to improve the generalization ability of the deep learning-based image classification model, data augmentation can be applied to both the training and validation sets [44]. A similar approach is followed in this study, and data augmentation has been applied to the whole dataset. The sample data is presented in Fig 3.

Download:

Fig 3. PID image dataset sample.

https://doi.org/10.1371/journal.pone.0313890.g003

Data augmentation methods, such as geometric transformation, kernel filters, mixing images, random erasing, and transformation [43] are evaluated for the suitability of intrusion scene augmentation. Details are provided in Table 2.

Download:

Table 2. List of augmentations selected.

https://doi.org/10.1371/journal.pone.0313890.t002

After the augmentation techniques applied the images increased to 7,962 total which comprises the seven classes as discussed in Table 3.

Download:

Table 3. Details of PID image dataset.

https://doi.org/10.1371/journal.pone.0313890.t003

3.2 Feature extraction of images

After data augmentation, the feature extraction phase was executed using the pre-trained Google InceptionV3 model. This transfer learning approach allowed us to leverage the well-established features learned by InceptionV3 on a large dataset. By extracting these high-level features from augmented images, we aimed to capture intricate patterns and representations essential for effective intrusion detection. The utilization of pre-trained models like InceptionV3 enhances the efficiency of our model training process, empowering it with a rich set of discriminative features for improved performance in the classification task.

In Inception network such as InceptionV3, features for t-SNE (t-distributed Stochastic Neighbor Embedding) are extracted from the penultimate layer it is the layer just before the final classification layer. This layer provides high-level features that have been processed by the network but are not yet reduced to the final class probabilities. The penultimate layer contains a rich representation of the image, capturing high-level abstract features. These features are useful for tasks like clustering or dimensionality reduction, as they represent the most informative parts of the input data.

CNNs utilize a series of convolutional layers that apply convolution operations to the input images. The convolution operation involves sliding a small filter or kernel across the image, computing the dot product between the filter and the local receptive field of the image. This process helps in extracting local patterns and features. The resulting feature maps are then passed through non-linear activation functions such as ReLU (Rectified Linear Unit) to introduce non-linearity. Mathematically, the convolution operation can be expressed in Eq (1) (1) Here, i and j denote the spatial coordinates of the feature map, σ is the activation function, image (i+m,j+n) represents the pixel values in the local receptive field of the image, Filter (m,n) represents the values of the convolution filter, and b represents the bias term.

InceptionV3 stands out for its efficiency, multi-scale feature extraction, and strong performance on large image datasets. Its ability to balance high accuracy with lower computational requirements makes it a solid choice for a wide range of image classification tasks instead of selecting other CNN models.

3.3 Dimensionality reduction with t-SNE for clustering

Following feature extraction with the InceptionV3 model, dimensionality reduction using t-SNE was executed for clustering. t-SNE effectively condensed the high-dimensional feature space, preserving data structure for insightful clustering analysis. This approach enhances interpretability, offering valuable insights into inherent patterns within the dataset.

The function for t-SNE can be mathematically represented in Eq (2) [45]. (2) where:

p_ij represents the similarity between data point i and data point j in the high-dimensional space,
q_ij represents the similarity between the corresponding data points in the low-dimensional space,
P_i and Q_i are the probability distributions in the high-dimensional and low-dimensional spaces, respectively.

The t-SNE algorithm iteratively minimizes this cost function to find an optimal low-dimensional representation of the data that preserves local structure and relationships from the high-dimensional space.

3.4 Proposed enhanced DBSCAN

DBSCAN is a clustering algorithm reliant on the proximity of data points in the feature space, it groups data points based on their density, effectively discerning dense and less crowded regions to capture the inherent data structure. After extracting and reducing features, they are considered as data points, and DBSCAN computes their density, assigning clusters using user-defined thresholds for minimum density and distance parameters. In the traditional DBSCAN method users regulate values of epsilon and Nmin. To automate and eliminate human intervention, we introduce an estimation for parameter epsilon in Algorithm 1, utilized in our proposed enhanced DBSCAN method as depicted in Fig 4. This clustering procedure aids in differentiating real intrusions from non-intrusive scenarios, enabling researchers to identify meaningful clusters representing distinct patterns in the multi-dimensional data. The method significantly contributes to improving intrusion detection system accuracy by effectively distinguishing between various intrusion types and non-intrusive scenarios, as outlined in the results section. (3)

Download:

Fig 4. Proposed enhanced DBSCAN flow chart.

https://doi.org/10.1371/journal.pone.0313890.g004

Algorithm 1 Epsilon parameter estimation ε

1: Input: dataset A = {a₁, a₂, …, a_n}

2: Variables: MinPts ⊳ MinPts are calculated by Eq (3)

3: Calculate distance between data points

4: for P in range [2, n] do ⊳ data items

5: Calculate d_p using Eq (3) ⊳ For each detected object/feature point P in the image,

6: Append (P, d_p) to d_p values

7: Polynomial fitting for discrete function d_p ⊳ To obtain continuous function coefficients

8: if S² < 0.99 then ⊳ If S² (fitting accuracy) is less than 0.99

9: P = P + 1 and go to line 3 ⊳ Increase P and repeat from the polynomial fitting step

10: else

11: Adding correction ⊳ Compute the correction factor based on the maximum value of D_P

12: end if

13: Calculate the derivative ⊳ Derive the first derivative of the corrected D_P

14: if p₀ > P then ⊳ Solve for p₀ if p₀ > P

15: P = P + 1 and go to line 3 ⊳ Increase P and repeat

16: else

17: Calculate the estimated radius ε ⊳ Calculated the estimated radius

18: end if

19: end for

The Manhattan distance, also known as “city block distance,” calculates total distance measure as the sum of ranges from all characteristics for two variables Xa and Xb in d-space measurements. The Manhattan distance between the locations is defined as above.

3.5 Preparing the data for classification with K-NN

Fig 5 presents the whole flow chart of the proposed model and Algorithm 2 introduces the proposed DBSCAN based model for PIDS, enhancing perimeter intrusion detection through advanced data processing and machine learning techniques. It starts by loading the PID image dataset, specialized for perimeter intrusion scenarios. Feature extraction is performed using the Inception v3 model, translating raw data into a structured feature space. To manage complexity, t-SNE is applied for dimensionality reduction. The algorithm then uses an enhanced DBSCAN, (Algorithm 1), for density-based spatial clustering, which effectively identifies unusual activities or security breaches without needing a predefined number of clusters. Clustering results are evaluated by using the evaluation metrics ARI, FMI and silhouette score as detailed in Eqs (4), (5), (6) and (7). After clustering, the algorithm proceeds to classification using the k-Nearest Neighbors (k-NN) approach. The cluster outcomes are treated as labels for training the k-NN model. The dataset is split into 70% for training and 30% for testing.

Download:

Fig 5. Flow chart of Novel enhanced DBSCAN based model.

https://doi.org/10.1371/journal.pone.0313890.g005

Adjusted Rand Index (ARI).

ARI measures the similarity between two clustering’s by adjusting for the chance grouping of elements. It considers all pairs of samples and checks if they are in the same or in the different clusters in predicted and true clusters as mathematically presented in Eqs (4) and (5). (4) where Ri (Rand Index) is given by: (5)

Algorithm 2 Novel enhanced DBSCAN based model for PIDS

1: Input: load the PID image dataset

2: Applying inception v3 for feature extraction

3: Performing dimensionality reduction using t-SNE

4: Applying enhanced DBSCAN see Algorithm 1

5: Outcome of enhanced DBSCAN

6: K-NN classification phase

7: Input: load original dataset with clusters

8: X = data.data

9: Y = data.target

10: Split the dataset into training and testing sets.

11: • 70% of data goes to training set

12: • 30% of data goes to testing set

13: for each instance in the test data do

14: Calculate the distance from each instance in the train data

15: for i, train_instance in enumerate(X_train) do

16: dist = Euclidean(test_instance, train_instance)

17: distances.append((dist, i))

18: distances.sort(key = lambda x: x[0]) ⊳ Sort distances and get the indices of the k-NN

19: Find the majority class among the data elements.

20: Return the predicted class.

21: end for

22: end for

23: Measure the performance by using validation metrics from equation

24: End

And a is the number of pairs of elements that are in the same cluster in both the predicted and true clusters, b is the number of pairs in different clusters in both, and is the total number of pairs.

Fowlkes-Mallows Index (FMI).

FMI is a measure of the similarity between the true cluster assignments and the predicted cluster. It is particularly useful for evaluating the precision and recall of clustering results, which are critical aspects of clustering algorithms. (6) where TP (True Positives) is the number of pairs correctly clustered together, FP (False Positives) is the number of pairs incorrectly clustered together, and FN (False Negatives) is the number of pairs incorrectly not clustered together.

Silhouette score.

The Silhouette Score evaluates how similar a sample is to its own cluster compared to other clusters. Silhouette Score is a metric used to evaluate the quality of a clustering solution. It measures how similar each point in one cluster is to points in the same cluster compared to points in other clusters. The Silhouette Score provides insight into how well the clusters are separated and how compact they are. (7) where a(i) is the average distance from the i-th sample to all other points in the same cluster, and b(i) is the minimum average distance from the i-th sample to points in a different cluster. The Silhouette Score is the mean s(i) for all samples.

The k-NN classifier predicts the class of test instances by calculating Euclidean distances and identifying the nearest neighbors. The classification categorizes data into intrusion activities, validating the proposed algorithm.

As presented in Table 4 the proposed enhanced DBSCAN is performing better due to its novel enhancement which has not been done in other variants of DBSCAN. Furthermore, the study’s findings indicate that the proposed technique significantly improves the detection and classification of intrusion activities. This is particularly evident in the comparative performance metrics, where the proposed model consistently outperformed other comparative models.

Download:

Table 4. Overall comparison of proposed enhanced DBSCAN with other variants of DBSCAN.

https://doi.org/10.1371/journal.pone.0313890.t004

4 Experimental results

In this study, the methodology presented in section 3 is followed, The features are selected by using python [46]. The original DBSCAN, K-DBSCAN, MDBSCAN, DBHC and ADBSCAN along with our proposed novel enhanced DBSCAN algorithm is applied to PID image dataset results of clustering visualization are presented in Figs 6–10 respectively.

Download:

Fig 6. Results of DBSCAN clustering.

https://doi.org/10.1371/journal.pone.0313890.g006

Download:

Fig 7. Results of K-DBSCAN clustering.

https://doi.org/10.1371/journal.pone.0313890.g007

Download:

Fig 8. Results of MDBSCAN clustering.

https://doi.org/10.1371/journal.pone.0313890.g008

Download:

Fig 9. Results of DBHC clustering.

https://doi.org/10.1371/journal.pone.0313890.g009

Download:

Fig 10. Results of ADBSCAN clustering.

https://doi.org/10.1371/journal.pone.0313890.g010

Fig 6 illustrates the clustering results of DBSCAN, which segmented into seven distinct clusters, highlighting Cluster 4, the largest, contains 1,821 points, while Cluster 5 includes 1,124 points, similarly cluster 7 contains 1029 points Conversely, smaller clusters like Cluster 1 and Cluster 3, with only 783 and 758 points respectively. The large clusters may contain datapoints of other intrusion activities therefore the silhouette score is compromised. Furthermore, DBSCAN detected 596 noise data points, it may misclustered intrusion activities as noise therefore the clustering results are compromised.

Fig 7 illustrates the clustering results of K-DBSCAN, it made seven distinct clusters, presenting cluster 6, the largest, contains 2,165 points, while cluster 2 includes 1,799 points, Conversely, smaller clusters like cluster 3 and cluster 5, with only 368 and 38 points respectively. Furthermore, K-DBSCAN ignores noise data, therefore it does not detect noise values, which include the outlier of the dataset in clusters, therefore the clustering results are compromised.

The MDBSCAN clustering graph, shown in Fig 8, illustrates a dataset segmented into multiple clusters and noise, demonstrating diverse groupings of data points. Notably, cluster 1, contains 1183 points, cluster 6 has less datapoints unlike the ground truth datapoints. MDBSCAN contains 452 noise points, highlights variability and outliers.

Fig 9 presents the visualization of DBHC clustering applied to a PID image dataset reveals a diverse distribution of data points across several clusters, with sizes ranging from 599 to 1,224 points, and identifies 452 points as noise.

The ADBSCAN clustering output is depicted in Fig 10, where graph reveals six clusters only. In contrast, clusters like Cluster 2 and Cluster 5, holding only 928 and 923 points respectively.

Evaluation metrics include Precision, recall, F1 score and accuracy. Where Precision is the ratio of true positive predictions to the total predicted positives. It measures the accuracy of the positive predictions as represented in (8). Similarly, Recall is the ratio of true positive predictions to the total actual positives. It measures the model’s ability to capture all the positive instances given in (9), moreover, The F1 score is the harmonic means of precision and recall. It provides a balance between precision and recall, especially when there is an imbalance between classes see (10), lastly Accuracy is the ratio of correctly predicted instances to the total instances. It provides an overall measure of model performance presented (11). (8) (9) (10) (11) In these equations:

True Positives (TP): Instances correctly predicted as positive.

False Positives (FP): Instances incorrectly predicted as positive.

False Negatives (FN): Instances incorrectly predicted as negative.

True Negatives (TN): Instances correctly predicted as negative.

Results obtained from dimensionality reduction with t-SNE are shown in Fig 11. The results from the proposed technique clustering algorithm is presented in Fig 12 applied to a dataset of 7,962 images across seven classes as required. It shows several clusters (labeled from cluster 1 to cluster 7), each in different colors, indicating groups of images similar based on the features analyzed. cluster 1 comprises 1,187 points, and the sizes of other clusters vary, suggesting varying degrees of similarity within the dataset. Additionally, the proposed algorithm has identified 85 points as noise these are outliers that do not fit well into any cluster due to their distinct features. This analysis shows points presented in graphs represent the number of images here each point represents an image. including both the identified clusters and noise, as shown in Fig 12 the clusters are well separated and there is no overlapping. As presented in Table 3, we have a total of seven categories of intrusions. Therefore, the seven different clusters correspond to the seven types of intrusions, as shown in the ground truth values.

Download:

Fig 11. Visualization of t-SNE.

https://doi.org/10.1371/journal.pone.0313890.g011

Download:

Fig 12. Results of proposed enhanced DBSCAN clustering.

https://doi.org/10.1371/journal.pone.0313890.g012

The results of clustering metrics ARI, FMI and Silhouette score of comparative techniques along with proposed technique are presented in Figs 13–15 respectively.

Download:

Fig 13. ARI metrics results.

https://doi.org/10.1371/journal.pone.0313890.g013

Download:

Fig 14. FMI metrics results.

https://doi.org/10.1371/journal.pone.0313890.g014

Download:

Fig 15. Silhouette score metrics results.

https://doi.org/10.1371/journal.pone.0313890.g015

Where it is shown that ADBSCAN does not perform well in ARI metric among other techniques the reason of low ARI is could be because ADBSCAN misclustered and made only six clustered therefore the ARI is compromised. Unlike ADBSCAN the MDBSCAN performed better in ARI metric as presented below MDBSCAN got 0.79 and DBHC got 0.75. In contrast DBSCAN and K-DBSCAN got 0.69 and 0.63 ARI metric score respectively. Similarly, it is shown that K-DBSCAN performed the worst in terms of FMI because it clustered two large clusters which have mis-clustered the datapoints from more than one intrusion activity therefore it got 0.60 of FMI and ADBSCAN got 0.63 of FMI. Unlike ADBSCAN the MDBSCAN performed better in ARI metric as presented in Fig 13 MDBSCAN got 0.80 and DBHC got 0.74. Furthermore, DBSCAN got 0.70 and proposed technique got 0.83 ARI metric score. Moreover, the Silhouette score of ADBSCAN is the lowest, which is 0.58 performed the worst in Silhouette score because it clustered only six clusters. Similarly, ADBSCAN achieved only 0.63, conversely DBHC performed better and achieved 0.82. The highest Silhouette score is achieved by our proposed technique, which is 0.86, since the clusters are well separated and have strong cohesion between datapoints.

Details of clustering results are further presented in Table 5. The proposed technique produced the highest average silhouette score which is 0.86. Accordingly, the ARI and FMI of the model is 0.82 and 0.83. These results are significantly higher than the results produced by other techniques. These performance measures confirm that the proposed algorithm is suitable for clustering PID image dataset.

Download:

Table 5. Details of overall clustering results.

https://doi.org/10.1371/journal.pone.0313890.t005

The k-NN classification algorithm is applied to the clustered data. It was found that proposed enhanced DBSCAN clustering model performs overall 0.91 accuracy, the overall precision is recorded at 0.89, recall is overall recorded as 0.91 and similarly the overall F1-score is recorded as 0.90 as shown in Figs 16–19.

Download:

Fig 16. Accuracy results for seven intrusion classes.

https://doi.org/10.1371/journal.pone.0313890.g016

Download:

Fig 17. Precision results for seven intrusion classes.

https://doi.org/10.1371/journal.pone.0313890.g017

Download:

Fig 18. Recall results for seven intrusion classes.

https://doi.org/10.1371/journal.pone.0313890.g018

Download:

Fig 19. F1-score results for seven intrusion classes.

https://doi.org/10.1371/journal.pone.0313890.g019

5 Conclusion

In this research, a novel PIDS model is introduced, a groundbreaking Machine learning framework designed for perimeter intrusion detection systems. Leveraging the pre-trained Inceptionv3 model, feature extraction is performed PID image dataset, focusing on perimeter intrusion scenarios. Subsequently, employing t-SNE for dimensionality reduction and followed by density-based clustering, a substantial enhancement to the conventional DBSCAN is introduced. This enhanced version incorporates epsilon value estimation and deploys the Manhattan distance formula, optimizing point distance calculations for multi-class data. The outcomes are highly promising, with the novel PIDS model achieving an impressive silhouette score of 0.86 on the challenging PID image dataset. This success not only underscores the effectiveness of our model in accurately identifying intrusion activities but also signifies a significant advancement in PID system capabilities, paving the way for further exploration and refinement in the realm of perimeter intrusion detection. This study focuses on six types of intrusion activities, with an additional class considered as non-intrusion, resulting in a total of seven classes. Future researchers may further extend these classes. Additionally, intrusion from the top of the perimeter fence is not within the scope of this research but can be addressed in future studies.

References

1. Perera C, Zaslavsky A, Christen P, Georgakopoulos D. Context aware computing for the internet of things: A survey. IEEE communications surveys & tutorials. 2013;16(1):414–454.
- View Article
- Google Scholar
2. Qiu T, Chen N, Li K, Atiquzzaman M, Zhao W. How can heterogeneous internet of things build our future: A survey. IEEE Communications Surveys & Tutorials. 2018;20(3):2011–2027.
- View Article
- Google Scholar
3. Pajouh HH, Javidan R, Khayami R, Dehghantanha A, Choo KKR. A two-layer dimension reduction and two-tier classification model for anomaly-based intrusion detection in IoT backbone networks. IEEE Transactions on Emerging Topics in Computing. 2016;7(2):314–323.
- View Article
- Google Scholar
4. Allwood G, Wild G, Hinckley S. Optical fiber sensors in physical intrusion detection systems: A review. IEEE Sensors Journal. 2016;16(14):5497–5509.
- View Article
- Google Scholar
5. Ibrahim SW. A comprehensive review on intelligent surveillance systems. Communications in science and technology. 2016;1(1).
- View Article
- Google Scholar
6. Vijverberg JA, Janssen RT, de Zwart R, de With PH. Perimeter-intrusion event classification for on-line detection using multiple instance learning solving temporal ambiguities. In: 2014 IEEE International Conference on Image Processing (ICIP). IEEE; 2014. p. 2408–2412.
7. Zhang YL, Zhang ZQ, Xiao G, Wang RD, He X. Perimeter intrusion detection based on intelligent video analysis. In: 2015 15th International Conference on Control, Automation and Systems (ICCAS). IEEE; 2015. p. 1199–1204.
8. Lohani D, Crispim-Junior C, Barthélemy Q, Bertrand S, Robinault L, Tougne L. Spatio-temporal convolutional autoencoders for perimeter intrusion detection. In: International Workshop on Reproducible Research in Pattern Recognition. Springer; 2021. p. 47–65.
9. Matern D, Condurache AP, Mertins A. Automated Intrusion Detection for Video Surveillance Using Conditional Random Fields. In: MVA; 2013. p. 298–301.
10. Villamizar M, Martínez-González A, Canévet O, Odobez JM. Watchnet: Efficient and depth-based network for people detection in video surveillance systems. In: 2018 15th IEEE International conference on advanced video and signal based surveillance (AVSS). IEEE; 2018. p. 1–6.
11. Tsakanikas V, Dagiuklas T. Video surveillance systems-current status and future trends. Computers & Electrical Engineering. 2018;70:736–753.
- View Article
- Google Scholar
12. Valera M, Velastin SA. Intelligent distributed surveillance systems: a review. IEE Proceedings-Vision, Image and Signal Processing. 2005;152(2):192–204.
- View Article
- Google Scholar
13. Kim SH, Lim SC, et al. Intelligent intrusion detection system featuring a virtual fence, active intruder detection, classification, tracking, and action recognition. Annals of Nuclear Energy. 2018;112:845–855.
- View Article
- Google Scholar
14. Cermeño E, Pérez A, Sigüenza JA. Intelligent video surveillance beyond robust background modeling. Expert Systems with Applications. 2018;91:138–149.
- View Article
- Google Scholar
15. Yan H, Shi G, Wang Q, Hao S. Identification of damaging activities for perimeter security. In: 2009 International Conference on Signal Processing Systems. IEEE; 2009. p. 162–166.
16. Gholizadeh N, Saadatfar H, Hanafi N. K-DBSCAN: An improved DBSCAN algorithm for big data. The Journal of supercomputing. 2021;77(6):6214–6235.
- View Article
- Google Scholar
17. Qian J, Zhou Y, Han X, Wang Y. MDBSCAN: A multi-density DBSCAN based on relative density. Neurocomputing. 2024;576:127329.
- View Article
- Google Scholar
18. Latifi-Pakdehi A, Daneshpour N. DBHC: A DBSCAN-based hierarchical clustering algorithm. Data & Knowledge Engineering. 2021;135:101922.
- View Article
- Google Scholar
19. Fahim A. Adaptive Density-Based Spatial Clustering of Applications with Noise (ADBSCAN) for Clusters of Different Densities. Computers, Materials & Continua. 2023;75(2).
- View Article
- Google Scholar
20. Fahim A. A varied density-based clustering algorithm. Journal of Computational Science. 2023;66:101925.
- View Article
- Google Scholar
21. Kumar KM, Reddy ARM. A fast DBSCAN clustering algorithm by accelerating neighbor searching using Groups method. Pattern Recognition. 2016;58:39–48.
- View Article
- Google Scholar
22. Cao F, Estert M, Qian W, Zhou A. Density-based clustering over an evolving data stream with noise. In: Proceedings of the 2006 SIAM international conference on data mining. SIAM; 2006. p. 328–339.
23. Li B, Wang Q, Barney E, Hart L, Wall C, Chawarska K, et al. Modified DBSCAN algorithm on oculomotor fixation identification. In: Proceedings of the Ninth Biennial ACM Symposium on Eye Tracking Research & Applications; 2016. p. 337–338.
24. Edla DR, Jana PK, Member IS. A prototype-based modified DBSCAN for gene clustering. Procedia Technology. 2012;6:485–492.
- View Article
- Google Scholar
25. Pitafi S, Anwar T, Sharif Z. A taxonomy of machine learning clustering algorithms, challenges, and future realms. Applied sciences. 2023;13(6):3529.
- View Article
- Google Scholar
26. Cai Y, Xie K, Ma X. An improved DBSCAN algorithm which is insensitive to input parameters. Acta Scicentiarum Naturalum Universitis Pekinesis. 2004;40(3):480–486.
- View Article
- Google Scholar
27. Karami A, Johansson R. Choosing DBSCAN parameters automatically using differential evolution. International Journal of Computer Applications. 2014;91(7):1–11.
- View Article
- Google Scholar
28. Guan C, Yuen KKF, Coenen F. Particle swarm optimized density-based clustering and classification: Supervised and unsupervised learning approaches. Swarm and evolutionary computation. 2019;44:876–896.
- View Article
- Google Scholar
29. Valarmathy N, Krishnaveni S. WITHDRAWN: A novel method to enhance the performance evaluation of DBSCAN clustering algorithm using different distinguished metrics; 2020. pmid:32728537
- View Article
- PubMed/NCBI
- Google Scholar
30. Ankerst M, Breunig MM, Kriegel HP, Sander J. OPTICS: Ordering points to identify the clustering structure. ACM Sigmod record. 1999;28(2):49–60.
- View Article
- Google Scholar
31. Ertöz L, Steinbach M, Kumar V. Finding clusters of different sizes, shapes, and densities in noisy, high dimensional data. In: Proceedings of the 2003 SIAM international conference on data mining. SIAM; 2003. p. 47–58.
32. Kim JH, Choi JH, Yoo KH, Nasridinov A. AA-DBSCAN: an approximate adaptive DBSCAN for finding clusters with varying densities. The Journal of Supercomputing. 2019;75(1):142–169.
- View Article
- Google Scholar
33. Zhang L, Xu Z, Si F. Gcmddbscan: Multi-density dbscan based on grid and contribution. In: 2013 IEEE 11th International Conference on Dependable, Autonomic and Secure Computing. IEEE; 2013. p. 502–507.
34. Forestiero A, Pizzuti C, Spezzano G. A single pass algorithm for clustering evolving data streams based on swarm intelligence. Data Mining and Knowledge Discovery. 2013;26:1–26.
- View Article
- Google Scholar
35. Chen Y, Tu L. Density-based clustering for real-time stream data. In: Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining; 2007. p. 133–142.
36. Huang Tq, Yu Yq, Li K, Zeng Wf. Reckon the parameter of DBSCAN for multi-density data sets with constraints. In: 2009 International Conference on Artificial Intelligence and Computational Intelligence. vol. 4. IEEE; 2009. p. 375–379.
37. Ruiz MS Carlos, Menasalvas E. C-dbscan: Density-based clustering with constraints. In: 1th International Conference, RSFDGrC 2007. vol. 4. Springer; 2007. p. 216–223.
38. Lelis L, Sander J. Semi-supervised density-based clustering. In: 2009 Ninth IEEE International Conference on Data Mining. IEEE; 2009. p. 842–847.
39. Wagstaff K, Cardie C, Rogers S, Schrödl S, et al. Constrained k-means clustering with background knowledge. In: Icml. vol. 1; 2001. p. 577–584.
- View Article
- Google Scholar
40. Liu P, Zhou D, Wu N. VDBSCAN: varied density based spatial clustering of applications with noise. In: 2007 International conference on service systems and service management. IEEE; 2007. p. 1–4.
41. Duan L, Xu L, Guo F, Lee J, Yan B. A local-density based spatial clustering algorithm with noise. Information systems. 2007;32(7):978–986.
- View Article
- Google Scholar
42. Pitafi S, Anwar T, Widia IDM, Yimwadsana B. Revolutionizing Perimeter Intrusion Detection: A Machine Learning-Driven Approach with Curated Dataset Generation for Enhanced Security. IEEE Access. 2023;.
- View Article
- Google Scholar
43. Shorten C, Khoshgoftaar TM. A survey on image data augmentation for deep learning. Journal of big data. 2019;6(1):1–48.
- View Article
- Google Scholar
44. Olsen A, Konovalov DA, Philippa B, Ridd P, Wood JC, Johns J, et al. DeepWeeds: A multiclass weed species image dataset for deep learning. Scientific reports. 2019;9(1):2058. pmid:30765729
- View Article
- PubMed/NCBI
- Google Scholar
45. Hajibabaee P, Pourkamali-Anaraki F, Hariri-Ardebili M. Dimensionality reduction techniques in structural and earthquake engineering. Engineering Structures. 2023;278:115485.
- View Article
- Google Scholar
46. Pilnenskiy N, Smetannikov I. Feature selection algorithms as one of the python data analytical tools. Future Internet. 2020;12(3):54.
- View Article
- Google Scholar
47. Ester M, Kriegel HP, Sander J, Xu X, et al. A density-based algorithm for discovering clusters in large spatial databases with noise. In: kdd. vol. 96; 1996. p. 226–231.
- View Article
- Google Scholar

[ref1] 1. Perera C, Zaslavsky A, Christen P, Georgakopoulos D. Context aware computing for the internet of things: A survey. IEEE communications surveys & tutorials. 2013;16(1):414–454.
View Article
Google Scholar

[2] View Article

[3] Google Scholar

[ref2] 2. Qiu T, Chen N, Li K, Atiquzzaman M, Zhao W. How can heterogeneous internet of things build our future: A survey. IEEE Communications Surveys & Tutorials. 2018;20(3):2011–2027.
View Article
Google Scholar

[5] View Article

[6] Google Scholar

[ref3] 3. Pajouh HH, Javidan R, Khayami R, Dehghantanha A, Choo KKR. A two-layer dimension reduction and two-tier classification model for anomaly-based intrusion detection in IoT backbone networks. IEEE Transactions on Emerging Topics in Computing. 2016;7(2):314–323.
View Article
Google Scholar

[8] View Article

[9] Google Scholar

[ref4] 4. Allwood G, Wild G, Hinckley S. Optical fiber sensors in physical intrusion detection systems: A review. IEEE Sensors Journal. 2016;16(14):5497–5509.
View Article
Google Scholar

[11] View Article

[12] Google Scholar

[ref5] 5. Ibrahim SW. A comprehensive review on intelligent surveillance systems. Communications in science and technology. 2016;1(1).
View Article
Google Scholar

[14] View Article

[15] Google Scholar

[ref6] 6. Vijverberg JA, Janssen RT, de Zwart R, de With PH. Perimeter-intrusion event classification for on-line detection using multiple instance learning solving temporal ambiguities. In: 2014 IEEE International Conference on Image Processing (ICIP). IEEE; 2014. p. 2408–2412.

[ref7] 7. Zhang YL, Zhang ZQ, Xiao G, Wang RD, He X. Perimeter intrusion detection based on intelligent video analysis. In: 2015 15th International Conference on Control, Automation and Systems (ICCAS). IEEE; 2015. p. 1199–1204.

[ref8] 8. Lohani D, Crispim-Junior C, Barthélemy Q, Bertrand S, Robinault L, Tougne L. Spatio-temporal convolutional autoencoders for perimeter intrusion detection. In: International Workshop on Reproducible Research in Pattern Recognition. Springer; 2021. p. 47–65.

[ref9] 9. Matern D, Condurache AP, Mertins A. Automated Intrusion Detection for Video Surveillance Using Conditional Random Fields. In: MVA; 2013. p. 298–301.

[ref10] 10. Villamizar M, Martínez-González A, Canévet O, Odobez JM. Watchnet: Efficient and depth-based network for people detection in video surveillance systems. In: 2018 15th IEEE International conference on advanced video and signal based surveillance (AVSS). IEEE; 2018. p. 1–6.

[ref11] 11. Tsakanikas V, Dagiuklas T. Video surveillance systems-current status and future trends. Computers & Electrical Engineering. 2018;70:736–753.
View Article
Google Scholar

[22] View Article

[23] Google Scholar

[ref12] 12. Valera M, Velastin SA. Intelligent distributed surveillance systems: a review. IEE Proceedings-Vision, Image and Signal Processing. 2005;152(2):192–204.
View Article
Google Scholar

[25] View Article

[26] Google Scholar

[ref13] 13. Kim SH, Lim SC, et al. Intelligent intrusion detection system featuring a virtual fence, active intruder detection, classification, tracking, and action recognition. Annals of Nuclear Energy. 2018;112:845–855.
View Article
Google Scholar

[28] View Article

[29] Google Scholar

[ref14] 14. Cermeño E, Pérez A, Sigüenza JA. Intelligent video surveillance beyond robust background modeling. Expert Systems with Applications. 2018;91:138–149.
View Article
Google Scholar

[31] View Article

[32] Google Scholar

[ref15] 15. Yan H, Shi G, Wang Q, Hao S. Identification of damaging activities for perimeter security. In: 2009 International Conference on Signal Processing Systems. IEEE; 2009. p. 162–166.

[ref16] 16. Gholizadeh N, Saadatfar H, Hanafi N. K-DBSCAN: An improved DBSCAN algorithm for big data. The Journal of supercomputing. 2021;77(6):6214–6235.
View Article
Google Scholar

[35] View Article

[36] Google Scholar

[ref17] 17. Qian J, Zhou Y, Han X, Wang Y. MDBSCAN: A multi-density DBSCAN based on relative density. Neurocomputing. 2024;576:127329.
View Article
Google Scholar

[38] View Article

[39] Google Scholar

[ref18] 18. Latifi-Pakdehi A, Daneshpour N. DBHC: A DBSCAN-based hierarchical clustering algorithm. Data & Knowledge Engineering. 2021;135:101922.
View Article
Google Scholar

[41] View Article

[42] Google Scholar

[ref19] 19. Fahim A. Adaptive Density-Based Spatial Clustering of Applications with Noise (ADBSCAN) for Clusters of Different Densities. Computers, Materials & Continua. 2023;75(2).
View Article
Google Scholar

[44] View Article

[45] Google Scholar

[ref20] 20. Fahim A. A varied density-based clustering algorithm. Journal of Computational Science. 2023;66:101925.
View Article
Google Scholar

[47] View Article

[48] Google Scholar

[ref21] 21. Kumar KM, Reddy ARM. A fast DBSCAN clustering algorithm by accelerating neighbor searching using Groups method. Pattern Recognition. 2016;58:39–48.
View Article
Google Scholar

[50] View Article

[51] Google Scholar

[ref22] 22. Cao F, Estert M, Qian W, Zhou A. Density-based clustering over an evolving data stream with noise. In: Proceedings of the 2006 SIAM international conference on data mining. SIAM; 2006. p. 328–339.

[ref23] 23. Li B, Wang Q, Barney E, Hart L, Wall C, Chawarska K, et al. Modified DBSCAN algorithm on oculomotor fixation identification. In: Proceedings of the Ninth Biennial ACM Symposium on Eye Tracking Research & Applications; 2016. p. 337–338.

[ref24] 24. Edla DR, Jana PK, Member IS. A prototype-based modified DBSCAN for gene clustering. Procedia Technology. 2012;6:485–492.
View Article
Google Scholar

[55] View Article

[56] Google Scholar

[ref25] 25. Pitafi S, Anwar T, Sharif Z. A taxonomy of machine learning clustering algorithms, challenges, and future realms. Applied sciences. 2023;13(6):3529.
View Article
Google Scholar

[58] View Article

[59] Google Scholar

[ref26] 26. Cai Y, Xie K, Ma X. An improved DBSCAN algorithm which is insensitive to input parameters. Acta Scicentiarum Naturalum Universitis Pekinesis. 2004;40(3):480–486.
View Article
Google Scholar

[61] View Article

[62] Google Scholar

[ref27] 27. Karami A, Johansson R. Choosing DBSCAN parameters automatically using differential evolution. International Journal of Computer Applications. 2014;91(7):1–11.
View Article
Google Scholar

[64] View Article

[65] Google Scholar

[ref28] 28. Guan C, Yuen KKF, Coenen F. Particle swarm optimized density-based clustering and classification: Supervised and unsupervised learning approaches. Swarm and evolutionary computation. 2019;44:876–896.
View Article
Google Scholar

[67] View Article

[68] Google Scholar

[ref29] 29. Valarmathy N, Krishnaveni S. WITHDRAWN: A novel method to enhance the performance evaluation of DBSCAN clustering algorithm using different distinguished metrics; 2020. pmid:32728537
View Article
PubMed/NCBI
Google Scholar

[70] View Article

[71] PubMed/NCBI

[72] Google Scholar

[ref30] 30. Ankerst M, Breunig MM, Kriegel HP, Sander J. OPTICS: Ordering points to identify the clustering structure. ACM Sigmod record. 1999;28(2):49–60.
View Article
Google Scholar

[74] View Article

[75] Google Scholar

[ref31] 31. Ertöz L, Steinbach M, Kumar V. Finding clusters of different sizes, shapes, and densities in noisy, high dimensional data. In: Proceedings of the 2003 SIAM international conference on data mining. SIAM; 2003. p. 47–58.

[ref32] 32. Kim JH, Choi JH, Yoo KH, Nasridinov A. AA-DBSCAN: an approximate adaptive DBSCAN for finding clusters with varying densities. The Journal of Supercomputing. 2019;75(1):142–169.
View Article
Google Scholar

[78] View Article

[79] Google Scholar

[ref33] 33. Zhang L, Xu Z, Si F. Gcmddbscan: Multi-density dbscan based on grid and contribution. In: 2013 IEEE 11th International Conference on Dependable, Autonomic and Secure Computing. IEEE; 2013. p. 502–507.

[ref34] 34. Forestiero A, Pizzuti C, Spezzano G. A single pass algorithm for clustering evolving data streams based on swarm intelligence. Data Mining and Knowledge Discovery. 2013;26:1–26.
View Article
Google Scholar

[82] View Article

[83] Google Scholar

[ref35] 35. Chen Y, Tu L. Density-based clustering for real-time stream data. In: Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining; 2007. p. 133–142.

[ref36] 36. Huang Tq, Yu Yq, Li K, Zeng Wf. Reckon the parameter of DBSCAN for multi-density data sets with constraints. In: 2009 International Conference on Artificial Intelligence and Computational Intelligence. vol. 4. IEEE; 2009. p. 375–379.

[ref37] 37. Ruiz MS Carlos, Menasalvas E. C-dbscan: Density-based clustering with constraints. In: 1th International Conference, RSFDGrC 2007. vol. 4. Springer; 2007. p. 216–223.

[ref38] 38. Lelis L, Sander J. Semi-supervised density-based clustering. In: 2009 Ninth IEEE International Conference on Data Mining. IEEE; 2009. p. 842–847.

[ref39] 39. Wagstaff K, Cardie C, Rogers S, Schrödl S, et al. Constrained k-means clustering with background knowledge. In: Icml. vol. 1; 2001. p. 577–584.
View Article
Google Scholar

[89] View Article

[90] Google Scholar

[ref40] 40. Liu P, Zhou D, Wu N. VDBSCAN: varied density based spatial clustering of applications with noise. In: 2007 International conference on service systems and service management. IEEE; 2007. p. 1–4.

[ref41] 41. Duan L, Xu L, Guo F, Lee J, Yan B. A local-density based spatial clustering algorithm with noise. Information systems. 2007;32(7):978–986.
View Article
Google Scholar

[93] View Article

[94] Google Scholar

[ref42] 42. Pitafi S, Anwar T, Widia IDM, Yimwadsana B. Revolutionizing Perimeter Intrusion Detection: A Machine Learning-Driven Approach with Curated Dataset Generation for Enhanced Security. IEEE Access. 2023;.
View Article
Google Scholar

[96] View Article

[97] Google Scholar

[ref43] 43. Shorten C, Khoshgoftaar TM. A survey on image data augmentation for deep learning. Journal of big data. 2019;6(1):1–48.
View Article
Google Scholar

[99] View Article

[100] Google Scholar

[ref44] 44. Olsen A, Konovalov DA, Philippa B, Ridd P, Wood JC, Johns J, et al. DeepWeeds: A multiclass weed species image dataset for deep learning. Scientific reports. 2019;9(1):2058. pmid:30765729
View Article
PubMed/NCBI
Google Scholar

[102] View Article

[103] PubMed/NCBI

[104] Google Scholar

[ref45] 45. Hajibabaee P, Pourkamali-Anaraki F, Hariri-Ardebili M. Dimensionality reduction techniques in structural and earthquake engineering. Engineering Structures. 2023;278:115485.
View Article
Google Scholar

[106] View Article

[107] Google Scholar

[ref46] 46. Pilnenskiy N, Smetannikov I. Feature selection algorithms as one of the python data analytical tools. Future Internet. 2020;12(3):54.
View Article
Google Scholar

[109] View Article

[110] Google Scholar

[ref47] 47. Ester M, Kriegel HP, Sander J, Xu X, et al. A density-based algorithm for discovering clusters in large spatial databases with noise. In: kdd. vol. 96; 1996. p. 226–231.
View Article
Google Scholar

[112] View Article

[113] Google Scholar

Figures

Abstract

1 Introduction

2 Related work

3 Proposed methodology

3.1 Pre-processing

3.2 Feature extraction of images

3.3 Dimensionality reduction with t-SNE for clustering

3.4 Proposed enhanced DBSCAN

3.5 Preparing the data for classification with K-NN

Adjusted Rand Index (ARI).

Fowlkes-Mallows Index (FMI).

Silhouette score.

4 Experimental results

5 Conclusion

References