Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Anomaly recognition in surveillance based on feature optimizer using deep learning

  • Shaista Khanam,

    Roles Methodology

    Affiliation Department of Computer Science, COMSATS University Islamabad, Wah Campus, Wah Cantt, Punjab, Pakistan

  • Muhammad Sharif,

    Roles Methodology

    Affiliation Department of Computer Science, COMSATS University Islamabad, Wah Campus, Wah Cantt, Punjab, Pakistan

  • Mudassar Raza,

    Roles Conceptualization

    Affiliation Department of Computer Science, Namal University, Mianwali, Pakistan

  • Waqar Ishaq,

    Roles Conceptualization

    Affiliation Telecommunication Department, Hazara University, Mansehra, Pakistan

  • Muhammad Fayyaz,

    Roles Investigation

    Affiliation Department of Computer Science, FAST - National University of Computer and Emerging Sciences, Chiniot-Faisalabad Campus, Chiniot, Pakistan

  • Seifedine Kadry

    Roles Project administration, Supervision

    skadry@gmail.com

    Affiliations Department of Computer Science and Mathematics, Lebanese American University, Beirut, Lebanon, Department of Applied Data Science, Noroff University College, Kristiansand, Norway

Abstract

Surveillance systems are integral to ensuring public safety by detecting unusual incidents, yet existing methods often struggle with accuracy and robustness. This study introduces an advanced framework for anomaly recognition in surveillance, leveraging deep learning to address these challenges and achieve significant improvements over current techniques. The framework begins with preprocessing input images using histogram equalization to enhance feature visibility. It then employs two DCNNs for feature extraction: a novel 63-layer CNN, “Up-to-the-Minute-Net,” and the established Inception-Resnet-v2. The features extracted by both models are fused and optimized through two sophisticated feature selection techniques: Dragonfly and Genetic Algorithm (GA). The optimization process involves rigorous experimentation with 5- and 10-fold cross-validation to evaluate performance across various feature sets. The proposed approach achieves an unprecedented 99.9% accuracy in 5-fold cross-validation using the GA optimizer with 2500 selected features, demonstrating a substantial leap in accuracy compared to existing methods. This study’s contribution lies in its innovative combination of deep learning models and advanced feature optimization techniques, setting a new benchmark in the field of anomaly recognition for surveillance systems and showcasing the potential for practical real-world applications.

Introduction

Surveillance is described as observing, tracking, documenting, and controlling the behavior of individuals, objects, and events to regulate activity in today’s social life [1]. To be safe and secure is one of the basic human needs [2]. The unexpected data patterns cause some problems that are referred to as anomalies. However, due to the wide variety of specific contexts, unusual event detection is a difficult glitch to solve [3]. Theft, robbery, shoplifting, snatching, automobile burglary, arson, and running arbitrarily are all examples of security threats [4,5]. These security threats can be minimized by implementing modern techniques [6,7] of computer vision [8] and Deep Learning. Identifying anomalous frames and categorizing suspicious photos is complicated and time-consuming [9]. Numerous applications use anomaly recognition, such as intrusion detection applications, visual monitoring, and recognition of suspicious or anomalous behavior. Surveillance is usually done by different types of cameras, all public and private places need proper surveillance [10].

Detection of anomalous patterns in surveillance [11] is much more challenging and complex now because of massive and instantaneous data handling requirements. Human beings have some limitations. They cannot examine enormous data for anomaly recognition in surveillance scenes [12]. Prior automated warning of violent activity in surveillance scenes could significantly minimize the risk of any dangerous activity. Anomaly incident recognition, a well-known research part, is a sub-classification of insightful reconnaissance using computer vision techniques. The outcomes of this classification can be used in various applications, including avoiding security threats in public places and other routine surveillance at various locations[10].

In today’s smart city era, video monitoring has become extremely vital. Large surveillance cameras are installed in public and reserved locations to monitor infrastructure and public safety. To separate the anomalous frame from normal footage is much time and effort, consuming [13]. The categorization of suspicious photos is complicated by variations in shape, texture, people orientation, size, and background. Anomalous image classification is also discovered as an important issue in the domain of surveillance[14]. Some techniques and algorithms are utilized under the umbrella of indirect supervision for anomaly recognition [15].

This paper addresses issues of accuracy, robustness, and the shortcomings of previous approaches by presenting a unique framework for anomaly identification in surveillance systems. The following are this work’s main contributions:

  1. Introduction of Up-to-the-Minute-Net: This research proposes a new 63-layer DCNN named Up-to-the-Minute-Net, specifically designed for feature extraction in anomaly detection tasks. This model is optimized to handle complex image data, enhancing its effectiveness in identifying suspicious activities in surveillance footage.
  2. Dual Feature Extraction Mechanism: This approach combines the feature extraction capabilities of the proposed Up-to-the-Minute-Net with the well-established Inception-Resnet-v2 model. By fusing features from both networks, we create a rich and diverse set of features that improve the model’s ability to detect subtle anomalies.
  3. Advanced Feature Selection Techniques: To further refine the feature set, we introduce two state-of-the-art optimization algorithms—Dragonfly and GA—for feature selection. These methods ensure that only the most relevant features are used, significantly enhancing the model’s performance and reducing computational complexity.
  4. Extensive Cross-Validation: We rigorously evaluate our model using 5-fold and 10-fold cross-validation, demonstrating its robustness and generalizability. The proposed framework achieves an unprecedented 99.9% accuracy through these experiments, substantially improving the existing technique.
  5. Benchmark Dataset and Real-World Application Potential: The model is tested on the Suspicious Activity Recognition (SAR) benchmark dataset, and its performance surpasses existing state-of-the-art methods. Our model demonstrates strong potential for practical, real-world applications, particularly enhancing public safety through more accurate and reliable surveillance systems.

The manuscript has six sections. Section 1 is about the introduction of the topic. Section 2 belongs to similar jobs. Section 3 encircles the proposed deep neural network-based methodology with the feature optimizer used in the introduced process. Section 4 is about the experimental outcomes and discussion, Section 5 is about discussion and limitations, and Section 6 depicts the conclusion and future direction.

Nomenclature used in the paper

Several methods are discussed in this review paper, and a list of nomenclature is also provided in Table 1, which are used throughout the paper.

Related work

Any recognition system mainly follows the important steps 1. pre-processing [16], 2. feature extraction[17,18] 3. optimal feature selection [19,20], and 4. classification [21,22] notwithstanding their complexities and complications. Before being used in the construction and application of models, images must undergo image preprocessing [23]. This includes, but is not limited to, scaling, organizing, and color modifications. Scale Invariant Feature Transform(3DSIFT) [24] finds the nearby elements in a picture. A one-dimensional CL (CONVID) is employed to limit commotion and to get applicable data from the embedding produced by the spatial highlights’ extractor [25]. A deep neural network gives very exceptional outcomes in the area of surveillance for anomaly detection, as the methodology proposed by experts is to perceive and segment individuals in pictures and recordings [2628]. Features are extracted from the images of suspicious actions and some methods such as MPPCA [29], MMPC+SFA [30], Conv-AE [31], ConvLSTM-AE [32], Deep Generic [33], GANs [34] are proposed for feature extraction. Heuristic and hand-crafted features a 3D extension of HOG [35] and HOF [36] provide a brilliant outcome. These features are also assessed utilizing diverse deep-learning models. Some pre-trained feature extraction models such as VGG -19 [37], Inception -3 [38], ResNet-50 [39], Google Net [40], ResNet-18 [41], Squeeze Net [42] DL strategies are utilized for feature extraction. Further with the features selection techniques, knowledge can be discovered and mined which allows for the elimination of redundant and outdated features while maintaining the core details [43].

Some irrelevant features can be further eliminated using dimension-reduction methods to improve the system’s effectiveness and performance [4446]. Principal Component Analysis(PCA) [47] attempts to order all n information vectors as immediate groupings with few eigenvectors while ignoring test focuses that don’t match the standards [48]. Object identification [49], object tracking [50], and object recognition [51] are the three phases of surveillance. Object classification would ultimately allow for distinction not only between humans and vehicles [52] but also between various types of behavior, such as drunk drivers and suspicious humans [52]. Members of the intoxicated driver’s class are likely to trigger an accident, so they are indirectly conducting behavior prediction. Members of the suspicious human class who are discovered in a parking lot may attempt to steal a car [53].

In the domain of surveillance, walker grouping is a significant objective recognition concern [54]. DL has recently seen a considerable advancement in images such as object recognition, and pose estimation, as different tasks utilizing non-direct relations in high-dimensional information [55,56]. In Generative Adversarial Networks (GAN), in light of its capacity to learn successive information, CNN-based LSTM [57,58] is utilized for encoding transient data in pictures [59]. Some methods are used for object detection which leads to object classification as YOLO is proposed for object detection [60]. Table 2 presents a comparative comparison of the state-of-the-art deep learning approaches for anomaly identification in surveillance.

thumbnail
Table 2. Comparative analysis of existing approaches in anomaly recognition for surveillance systems.

https://doi.org/10.1371/journal.pone.0313692.t002

Motivation

The proposed research is motivated by the critical need to enhance surveillance systems to ensure public safety in an increasingly complex environment where manual monitoring is no longer feasible. Current DL methods for anomaly recognition face challenges in achieving high accuracy, efficient feature extraction, and scalability for real-world applications. To bridge these gaps, this paper introduces a novel framework that combines two DCNNs— the custom-built “Up-to-the-Minute-Net” and the established Inception-ResNet-v2—for comprehensive feature extraction. This dual approach is complemented by a hybrid optimization technique using Dragonfly and GA for feature selection. This approach notably enhances the model’s performance by concentrating on the most pertinent features. The model’s ability to achieve 99.9% accuracy with 5-fold cross-validation demonstrates its superiority over existing methods. By advancing the state-of-the-art in both feature extraction and selection, this study advances not only the theoretical knowledge of anomaly recognition but also offers a practical solution that is scalable and applicable to real-world surveillance scenarios.

Proposed methodology

The main objective of this study is to introduce a method for detecting anomalies in surveillance footage and identifying suspicious activities. The overall architecture of the proposed anomaly detection framework is illustrated in (Fig 1). It shows the integration of various stages, including the initial image preprocessing step using contrast enhancement [67] and histogram equalization. In this manuscript, a DDCNN named Up-to-the-Minute-Net is proposed. The other significant phases performed are feature extraction from the proposed network and pre-trained deep network Inception-Resnet-V2 [68]. This visualization helps in understanding how each component contributes to the overall anomaly detection process, from preprocessing to feature extraction and classification.

Extracted features from both networks are fused in the fusion phase. The extracted features were passed to two feature optimizers Dragonfly and GA. The methodology produces improved outcomes as compared to much other research after the classification. The optimized features are transferred to distinctive classifiers for categorization. The experiments are performed on two different cross-validation numbers, 5- and 10 values are selected for this purpose.

Data preprocessing

Before processing the images through the networks, the data undergoes several preprocessing steps. Initially, contrast enhancement [67] and histogram equalization techniques are applied to improve image quality. The noise from the image is eliminated using a contrast enhancement technique and histogram equalization upgrades the differentiation of pictures by changing the qualities in the intensities of the image. Histogram equalization further improves the contrast by redistributing the image’s intensity levels, making features more distinguishable.

Feature extraction

Feature extraction is integral to the execution of the classification procedure. The research has a prodigious contribution to the fusion of features extracted from the proposed Up-to-the-minute-Net and pre-trained deep Inception-Resnet-V2 [68] network. Both networks are used for feature extraction from the SAR [69] dataset. The features extracted from Up-to-the-Minute-Net are 4096 and features extracted from Inception-Resnet-V2 are 1536.

Proposed Deep CNN Up-to-the-Minute-Net.

A new CNN-based deep neural network named Up-to-the-Minute-Net is proposed. First, the CIFAR-100 is used to train the anticipated CNN model [70]. The dataset contains 100 classes of images belonging to different categories. In the wake of switching the proposed network over to a pre-trained model, the features are extracted from the SAR [69]. The proposed Up-to-the-Minute-Net is a branched network. Table 3 elaborates on the features FMS, FD, ST, PD, and the pooling window size of each layer.

thumbnail
Table 3. Detailed layers’ configuration of proposed Up-to-the-Minute-Net.

https://doi.org/10.1371/journal.pone.0313692.t003

The proposed deep CNN net starts from the Input layer. Its FMS is 227 × 227 × 3 and the FD is 11 × 11 × 3 × 96. Following the input, the network has a CN1 which CN layer. The network has a total of 63 layers in number. The features are extracted from FC20 in the form of 1 × 1 × 4096. The detailed architecture is depicted in (Fig 2). The network has 20 CN layers. It also contains 7 R layers, 2 D layers, 3 FC layers, and one SoftMax layer. The network also has 14 BN layers, 5 P layers, 4 ADD layers, 4 GCN layers, and 1 final CL layer for classification purposes. The proposed deep CNN net starts from the Input layer. Its FMS is 227 × 227 × 3 and the FD is 11 × 11 × 3 × 96.

thumbnail
Fig 2. Detailed layered architecture of proposed Up-to-the-Minute-Net.

https://doi.org/10.1371/journal.pone.0313692.g002

The network output type is classification. Some parameters are definite and have to be set at the time of implementation. Table 4 describes the hyperparameters and their values. These hyperparameters have a major role in the training of the network.

thumbnail
Table 4. Hyperparameters and their values for proposed Up-to-the-Minute-Net.

https://doi.org/10.1371/journal.pone.0313692.t004

All dataset images contain features that were obtained from the FC20 layer. Each picture has 4096 total features that were obtained from this layer. The feature matrix’s overall size thus increases to 13250 by 4096. The first network used for feature extraction is the proposed Up-to-the-Minute-Net.

Inception-Resnet-V2.

The second network used for feature extraction is Inception-Resnet-V2 [68]. This DCNN is trained on more than 1,000,000 images from the ImageNet [71] data set without the FC layer. The net is 164 layers deep and can categorize images into 1000 article classes. Thus, the network has acquired fine feature portrayals for a large number of images. The network has an image input size of 299 by 299. In the Inception-Resnet block, numerous valued CN channels are joined with available connections. Inception-Resnet-V2 has a total of 824 layers and functions in number. Its input type is the image, and its output type is classification. The features are extracted from the average P-layer. The total features extracted from the network are 1536 and the total images in the SAR [69] are 13250. The feature matrix we achieved is 13250* 1536.

Feature fusion

Feature fusion intends to consolidate the source image of a similar scene to shape one composite picture that holds a more exact depiction of the view than any of the singular source images. Preceding the merging of images, striking features, present in all source pictures, are separated utilizing a suitable component extraction method. The research has a prodigious contribution to the fusion of features obtained from the proposed Up-to-the-Minute-Net and pre-trained deep CNN network Inception-Resnet-V2. Both networks are used for feature extraction. Then, at that point, fusion is performed utilizing the features extracted from these networks. The extracted features from InceptionResNet-v2 are 1536 and Up-to-the-Minute-Net are 4096. These features got fused and made a collective compound of 13250*5632 fused features. The remarkable features are first recognized in each source image. The remarkable quality of a component is figured as a coefficient. The features fusion process ensures that the majority of the predominant features are integrated.

Feature optimization

There is an immense number of retrieved features in the feature extraction phase. After the fusion process, this makes a gigantic combination of 5632 features. The training phase for classification can be slowed down for these many features. Additionally, concerns like indefinite features and the curse of dimensionality can emerge, thus, a decline in execution can happen. To resolve this issue, the fused features are delivered to the Dragonfly [72] and GA [73] feature optimizers simultaneously. These algorithms were chosen because of their demonstrated effectiveness in solving high-dimensional optimization problems, which is essential in the context of anomaly recognition and feature selection from large datasets.

Dragonfly algorithm.

The mathematical implementation of Dragonfly is as follows according to Reynolds, the overall behavior follows three primitive principles of separation, alignment, and cohesion [74]. While calculating the separation is the position of the current individual, shows the position of the kth neighboring individual and is some neighboring individuals.

(1)

While calculating alignment shows the velocity of the kth neighboring individual.

(2)

The cohesion is calculated as follows:

(3)

Where m is the position of the current individual [72]. These principles are incorporated into the Dragonfly Algorithm to guide the search process toward optimal solutions in a high-dimensional space.

Genetic algorithm.

GA reached to optimal solution through the probabilities of crossover and mutation by continuously changing the search space. In the following equation e is the number of generations and E is the total number of evolutionary generations set by the population [75].

While performing crossover two parents are chosen for mating one parent donates some part of genetic material and the corresponding part of the other parent participates in the offspring [76].

Dragonfly and GA were chosen due to their demonstrated suitability for feature selection in complex, high-dimensional datasets. Both algorithms are well-suited to balancing exploration and exploitation, which is critical for ensuring that the most relevant features are selected from the fused feature set. The number of optimal features selected from the Dragonfly features optimizer is 950 and 1250 and the total of optimal features chosen from the GA optimizer are 500 and 2500.

Classification

The chosen features are pushed ahead to different classifiers. The picked classifiers are the variants of SVM [77]. The variants of SVM incorporate Quad-SVM [78], Linear-SVM [79], FineG-SVM [80], CoarG-SVM), MedG-SVM, and Cu-SVM [81]. The SVM classifier and its kernel can be found in [8284]. The classifiers are evaluated on several performance evaluation metrics. The itemized upshots and trials are presented in the results section. The results are taken through two different folds of cross-validation numbers with two changed feature optimizers Dragonfly and GA.

Experimental results and discussion

A total of 8 experiments are performed, the number of experiments performed with the Dragonfly is 4, and likewise, the experiments performed with GA are 4 in number. The dataset used for implementation purposes is SAR [69]. The features are extracted from two DCNNs, one is proposed Up-to-the-minute-Net, and the second is pre-trained Inception-Resnet-V2. The extracted features from both DCNNs got fused and provided a combination of 5632. Results are taken by 5- and 10-fold cross-validation. The number of optimal features selected is 500, 950, 1250, and 2500. The total number of iterations is 25 for each experiment. The presented research is implemented using MATLAB R2021a and on the Microsoft Windows 10 Pro operating system. The system has the processor of Intel(R) Core (TM) i5-2520M CPU @ 2.50GHz, 2501 Mhz, 2 Core(s), 4 Logical Processor(s), and BIOS Version Hewlett-Packard 68SCF Ver. F.67. The system has Installed Physical Memory (RAM) of 8.00 GB. The detailed results of all experiments are described in section 4.2. The performance measures used for experiments are Precision(Pre) [85], Recall(Rec) [86], Accuracy [87], F1-Sc [88], ROC, and AUC [89]. The performance criteria are similar for all experiments.

Dataset used for implementation

A dataset containing five anomaly classes is arranged by acquiring four classes ((a)Falling (b)Fighting (c)Firing and (d) Running) from HMDB51 (https://www.kaggle.com/datasets/easonlll/hmdb51) [79] and one class (Fire) from AIDER (https://zenodo.org/records/3888300#.XvCPQUUzaUk) [80] datasets. The dataset is named SAR (Suspicious Action Recognition). It is also used in [59].

Simulation and data details

This dataset SAR is essential for assessing how well the suggested anomaly recognition methodology works. The SAR dataset comprises 13,250 images, each with different scenarios and conditions, making it suitable for testing the robustness and accuracy of the proposed models. The total number of original images in the dataset is 6625 and the augmented images are 13250. The dataset distribution between original and augmented images is depicted in (Fig 3), where the blue bars represent the original images, and the yellow bars indicate the augmented images. The figure shows how the dataset has been expanded through augmentation to create a larger, more diverse dataset that improves the model’s training and reduces overfitting.

thumbnail
Fig 3. Graphical representation of original and augmented images.

https://doi.org/10.1371/journal.pone.0313692.g003

The combination we obtained after the fusion process is 13250* 5632. The k-fold cross-validation method is used for the validation of this multiclass classification.

Experiments performed based on feature optimizer and no. of folds

To perform the experiments two different feature optimizers are used. First Dragonfly feature optimizer is selected for the selection of optimal features second is the GA feature optimizer to perform the experiments. The parameters selected for the Dragonfly feature optimizer are discussed in Table 5. A total of 4 experiments were executed based on Dragonfly optimization; two experiments were executed with 5 folds and 2 experiments were executed with 10 folds cross-validation. A different number of validations is selected to check out the difference if it exists by changing the number of folds for validation purposes.

thumbnail
Table 5. Parameters Values used for Dragonfly optimization.

https://doi.org/10.1371/journal.pone.0313692.t005

Two experiments are performed with 950 selected features with 5- and 10-fold cross-validation. Two experiments are performed with 1250 selected features on both folds for validation. In this manuscript, the model is also tested with a GA feature optimizer to attain the best results. In the following category total of 4 experiments are performed based on GA features optimizer. 2 experiments are performed with 5 folds and two are performed with 10 folds. Parameter values used for experiments are elaborated in Table 6.

Two experiments are performed with 500 selected features with 5- and 10-fold cross-validation two experiments are performed with 2500 on both folds for validation. The outcomes of all 8 experiments are discussed in Table 7 which are comprised of Pre, Rec, F1-Sc, and Accuracy.

The highest accuracy achieved is 99.9% with a Pre of 1.00, Rec of 1.00, and F1-Sc of 1.00. In this experiment, the selected number of optimal features is 2500 from total features of 5632. The CM in (Fig 4) shows how well the model classified the different anomaly classes in the SAR dataset and it presents the CM of the experiment with the highest accuracy. The CM details the performance of the model in classifying different anomaly types, with metrics such as TP, FP, TN, and FN.

The ROC curve in (Fig 5) visualizes the trade-off between TP rates (sensitivity) and FP rates across different classification thresholds. The AUC value close to 1.0 signifies that the model is highly effective at distinguishing between anomaly and non-anomaly cases. The curve in this Fig shows that the model’s predictive power is excellent, as it achieves near-perfect classification with minimal FP and FN.

The (Fig 6) shows the performance metrics for Experiment 7, which achieved the highest accuracy. The fig includes graphical representations of Accuracy, Pre, Rec, and F1-Sc for the model trained with 2500 features using GA optimization and 5-fold cross-validation. This fig emphasizes the model’s superior performance across all metrics, with perfect scores indicating that the model is highly effective at detecting anomalies. The detailed performance metrics underscore the impact of optimal feature selection and the effectiveness of the GA optimization technique.

thumbnail
Fig 6. Graphical representation of Accuracy, Pre, Rec, and F1 Score on 2500 features with GA optimization on 5-fold cross-validation.

https://doi.org/10.1371/journal.pone.0313692.g006

No. of the folds on which the highest accuracy is achieved is 5. In this experiment, the data is trained on the testing and training ratio of 20:80. The specified portion of testing and training data is changed after every iteration.

Accuracy over selected features

The selection of optimal features has a great impact on accuracy. A total 4 number of features are selected for experimentation those are 500,950,1250,2500. Table 8 describes the association between the features and accuracy.

The selection of these optimal features is random, as the total fused feature is 5632 in number. The careful selection of features shown in (Fig 7), rather than simply using all available features, leads to significant improvements in accuracy. The performance improvements as the number of features increases show that the model can leverage additional information up to a point, after which further increases offer limited benefits.

thumbnail
Fig 7. The relational graph between No. of features and Accuracy.

https://doi.org/10.1371/journal.pone.0313692.g007

The highest accuracy achieved is 99.9% on 2500 features on Cu-SVM with GA optimization on 5-fold cross-validation.

Comparison with existing results

This section compares the previously published work of [69]. Heretofore the number of experiments done was 5. The number of selected features was 100, 250, 500, 450, 1000. The number of folds selected for all experiments was 5 folds. ASO is used for feature optimization of all experiments. The highest result achieved was 99.3% on the Cu-SVM of a selected number of 500 features. Table 9 provides a widespread evaluation of the proposed work with prevailing work.

thumbnail
Table 9. A widespread comparison of proposed work with existing work.

https://doi.org/10.1371/journal.pone.0313692.t009

In the proposed work a total of 8 experiments are executed with the selected features of 500,950,1250 and 2500. The experiments are accomplished, and results are obtained by the two feature optimizers the Dragonfly algorithm and GA. No of the folds selected are 5 folds and 10 folds for cross-validation. The highest accuracy acquired is 99.9% from 2500 selected features on Cu-SVM with 5-fold cross-validation. While [69] achieved 99.3% accuracy with 500 features using the entropy-coded ant colony optimization and L4-branched-ActionNet, our model achieved the same accuracy with 950 features using the Up-to-the-Minute-Net and GA optimizer. However, our model was designed to explore the impact of using a larger feature set (up to 2500 features), ultimately achieving 99.9% accuracy. This suggests that while [69] Maybe more efficiently with fewer features, our approach excels when higher feature counts yield better performance. Future work will focus on optimizing feature selection to balance accuracy and computational efficiency for real-world applications.

Statistical analysis

To ensure the robustness and reliability of our model, we employed rigorous statistical methods throughout the study. We utilized 5-fold and 10-fold cross-validation to validate the model’s efficiency, ensuring generalizability across different data splits. Additionally, we implemented two distinct feature optimization techniques—Dragonfly and GA—to select the most relevant features, testing different feature subsets to optimize accuracy and reliability. The models were evaluated using key performance metrics such as Pre, Rec, F1-Sc, and accuracy, across multiple iterations and parameter settings, confirming the consistency and validity of our findings. These comprehensive steps ensure that the statistical analysis conducted is thorough and methodologically sound.

The base paper achieved a prediction speed of 650 obs/sec, while the proposed model achieved 87 obs/sec. This speed discrepancy can largely be attributed to the computational power available in the respective environments. The base paper benefited from the use of a dedicated NVIDIA GTX 1070 GPU, which is designed to handle intensive computational tasks such as model training and prediction at a much faster rate than a CPU. In contrast, the proposed model was run on a CPU-based system without GPU acceleration, which naturally results in slower processing speeds. Although the prediction speed of the proposed model is lower, the focus of our research is on improving accuracy, achieving 99.9%, as compared to the 99.3% accuracy reported in [69]. The increase in accuracy, despite the computational limitations, demonstrates the effectiveness of our feature extraction and selection approach, making the trade-off between speed and accuracy a key consideration. For applications where high accuracy is more critical than prediction speed, the proposed model offers a valuable alternative.

In on our feature selection process and the trends observed with the use of 500, 950,1250, and 2500 features, we anticipate a marginal increase in accuracy beyond 99.9% when using the full feature set. The primary goal of our feature selection was to find an optimal balance between computational efficiency and accuracy, as the marginal increase in accuracy with significantly more features is unlikely to justify the additional computational cost. In real-time applications where speed is crucial, a model with fewer features might be preferable, whereas in other cases, the higher accuracy of the proposed model may justify the additional computational time.

Discussion and limitations

The importance of anomaly recognition in surveillance has been discussed before. For this purpose, a model is designed and tested with different experiments. This subdivision mounts the experiments and results of the proposed approaches and the relative comparison to the existing techniques. The experiments are conducted with the standard benchmark dataset Suspicious Activity Recognition (SAR). In the paper, a list of nomenclature is presented in Table 1 and a comparative analysis of existing techniques to detect anomalies in surveillance with the help of DL is presented in Table 2. Fig 1 shows the flow chart of the proposed approach. Table 3 shows the detailed layer’s configuration of the proposed Up-to-the-Minute-Net. Fig 2 depicts the detailed layered architecture of the proposed DCNN net. Table 4 shows the hyperparameters and their values for the proposed Up-to-the-Minute-net. The augmented dataset is used for experimentation purposes and a graphic illustration of the dataset is shown in Fig 3. The extracted features from both DNNs got fused which are passed to two feature optimizers to attain the best results. The hyperparameters set for the experimentation based on the Dragonfly features optimizer are discussed in Table 5. The hyperparameters set for the experimentation based on the GA features optimizer are discussed in Table 6. The selected features were passed to two validation methods, 5- and 10-fold cross-validation methods. The outcome of all 8 experiments is discussed in Table 7 comprised of Pre, Rec, F1-Sc, and Accuracy. The highest results were achieved in experiment 7 on Cu-SVM on 5-fold cross-validation on 2500 features with GA features optimizer. The CM and ROC of the highest accuracy are represented in Fig 4. The ROC of the highest obtained is shown in Fig 5. The Pre, Rec, F1-Sc, and accuracy of experiment 7 are represented graphically in Fig 6. The selection of several features has a great impact on results. The highest accuracy obtained on selected features is discussed in Table 8 and it is graphically depicted in Fig 7. For evaluation purposes, a comprehensive comparison of the proposed work with existing work is discussed in Table 9.

While our proposed framework demonstrates significant improvements in anomaly recognition accuracy, there are several limitations to consider. The method’s performance may vary when applied to larger or more diverse datasets outside of the benchmark dataset used in this study. Additionally, the computational complexity of the DL models and optimization algorithms may pose challenges in real-time applications or on resource-constrained devices. Further research is needed to evaluate the scalability and robustness of our approach in different operational environments and to address these practical constraints.

Conclusion and future work

Within this manuscript, a DCNN-branched model named Up-to-the-Minute-Net is anticipated. The benchmark dataset SAR is used for experiments. The dataset consists of 5 classes of anomalies. After the preprocessing of images, the features are extracted from the proposed DCNN. Another pre-trained deep neural network Inception-Resnet-V2 is used for feature extraction. The extracted features from both DCNNs were fused. In this research we proposed two methodologies belonging to optimization, the experiments are performed on the fused features by two different feature optimizers for the selection of optimal features. The feature selection algorithms Dragonfly and GA are used for the optimization problem. The results are taken on 5 and 10 folds as well to pattern the variance. The total number of experiments performed is 8. Four experiments are performed by the Dragonfly feature optimizer and four are performed by the GA features optimizer. The number of features selected for different experiments are 500,950,1250 and 2500, The highest accuracy is 99.9% achieved on Cu-SVM on selected 2500 features on 5-fold cross-validation based on GA optimization. While the GA optimizer with 2500 features provided the highest accuracy, it is important to consider the computational requirements associated with this feature set. In real-world applications, the increased number of features may result in higher computational overhead, affecting processing time and memory usage. A comparison between GA with 2500 features and GA with 500 features highlights this trade-off, where the model with 500 features, although slightly lower in accuracy (99.2%), may be more suitable for resource-constrained environments. Future work will focus on evaluating these computational aspects to optimize the framework for practical scenarios.

References

  1. [1. Fan Y, Levine MD, Wen G, Qiu S. A deep neural network for real-time detection of falling humans in naturally occurring scenes. Neurocomputing. 2017;260:43–58.
  2. 2. Underwood B, Saiedian HJS. Mass surveillance: a study of past practices and technologies to predict future directions. 2021;4(2):e142.
  3. 3. Yang H, Yuan C, Xing J, Hu W. SCNN: Sequential convolutional neural network for human action recognition in videos. In: 2017 IEEE International Conference on Image Processing (ICIP). IEEE; 2017. p. 355–9. https://doi.org/10.1109/icip.2017.8296302
  4. 4. Sung CS, Park JY. Design of an intelligent video surveillance system for crime prevention: applying deep learning technology. Multimedia Tools Appl. 2021:1–13.
  5. 5. Yuan S, Wu X. Deep learning for insider threat detection: review, challenges and opportunities. Comput Secur. 2021:102221.
  6. 6. Chaudhary D, Kumar S, Dhaka VS. Estimating crowd size for public place surveillance using deep learning. Deep learning and big data for intelligent transportation: enabling technologies and future trends. p. 175.
  7. 7. Doshi K, Yilmaz Y. Any-shot sequential anomaly detection in surveillance videos. In: Proc IEEE/CVF Conf Comput Vis Pattern Recognit Work. IEEE; 2020. p. 934–5.
  8. 8. Pawar K, Attar V. Deep learning approaches for video-based anomalous activity detection. World Wide Web. 2018;22(2):571–601.
  9. 9. Doshi K, Yilmaz Y. Continual learning for anomaly detection in surveillance videos. In: Proc IEEE/CVF Conf Comput Vis Pattern Recognit Work. IEEE; 2020. p. 254–5.
  10. 10. Santhosh KK, Dogra DP, Roy PP. Anomaly detection in road traffic using visual surveillance. ACM Comput Surv. 2020;53(6):1–26.
  11. 11. Mekruksavanich S, Jitpattanakul A. Smartwatch-based human activity recognition using hybrid LSTM network. In: 2020 IEEE Sensors. IEEE; 2020. p. 1–4.
  12. 12. Kumar M, Biswas M. Abnormal human activity detection by convolutional recurrent neural network using fuzzy logic. Multimed Tools Appl. 2023;83(22):61843–59.
  13. 13. Sujatha E, Janani D. Real time activity monitoring using deep learning. In: 2024 5th International Conference on Innovative Trends in Information Technology (ICITIIT). IEEE; 2024. p. 1–6.
  14. 14. Arul U, Prabhakara Rao T, Baskaran R, Kirubakaran S, Thariq Hussan MI. Effective anomaly identification in surveillance videos based on adaptive recurrent neural network. 2024;19(3):1793-1805.
  15. 15. Erhan L, Ndubuaku M, Di Mauro M, Song W, Chen M, Fortino G, et al. Smart anomaly detection in sensor systems: A multi-perspective review. Information Fusion. 2021;67:64–79.
  16. 16. Raza M, Sharif M, Yasmin M, Khan MA, Saba T, Fernandes SL. Appearance based pedestrians’ gender recognition by employing stacked auto encoders in deep learning. Future Generation Comput Syst. 2018;88:28–39.
  17. 17. Cao X, Gao S, Chen L, Wang Y. Ship recognition method combined with image segmentation and deep learning feature extraction in video surveillance. Multimed Tools Appl. 2019;79(13–14):9177–92.
  18. 18. Khan SU, Hussain T, Ullah A, Baik SW. Deep-ReID: deep features and autoencoder assisted image patching strategy for person re-identification in smart cities surveillance. Multimed Tools Appl. 2021;83(5):15079–100.
  19. 19. Theng D, Bhoyar KK. Feature selection techniques for machine learning: a survey of more than two decades of research. Knowl Inf Syst. 2023;66(3):1575–637.
  20. 20. Khan MA, Javed K, Khan SA, Saba T, Habib U, Khan JA, et al. Human action recognition using fusion of multiview and deep features: an application to video surveillance. Multimed Tools Appl. 2020;83(5):14885–911.
  21. 21. Shah JH, Sharif M, Yasmin M, Fernandes SL. Facial expressions classification and false label reduction using LDA and threefold SVM. Pattern Recognit Lett. 2020;139:166–73.
  22. 22. Dai J, Li Q, Wang H, Liu L. Understanding images of surveillance devices in the wild. Knowledge-Based Syst. 2024;284:111226.
  23. 23. Sharif M, Ansari GJ, Yasmin M, Fernandes SL. Reviews of the implications of VR/AR health care applications in terms of organizational and societal change. Emerging technologies for health and medicine: virtual reality, augmented reality, artificial intelligence, internet of things, robotics, industry 4.0. 2018. p. 1–19.
  24. 24. Scovanner P, Ali S, Shah M. A 3-dimensional sift descriptor and its application to action recognition. In: Proceedings of the 15th ACM international conference on Multimedia. ACM; 2007. p. 357–60. https://doi.org/10.1145/1291233.1291311
  25. 25. Bento FRO, Vassallo RF, Samatelo JLA. Anomaly detection on public streets using spatial features and a bidirectional sequential classifier. J Control Autom Electr Syst. 2021;33(1):156–66.
  26. 26. Zahra A, Ghafoor M, Munir K, Ullah A, Ul Abideen Z. Application of region-based video surveillance in smart cities using deep learning. Multimed Tools Appl. 2021;:1–26. pmid:34975282
  27. 27. Wang J, Hu F, Abbas G, Albekairi M, Rashid N. Enhancing image categorization with the quantized object recognition model in surveillance systems. Exp Syst Appl. 2024;238:122240.
  28. 28. Santos T, Oliveira H, Cunha AJCSR. Systematic review on weapon detection in surveillance footage through deep learning. 2024;51:100612.
  29. 29. Kim J, Grauman K. Observe locally, infer globally: a space-time MRF for detecting abnormal activities with incremental updates. In: 2009 IEEE Conf Comput Vis Pattern Recognit. IEEE; 2009. p. 2921–8.
  30. 30. Mahadevan V, Li W, Bhalodia V, Vasconcelos N. Anomaly detection in crowded scenes. In: 2010 IEEE Comput Soc Conf Comput Vis Pattern Recognit. IEEE; 2010. p. 1975–81.
  31. 31. Hasan M, Choi J, Neumann J, Roy-Chowdhury AK, Davis LS. Learning temporal regularity in video sequences. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE; 2016. p. 733–42.
  32. 32. Luo W, Liu W, Gao S. Remembering history with convolutional LSTM for anomaly detection. In: 2017 IEEE International Conference on Multimedia and Expo (ICME). IEEE; 2017.p. 439–44. https://doi.org/10.1109/icme.2017.8019325
  33. 33. Hinami R, Mei T, Satoh S. Joint detection and recounting of abnormal events by learning deep generic knowledge. In: Proceedings of the IEEE Int Conf Comput Vis. 2017. p. 3619–27.
  34. 34. Ravanbakhsh M, Nabi M, Mousavi H, Sangineto E, Sebe N. Plug-and-play cnn for crowd motion analysis: an application in abnormal event detection. In: 2018 IEEE Winter Conference on Applications of Computer Vision (WACV). IEEE; 2018. p. 1689–98.
  35. 35. Dalal N, Triggs B. Histograms of oriented gradients for human detection. In: 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR). IEEE; 2005. p. 886–93.
  36. 36. Dalal N, Triggs B, Schmid C. Human detection using oriented histograms of flow and appearance. In: European Conf Comput Vis. Springer; 2006. p. 428–41.
  37. 37. Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556. 2014.
  38. 38. Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z. Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE; 2016. p. 2818–26. https://doi.org/10.1109/cvpr.2016.308
  39. 39. He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. In: Proceedings of the IEEE Conf Comput Vis Pattern Recognit. 2016. p. 770–8.
  40. 40. Szegedy C, Liu W, Jia Y, Sermanet P, Chuang N, Vanhoucke V, et al. Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2015. p. 1–9.
  41. 41. Das S. CNN Architectures: LeNet, AlexNet, VGG, GoogLeNet, ResNet. ed: Accessed: Sep, 2019.
  42. 42. Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, et al. ImageNet large scale visual recognition challenge. Int J Comput Vis. 2015;115(3):211–52.
  43. 43. Pham NT, Foo E, Suriadi S, Jeffrey H, Lahza HFM. Improving performance of intrusion detection system using ensemble methods and feature selection. In: Proceedings of the Australasian Computer Science Week Multiconference. ACM; 2018. p. 1–6. https://doi.org/10.1145/3167918.3167951
  44. 44. Li S, Yoon H-S. Enhancing camera calibration for traffic surveillance with an integrated approach of genetic algorithm and particle swarm optimization. Sensors (Basel). 2024;24(5):1456. pmid:38474992
  45. 45. Chen Y, Gao B, Lu T, Li H, Wu Y, Zhang D, et al. A hybrid binary dragonfly algorithm with an adaptive directed differential operator for feature selection. Remote Sens. 2023;15(16):3980.
  46. 46. Ravagnani A, Lillo F, Deriu P, Mazzarisi P, Medda F, Russo AJ. Dimensionality reduction techniques to support insider trading detection. 2024.
  47. 47. Jolliffe IJE. Principal component analysis. 2005.
  48. 48. BenAbdelkader C, Cutler R, Davis L. Stride and cadence as a biometric in automatic person identification and verification. In: Proceedings of Fifth IEEE International Conference on Automatic Face and Gesture Recognition. IEEE; 2002. p. 372–7.
  49. 49. Raj M, Bakas J. Detection of object-based forgery in surveillance videos utilizing motion residual and deep learning. In: Distributed Computing and Intelligent Technology: 19th International Conference, ICDCIT 2023, Bhubaneswar, India, January 18–22, 2023, Proceedings. Springer; 2023. p. 141–8.
  50. 50. Al Jaberi S, Patel A, AL-Masri A. Object tracking and detection techniques under GANN threats: a systemic review. 2023:110224.
  51. 51. Raja R, Sharma PC, Mahmood MR, Saini DK. Analysis of anomaly detection in surveillance video: recent trends and future vision. Multimed Tools Appl. 2022;82(8):12635–51.
  52. 52. Xiang Y, Li T, Ren W, Zhu T, Choo K-KR. A lightweight privacy-preserving scheme using label-based pixel block mixing for image classification in deep learning. arXiv preprint arXiv:2105.08876. 2021.
  53. 53. Kalaivani P, Roomi SMM. Towards Comprehensive Understanding of Event Detection and Video Summarization Approaches. In: 2017 Second International Conference on Recent Trends and Challenges in Computational Models (ICRTCCM). IEEE; 2017. p. 61–6. https://doi.org/10.1109/icrtccm.2017.84
  54. 54. Brunetti A, Buongiorno D, Trotta GF, Bevilacqua V. Computer vision and deep learning techniques for pedestrian detection and tracking: a survey. Neurocomputing. 2018;300:17–33.
  55. 55. Tang S, Wang Z, Yu C, Sun C, Li Y, Xiao J. Fast and accurate novelty detection for large surveillance video. CCF Trans HPC. 2024;6(2):130–49.
  56. 56. Chen J, Wang Y, Wang Q, Wan H, Ma X. Free-Ride Transmission of Semantic Features in Wireless Video Surveillance Systems. In: 2024 IEEE Wireless Communications and Networking Conference (WCNC). IEEE; 2024. p. 1–6. https://doi.org/10.1109/wcnc57260.2024.10571305
  57. 57. Heckler L, König RJPC. Feature selection for unsupervised anomaly detection and localization using synthetic defects. 2024;154:165.
  58. 58. An S, Kim J, Kim S, Chikontwe P, Jung J, Jeon H, et al. Few-shot anomaly detection using positive unlabeled learning with cycle consistency and co-occurrence features. Exp Syst Appl. 2024;256:124890.
  59. 59. Kaiser Ł, Nachum O, Roy A, Bengio S. Learning to remember rare events. arXiv preprint arXiv:1703.03129. 2017.
  60. 60. Redmon J, Divvala S, Girshick R, Farhadi A. You only look once: unified, real-time object detection. In: Proceedings of the IEEE Conf Comput Vis Pattern Recognit. IEEE; 2016. p. 779–88.
  61. 61. Wang D, Wu KJMS. Anomaly detection in surveillance videos using transformer with margin learning. J Title Abbrev. 2024;30(5):1–13.
  62. 62. Jebur SA, Hussein KA, Hoomod HK, Alzubaidi L, Saihood AA, Gu YJ. A scalable and generalized deep learning framework for anomaly detection in surveillance videos. 2024.
  63. 63. Al-lahham A, Zaheer MZ, Tastan N, Nandakumar K. Collaborative Learning of Anomalies with Privacy (CLAP) for unsupervised video anomaly detection: a new baseline. In: 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE; 2024. p. 12416–25. https://doi.org/10.1109/cvpr52733.2024.01180
  64. 64. Iqbal A, Amin R. Time series forecasting and anomaly detection using deep learning. Comput Chem Eng. 2024;182:108560.
  65. 65. Surianarayanan C, Kunasekaran S, Chelliah PR. A high-throughput architecture for anomaly detection in streaming data using machine learning algorithms. Int J Inf Tecnol. 2023;16(1):493–506.
  66. 66. Altice B, Nazario E, Davis M, Shekaramiz M, Moon TK, Masoum MAS. Anomaly detection on small wind turbine blades using deep learning algorithms. Energies. 2024;17(5):982.
  67. 67. Cheng HD, Shi XJ. A simple and effective histogram equalization approach to image enhancement. Digital Signal Process. 2004;14(2):158–70.
  68. 68. Szegedy C, Ioffe S, Vanhoucke V, Alemi A. Inception-v4, inception-resnet and the impact of residual connections on learning. In: Proceedings AAAI. 2017. vol. 31. no. 1.
  69. 69. Saba T, Rehman A, Latif R, Fati SM, Raza M, Sharif MJIA. Suspicious activity recognition using proposed deep L4-branched-actionnet with entropy coded ant colony system optimization. J Ambient Intell Humaniz Comput. 2021;9:89181–97.
  70. 70. Krizhevsky A, Hinton G. Learning multiple layers of features from tiny images. In: cs.utoronto.ca. 2009.
  71. 71. Deng J, Dong W, Socher R, Li L-J, Li K, Fei-Fei L. Imagenet: a large-scale hierarchical image database. In: 2009 IEEE conference on computer vision and pattern recognition. IEEE; 2009. p. 248–55.
  72. 72. Mirjalili S. Dragonfly algorithm: a new meta-heuristic optimization technique for solving single-objective, discrete, and multi-objective problems. Neural Comput Appl. 2015;27(4):1053–73.
  73. 73. Jh HJAA. Adaptation in natural and artificial systems. 1975.
  74. 74. Reynolds CW. Flocks, herds and schools: A distributed behavioral model. In: Proceedings of the 14th annual conference on Computer graphics and interactive techniques. ACM; 1987. p. 25–34. https://doi.org/10.1145/37401.37406
  75. 75. Katoch S, Chauhan SS, Kumar V. A review on genetic algorithm: past, present, and future. Multimed Tools Appl. 2021;80(5):8091–126. pmid:33162782
  76. 76. Goldberg D, Lingle R. Alleles, loci, and the traveling salesman problem. In: Proceedings of the First International Conference on Genetic Algorithms and Their Applications. Psychology Press; 2014. p. 154–9.
  77. 77. Noble WSJN. What is a support vector machine?. Nat Rev Genet. 2006;24(12):1565–7.
  78. 78. Dagher I. Quadratic kernel-free non-linear support vector machine. J Glob Optim. 2007;41(1):15–30.
  79. 79. Chang YW, Lin CJ. Feature ranking using linear SVM. In: Causation and Prediction Challenge. PMLR; 2008. p. 53–64.
  80. 80. Virdi P, Narayan Y, Kumari P, Mathew L. Discrete wavelet packet based elbow movement classification using fine Gaussian SVM. In: 2016 IEEE 1st International Conference on Power Electronics, Intelligent Control and Energy Systems (ICPEICES). IEEE; 2016. p. 1–5.
  81. 81. Liu Z, Zuo MJ, Zhao X, Xu H. An analytical approach to fast parameter selection of Gaussian RBF kernel for support vector machine. J Comput Sci Eng. 2015;31(2):691–710.
  82. 82. Rüping S. Svm kernels for time series analysis. Technical report 2001.
  83. 83. Ayat NE, Cheriet M, Suen CY. Automatic model selection for the optimization of SVM kernels. Pattern Recognit. 2005;38(10):1733–45.
  84. 84. Haasdonk B. Feature space interpretation of svms with indefinite kernels. Int J Pattern Recognit Artif Intell. 2005;27(4):482–92.
  85. 85. van Halem N, van Klaveren C, Cornelisz I. The effects of implementation barriers in virtually proctored examination: a randomised field experiment in Dutch higher education. High Educ Q. 2020;75(2):333–47.
  86. 86. Voss C, Haber NJ. Systems and methods for detection of behavior correlated with outside distractions in examinations. Google Patents; 2018.
  87. 87. Korman M. Behavioral detection of cheating in online examination. 2010.
  88. 88. Butler-Henderson K, Crawford J. A systematic review of online examinations: a pedagogical innovation for scalable authentication and integrity. Comput Educ. 2020;159:104024. pmid:32982023
  89. 89. Metzger R, Maudoodi R. Using access reports and API logs as additional tools to identify exam cheating. In: Society for Information Technology & Teacher Education International Conference. Association for the Advancement of Computing in Education (AACE); 2020. p. 294–9.