Figures
Abstract
Deep neural networks have been shown to be highly vulnerable to adversarial examples—inputs crafted to mislead models by adding subtle, human-imperceptible perturbations. Transferability and stealthiness are two crucial metrics for evaluating adversarial attacks. However, these goals often conflict: examples with high transferability typically exhibit noticeable adversarial noise, while those with imperceptible perturbations tend to perform poorly in black-box attacks. To tackle this, we propose Diff-AdaNAG, a novel framework that introduces Nesterov’s Accelerated Gradient (NAG) into diffusion-based adversarial example generation. Specifically, the diffusion mechanism guides the generation process toward the natural data distribution, achieving stealthy attacks with imperceptible adversarial examples. Meanwhile, an adaptive step-size strategy is utilized to harness the strong acceleration and generalization capabilities of NAG in optimization, enhancing black-box transferability in adversarial attacks. Extensive experiments demonstrate that Diff-AdaNAG consistently outperforms state-of-the-art methods in both white-box and black-box scenarios, significantly boosting transferability without compromising stealthiness. The code is available at https://github.com/Linc2021/Diff-AdaNAG.
Citation: Lin C, Long S (2025) The strength of Nesterov’s accelerated gradient in boosting transferability of stealthy adversarial attacks. PLoS One 20(11): e0337463. https://doi.org/10.1371/journal.pone.0337463
Editor: Asadullah Shaikh, Najran University College of Computer Science and Information Systems, SAUDI ARABIA
Received: April 10, 2025; Accepted: November 8, 2025; Published: November 25, 2025
Copyright: © 2025 Lin, Long. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: All image data used in this study are publicly available from the ImageNet database (https://image-net.org/challenges/LSVRC/2012/index.php). A subset of this dataset, selected with a fixed seed for replication, was used in our experiments. All relevant code required to reproduce the study’s findings is available in the GitHub repository at https://github.com/Linc2021/Diff-AdaNAG.
Funding: We would like to acknowledge the financial support provided by the Postgraduate Scientific Research Innovation Project of Hunan Province (No. CX20240105). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing interests: The authors have declared that no competing interests exist.
Introduction
Deep neural networks are vulnerable to adversarial attacks—subtle perturbations that are visually indistinguishable from natural data yet capable of misleading model predictions [1,2]. This vulnerability poses serious security risks in real-world applications, making robustness against such attacks a critical research problem. Understanding the mechanisms behind adversarial attacks is essential for developing effective defenses. Typically, adversarial example generation is formulated as a constrained optimization problem, including sign-based methods such as Fast Gradient Sign Method (FGSM) [1], Basic Iterative Method (BIM) [3], Projected Gradient Descent (PGD) [4], etc. While PGD performs well in white-box scenarios, its performance declines in black-box contexts, often underperforming compared to FGSM. Consequently, various PGD-inspired algorithms have been proposed, highlighting the importance of rigorously evaluating these methods to enhance model robustness and security.
Among evaluation metrics, transferability and stealthiness stand out as the two core criteria for assessing the effectiveness of adversarial attacks. Transferability refers to the ability of adversarial examples to generalize across different models, enabling successful attacks even when the attacker has incomplete knowledge of the target model’s structure and parameters. Stealthiness, on the other hand, demands that adversarial examples be visually indistinguishable from original images to avoid human detection and being easily identified and filtered by defense mechanisms. In practice, however, there is often a trade-off between these two metrics. Algorithms that focus on generating imperceptible adversarial examples often exhibit poor performance in black-box attack scenarios, with their transferability significantly limited [5]. In contrast, some algorithms enhance transferability by introducing more noticeable adversarial noise at the expense of stealthiness [6].
To further improve the stealthiness of adversarial attacks, diffusion-based mechanisms have been introduced to generate adversarial examples. DiffAttack employs the Denoising Diffusion Implicit Model (DDIM) inversion to project clean images into the diffusion latent space and add noise there; this latent space alteration yields adversarial examples that are visually nearly indistinguishable from the original images [7]. Recently, Diff-PGD [8] exploits a diffusion mechanism to generate adversarial examples that are closer to the natural data distribution, improving the stealthiness and effectiveness of the attack. Similar work includes the NS-Diff-PGD presented by [9]. However, Diff-PGD focuses on producing visually stealthy adversarial examples and suffers from a low attack success rate in black-box settings.
Addressing these limitations, numerous effective strategies exist for training adversarial examples with transferability on a white-box surrogate model. In particular, representative algorithms derived from FGSM, namely MI-FGSM [10] and NI-FGSM [11], correspond to Heavy-Ball (HB) [12] and Nesterov’s Accelerated Gradient (NAG) [13], respectively. Notably, in traditional optimization fields, NAG is a more advanced momentum algorithm than HB. NI-FGSM, which incorporates the NAG mechanism, demonstrates remarkable black-box transferability. However, MI-FGSM and NI-FGSM still rely on the sign-based optimization used in FGSM and PGD. The use of the sign function may result in uncontrollable algorithmic non-convergence [14] and poor generalization [15]. Therefore, we tend to believe that the potential of NAG hasn’t been fully exploited. To address this concern, an adaptive step-size strategy is used to improve MI-FGSM [16]. It evades the adverse effects of the sign function and generates adversarial noise that is both transferable and imperceptible.
Motivated by these insights, we propose Diff-AdaNAG, an innovative adversarial attack framework that synergistically integrates Adaptive Nesterov’s Accelerated Gradient optimizer (AdaNAG) with diffusion-based adversarial example generation inspired by Diff-PGD. By replacing problematic sign-based operations with adaptive NAG momentum, Diff-AdaNAG establishes the direct connection of NAG in convex optimization theory and adversarial attack practice, unleashing its strength in black-box transferable attacks. Additionally, the diffusion mechanism significantly increases the stealthiness and effectiveness of black-box adversarial attacks by generating examples closer to the natural data distribution. Comprehensive experiments demonstrate that Diff-AdaNAG consistently outperforms state-of-the-art methods under both white-box and black-box scenarios, highlighting its superior attack success rates and improved transferability. Our contributions are as follows.
- We propose an innovative adversarial attack algorithm that produces stealthy adversarial examples by a diffusion-based mechanism inspired by Diff-PGD.
- We introduce adaptive NAG momentum to enhance controllability by replacing problematic sign-based operations with adaptive step-size.
- We conduct comprehensive experiments to show that the proposed method consistently outperforms state-of-the-art methods under both white-box and black-box scenarios, highlighting its superior attack success rates and improved transferability.
Notation. We use lower case bold letters x to denote vectors and upper case bold letters X to denote matrices. We use to denote the
-norm of a vector, and
to denote the ℓ∞-norm. For any vectors
,
, all standard operations such as
and
are assumed to be element-wise. We use
to denote the gradient of a function f with respect to x. We use I to denote the identity matrix, and 0 to denote the zero vector.
Related works
We aim to generate a non-targeted adversarial example from a given input
with the true label y. The goal is to minimize the objective function
, in order to maximize the discrepancy between the predicted label of the adversarial example and the true label. The optimization problem can be formulated as follows:
where ε is the -norm constraint on the perturbation. The loss function
is typically the cross-entropy loss between the predicted label of the adversarial example and the true label. We can learn adversarial examples using the gradient ascent direction in practice.
Adversarial attacks
There are many methods to solve problem (1), such as FGSM [1], BIM [17], and PGD [4] (sign-based). FGSM generates adversarial examples by taking a single step in the direction of the sign of the gradient of the loss function with respect to the input:
BIM iteratively takes small steps in the direction of the gradient and projects the perturbed input back to a valid -ball centered around the original input. Then, PGD is proposed as a multi-step variant of BIM by introducing random noise initialization.
where projects
back to the
-ball centered around x with radius ε and
. However, their attack effectiveness falls short in black-box scenarios. Dong et al. propose MI-FGSM to address this issue by incorporating momentum into the optimization process [10], then Lin et al. modify the gradient calculation position of MI-FGSM and proposed NI-FGSM to further improve the transferability of adversarial examples [11]:
where is the momentum term and μ is the momentum factor suggested as a constant value.
Nevertheless, these approaches still share a common drawback: they all incorporate the sign function into the iterative update process of adversarial samples. Multiple studies have demonstrated that reliance on the sign function may adversely affect the algorithm’s practical efficacy [14] and, in extreme cases, compromise its convergence guarantees [15]. Moreover, while NI-FGSM introduces extrapolation-based momentum updates into its iterative framework, this implementation does not maintain strict mathematical equivalence to the NAG method (which will be elaborated in the next section). This divergence suggests that NI-FGSM fails to harness the optimization advantages inherent to NAG. Consequently, its untapped potential in black-box transferable adversarial attacks warrants further exploration through refined algorithmic adaptations.
In addition to the generic gradient-based attack methods mentioned above, there are also some advanced feature-level techniques such as FIA [18], NEAA [19], and MFAA [20]. The core principle of these methods is to enhance transferability by selectively disrupting category-related and cross-model invariant features encoded in the intermediate layers of deep networks. By focusing on object-aware representations that consistently drive decisions across models, the resulting adversarial examples exhibit stronger transferability. In contrast, CAAM [21] improves transferability by exploiting cross-model channel redundancy and invariance, through channel shuffling/reweighting and channel-invariant blocks. Yet the same layer- or channel-level tampering that boosts transferability also leaves larger, more structured perturbation footprints, so stealth drops in exchange for higher black-box success.
Diffusion models
Diffusion Models [22–24] show a significant performance in many fields, such as image generation [23], text-to-image generating [25–27], video generation [28–30], 3-D generation [31,32], and adversarial attacks [33–35].
The Denoised Diffusion Probabilistic Model (DDPM) [23] is a discretized variant of diffusion models. Let denote a sample from the natural image distribution. Through a forward diffusion process, Gaussian noise is incrementally added to x0, resulting in progressively noisier samples
over T steps, governed by the Markovian process:
where represents a Gaussian distribution, and
increases from 0 to 1. The conditional probability
is:
where and
. As
approaches zero,
converges to an isotropic Gaussian distribution.
The joint distribution is called the reverse process modeled by a Markov chain with learned Gaussian transitions, starting from
. It aims to reconstruct samples by refining Gaussian noise. The transition model,
, is optimized by minimizing the variational bound on the negative log-likelihood. A modified U-Net [23] is typically used for denoising. The reverse diffusion process is:
We can use to represent such a sampling process with
.
Diffusion models [23,36] are also used in guided image synthesis. Based on a diffusion model generative prior, SDEdit (Stochastic Differential Editing) [37] synthesizes realistic images by iteratively denoising via a stochastic differential equation. The key idea is to perturb the input sample according to and then iteratively apply the learned denoising function
:
SDEdit applies K forward diffusion steps, followed by K reverse steps to align the sample with the natural data distribution . In adversarial learning, SDEdit enhances stealthiness, generating images that reside in a transitional space between the input and realistic data distributions. Several diffusion-based methods have been proposed for adversarial attacks [8,9,34]. These approaches typically involve applying SDEdit to image samples at each step of the gradient ascent procedure. In addition, such a mechanism can also be used in adversarial defense tasks [38,39].
Methodology
In order to further strengthen the transferability of stealthy adversarial attacks, we present the Diff-AdaNAG algorithm for generating adversarial examples. The algorithm integrates the AdaNAG optimizer with a diffusion-based mechanism inspired by Diff-PGD. In the first subsection, we highlighted the differences between NAG and NI-FGSM. Then, we introduced AdaNAG to address this gap in the second subsection. Subsequently, in the third subsection, we proposed a diffusion-based AdaNAG method for achieving stealthy adversarial attacks. Finally, we presented an efficient accelerated algorithm for practical implementation in the last subsection. Fig 1 provides a high-level overview of the proposed Diff-AdaNAG algorithm.
q is forward diffusion and is backward denoising,
is the denoised sample.
The gap between Nesterov’s accelerated gradient and NI-FGSM
NAG [13,40] is a popular momentum method that has been shown to converge faster than traditional gradient descent in smooth convex optimization, and its form is as follows:
where , L is the Lipschitz constant satisfying L > 0,
is the momentum parameter satisfying
. From the second line of the algorithm, we can observe that the NAG method distinguishes itself by strategically shifting the gradient computation to the extrapolated point
rather than the current position
. This adjustment is pivotal as it allows the algorithm to effectively harness the momentum from previous iterations, thereby accelerating convergence.
NI-FGSM adopts a similar concept but introduces an additional L1-norm normalization step for calculating the momentum term,
and employs the sign function to adjust the algorithm’s final update direction,
However, these operations, causing the lack of strict mathematical equivalence between NI-FGSM and NAG carry the risk of distorting gradient information, which may lead to suboptimal algorithmic performance. This further exacerbates the gap between theoretical convergence in optimization (NAG) and practical effectiveness in adversarial attacks (NI-FGSM).
Adaptive Nesterov’s accelerated gradient
Fortunately, sign-based methods are closely connected to Adam-type methods [16]. Therefore, adaptive step-sizes offer a viable alternative to mitigate the potential drawbacks associated with the sign function. By incorporating it into the iterative steps of NAG, we propose the AdaNAG method:
where ,
,
with
and δ is a small constant to avoid division by zero. As we can see, unlike the step size strategy L−1 used in NAG for smooth optimization problems or the sign function employed in NI-FGSM for adversarial attacks, AdaNAG introduces an adaptive strategy
, which accumulates historical squared gradient information. Here,
is a diagonal matrix, with its diagonal entries representing the updated weights for the parameters in each dimension. Compared to the sign function, these entries may preserve more useful information. Moreover, given a clean image x and its corresponding label y, this method can also be used to generate adversarial examples. Consequently, AdaNAG bridges the gap between convex optimization theory and adversarial attack practice and, on the other hand, enhances interpretability by replacing problematic sign-based operations with adaptive step-sizes.
Diffusion-based adaptive Nesterov’s accelerated gradient
To further enhance stealthiness, we integrate a diffusion mechanism into AdaNAG. In the update process of AdaNAG, we first obtain the weighted form of the perturbed image by linearly interpolating between the perturbed image
and the clean image
with a weight
by
. Then we calculate the gradient of the loss function
with respect to the perturbed image
and update
along the gradient ascent direction:
In the spirit of Diff-PGD, we turn to use the purified image obtained by using SDEdit with K reverse steps. Then the update step of
in Diff-AdaNAG become
It can be observed that when computing the gradient, differentiation is taken with respect to y(t). This means we need to compute the adversarial gradient
through back-propagation. For such a reason, K cannot be too large due to GPU memory constraints.
Accelerated diff-AdaNAG with gradient approximation
To improve computational efficiency, we adopt the memory-saving gradient approximation strategy proposed in [32,41,42] along with fast sampling techniques [36]. Instead of backpropagating through the full K-step SDEdit process, which incurs high GPU memory costs, we approximate the Jacobian of K-step outputs with a constant c, leading to the simplification . This allows us to compute the gradient only at
without storing intermediate gradients
So the adversarial gradient can be computed by
Additionally, we accelerate inference by leveraging the DDIM [36] sampling strategy, which sub-samples the original diffusion steps (T) into a reduced set (Ts). Instead of using the full K steps in SDEdit, we scale it down to , significantly improving efficiency (e.g.,
). This reduces the number of function evaluations while preserving the effectiveness of SDEdit.
By leveraging gradient approximation alongside DDIM acceleration, our method achieves substantial savings in both computation time and VRAM usage since it operates without gradient computation and solely relies on the DDIM inference model, all while maintaining a high success rate in global attack tasks. Finally, the algorithm 1 shows our method in its entirety.
Algorithm 1 Diff-AdaNAG.
Experiments
Experimental setting
Datasets. We evaluate our Diff-AdaNAG algorithm on the validation dataset of ImageNet [43], which randomly selects 500 images from the ILSVRC 2012 validation set (https://www.image-net.org/challenges/LSVRC/2012/index.php).
Models. In order to demonstrate the superiority of the method, we conduct experiments on different models as much as possible. These models include: a series of ResNet models [44] including ResNet-18 (Res-18), ResNet-34 (Res-34), ResNet-50 (Res-50), and ResNet-101 (Res-101), EfficientNet-b0 (EffNet) [45], GoogLeNet (GooLeNet) [46], Inception-v3 (Inc-v3) [47], MNASNet0-5 (MNAS) [48], MobileNet-v3-small (MobNet) [49], ShuffleNet-v2-x0-5 (ShufNet) [50], SqueezeNet-1-1 (SqueNet) [51] and VGG11 (VGG-11) [52]. These models are all pre-trained models from the torchvision library [53] and are used as both source models for generating adversarial examples and target models for testing these adversarial examples.
Baselines. We compare the proposed Diff-AdaNAG algorithm with several state-of-the-art adversarial attacks, including, PGD [4], NI-FGSM [11], and Diff-PGD [8] and AdaMSI-FGM [16]. Our focus is on transfer-based attacks, where no query access to the target model is granted. Therefore, we do not compare with query-based methods.
Evaluation metrics. We evaluate the performance using the standard evaluation metric for adversarial attacks, Attack Success Rate (ASR), which is used to assess the quality of the adversarial example. ASR is defined as the percentage of adversarial examples that are misclassified by the target model. Higher ASR values indicate better adversarial example quality. Additionally, we leverage Frechet Inception Distance (FID) [54] as the indicator of the human imperceptibility of the crafted adversarial examples. A full-referenced metric, LPIPS [55], is also used to assess the perceptual differences.
Implementation details. Our experiments were conducted using an NVIDIA L20 GPU. Some experiments follow [56] to support the state-of-the-art baselines, and Diff-PGD follow [8]. The software versions used are Ubuntu 22.04.5, Python 3.11.11, PyTorch 2.5.1+cu124, and Torchvision 0.20.1. Previous research indicates that more iterations are favorable for convergence [57]. In order to better demonstrate the convergence process, n = 20 is adopted in our experiments. The maximum of norm perturbation
, and the batch size is set to 64 for all algorithms. For MI-FGSM and NI-FGSM, we adopt the default momentum parameter
and step-size
. For PGD, Diff-PGD, and AdaMSI-FGM, the step-size
. The remaining hyperparameters of AdaMSI-FGM were set to their default values, following the original paper [16]. For Diff-AdaNAG, we set
,
and
.
Results of adversarial attacks
In this section, we report various experimental results to demonstrate the effectiveness of the proposed Diff-AdaNAG method. First, we compare the performance of Diff-AdaNAG with several classic black-box and white-box attacks. Then, we compare the convergence between the proposed method and existing methods. Furthermore, we show the flexibility of the proposed method by combining Diff-AdaNAG with other existing black-box attacks such as Diverse Input Method (DIM) [58], Translation-Invariant Method (TIM) [59], and Scale-Invariant Method (SIM) [11]. Additionally, we also conduct an in-depth comparison with SI-NI-FGSM and several advanced feature-level attacks. Finally, we visualize the generated adversarial examples to demonstrate the stealthiness of the proposed method.
Comparison with classic attacks.
We compare the performance of Diff-AdaNAG with the classic attacks. The ASRs against the considered models are shown in Table 1.
It’s not hard to see from Table 1 that all algorithms exhibit strong white-box attack performance, with nearly 100% ASRs against all white-box models. On the other hand, our Diff-AdaNAG outperforms all other methods in black-box attacks by integrating AdaNAG and diffusion-based schemes.
Convergence comparison with diff-PGD.
To compare the convergence of the proposed Diff-AdaNAG, we use Diff-AdaNAG, Diff-PGD, and PGD to generate the adversarial examples on ResNet-18, ResNet-34, and ResNet-50, and VGG11. Specifically, the adversarial examples are crafted on the models with various numbers of iterations ranging from 1 to 10, and we log the attack success rates (ASR) of the generated adversarial examples and loss values at each iteration. The results are shown in Fig 2. It can be observed that Diff-PGD converges faster than PGD, which is consistent with the results in [8] and verifies the effectiveness of introducing the diffusion mechanism. Furthermore, our Diff-AdaNAG achieves the best convergence performance among all methods, and it yields higher attack success rates than other methods with the same number of iterations. In another view, Diff-AdaNAG achieves the same attack success rates as other methods with fewer iterations. This indicates that the proposed method can achieve a better convergence performance by combining the diffusion-based scheme with NAG.
The adversarial examples are generated on ResNet-18 (a), ResNet-34 (b), ResNet-50 (c), and VGG11 (d). The x-axis represents the number of iterations, and the y-axis represents the attack success rates and the logarithm of loss.
Flexibility.
To enhance the flexibility and transferability of our proposed method, we incorporate mechanisms from several existing black-box attack techniques. Specifically, we explore combinations of our method (Diff-AdaNAG) with three representative approaches: DIM [58], TIM [59], and SIM [11]. In our experiments, the baseline is an iterative FGSM variant (namely BIM or I-FGSM). We systematically denote the integrated variants—for instance, “DI” represents DIM combined with I-FGSM, whereas “DI+Ours” indicates DIM integrated with our proposed method. We conduct extensive experiments to evaluate both individual and joint combinations of these methods. In addition to assessing DI, TI, and SI separately, we also experiment with pairwise combinations such as DI+SI, DI+TI, and SI+TI, as well as the full combination DI+TI+SI.
From Table 2, it is evident that our method, when combined with DIM (denoted DI+Ours) and SIM (denoted SI+Ours), consistently outperforms their corresponding baselines (DI and SI alone). Specifically, DI+Ours and SI+Ours demonstrate clearly higher attack success rates across various model combinations. This indicates that our method has strong compatibility and synergy with DIM and SIM. However, when our method is integrated with TIM (denoted TI+Ours), although performance improves compared to baseline (TI alone), it remains inferior to our method without TI (see Table 1). The reason could be that TIM inherently averages gradients over spatial translations, potentially diminishing the strength of location-sensitive adversarial features generated by our method. Consequently, while TI enhances translation invariance, it may weaken essential high-frequency perturbations required for successful cross-model attacks, thus leading to comparatively lower performance. The results of other source models (MNASNet, MobileNet, etc.) are provided in S1 Table,S2 Table, and S3 Table.
To further verify that the performance degradation of TI stems from its intrinsic issues, we conducted experiments on pairwise combinations of TI, DI, and SI, with results presented in Table 3. Our experiments show that combining DI and SI with our method (DI+SI+Ours) achieves a significantly higher attack success rate than using DI, SI, or our method alone, indicating strong synergy between our method and DI/SI. However, when any method is combined with TI, the performance drops significantly, even below the baselines (e.g., DI+TI underperforms DI, and SI+TI underperforms SI). This confirms that TI’s inherent inflexibility negatively impacts performance. In contrast, our method is plug-and-play and consistently outperforms baselines across all scenarios, demonstrating the superior transferability and flexibility of Diff-AdaNAG. The results of other source models (MNASNet, MobileNet, etc.) are provided in S4 Table,S5 Table, and S6 Table.
When extending the analysis to combinations involving all three transformations (SI+TI+DI), the experimental results align with the above observations. As shown in Table 4, SI+TI+DI outperforms SI+TI and DI+TI but underperforms DI+SI, further confirming that TI detracts from overall attack success rates when combined with other methods. Nevertheless, incorporating our method (SI+TI+DI+Ours) significantly enhances the baseline combination (SI+TI+DI), boosting its transferability. The results of other source models (MNASNet, MobileNet, etc.) are provided in S7 Table.
Further comparison with NI-FGSM.
As stated in the first comparison of this section, our method outperforms all compared algorithms, including NI-FGSM, in both white-box and black-box scenarios. To further highlight the performance gap between our method and NI-FGSM, we conduct a deeper comparison. Reference [11] introduced two adversarial attack methods, NI-FGSM and SIM, and demonstrated that SIM can be combined with other attack methods to enhance performance. This conclusion is also validated in the subsection above (e.g., SI+DI outperforms DI, SI+TI outperforms TI, and SI+TI+DI outperforms TI+DI). Reference [11] further emphasizes that combining SIM with NI-FGSM to form the SI-NI-FGSM method significantly improves attack performance, particularly in challenging black-box scenarios. Therefore, we compare our method with SI-NI-FGSM by introducing SI to form SI-DiffAdaNAG (SI+Ours). The adversarial examples are crafted on ResNet-18 with various numbers of iterations ranging from 0 to 20 and then transferred to attack ResNet-34, ResNet-50, and ResNet-101.
As shown in Fig 3(a), SIM, combined with our proposed method (denoted as SI+Ours), consistently achieves superior attack success rates compared to the original SI-NI-FGSM across different ResNet model combinations. Specifically, when attacking ResNet-18 itself, both methods show rapidly increasing success rates with more steps. However, SI+Ours reaches near-perfect success faster, indicating stronger attack capability. When the adversarial examples generated by ResNet-18 are transferred to black-box settings by choosing ResNet-34, ResNet-50, and ResNet-101 as the target models, our method consistently achieves higher success rates than SI-NI-FGSM as the step count increases, demonstrating improved transferability and robustness. Notably, at step 20, the attack success rate of SI+Ours exceeds SI-NI-FGSM by approximately 5%, highlighting a substantial improvement in the transferability.
(a) Attack success rates of different model combinations with increasing steps, attacks are launched from ResNet-18 and transferred to four target models: ResNet-18 (white-box), ResNet-34, ResNet-50, and ResNet-101 (black-box settings); (b) Attack success rates of different models, the source model is ResNet-18.
Additionally, the comparative results in Fig 3(b) further reinforce the effectiveness of our enhanced method across a variety of architectures, including ResNet series, EfficientNet, GoogLeNet, Inception-v3, and other models described in the experimental setting. Across all tested architectures, SI+Ours consistently surpasses the baseline SI-NI-FGSM, with improvements ranging from approximately 3% to 10% in terms of attack success rates. This consistent improvement across diverse architectures demonstrates the generalization ability of our method in black-box transfer settings. The results for the remaining models (MNASNet, MobileNet, etc.) are provided in S2 Table.
Overall, the performance divide between NI-FGSM and Diff-AdaNAG further illustrates the gap between NAG method in optimization theory and its adversarial attack practice. The results not only indicate that Diff-AdaNAG has a better transferability, but also demonstrate that with the property of looking ahead, Diff-AdaNAG can accelerate the generation of adversarial examples, making it highly effective for generating transferable adversarial attacks in practical scenarios.
Further comparison with advanced feature-level attacks.
In this subsection, we compare our approach with several advanced feature-level attacks that incorporate additional techniques, such as exploiting information extracted from intermediate layers of classifiers (FIA [18], NEAA [19], MFAA [20]), as well as leveraging channel redundancy and invariance to perturb convolutional features (CAAM [21]).
It should be noted that all comparative experiments in this paper were conducted with a perturbation budget of 4/255, rather than the 16/255 adopted in the original papers. The attack models evaluated in this subsection include ResNet-152 (Res-152) [44], Inception-v3 (Inc-v3) [47], Inception-v4 (Inc-v4) and InceptionResNet-v2 (IncRes-v2) [60], as well as VGG16 (VGG-16) and VGG19 (VGG-19) [52]. For FIA, NEAA, and MFAA, we adopt the attack layers of Mixed_5b for Inc-v3, Mixed_5a for Inc-v4, conv2d_4a for IncRes-v2, Conv3_3 for VGG-16, Conv3_4 for VGG-19, and the final layer of the second block for Res-152, respectively. For MFAA, since it requires fusing multiple layers of feature representations, in addition to the designated attack layer we further select supplementary layers. Specifically, for Res-152 we follow the configuration provided in its official GitHub repository (https://github.com/KWPCCC/MFAA) and include unit 9 of the third block (layer3.8), unit 19 of the third block (layer3.18), unit 29 of the third block (layer3.28), and the final convolutional layer (layer4), which we summarize as [layer2, layer3.8, layer3.18, layer3.28, layer4]. For the remaining source models, where the original paper does not explicitly specify additional layers, we adopt a similar strategy to that of Res-152. Concretely, we use [Mixed_5b, Mixed_5d, Mixed_6e, Mixed_7c] for Inc-v3; [Mixed_5a, ReductionA, ReductionB, InceptionC] for In-v4; [conv2d_4a, mixed_6a, mixed_7a, conv2d_7b] for IncRes-v2; [conv3_3, conv4_3, conv5_3] for VGG-16; and [conv3_4, conv4_4, conv5_4] for VGG-19. All other implementation details are kept consistent with those reported in the Implementation Details of Experiments sections. The ASRs against the considered six models are shown in Table 5.
The results show that in the white-box scenario, the proposed method consistently pushes the success rate to over 99% on VGG-16/19 and Res-152, and to 98.2% on Inc-v3, outperforming the second-best method by 2–14 absolute points. More importantly, our method also obtains the highest average black-box ASR (45.0%), surpassing MFAA (43.9%) and CAAM (42.8%) while leaving FIA/NEAA further behind (≈ 41%). The improvement is especially pronounced when Res-152 or either VGG model serves as the surrogate, yielding cross-model gains of +2.6% and +4.6% over the runner-up, respectively.
Human-imperceptible.
Quantitative analysis. In the last two columns of Tables 1 and 5, we report FID and LPIPS, where the former serves as an indicator of the human imperceptibility of the crafted adversarial examples, and the latter is used to assess their perceptual differences. From the results of Table 1, we can observe that our method outperforms the others on both metrics except for PGD. Although PGD achieves slightly better imperceptibility scores, this may stem from its weaker attack strength, as reflected in its inferior ASR performance. By contrast, our method consistently achieves higher ASR while maintaining favorable perceptual quality. From the results of Table 5, Diff-AdaNAG exhibits the smallest perceptual distortion, achieving an average FID of 20.11 and LPIPS of 0.03—roughly 40% lower than the second-best CAAM. These results corroborate that the adaptive NAG momentum and diffusion-based noise scheduling embedded in ours successfully concentrate the perturbation energy on transferable yet perceptually redundant directions, delivering the best known trade-off between transfer strength and visual stealth.
Qualitative analysis. We visualize fifteen adversarial examples generated by PGD, NI-FGSM, Diff-PGD, AdaMSI-FGM, and Ours. The original images are shown in Fig 4(a). We choose ResNet34 as the source model. The resulting adversarial examples are displayed in Fig 4(b)–4(f). It is noteworthy that all of these adversarial noises are human-imperceptible. The experiments demonstrate that the proposed method boosts transferability while preserving stealthiness.
(a) represents the original image and (b)-(f) represent the adversarial examples generated by five methods.
Parameter sensitivity and computational cost.
The diffusion model used in Diff-AdaNAG is moderately sensitive to the DDIM sampling parameters. Specifically, the number of sampling steps Ts and the steps Ks used in SDEdit should not be too large. In our experiments, using DDIM50 (Ts = 50) with Ks = 1 achieves the best trade-off between attack success rate and imperceptibility. Increasing Ts (e.g., to DDIM100) or enlarging Ks tends to reduce transferability while slightly improving imperceptibility. The parameters adopted in this paper are chosen based on this balance. In addition, we analyze two remaining hyperparameters, and η. The initial hyperparameter
is set to 1, as theoretically it is required to lie within the range (0,1]. The step size η is selected through a grid search over [0.001,0.005,0.01,0.05,0.1], and the optimal value of
is adopted in our experiments. In terms of computational cost, most of the runtime comes from the diffusion denoising process, while the adaptive Nesterov optimization adds negligible overhead. Although Diff-AdaNAG is somewhat slower than purely gradient-based attacks, it remains computationally feasible and can be efficiently parallelized on GPUs.
Discussion
The utilization of Diff-AdaNAG for adversarial sample generation can be viewed as analogous to training neural networks via adaptive NAG optimization augmented with a diffusion mechanism. This intrinsic relationship endows the proposed method with transferability that inherits the accelerated convergence properties and stable generalization guarantees native to NAG. The work presented in this paper further inspires the idea that advanced algorithms in optimization can be leveraged to enhance adversarial attack performance. Future convergence proofs for such algorithms will provide theoretical support for the interpretability of adversarial attack algorithms.
Additionally, a key innovation lies in the adaptive stepsize mechanism, which enables automated per-dimension learning rate assignment for adversarial perturbations. This refinement not only optimizes the noise manifold but strategically reduces update magnitudes in perceptually non-salient regions, thereby minimizing -norm distortion while maintaining attack success rates, enabling stealthy adversarial attacks that are imperceptible to the human eye. The introduction of the diffusion mechanism not only significantly enhances the concealment of generated samples but can also be utilized for style-customized adversarial sample generation to adapt to physical-world attacks. This will be pursued as part of our future research.
Conclusions
This paper introduces an innovative adversarial attack framework that leverages a diffusion-based perturbation synthesis mechanism to generate visually imperceptible noise patterns. By synergistically integrating momentum extrapolation principles with adaptive step-size scaling, the proposed method unlocks the untapped potential of the vanilla Nesterov’s Accelerated Gradient (NAG) optimizer in crafting transferable adversarial examples under black-box constraints. Extensive experiments on the ImageNet dataset validate the effectiveness of our algorithm, demonstrating its ability to combine with various transfer-based black-box adversarial attacks to boost transferability while maintaining the stealthiness of adversarial attacks.
Supporting information
S1 Table. Supplementary Results for Table 2 (a).
Attack success rates (%) of adversarial attacks against twelve models. This table contains the supplementary results for the subsection Flexibility.
https://doi.org/10.1371/journal.pone.0337463.s001
(XLSX)
S2 Table. Supplementary Results for Table 2 (b) and Fig 3.
Attack success rates (%) of adversarial attacks against twelve models. This table contains the supplementary results for the subsection Flexibility and Further Comparison with NI-FGSM.
https://doi.org/10.1371/journal.pone.0337463.s002
(XLSX)
S3 Table. Supplementary Results for Table 2 (c).
Attack success rates (%) of adversarial attacks against twelve models. This table contains the supplementary results for the subsection Flexibility.
https://doi.org/10.1371/journal.pone.0337463.s003
(XLSX)
S4 Table. Supplementary 4esults for Table 3 (a).
Attack success rates (%) of adversarial attacks against twelve models. This table contains the supplementary results for the subsection Flexibility.
https://doi.org/10.1371/journal.pone.0337463.s004
(XLSX)
S5 Table. Supplementary results for Table 3 (b).
Attack success rates (%) of adversarial attacks against twelve models. This table contains the supplementary results for the subsection Flexibility.
https://doi.org/10.1371/journal.pone.0337463.s005
(XLSX)
S6 Table. Supplementary results for Table 3 (c).
Attack success rates (%) of adversarial attacks against twelve models. This table contains the supplementary results for the subsection Flexibility.
https://doi.org/10.1371/journal.pone.0337463.s006
(XLSX)
S7 Table. Supplementary results for Table 4.
Attack success rates (%) of adversarial attacks against twelve models. This table contains the supplementary results for the subsection Flexibility.
https://doi.org/10.1371/journal.pone.0337463.s007
(XLSX)
References
- 1.
Goodfellow IJ, Shlens J, Szegedy C. Explaining and Harnessing Adversarial Examples. In: 2015 International Conference on Learning Representations (ICLR). San Diego, CA, USA; 2015.
- 2.
Szegedy C, Zaremba W, Sutskever I, Bruna J, Erhan D, Goodfellow IJ, et al. Intriguing properties of neural networks. In: 2014 International Conference on Learning Representations (ICLR). Banff, AB, Canada; 2014.
- 3.
Balles L, Hennig P. Dissecting Adam: The Sign, Magnitude and Variance of Stochastic Gradients. In: 2018 International Conference on Machine Learning (ICML). Stockholm, Sweden; 2018. p. 404–13.
- 4.
Madry A, Makelov A, Schmidt L, Tsipras D, Vladu A. Towards deep learning models resistant to adversarial attacks. In: 2018 International Conference on Learning Representations (ICLR). Vancouver, Canada; 2018.
- 5.
Moosavi-Dezfooli S, Fawzi A, Fawzi O, Frossard P. Universal adversarial perturbations. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA. 2017. p. 86–94.
- 6.
Athalye A, Carlini N, Wagner DA. Obfuscated gradients give a false sense of security: circumventing defenses to adversarial examples. In: 2018 International Conference on Machine Learning (ICML), Stockholm, Sweden. 2018. p. 274–83.
- 7.
Kang M, Song D, Li B. In: New Orleans, LA, USA, 2023. 73919–42.
- 8.
Xue H, Araujo A, Hu B, Chen Y. Diffusion-based adversarial sample generation for improved stealthiness and controllability. In: 37th International Conference on Neural Information Processing Systems (NeurIPS). vol. 36. New Orleans, LA, USA, 2023. p. 2894–921.
- 9.
Wu S, Sang Q. SAM and diffusion based adversarial sample generation for image quality assessment. In: 7th Chinese conference on Pattern Recognition and Computer Vision (PRCV). Urumqi, China; 2024. p. 383–97.
- 10.
Dong Y, Liao F, Pang T, Su H, Zhu J, Hu X, et al. Boosting adversarial attacks with momentum. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2018. p. 9185–93. https://doi.org/10.1109/cvpr.2018.00957
- 11.
Lin J, Song C, He K, Wang L, Hopcroft JE. Nesterov accelerated gradient and scale invariance for adversarial attacks. In: 2020 International Conference on Learning Representations (ICLR). Virtual Only Conference; 2020. p. 9185–93.
- 12. Polyak BT. Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics. 1964;4(5):1–17.
- 13. Nesterov YE. A method for solving the convex programming problem with convergence rate O(1/k2). Doklady Akademii Nauk SSSR. 1983;269(3):543–7.
- 14.
Karimireddy SP, Rebjock Q, Stich SU, Jaggi M. Error feedback fixes SignSGD and other gradient compression schemes. In: 2019 International Conference on Machine Learning (ICML). vol. 97; 2019. p. 3252–61.
- 15.
Balles L, Hennig P. Dissecting Adam: the sign, magnitude and variance of stochastic gradients. In: Proceedings of Machine Learning Research, Stockholm, Sweden, 2018. p. 413–22.
- 16. Long S, Tao W, LI S, Lei J, Zhang J. On the convergence of an adaptive momentum method for adversarial attacks. AAAI. 2024;38(13):14132–40.
- 17.
Kurakin A, Goodfellow I, Bengio S. Adversarial examples in the physical world. In: 2017 International Conference on Learning Representations (ICLR). Toulon, France; 2017.
- 18.
Wang Z, Guo H, Zhang Z, Liu W, Qin Z, Ren K. Feature importance-aware transferable adversarial attacks. In: 2021 IEEE International Conference on Computer Vision (ICCV). 2021. p. 7639–48.
- 19. Ke W, Zheng D, Li X, He Y, Li T, Min F. Improving the transferability of adversarial examples through neighborhood attribution. Knowledge-Based Systems. 2024;296:111909.
- 20. Zheng D, Ke W, Li X, Duan Y, Yin G, Min F. Enhancing the transferability of adversarial attacks via multi-feature attention. IEEE TransInformForensic Secur. 2025;20:1462–74.
- 21. Zheng D, Ke W, Li X, Zhang S, Yin G, Qian W, et al. Channel-augmented joint transformation for transferable adversarial attacks. Appl Intell. 2023;54(1):428–42.
- 22.
Sohl-Dickstein J, Weiss E, Maheswaranathan N, Ganguli S. Deep unsupervised learning using nonequilibrium thermodynamics. In: 2015 International Conference on Machine Learning (ICML), Lille, France. 2015. p. 2256–65.
- 23.
Ho J, Jain A, Abbeel P. Denoising diffusion probabilistic models. In: 34th International Conference on Neural Information Processing Systems (NeurIPS). vol. 33. Virtual Only Conference. 2020. p. 6840–51.
- 24.
Song Y, Sohl-Dickstein JN, Kingma DP, Kumar A, Ermon S, Poole B. Score-based generative modeling through stochastic differential equations. In: 2021 International Conference on Learning Representations (ICLR). Virtual Only Conference; 2021.
- 25.
Rombach R, Blattmann A, Lorenz D, Esser P, Ommer B. High-resolution image synthesis with latent diffusion models. In: 2022 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). New Orleans, LA, USA; 2022. p. 10684–95.
- 26.
Balaji Y, Nah S, Huang X, Vahdat A, Song J, Zhang Q. eDiff-I: text-to-image diffusion models with an ensemble of expert denoisers. arXiv preprint 2023.
- 27.
Saharia C, Chan W, Saxena S, Li L, Whang J, Denton EL, et al. Photorealistic text-to-image diffusion models with deep language understanding. In: 36th International Conference on Neural Information Processing Systems (NeurIPS). vol. 35. New Orleans, LA, USA. 2022. p. 36479–94.
- 28.
Ho J, Salimans T, Gritsenko A, Chan W, Norouzi M, Fleet DJ. Video diffusion models. In: 36th International Conference on Neural Information Processing Systems (NeurIPS). vol. 35. New Orleans, LA, USA. 2022. p. 8633–46.
- 29.
Ho J, Chan W, Saharia C, Whang J, Gao R, Gritsenko A. Imagen video: high definition video generation with diffusion models. 2022.
- 30.
Pan X, Qin P, Li Y, Xue H, Chen W. Synthesizing coherent story with auto-regressive latent diffusion models. In: 2024 Winter Conference on Applications of Computer Vision (WACV). Waikoloa, HI, USA; 2024. p. 2920–30.
- 31.
Lin C-H, Gao J, Tang L, Takikawa T, Zeng X, Huang X, et al. Magic3D: high-resolution text-to-3D content creation. In: 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2023. p. 300–9. https://doi.org/10.1109/cvpr52729.2023.00037
- 32.
Poole B, Jain A, Barron JT, Mildenhall B. DreamFusion: text-to-3D using 2D diffusion. In: 2023 International Conference on Learning Representations (ICLR). Kigali, Rwanda; 2023.
- 33.
Song Y, Shu R, Kushman N, Ermon S. Constructing unrestricted adversarial examples with generative models. In: 32th International Conference on Neural Information Processing Systems (NeurIPS). vol. 31. Montréal, QC, Canada. 2018.
- 34.
Chen X, Gao X, Zhao J, Ye K, Xu CZ. AdvDiffuser: natural adversarial example synthesis with diffusion models. In: 2023 IEEE International Conference on Computer Vision (ICCV). Paris, France; 2023. p. 4562–72.
- 35.
Dai X, Liang K, Xiao B. AdvDiff: generating unrestricted adversarial examples using diffusion models. In: 2024 European Conference on Computer Vision (ECCV). Milan, Lombardy, Italy; 2024. p. 93–109.
- 36.
Song J, Meng C, Ermon S. Denoising diffusion implicit models. In: 2021 International Conference on Learning Representations (ICLR). Virtual Only Conference; 2021.
- 37.
Meng C, He Y, Song Y, Song J, Wu J, Zhu JY, et al. SDEdit: guided image synthesis and editing with stochastic differential equations. In: 2022 International Conference on Learning Representations (ICLR). Virtual Only Conference; 2022.
- 38.
Nie W, Guo B, Huang Y, Xiao C, Vahdat A, Anandkumar A. Diffusion models for adversarial purification. In: 2022 International Conference on Machine Learning (ICML). vol. 162. Baltimore, MD, USA. 2022. p. 16805–27.
- 39.
Zhang K, Zhou H, Zhang J, Huang Q, Zhang W, Yu N. Ada3Diff: defending against 3D adversarial point clouds via adaptive diffusion. In: 31st ACM International Conference on Multimedia. Ottawa ON, Canada; 2023.
- 40. Tseng P. Approximation accuracy, gradient methods, and error bound for structured convex optimization. Math Program. 2010;125(2):263–95.
- 41.
Xue H, Liang C, Wu X, Chen Y. Toward effective protection against diffusion-based mimicry through score distillation. In: 2023 International Conference on Learning Representations (ICLR). Kigali, Rwanda; 2023.
- 42.
Chen Z, Li B, Wu S, Jiang K, Ding S, Zhang W. Content-based unrestricted adversarial attack. In: 37th International Conference on Neural Information Processing Systems (NeurIPS). vol. 36. New Orleans, LA, USA. 2023. p. 51719–33.
- 43.
Deng J, Dong W, Socher R, Li LJ, Li K, Fei-Fei L. ImageNet: a large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Miami, FL, USA; 2009. p. 248–55.
- 44.
He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Las Vegas, NV, USA; 2016. p. 770–8.
- 45.
Tan M, Le Q. EfficientNet: rethinking model scaling for convolutional neural networks. In: 2019 International Conference on Machine Learning (ICML). vol. 97. Long Beach, California, USA. 2019. p. 6105–14.
- 46.
Szegedy C, Wei Liu, Yangqing Jia, Sermanet P, Reed S, Anguelov D, et al. Going deeper with convolutions. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2015. p. 1–9. https://doi.org/10.1109/cvpr.2015.7298594
- 47.
Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z. Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA. 2016.
- 48.
Tan M, Chen B, Pang R, Vasudevan V, Sandler M, Howard A, et al. MnasNet: platform-aware neural architecture search for mobile. In: 2019 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Long Beach, CA, USA; 2019.
- 49.
Howard A, Sandler M, Chen B, Wang W, Chen L-C, Tan M, et al. Searching for MobileNetV3. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV). 2019. p. 1314–24. https://doi.org/10.1109/iccv.2019.00140
- 50.
Ma N, Zhang X, Zheng HT, Sun J. ShuffleNet V2: practical guidelines for efficient CNN architecture design. In: 2018 European Conference on Computer Vision (ECCV). Munich, Germany; 2018.
- 51.
Iandola FN, Han S, Moskewicz MW, Ashraf K, Dally WJ, Keutzer K. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size. 2016.
- 52.
Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. In: 2015 International Conference on Learning Representations (ICLR). San Diego, CA, USA; 2015.
- 53.
Paszke A, Gross S, Massa F, Lerer A, Bradbury J, Chanan G, et al. PyTorch: an imperative style, high-performance deep learning library. In: 33th International Conference on Neural Information Processing Systems (NeurIPS). vol. 32. Vancouver Canada. 2019.
- 54.
Heusel M, Ramsauer H, Unterthiner T, Nessler B, Hochreiter S. GANs trained by a two time-scale update rule converge to a local nash equilibrium. In: 31th International Conference on Neural Information Processing Systems (NeurIPS). vol. 30. Long Beach, CA, USA. 2017.
- 55.
Zhang R, Isola P, Efros AA, Shechtman E, Wang O. The unreasonable effectiveness of deep features as a perceptual metric. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Salt Lake City, UT, USA; 2018.
- 56.
Kim H. Torchattacks: a PyTorch repository for adversarial attacks. arXiv preprint 2021.
- 57.
Pintor M, Demetrio L, Sotgiu A, Demontis A, Carlini N, Biggio B, et al. Indicators of attack failure: debugging and improving optimization of adversarial examples. In: 36th International Conference on Neural Information Processing Systems (NeurIPS). vol. 35. New Orleans, LA, USA. 2022. p. 23063–76.
- 58.
Xie C, Zhang Z, Zhou Y, Bai S, Wang J, Ren Z, et al. Improving transferability of adversarial examples with input diversity. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2019. p. 2725–34. https://doi.org/10.1109/cvpr.2019.00284
- 59.
Dong Y, Pang T, Su H, Zhu J. Evading defenses to transferable adversarial examples by translation-invariant attacks. In: 2019 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Long Beach, CA, USA; 2019. p. 4307–16.
- 60. Szegedy C, Ioffe S, Vanhoucke V, Alemi A. Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning. AAAI. 2017;31(1).