Fig 1.
The overall architecture of the proposed CT-GateNet model.
Fig 2.
Diagram of our Gated Channel-Spatial Attention (GCSA) mechanism.
Fig 3.
The Adaptive Feature Fusion Gate (AFFG) mechanism.
Table 1.
Experimental datasets overview.
Fig 4.
Representative spectrograms of music samples from the employed datasets: (a) Country genre from the GTZAN dataset, (b) Folk genre from the FMA-SMALL dataset, (c) Electronic genre from the FMA-Medium dataset.
Table 2.
DDIM audio generation algorithm.
Fig 5.
An example of the generation model on the GTZAN dataset.
Table 3.
Performance comparison with state-of-the-art methods on GTZAN dataset.
Table 4.
Performance comparison of deep learning backbones and proposed method on GTZAN dataset.
Fig 6.
Performance metrics of each target music genre on GTZAN dataset.
Fig 7.
GTZAN normalized confusion matrix (test set).
Table 5.
Performance comparison with state-of-the-art methods on FMA-SMALL dataset.
Table 6.
Performance comparison of deep learning backbones and proposed method on FMA-SMALL dataset.
Fig 8.
Performance metrics of each target music genre on FMA-SMALL dataset.
Fig 9.
FMA-SMALL normalized confusion matrix (test set).
Table 7.
Performance comparison with state-of-the-art methods on FMA-Medium dataset.
Table 8.
Performance comparison of deep learning backbones and proposed method on FMA-Medium dataset.
Fig 10.
Performance metrics of each target music genre on FMA-Medium dataset.
Fig 11.
FMA-Medium normalized confusion matrix (test set).
Table 9.
The ablation experiment results on the GTZAN dataset.
Table 10.
The ablation experiment results on the FMA-SMALL dataset.
Table 11.
The ablation experiment results on the FMA-Medium dataset.
Table 12.
Performance comparison of different data augmentation methods on GTZAN and FMA-SMALL datasets.