Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

A robust zero-watermarking scheme based on non-negative matrix factorization for audio protection

Abstract

The copyright problem of digital products is becoming more and more prominent. In this case, digital watermarking technology has attracted the attention of many experts and scholars in the field of information security. Among the proposed technologies, zero-watermarking technology has been favored greatly with its excellent imperceptibility. In this paper, a novel robust audio zero-watermarking scheme is designed by applying non-negative matrix decomposition algorithm to zero-watermarking technology. Firstly, the proposed scheme divides the input audio signal into fixed frames, then applies fast Fourier transform(FFT) and non-negative matrix factorization(NMF) algorithm to extract the feature vector of the original audio signal. Finally, XOR the feature vector and the digital watermark sequence to achieve the embedding of zero-watermarking. The experimental results show that the proposed scheme performs more effectively in resisting common and frame-desynchronization attacks than the existing zero-watermarking schemes.

Introduction

Nowadays, due to the rapid development of network and computer communication technology, multimedia files can be easily distributed, shared and manipulated by people [1, 2]. Undoubtedly, this dramatically increases the demand for copyright protection. While digital watermarking is a promising technology to protect the digital audio products from tampering [3]. Specifically, zero-watermarking technology hides the copyright information in the original audio signal in an imperceptible way. It has become the main focus of researchers. However, a good zero-watermarking scheme also needs to be robust enough to resist many intentional or unintentional attacks [4, 5]. So this paper makes efforts to improve robust performance.

Watermarking technology can be divided into non-blind (including semi-blind) and blind watermarking according to whether the original audio signal and watermark are needed for watermark extraction. Non-blind watermarking requires the original audio and watermark to participate in watermark extraction, while blind watermark requires not [6]. Obviously, blind watermarking is more valuable in practical application. The audio zero-watermarking scheme designed in this paper belongs to blind watermarking. Until now, audio blind watermarking has been explored a lot.

All existed zero-watermarking schemes can be broadly categorized into two groups: time domain based and transform domain based. Methods in transform domain are more popular than in time domain due to its simplicity and effectiveness. At present, the advanced zero-watermarking schemes are mainly based on discrete cosine transform(DCT) [7], discrete wavelet transform(DWT) [8], singular value decomposition(SVD) [9] or the combination of multiple transform domain [10, 11].

The zero-watermarking schemes based on the transform domain utilize the essential characteristics of the original audio signal. Specifically, an audio zero-watermarking scheme based on dct coefficients symbol is proposed in paper [12]. It performs DCT transformation on the average value of the absolute value of each frame, then selects the maximum value of the DCT coefficient to realize the embedding of watermark. In [13], DWT transformation is executed on the host signal to get the power approximation or energy of the audio segment. Then, the watermark is extracted according to the relative energy of the continuous segment. This scheme is a suitable candidate for audio copyright protection. However, it does have weak point. Because the watermark is constructed by relationship between adjacent fragments, so this scheme is not robust against frame-desynchronization attacks. Min L et al. [11] proposes a scheme based on DCT-DWT-SVD which can effectively resist common attacks. The watermark is registered by performing SVD on the coefficients which are generated after DWT and DCT transformation. However, it still has room for further improvement. For example, when it suffers low-pass filter attack, its NC value is less than 0.96.

Furthermore, as for some state-of-the-art techniques, researchers have explored the phase spectrum of Short Time Fourier Transform (STFT) [14] to generate watermark. But it just studies three types of attacks. The various other attacks are not considered. L.Xu et al. [15] propose an audio zero-watermarking method based on sparse representation, the OMP algorithm and K-SVD algorithm are adopted. After that, authors propose a novel zero-watermarking technique based on the GFT [16]. They use the combined shift operator to construct the graph signal, and then the stable graph Fourier coefficients are selected for encoding. These two zero-watermarking schemes have good robustness against various attacks, but ours is not bad either. From the experimental results, we can find that the NC values in our scheme can reach more than 99% when resisting common attacks. So the robustness of proposed scheme and above two schemes [15, 16] is comparable.

This paper proposes an audio zero watermarking scheme based on non-negative matrix factorization so as to further improve the robust performance. The reasons why NMF algorithm used are explained below.

  • Audios and images are generally represented by high-dimensional data matrix in the fields of image recognition and speech signal processing. So how to deal with multi-dimensional data has become an urgent problem for researchers. While non-negative matrix decomposition is a simple, intuitive and effective matrix decomposition technique to decompose the original high-dimensional data matrix into the basis matrix and coefficient matrix of a lower dimension [17, 18].
  • More importantly, all elements in the matrix are not negative, which not only reduces the storage space, but also makes the decomposed results sparse. It is worth mentioning that this sparse feature can better represent the essential characteristics of the original audio signal [19]. Furthermore, it overcomes the limitation of SVD with non-negative restriction.
  • In this paper, We use the maximum value of each group of coefficients to represent each frame of audio signal, and then encode all the maximum values. Through conducting comparative experiments with some zero-watermarking schemes, experiments results show that the proposed scheme can resist attacks better and enhance the robustness performance.

Work contributions

Due to the rapid development of network technology, multimedia files can be easily distributed, shared and manipulated, so the demand for copyright protection is increasing day by day. A key contribution of this work is to apply NMF algorithm to zero-watermarking technology and propose a new scheme for audio copyright protection.

Audio zero-watermarking technology is a promising audio copyright protection technology with its excellent imperceptibility. Furthermore, improving its robustness is still the goal pursued by researchers. A second key contribution of this work is that we have conducted experiments on a number of known data sets and compared the results with those of three related works. Experimental results show the proposed scheme can achieve satisfactory robust performance against common attacks and frame-desynchronization attacks.

Materials and methods

The basic concept of non-negative matrix factorization

NMF algorithm is a matrix factorization algorithm with non-negative constraints. Given a matrix , looking for the non-negative basis matrix and the non-negative coefficient matrix to satisfy VW × H. A non-negative matrix is thus decomposed into the product of two non-negative matrices. By replacing the original data matrix with the coefficient matrix, the dimensionality reduction matrix of the data feature can be obtained. Convert the matrix factorization problem into a problem of minimizing the error between two matrices. This theory can be better explained by the following formula: (1) where V is the original matrix and V′ is the matrix to be updated (V′ = W × H). Vij is the atom in row i and column j of the V matrix. In order to obtain the optimal value, the iterative formulas used in the experiment for the matrices W and H are given below. (2) (3) The NMF algorithm is divided into two parts: training process and testing process. When training, the W matrix and H matrix are initialized randomly, where k represents the number of iterations. When Eq (1) converges and approaches 0 through iteration, it indicates that the input V matrix has been decomposed into W and H matrices. The W matrix will be stored. When testing, input V matrix and W matrix obtained from the training process, according to the formula in algorithm 2 to obtain the H matrix. Algorithm 1 and 2 describes NMF training and testing process clearly and concisely.

Algorithm 1: NMF-training process

Input: Vn×m;

Initialize the random matrix Wn×r and matrix Hr×m.

  

for k ← 1 to iteration number do

   ;

   ;

V = W × H

end

Output: Wn×r, Hr×m

Algorithm 2: NMF-testing process

Input: Vn×m, Wn×r

do

   ;

Output:Hr×m

The proposed zero-watermarking scheme based on non-negative matrix factorization

The scheme designed in this paper includes two modules: watermark embedding and watermark extraction. The detailed analysis is as follows. The simple and comprehensible watermark embedding and extraction process can be seen in Fig 1.

  1. Watermark embedding process
    (1)Watermark image preprocessing: In this paper, the binary image C of N × N (N = 32) is used as copyright information, and the binary image is transformed into a one-dimensional signal vector by dimensionality reduction processing. Each pixel in the binary image is represented by a one-dimensional signal vector, namely: (4) c represents the pixel point of the image, and M represents the total number of pixels of the binary image.
    (2)Audio preprocessing: At first, the number of frames of the audio signal should be consistent with the total number of pixels of the watermark image, so the input audio signal Y is divided into fixed M frames, and fast Fourier transform is performed on each frame of the signal. The length of input signal is denoted as Ylen, and the length of each frame is denoted as Flen. (5)
    (3)Encoding: The preprocessed audio signal is decomposed by non-negative matrix to obtain the basis matrix and coefficient matrix, then take the maximum value ηi(1 ≤ iM) in the coefficient matrix generated by each frame signal, Mean(ηi) is obtained after averaging them, and a polarity vector Bi is formed according to the size relation between ηi and Mean(ηi). (6) (7)
    (4)Obtain the watermark key: XOR the polarity vector B and the one-dimensional watermark signal C to obtain the watermark key. So that we achieve the embedding of watermark. (8)
  2. Watermark extraction process
    (1)Suppose the audio signal after attacks is Y′, then repeat step (2) of the watermark embedding above to conduct audio preprocessing.
    (2)Get the polarity vector B′ according to the same encoding method (3) as the watermark embedding above.
    (3)XOR the polarity vector and the key to obtain the extracted watermark signal C′. (9)
    (4)The one-dimensional watermark signal is converted to two-dimensional watermark signal, and the binary watermark image is obtained after image restoration.

Experimental results and analyses

Experiment preparation

The input audio signal of this experiment is from the LibriSpeech corpus [20]. It is composed of 585 hours of real voice data and corresponding text collected by 2456 speakers at a sampling rate of 24kHz. In this experiment, we set the resample point of the speech signal to 256. In order to prove the validity and the feasibility of the proposed scheme, the selected audio data set is divided into training data set and testing data set. The training data set consists of 250 sentences selected from three randomly selected people for about 20 minutes. The testing data type is divided into in-domain and out-of-domain conditions for the experiment, each testing data set contains 50 sentences and takes about 3 minutes. In-domain means that the testing data is within the range of the training data set, and out-of-domain means that the testing data is outside the range of the training data set. The detailed information is shown in Table 1. In this paper, for evaluating the robustness of the proposed scheme, we choose six different types of attacks. Below is a brief introduction to each attack.

  • Noise: the SNR of the additive white Gaussian noise (AWGN) which is added to the original audio signal is 10dB, 20dB, 30dB.
  • Low pass filter: a low-pass filter with cut-off frequency 5kHz is used.
  • Resampling: the sampling frequency is first changed to one half of the original sampling frequency, and then changed to the original sampling frequency.
  • MP3 compression: the original signal is compressed in MP3 format (128kbps).
  • Re-quantization: the audio signal is quantized from 16 bits to 8 bits, and then quantized from 8 bits to 16 bits.
  • Frame-desynchronization: the beginning or the end of audio signal will be cropped to 0.5 second, 1 second and 2 seconds, respectively.
thumbnail
Table 1. The Detailed experiment introduction and classification.

(a,x,y and z represent a set of statements for a particular speaker).

https://doi.org/10.1371/journal.pone.0270579.t001

In this paper, the bit error rate(BER) is used to measure the difference between the extracted watermark and the original watermark, and the normalized correlation coefficient(NC) is used to measure the similarity between the extracted watermark and the original watermark, both BER and NC are the evaluation metrics to measure the robustness of the proposed scheme under various attacks. The specific calculation formulas are as follows. (10) (11) where Nbc is the number of error bits, Nbo means the total bits of the original watermark sequence, C represents the embedded watermark sequence and C′ represents the extracted watermark sequence, N is the length of watermark sequence. BER is closer to 0 and NC is closer to 1, the performance of the proposed scheme against attacks is better.

Imperceptibility analysis

Zero-watermarking technology does not modify the original data. There is no perceptible difference between the watermarked carrier and the original carrier, so it has excellent imperceptibility.

Robustness analysis

Robustness comparison between the proposed scheme and other zero-watermarking schemes under common attacks.

Through BER and NC values, Table 2 compares the robust performance of the proposed scheme and some related zero-watermarking schemes [10, 12, 13] against various common attacks.

thumbnail
Table 2. Robustness comparison of the proposed scheme and other zero-watermarking schemes under common attacks, where **/** indicates average metrics BER/NC and the bold mark indicates the best number across all the schemes under each attack.

https://doi.org/10.1371/journal.pone.0270579.t002

As for the proposed scheme, We observe that whether it is in-domain or out-of-domain, excluding noise attack (SNR = 10dB), the BER values under all other attacks are all below 1%, and the range of all NC values is close to or greater than 99%. This clearly illustrates the good robustness of the scheme proposed in this paper against different common attacks. For the noise attack and re-quantization attack, compared with other schemes, the proposed scheme clearly shows better BER and NC values. For the low pass filter attack, compared with the schemes [10, 12], the proposed scheme provides better BER and NC values while maintaining robustness comparable to the scheme proposed in [13]. In terms of re-sampling attack, the proposed scheme has the same robust performance as schemes [10, 12, 13], but it is also slightly better than schemes [10, 12, 13] from the values of BER and NC. The robust performance of the proposed scheme against MP3 compression attack is similar to that of scheme [10], but better than that of scheme [12, 13]. Generally speaking, the NC values of the proposed scheme are almost all around 99%, while the NC values of schemes [10, 12, 13] all range from 90% to 99%.

When resisting different types of common attacks, the original watermark image and watermark images extracted by the proposed scheme and other zero-watermarking schemes are shown in Figs 25. A–F represents the different attack types. Specifically, A-no attack, B-noise attack, C-low pass filter attack, D-resampling attack, E-MP3 compression attack and F-re-quantization attack. We can observe that the watermark images we extracted are all clearly visible. Specifically, When resisting noise attack(10dB), the extracted watermark image of DCT [12] scheme, DWT [13] scheme and DWT-DCT [10] scheme are a bit fuzzy, showing poor robustness. This directly demonstrates better robustness of our scheme.

thumbnail
Fig 2. The proposed scheme used to extract the watermark images.

https://doi.org/10.1371/journal.pone.0270579.g002

thumbnail
Fig 5. DWT-DCT scheme used to extract the watermark images.

https://doi.org/10.1371/journal.pone.0270579.g005

Robustness comparison between the proposed scheme and other zero-watermarking schemes under frame-desynchronization attacks.

Table 3 shows BER and NC values of the proposed scheme and other zero-watermarking schemes under frame-desynchronization attacks. We can see that the proposed scheme based on blind extraction demonstrates good robustness for slight frame-desynchronization attacks, but poor robustness for serious frame-desynchronization attacks. When the duration of removed frames is less than or equal to 1 second, the BER and NC values are less than 0.3 and greater than 0.8. As the removed frames increase, the results show worse robustness.

thumbnail
Table 3. Robustness comparison of the proposed scheme and other zero-watermarking schemes under frame-desynchronization attacks.

https://doi.org/10.1371/journal.pone.0270579.t003

But on the whole, compared with other zero-watermarking schemes, the proposed scheme clearly indicates better BER and NC values. As the frame shift increases, the proposed scheme still shows better performance than other zero-watermarking schemes.

In addition, we select one situation where the audio signal is clipped to 0.5 second at the beginning and make a comparison with other zero-watermarking schemes. We can intuitively see from Fig 6 that When the duration of removed frames is 0.5 second, the extracted watermark image of the proposed scheme is relatively clear. While the extracted watermark images of the other three zero-watermarking schemes are very blurry. So we can conclude that the proposed scheme can resist slight frame-desynchronization attacks effectively.

thumbnail
Fig 6. Extracted watermark images when the audio signal is clipped to 0.5 second at the beginning.

(a) original watermark image; (b) image extracted by the proposed scheme; (c) image extracted by the DCT [12] scheme; (d) image extracted by the DWT [13] scheme; (e) image extracted by the DWT-DCT [10] scheme.

https://doi.org/10.1371/journal.pone.0270579.g006

The complexity comparison and analysis.

Table 4 lists the average time cost for an audio segment and memory cost in the watermark embedding process for different schemes. As can be seen from the Table 4, compared with most schemes, the proposed scheme is more time efficient.

thumbnail
Table 4. The experimental environment and complexity comparison.

https://doi.org/10.1371/journal.pone.0270579.t004

For the proposed scheme, the core process of constructing zero-watermarking information is extracting the coefficient matrix to represent the original audio signal. Firstly, we will have a training process for input audio signal to get a base matrix which will occupy the storage space of 73,728 bytes. Then we will obtain the coefficient matrix according to Algorithm 2 which is essentially a simple matrix operation process.

Conclusion

In this paper, a robust zero-watermarking scheme based on NMF is proposed. The audio signal is divided into fixed frames, then each frame of audio signal applies FFT and NMF to obtain the basis matrix and coefficient matrix. The maximum coefficient value of each frame signal after NMF decomposition is taken as the feature information about this frame, then the feature sequence is clustered into two classes to encode as binary sequence 0 and 1 respectively. The zero-watermarking sequence is created by performing XOR operation between the encoded sequence and the watermark sequence. Through comparative experiments, it can be observed that the proposed scheme can extract watermark image well and has good robustness under the common attacks and slight frame-desynchronization attacks. However, it still has limitations such as a need for a trusted third party. The key generated by the audio will be registered in the third party in a one to one correspondence. When extracting the watermark image, you will need to find the required key according to the mapping relationship.

References

  1. 1. Zong T., Xiang Y., Natgunanathan I., Gao L., Hua G. and Zhou W., “Non-Linear-Echo Based Anti-Collusion Mechanism for Audio Signals,” in IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 29, pp. 969–984, 2021.
  2. 2. A. Fierro-Radilla, M. Nakano-Miyatake, M. Cedillo-Hernandez, L. Cleofas-Sanchez and H. Perez-Meana, “A Robust Image Zero-watermarking using Convolutional Neural Networks,” 2019 7th International Workshop on Biometrics and Forensics (IWBF), Cancun, Mexico, 2019, pp. 1-5.
  3. 3. Xiang Y., Natgunanathan I., Peng D., Hua G. and Liu B., “Spread Spectrum Audio Watermarking Using Multiple Orthogonal PN Sequences and Variable Embedding Strengths and Polarities,” in IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 26, no. 3, pp. 529–539, March 2018.
  4. 4. Hua G., Goh J. and Thing V. L. L., “Time-Spread Echo-Based Audio Watermarking With Optimized Imperceptibility and Robustness,” in IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 23, no. 2, pp. 227–239, Feb. 2015.
  5. 5. J. FAN and K. U, “A Novel Image Zero-Watermarking Scheme Based on Non-Uniform Triangular Partition,” 2019 International Conference on Wavelet Analysis and Pattern Recognition (ICWAPR), Kobe, Japan, 2019, pp. 1-6.
  6. 6. Xiang Y., Natgunanathan I., Rong Y. and Guo S., “Spread Spectrum-Based High Embedding Capacity Watermarking Method for Audio Signals,” in IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 23, no. 12, pp. 2228–2237, Dec. 2015.
  7. 7. Z. Zhang, “Zero-watermarking algorithm based on DC component in DCT domain,” 2021 International Conference on Electronic Information Engineering and Computer Science (EIECS), 2021, pp. 475-478.
  8. 8. J. Panda, S. Choudhary, K. Nath and S. Kumar, “Audio zero watermarking scheme based on sub band mean energy comparison using DWT-DCT,” 2016 International Conference on Signal Processing and Communication (ICSC), 2016, pp. 352-357.
  9. 9. Farzaneh M, Toroghi R M. Robust Audio Watermarking Using Graph-based Transform and Singular Value Decomposition[C]// 2020 10th International Symposium on Telecommunications (IST). 2020.
  10. 10. Yu Y., Lei M., Liu Xiaoming, Qu Zhiguo and Cheng Wang, “Novel zero-watermarking scheme based on DWT-DCT,” in China Communications, vol. 13, no. 7, pp. 122–126, July 2016.
  11. 11. Lei Min, Yang Yu, Liu XiaoMing, Cheng MingZhi and Wang Rui, “Audio zero-watermark scheme based on discrete cosine transform-discrete wavelet transform-singular value decomposition,” in China Communications, vol. 13, no. 7, pp. 117–121, July 2016.
  12. 12. RUIPENG S, HAITAO X. An audio zerowatermarking algorithm based on dct coefficients symbol[J]. Computer Technology And Development, 2014, 24(05):146–149.
  13. 13. Yang Y., Lei M., Cheng M., Liu B., Lin G. and Xiao D., “An audio zero-watermark scheme based on energy comparing,” in China Communications, vol. 11, no. 7, pp. 110–116, July 2014.
  14. 14. A. Electa Alice Jayarani, M. R. Bhatt and D. D. Geetha, “Zero Watermarking on Audio Based on STFT,” 2018 International Conference on Computing, Electronics & Communications Engineering (iCCECE), 2018, pp. 253-256.
  15. 15. Xu L. et al., “A novel robust zero-watermarking algorithm for audio based on sparse representation,” in China Communications, vol. 18, no. 8, pp. 237–248, Aug. 2021.
  16. 16. Xu L., Huang D., Zaidi S. F. A., Rauf A. and Das R. K., “Graph Fourier Transform Based Audio Zero-Watermarking,” in IEEE Signal Processing Letters, vol. 28, pp. 1943–1947, 2021.
  17. 17. Byun J. and Shin J. W., “Initialization for NMF-based audio source separation using priors on encoding vectors,” in China Communications, vol. 16, no. 9, pp. 177–186, Sept. 2019.
  18. 18. H. Murata and A. Ogihara, “Note Estimation by Contaminated Normal Distribution for Audio Watermarking Method Using Non-negative Matrix Factorization,” 2018 International Symposium on Intelligent Signal Processing and Communication Systems (ISPACS), 2018, pp. 469-474.
  19. 19. Chen M., Gong M. and Li X., “Feature Weighted Non-Negative Matrix Factorization,” in IEEE Transactions on Cybernetics. pmid:34437084
  20. 20. ZEN H, DANG V, CLARK R, et al. Libritts: A corpus derived from librispeech for text-tospeech[J/OL]. CoRR, 2019, abs/1904.02882. http://arxiv.org/abs/1904.02882.