A Collusion-Resistant Fingerprinting System for Restricted Distribution of Digital Documents

Digital fingerprinting is a technique that consists of inserting the ID of an authorized user in the digital content that he requests. This technique has been mainly used to trace back pirate copies of multimedia content such as images, audio, and video. This study proposes the use of state-of-the-art digital fingerprinting techniques in the context of restricted distribution of digital documents. In particular, the system proposed by Kuribayashi for multimedia content is investigated. Extensive simulations show the robustness of the proposed system against average collusion attack. Perceptual transparency of the fingerprinted documents is also studied. Moreover, by using an efficient Fast Fourier Transform core and standard computer machines it is shown that the proposed system is suitable for real-world scenarios.


Introduction
The adoption of information systems is changing the way organizations work, allowing the automation of their processes and thus making them more efficient. One of the recurring processes of these organizations is the management of the document's lifecycle which is the workflow that defines how users interact with the documents from their creation until their usage in other organization's processes. Currently, many organizations have digitized their physical documents to automate the document's lifecycle which depends on the type of information contained in the digital documents, but has a common flow [1][2][3] In this lifecycle, digital documents with sensible information must be available only for authorized users. Cryptography and other security techniques can be used to protect documents during their storage and distribution, but once an authorized user obtains a clear copy document it has no more protection. The authorized user could act dishonestly, by distributing the document to unauthorized entities. The document copy distributed illegally is known as pirate document and the user that distributes it is named traitor. The problem of pirate documents can continue if it is not possible to identify the traitor [4][5][6][7]. To solve this problem, invisible watermarking techniques can be used to insert the ID of the user in the document requested. If a pirate copy is detected, it would be possible to identify the traitor by detecting the user's ID, which was inserted when the document was distributed to that user. This way of using the watermarking techniques is known as digital fingerprinting.
Digital fingerprinting was originally used to face the illegal distribution of multimedia content such as images [8], audio [9], and video [10], but this technique has been also used to protect digital documents [11], persuading users not to distribute pirate copies and detecting the users who do it. In this study, ''fingerprinting'' will be used as a synonym for ''digital fingerprinting'' and ''fingerprint'' will be used to refer to the unique ID of each user.
For the effectiveness of fingerprinting techniques, it is necessary to satisfy two main properties: N Perceptual transparency (namely unobtrusiveness, invisibility, or imperceptibility): The original document and its fingerprinted copy must be identical to the user perception. This is measured using the natural language understanding: If the document is entirely legible after the fingerprint insertion, this property is achieved [12,13].
N Robustness: It is the capacity of user's IDs to survive intentional, and unintentional attacks after being inserted into the document. If the fingerprint is removed or destroyed, the value of the digital document is lost [8].
In an organizational environment, an attack that is prone to happen is a collusion attack. This attack occurs when a set of traitors obtain a copy of a document, each copy having its own fingerprint. Traitors can perform an average of the copies to generate a new pirate document destroying their fingerprints in the process. Traitors who perform a collusion attack are named colluders.

Related Work
Different watermarking insertion techniques have been proposed for digital documents that can be used in fingerprinting. These techniques can be syntactic techniques, semantic techniques, and image-based techniques [12,14]. In this study, digital documents are represented as images; hence, from these three techniques, image-based techniques are chosen.
The first techniques for watermarking on digital documents represented as images were proposed by Brassil et al. [15] and Low et al. [16]. They consist of inserting watermarks by shifting the textlines in a vertical way allowing a bit encode per line. To detect the watermark, the distance between the lines is measured. These techniques only work for formatted documents and the inserted watermark is easy to remove by an averaging collusion attack. In [17,18], techniques based on horizontally shifting of words in the document were proposed. However, these techniques can only be applied to documents that have variable spacing between adjacent words. The watermark detection is only possible when having the knowledge of the space between words in the original document. Other techniques based on the distance between lines, words, or characters have been proposed in [19][20][21][22][23][24][25], and most of them have been proved to be robust against indirect attacks such as copying, printing, and scanning. Also, specific schemes have been proposed to face these problems [26,27]. However, the robustness of these techniques against collusion attacks is not reported.
Spread spectrum techniques for insertion have been widely used for natural images because they are robust for a wide range of attacks, including collusion attacks. Cox et al. [8] proposed the first watermarking technique using spread spectrum. In this technique, the user's fingerprint is represented as a spread spectrum sequence that is inserted in the most significant frequency regions of the image, because the less significant regions tend to be discarded when applying filters or other techniques of image processing. When the embedded sequence is extracted, it is necessary to establish a correlation with all the known user sequences to detect the traitor. This strategy increases the detection time linearly, and under a collusion attack it considers that all the users are likely to collude, which is not necessarily true. Despite the spread spectrum being used in [28] and [29] as an insertion technique for digital documents, its resistance to collusion attacks has not been reported.
Wang et al. [30] have considered that traitors are more likely to collude with users who share common characteristics such as social   groups, and the fingerprint is generated from the user ID and its group ID. At the detection stage, first, the group of colluders is identified and then the users who belong to that group are identified. This reduces the computational cost along with the probability of false-positive detection. Kuribayashi proposed in [31] a hierarchical fingerprinting scheme based on Code Division Multiple Access (CDMA). In this scheme, users are organized in groups, and the user's fingerprint is represented by a spread spectrum sequence, one for the user ID and another for the group ID. These sequences are orthogonal because they are Discrete Cosine Transform (DCT) basis vectors modulated by a pseudorandom sequence (PN) of 1 and {1 values, allowing retention of orthogonality.
The spread spectrum sequence for a group i is generated from a vector V of length L with all entries equal to 0, adding an amount of energy b g to the entry at position i. Then, the Inverse DCT (IDCT) is applied to V to obtain the i-th basis function of the DCT. Finally, V is modulated by a sequence PN generated from a secret key s, which provides security to the scheme because only the one who knows that key is able to detect groups. The spread spectrum sequence W i that is generated for the i-th group is expressed by Equation 1. Each component in the spread spectrum sequence for the group ID can be assigned to a group; therefore, the total amount of groups supported is L.  Generation of the spread spectrum sequence from an user j belonging to a group i is performed similar to the sequence of the group, with the difference that the PN seed is given by the group ID. The spread spectrum sequence assigned to the j-th user is computed according to Equation 2.
Using the group ID as the seed of PN, a link is stablished between the group and the user. As the number of groups and users per group is L, the total amount of users supported is L 2 . With the spread spectrum sequences of user and group, the fingerprint that represents a user is generated from Equation 3 and the total energy of the fingerprint is obtained from Equation 4.
The generated fingerprint is inserted in the frequency components of the host image. The starting point of insertion denoted as P w is selected from the low and middle frequencies. When a pirate document is detected, a sequenceŴ W i,j is extracted from its frequency components starting from the point P w , in a nonblinded fashion. FromŴ W i,j , it is possible to obtain a detection sequenced d g for group IDs, and a detection sequenced d u to detect user IDs. Detection is performed using two thresholds, one for the group (T g ) and another for the user (T u ). Upon detecting the presence of a group i, it is possible to detect the users belonging to that group. However, if a user belongs to the group iz1, the user will not be detected until detection is performed for that group and the spread spectrum sequences of its users in the image are examined. The system reported in [31] has proved be faster than the previous work as its fingerprint detection strategy is carried out using fast DCT algorithms.
In this study, the performance of the scheme presented in [31] is investigated for digital document distribution applications. Numerous experiments are performed to evaluate the properties of perceptual transparency and robustness. The outline of this paper is as follows: First, experimental results and discussion are presented. In the Methods and Materials section, details of the insertion, detection and determination of thresholds are provided, and the Fast DCT along with the PSNR and SSIM Index metrics used to evaluate perceptual transparency are recalled. Finally, conclusions are provided.

Results and Discussion
The proposed fingerprinting system implementation must satisfy perceptual transparency and robustness under averaging collusion attacks in digital documents. To achieve that goal, adequate values for the energy of user (b u ), energy of group (b g ), fingerprint length (L), and insertion position (P w ) are determined through experimentation. First, values that satisfy perceptual transparency are determined. Then, their best configuration is identified for Fingerprinting System for Digital Documents PLOS ONE | www.plosone.org maximum colluder detection. Simulations of collusion attacks are performed, generating pirate copies of digital documents in TIFF (lossless) and JPEG (lossy) formats. For the experiments, the input documents have been selected from a set of 1000 different digital documents in a JPEG format with average dimensions of 190062700 pixels, these documents have been obtained from [32]. Statistical significance is achieved for this sample size as random t-tests results (for significance level equal to 0.05) showed a p-value about 9:9514|10 {209 for the biggest.

Perceptual Transparency
To find values for b u , b g , L, and P w that satisfy perceptual transparency, these values were combined as defined in Table 1 to generate fingerprints, which were inserted in a set of 1000 digital documents. Then, the perceptual transparency was evaluated by using PSNR and SSIM Index as metrics (these metrics are detailed in the Methods and Materials section). It was found that b u and b g are the factors that affect the perceptual transparency more negatively as their values increase, as shown in Figure 1. Also, it was noticed that lower values of P w slightly increase the PSNR value, whereas when L increases, the PSNR value is slightly reduced. By fixing the values of P w and L and varying the values of b u and b g , it can be observed in Figures 1 and 2 that the obtained values of PSNR and SSIM Index are correlated. This is consistent with the findings of Hore and Ziou [33].

Inquest Evaluation
As the perceptual transparency for digital documents is achieved by preserving the legibility of the text, an inquest was applied to 100 respondents to determine the lowest PSNR values that satisfied legibility of digital documents. The inquest consisted of the evaluation of a subset having fifteen digital documents with a PSNR value in the range of 4-30 dB. The possible answers available for the respondents were the following: 1. I do not perceive image distortion. 2. I perceive image distortion but the text is easily readable. 3. I perceive image distortion and the text is hardly readable. 4. The text is not readable.
The results of the inquest are shown in Table 2. These results are plotted in Figure 3 that shows the change in the perception of the respondents while the PSNR value in the fingerprinted documents decreases. Most of the respondents considered that digital fingerprinted documents with PSNR values greater than 14 dB and SSIM Index greater than 0.887 are legibles; otherwise, those documents were considered as nonlegibles. Therefore, configurations of b u , b g , L, and P w that generate fingerprinted documents with PSNR and SSIM Index equal to or greater than these values satisfy perceptual transparency.

Collusion Attack on Digital Documents in TIFF Format
From the generated set of b u , b g , L, and P w , configurations that generate fingerprinted documents were selected with a PSNR value of 16 dB to satisfy perceptual transparency. The selected values of parameters of b u , b g , and L are shown in Table 3. As the best value for P w is the lowest one (1/6), this value was selected as the most appropriate and was fixed for the next simulations.

Robustness Factors
In Table 3, there are only five configurations of b u and b g for each value of L with a PSNR of 16 dB. To determine the values of b u and b g that allow the highest colluder detection probability, the first five configurations of b u and b g were chosen with L = 50,000. Then, averaging collusion attacks were simulated over 50 digital documents, from 2 colluders to 300 colluders that belong to the same group using the selected values. The fingerprinted documents were generated in the lossless format TIFF with a RGB color scheme.
The results of the simulation are plotted in Figures 4 and 5. Configuration 1 (b u~2 00,000, b g~5 0,000, L = 50,000) has the highest number of detected colluders, whereas configuration 5 (b u~5 0,000, b g~2 00,000, L = 50,000) has the lowest one. This is significant because the values of b u and b g in configurations 1 and 5 are inverted, because in the averaging collusion attack, the value of b u of each colluder is reduced proportionally to the number of Table 3. Configurations of b u , b g and L that satisfy perceptual transparency for a fixed P w . colluders. However, as b g is the same for each user, the energy of b g is first accumulated as many times as the number of colluders, and then it is divided by the same value, having no changes. Therefore, the amount of energy assigned to users must be the highest possible to resist a big amount of colluders. Figure 6 shows the detection sequence from a pirate document generated from the collusion of two colluders with ID = 300 and ID = 600. The fingerprints of both the colluders were defined by configuration 1, and despite the reduction in energy, the threshold T u could still detect them. Simulations performed for lossless digital documents presented a considerable amount of noise under T u . Much of this noise was because of the pixels in the host image after fingerprint insertion, which were not in the range of 0-255. Therefore, negative values were set to 0 and higher values of 255 were set to 255. This rounding off was reflected as noise.

Fingerprint Length
After defining the values of b u~2 00,000 and b g~5 0,000 as the configuration with the highest detection rate, a new simulation of collusion attacks was performed to determine the value of L that provides the best detection rate. Configurations with different values of L, b u~2 00,000, and b g~5 0,000 were selected from Table 3, and these configurations were 1, 6, 11, 16, 21, 26 and 31. The simulation results are plotted in Figures 7 and 8. As the value of L increased, the detection ratio also increased. In [25] Kuribayashi had already mentioned this behavior for images in general, but in that work, it was indicated that the value of L is limited by the image size. It is possible to calculate the highest value of L for the document samples, as shown in Equation 5, where L max is the maximum value of L, D w is the document width, D h is the document height, D com is the number of color components (typically 3), and P w is the insertion point of the fingerprint.
Using Equation 5, for the sample documents, L max is 12,150,000. Using L~350,000, it was possible to detect 270 colluders in a collusion attack with 300 colluders. Reducing b u and b g to increase L to maintain the perceptual transparency could be a good tradeoff.

Collusion Attack on Digital Documents in JPEG Format
Once identified that configuration 31 (b u~2 00,000, b g~5 0,00 and L~350,000) in Table 3 leads to the highest amount of detected colluders, these values were used to generate fingerprints and simulate collusion attacks using the lossy image format JPEG. In these simulations, the fingerprinted digital documents were compressed storing the image in a JPEG format. Then, the collusion attack was simulated and the resulting document was stored again in JPEG format. This implies that the fingerprints in the digital document were affected twofold by the compression process. Figure 9 shows the results of the simulated collusion attacks from 2 to 250 colluders, over 15 digital documents with their quality reduced to 80, 60, and 30%. Comparing the maximum amount of colluders detected using documents in JPEG format (Figure 9) with those using documents in TIFF format   (Figure 7), it was observed that the detection rate in documents in JPEG format diminished drastically. However, a considerable amount of colluders were still detected in digital documents in the lossy format JPEG, making this an attractive characteristic for organizations.

Performance Evaluation
The main functions of the proposed system are the insertion of a fingerprint and its detection. The execution time of these functions are critical when digital documents are fingerprinted in a production environment. To determine if the system holds feasible execution times, performance of insertion and detection of fingerprints were evaluated over 100 digital documents in JPEG and TIFF format (implementation details can be found in the Software Implementation section). The insertion evaluation considers the document read time, the fingerprint insertion time and the fingerprinted document write time. On the other hand, detection evaluation considers the document read time and the fingerprint detection. The results obtained are shown in Table 4. It is noticeable that insertion and detection time does not differ significantly between the formats. The most time-consuming task in the insertion and detection is the execution of the DCT/IDCT, with an execution time higher than 3500 ms. This can be observed in the relation of execution time between the insertion and detection of fingerprints in Table 4; the execution time of the insertion function requires two transformations and is twice the time of detection in which just one transformation is needed. Considerable time is spent in the insertion due to DCT transforms applied over digital documents of large dimensions; however, this time is still feasible for real-life applications. The time for performing the detection of users after the digital document has been read and transformed with the IDCT is very low, around 74 ms, allowing a wide amount of users per group and having an acceptable detection time.

Comparison
In related works (see the Related Work section), techniques for information insertion in digital documents can be divided into line shifting encoding, word shifting encoding, and character space encoding. As mentioned earlier, these techniques have not reported their robustness to collusion attacks. On the other hand, schemes described in [29] and [28] could potentially resist collusion attacks because they are based on spread spectrum techniques. However, these works do not report their robustness to these attacks. It is worth mentioning that for none of these techniques the impact of fingerprint insertion in perceptual transparency has been evaluated. Also, as these insertion techniques have not been conceived originally for fingerprinting, the supported amount of users is not provided. Table 5 shows a comparison of the approach proposed in this work against the approaches reviewed in the Related Work section, where insertion techniques are used on digital documents represented as images. Unlike related works, the selected fingerprinting scheme for the proposed system was validated regarding perceptual transparency and robustness to collusion attacks through experimentation.  Table 3

Materials and Methods
This section describes the Fast DCT and Inverse DCT (IDCT) Algorithms as well as the fingerprint insertion and detection methods. Also, the PSNR and SSIM Index metrics used for evaluation of the perceptual transparency in the proposed system are described.

Fast DCT and Inverse DCT Algorithms
It is known that the Fourier transform of a real-even function f ({x)~f (x) is real-even, and i times the Fourier transform of a real-odd function f ({x)~{f (x) is real-odd; thus, for these symmetry conditions, it is not necessary to use complex inputs/ output. Therefore, it is possible to compute the DCT or the Discrete Sine Transform (DST) by utilizing an FFT algorithm.
Let the input vector x(n~0::M{1) be even around n~{0:5 and even around n~M{0:5; it is possible to show that DFT(x) is the nonnormalized DCT of x, Y nonO (k) described as follows: with basis: b(n,k)~2 cos½(nz The basis set described in Equation (7) is nonorthogonal; therefore, it is necessary to normalize Equation (6) to get the orthogonal transform as follows: On the other hand, let the input vector Y (k~0::M{1) be even around k~0 and odd around k~M; it is possible to show that DFT(Y ) is the nonnormalized Inverse DCT of Y , x nonO (n) described as follows: As shown in Equation (6), a normalization procedure is necessary to obtain the orthogonal transform. The normalization is carried out as follows: In the literature, fast algorithms for the DFT have been extensively reported and very efficient software libraries exist [34]. In this work, these libraries are utilized as a module of DCT

Fingerprinting Method
The fingerprinting method for digital images consists in two procedures (the fingerprint insertion and detection) as follows: Fingerprint Insertion Method. The following steps describe the insertion method of the fingerprint W i, j described in Equation 1. This method is graphically shown in Figure 10: 1. Transform the image to the frequency domain using the DCT function. 2. Select L coefficients of the low and middle frequencies from a position P w . The selected coefficients are denoted as: 3. Insert the fingerprint additively in the extracted coefficients: 4. Transform the image to the spacial domain using the Inverse DCT function to get the fingerprinted image.
Fingerprint Detection Method. The following steps describe the detection of the user's fingerprint in an illicit copy of an image representing a digital document: 1. Transform the illicit copy to the frequency domain using the DCT function. 2. Select L coefficients of the low and middle frequencies from the position P w . The selected coefficients are denoted as: 3. Detect the group ID: (a) Generate PN using the secret key s. (b) Use the DCT function to extract the detection sequenced d g : (c) Calculate the variance s 2 g ofd d g considering the probability distribution and determine the threshold T g from a given false-positive denoted as Pe g : where erfc {1 stands for the inverse complementary error .
(d) Ifd d g in the input k exceeds the threshold T g , it is determined whether k is the group ID.
4. Detect the user ID: (a) Generate PN using the ID k of the detected group.
(b) Use the DCT function to extract the detection sequenced d u : (c) Calculate the variance s 2 u ofd d u in a similar way as in Equation 17, considering the probability distribution and determine the threshold T u from a given false-positive denoted as Pe u : (d) Ifd d u in the input h exceeds the threshold T u , it is determined that h is the user ID.
This hierarchical detection of fingerprints is illustrated in Figure 11.

Evaluation of Perceptual Transparency
In this study, two methods for evaluation of the perceptual transparency of fingerprinted images were utilized. Those methods are described as follows: Peak Signal-to-Noise Ratio. Peak Signal-to-Noise Ratio (PSNR) measures the similarity of two images. It defines the relation between the maximum energy of a signal and the noise that affects expressing this difference in decibels [33,35]. Given an 8-bit grayscale image f and a copy of the altered image g, both of size M|N, the PSNR between f and g is defined by: ð20Þ For Mean Square Error (MSE), the difference between pixels f ij and g ij is considered as an error that generates image quality loss. As MSE tends to zero, the value of PSNR approaches infinity. The higher the PSNR (f, g) values, the higher the image quality.
Structural Similarity Index. Structural Similarity Index (SSIM) is a particular implementation of the structural similarity philosophy [33], and it is considered correlated with the human visual system [36]. For two image signals x and y, comparison of three components: luminance, contrast, and structure, is necessary. These components are relatively independent because object structures in images neither depend on illumination nor contrast. The luminance is defined by the function in Equation 22, where m x is the standard deviation of x.
Then, the closeness of the contrast of the images is measured as shown in Equation 23, where s x is the variance of x.
c(x,y)~2 s x s y zC 2 s 2 x zs 2 y zC 2 The structure comparison is defined by Equation 24, where s xy is the covariance between x and y.
s(x,y)~s xy zC 3 s x s y zC 3 Finally, the three components are combined to get the overall similarity measure expressed in Equation 25, where the exponents a, b, and c are positive integers that define the importance of each component.
Software Implementation The fingerprinting system was implemented in C++ language using the GCC compiler version 4.2.1 with Ubuntu 12.10 as operative system, an Intel Core i5 processor at 2.7 GHz and 4 GB RAM. To perform the DCT and IDCT Transform, the FFTW library version 3.3.3 was used [34]. To read and write TIFF images, the libtiff version 3.6.1 was used [37], and for the JPEG image, the jpeglib version 8.0 was used [38]. For computing the PSNR and SSIM Index value the IQA library version 1.1.2 was used [39]. Finally, the source code is available upon request.  Fingerprinting System for Digital Documents

Conclusions
In this study, the collusion-resistant fingerprinting method proposed by Kuribayashi has been implemented in the context of restricted distribution of digital documents. The performance of the system was evaluated by defining permissible levels of distortion in terms of legibility of fingerprinted documents, and appropriated values of parameters have been determined to achieve higher colluder detection. Estimation of the required computing time for insertion and detection of fingerprints in digital documents was also carried out. For the fingerprinting scheme, it was shown that the energy assigned to users must be higher than that assigned to groups to have a higher colluder detection probability. Furthermore, an equation to determine the maximum fingerprint's length was proposed. The system could fully detect up to 270 of 300 colluders for lossless compressed digital documents. The detector performance after lossy compression remains competitive for real-work environments. The number of users available and the high-quality lossy compression robustness make the proposed system suitable for implementation in a production environment. Fingerprinting System for Digital Documents