Extreme learning machine based optimal embedding location finder for image steganography.

In image steganography, determining the optimum location for embedding the secret message precisely with minimum distortion of the host medium remains a challenging issue. Yet, an effective approach for the selection of the best embedding location with least deformation is far from being achieved. To attain this goal, we propose a novel approach for image steganography with high-performance, where extreme learning machine (ELM) algorithm is modified to create a supervised mathematical model. This ELM is first trained on a part of an image or any host medium before being tested in the regression mode. This allowed us to choose the optimal location for embedding the message with best values of the predicted evaluation metrics. Contrast, homogeneity, and other texture features are used for training on a new metric. Furthermore, the developed ELM is exploited for counter over-fitting while training. The performance of the proposed steganography approach is evaluated by computing the correlation, structural similarity (SSIM) index, fusion matrices, and mean square error (MSE). The modified ELM is found to outperform the existing approaches in terms of imperceptibility. Excellent features of the experimental results demonstrate that the proposed steganographic approach is greatly proficient for preserving the visual information of an image. An improvement in the imperceptibility as much as 28% is achieved compared to the existing state of the art methods.


Introduction
Over the decades, the ever-escalating advancements of communication technology allowed the free transferring and sharing of confidential information over the complex internet network. This free sharing of sensitive information in the form of data files, and video/audio recordings a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 posed severe security threats. The preservation of users' privacy is repeatedly threatened by the highly sophisticated and deceptive phishing attacks. Thus, absolute protection of sensitive data communication from unauthorized accesses or attacks is demanded.
Presently, the secured communication is achieved via mathematical models assisted cryptographic and steganographic techniques. Ironically, cryptography being the encryption of a plain-text for generating the cipher-text does not obscure the data existence. It rather makes the data incomprehensible to protect the secret message from attacks or unauthorized access. For absolutely secured information communication, the limitations of cryptography are surmounted by introducing a new technique called steganography. However, most of the conventional steganographic techniques suffer from high computational loads when selecting the best location for concealing the message in the host medium with minimal deformation. This shortcoming can be overcome by introducing the neural network (NN) based steganographic technique, where the NN uses a distributed representation to store the learning knowledge. Thus, accessing the concealed data without knowing the topology of the NN appears practically infeasible [1]. Although some researchers prefer models with interpretability power such as explicit mathematical or statistical models or even heuristically encoded models such as fuzzy models, it has been proved that black box type of models when learning is feasible have more capability of capturing complicated knowledge and proving functionality in real world type of systems [2][3] [4]. Such black box models have dramatically proved high efficiency in the state of the art of speech recognition, visual object recognition and many other fields [5].
Using information-hiding protocols, the steganographic technique embeds the message into a cover medium to keep the hidden data from being detected. This cover medium may be an image, video, or audio file. Among various steganographic techniques image steganography (concealing data into an image) is most popular and widely used because it allows an easy exchange of vast amount of images via the internet [6]. On top, the image steganography assisted hidden data cannot be recognized through the visual inspection [7]. Lately, in the image steganography domain the heuristic searching optimization became attractive [5]. Despite much research achieving an efficient steganographic algorithm for finding the best embedding location with reduced computational time expenses remains challenging.
Depending on embedded locations, the image steganographic algorithms are categorized into spatial [8] [9] and frequency domain embedding. The later one is also called transformdomain embedding [10]- [13]. In the spatial domain, the least significant bit (LSB) based steganography [8] [9] is the most extensively used method [14], where the carrier or cover image LSB is applied to conceal the secret message. Conversely, in the least significant bit replacement (LSBR) based steganography, the hidden secret message can be uncovered by the existing steganalysis methods [15] [16]. Thus, it is weak against visual and statistical attacks. The least significant bit matching (LSBM) method also called ± embedding method provides better security than LSBR. However, it is incompatible for most of the model-preserving steganographic techniques [17]. Despite their high capacity the spatial-domain techniques are not robust against image-processing operations, noise attacks, lossy compression, and filtering. Furthermore, they offset the statistical properties of the image due to the sole usage of the BMP format.
As aforementioned, in frequency-domain steganography the secret data are concealed in the significant parts of the cover image. This domain is comprised of several transforms such as discrete cosine transforms (DCT), discrete wavelet transforms (DWT), and discrete Fourier transforms (DFT). These transforms are used as media for hiding a message into an image [18]. Although both DWT and DCT have relatively smaller capacities but the former one is superior in terms of robustness against image-processing operations, statistical and noise attacks as well as distortion [19]. Thus, the steganographic techniques in the frequencydomain possess better immunity to attacks than the one spatial-domain. The limitations involving the spatial-domain techniques are overcome using frequency-domain. Numerous researches are performed with DWT [10], [12]. The presence of rounding error in the inverse DFT make it disadvantageous for steganographic applications [20]. Table 1 presents a brief summary of embedding the secret information in spatial or frequency domain.
Some researchers have combined the spatial and frequency domains. The [21][22] introduced a framework for optimizing the adaptive distortion function to achieve minimal statistical detectability. The [23] improved the detection percentage and classified the images as stego or clean. Furthermore, spatial or frequency domain techniques are integrated with other techniques including artificial NN (ANN), genetic algorithm (GA), or both to attain enhanced steganographic performances. Spatial-domain based GAs are used [1], [24] to minimize the distortion and. GA and ANN are used [25] to accelerate the training speed. Frequency-domain ANN is used [26] to augment the embedding capacity. Spatial domain based ANN is utilized [27] to realize good approximation capacity, faster convergence, and a more stable performance surface. This type of ANN is also used [28] to increase the approximation capacity and minimize distortion.
The ANN is also used with steganography for message embedding [25], where the secret message is assumed to represent an image. This allowed the steganographer to change the message data freely provided the visual information is preserved. However, this assumption is not applied to the text messages. Meanwhile, ANN is also used for digital watermarking to authenticate the image [29], in which concealing a secret message is not required [30]. ANN is employed for the capacity maximization [28], steganographic content detection [31][32][33], identification of the embedded data in an image when applied to steganalysis or as a classifier and determination of the lower and upper bounds of embedding capacity [34]. Likewise, GAs are used in steganography for diverse purposes. GA is used to model the steganography problem [24] for search and optimization. Besides, for optimization with minimum distortion the GAs are utilized, where a stego image closer to the cover image is obtained [1], [35]. The [11] presents DCT with Markov as a detection and classifier for images. Table 2 summarizes different embedding techniques with combined spatial and frequency domains.
Lately, the learning ability of NNs is exploited to expand the optimization potential of conventional data-hiding techniques. In steganography, ANN is used either for the classification of the stego image or for the detection of the embedded data in an image. We intend to reduce the distortion in a stego image as much as possible by appropriately selecting the location in the image for messages embedment. Theoretically, an ELM demonstrates a good generalization performance and universal approximation at extremely fast learning speeds. It can be used for either classification or regression purposes [36]. Inspired by such notable advantages, we propose an ELM-based supervised mathematical model called Optimal Embedding Location Finder (OELF) for image steganography. In addition, a novel fusion metric (fusion1) is introduced for the training in the regression mode to realize the best performance metric for steganography. Another novel fusion metric (fusion2) is developed for evaluating the results. To the best of our knowledge, for the first time we use the machine learning to determine the best location with least sensitive area for embedding. This paper is organized as follows. Section 2 depicts the proposed OELF model. Section 3 highlights the detail mathematical background of steganography. Section 4 describes the proposed methodology. Section 5 explains the experimental results with various attributes. Section 6 concludes the paper with further outlook.

Optimal Embedding Location Finder (OELF) model
Most traditional steganographic methods embed the message into an image by ignoring the significance of the image's spatial features. Nevertheless, the identification of best embedding location is critically decided by the message homogeneity and other texture features [37] of the blocks. A location having least image distortion is considered to be the optimum one. To Table 2. The combined spatial and frequency domains with different embedding techniques for the existing state of the art methods.

Tomás Filler and Fridrich 2011
Frequency-DCT and Spatial Strong against many types of steganalysis High complexity

Tom Filler, Judas, and Fridrich 2011
Frequency-DCT and spatial The methods are not limited to binary embedding and allow the embedder to choose the amplitude of embedding changes dynamically based on the coverimage content.
Focus on payload aspects rather than embedding

Pathak and Selvakumar, 2014
Frequency-DCT and Spatial It is used as a classifier and embedding. This method omitted some features of images.

Iranpour and Rahmati 2014
Spatial and GA Enhancing the security by minimize the distortion.
Omitted the optimum number of blocks as well as their sizes.

El-Emam and AL-Zubidy 2013
Frequency GA and ANN Allowed the steganographer to change the message data freely provided the visual information is preserved.
Omits the text steganography.

Tsai et al. 2009
Frequency and ANN Augment the embedding capacity and supports truecolor secret image with size constraint on shares.
Hiding small image into large image.

Husien and Badi 2014
Spatial and ANN Good approximation capacity, faster convergence, and more stable performance surface.
Did not present numerical comparisons with other works.

Spatial-LSB and ANN
Increases the approximation capacity. PSNR and MSE are not satisfied and did not tested against image processing.

El-Alfy 2013
Spatial domain-PVD and ANN 99% rates of detection have been achieved. Applied only in transformed domain.

Pratt, Konda, and Chu 2008 Spatial-LSB, and ANN
It is especially challenging when the embedding rate is low, such as below 10 percent of all embedded data.
It is used as a steganalysis and not as embedding. Some error rates have been addressed in extracting the embedded data.

Nazeri and Kanan 2014 Spatial domain and GA
It is modeling the steganography problem as a search and optimization problem.
Did not tested against image processing or any statistical analysis attack.

Roy and Laha 2015
Spatial-LSB and GA High security and robustness. The image quality (PSNR) is not satisfied.
protect the embedding process from a steganalysis, any form of distortion in the image must be minimized after the payload is inserted. Furthermore, the cover image and stego image must be approximately identical both visually and statistically. The selected area and the embedding method are the primary factors that affect the distortion. Based on OELF model an ELM is proposed for finding the best embedding location. It is worth noting that ELM is beneficial due to its universal approximation capacity which allows rapid training with good overfitting avoidance than other classical NN based approaches [36]. Thus, a modified ELM is used to train a single-hidden-layer NN with a varying number of neurons. Appendix A provides a short depiction of ELM.

Background of steganography modeling
As mentioned earlier, OELF locates the most suitable window for embedding the secret message into the image without affecting its visual features. Initially, the image is partitioned into (8 × 8) block pixels and one bit of the message is inserted into each block. Depending on the message size, the image is then partitioned into overlapping square windows to embed the message. The features of contrast (C), energy (Enr), homogeneity (H), entropy (Ent), correlation (Corr), standard deviation (Std), and the mean (M) of each square window are calculated using: where i and j are the horizontal and vertical pixel coordinates, respectively, and p is the pixel value.
where N is the number of the window pixels.
After calculating the window features and embedding the message in the corresponding window, the resultant imperceptibility is represented using one of three metrics including correlation, MSE, and SSIM. The expression for MSE and SSIM yields: where N and M are the length and width of the image, respectively.
where μ x and μ y are the local mean, σ x and σ y are the standard deviation, σ xy is the cross-covariance, C 1 and C 2 are constants.

Methodology
The following subsections describe the detailed methodology including the input (host or cover image), the message to be embedded in the image, the output (stego image) and the evaluation metrics of imperceptibility.

Input and output determination
Two images such as Lena and Sails from the standard database are used to analyze the trends between the imperceptibility and the texture features of the image. Imperceptibility is measured in terms of correlation, MSE, and SSIM between two corresponding square windows for the host and stego images with respect to the extracted features. Figs 1-6 show the trends of the imperceptibility of the Lena and Sails images after the message is embedded into a square window regarding the corresponding texture features in this window. It is evident that all the features (contrast, energy, homogeneity, entropy, correlation, entropy, and Std) are strongly correlated. The occurrence of less variability in the imperceptibility correlation with respect to the set of features implies their equivalent usage in the machine learning model. Table 3 summarizes the trends of imperceptibility to texture feature. A detail analysis of such trends between the imperceptibility and the texture features of the image allowed us to determine the possible causality among them. Thus, the machine learning is designed with an optimized embedder or steganographer.

Model design
The following steps are adopted to develop the proposed model: 2. Determination of the number of blocks needed to embed the message according to the message bits' size m.
3. Determination of the minimum square window size (SWS) from the image that contains the required blocks. The SWS is calculated using:  The size of the data set is: where N and M are the length and width of the image, respectively, and SWS is the square window size. The texture feature extraction, metric calculation and embedding are performed for building the learning data set. It is customary to explain briefly the embedding and the feature extraction procedure. Wavelet transform based embedding. As aforementioned, the message must be embedded into its corresponding square window for each square window in the data set. The learning data are extracted from the raw data set using the embedding process and the calculation of the resultant visual imperceptibility metrics. To achieve this goal, the following steps are executed: 1. For message bit one, the value is quantized to the nearest even number with the index (8,8) in the corresponding block. Otherwise, it is quantized to the nearest odd number.

Data set preparation
2. The wavelet transform for each sub-block is computed by inverting the wavelet.
3. The wavelet is transformed to its corresponding spatial domain.
4. The embedding process is repeated until the final bit of the message is embedded.   5. For each square window, the corresponding visual metrics are calculated. These metrics include correlation, MSE, SSIM, and fusion1. The expression for fusion1 yields: Texture feature extraction. The texture features are extracted using the following steps: 1. The co-occurrence matrix is built for each square window whose sub-blocks are used for embedding the message bits.
2. The feature function (contrast, energy, homogeneity, entropy, correlation, and standard deviation) of the co-occurrence matrix is computed for each square window. where n is the number of square windows, f 1j , f 2j ,. . ., f 7j are the extracted features, y ij is the corresponding output metrics, and i = 1, 2, 3, 4 correspond to the Corr, MSE, SSIM, and fusion1, respectively. A neural network of ñ hidden neurons is built and trained on a part of X to predict y i . Furthermore, the training and the testing phases are validated using the RMSE before applying the ELM-based model. Now we turn our attention in determining the optimal training percentage and the optimum number of neuron.
RMSE for training and testing. The OELF being a supervised model the authentication of the training and testing phases are necessary. They play a decisive role in the proposed model. In the present case, OELF is trained to predict the visual imperceptibility metrics (Corr and SSIM) and the fusion1 metric. The RSMEs of the proposed OELF model for the training and testing phase are computed to evaluate its predictability performance. Table 4 summarizes the RSME values of the square window for each of the similarity metrics. The computed RSMEs for all the metrics in both the training phase and testing phase with different images are discerned to be approximately zero, indicating the suitability of the proposed model.
Developed ELM training. A number of issues need to be addressed when using ELM. First, an appropriate training-testing ratio has to be determined accurately to avoid over-fitting for using a high training percentage and under-fitting for using a low training percentage. Second, the ELM does not provide the user with the exact number of neurons to be selected for building the network structure. Moreover, the performance of the model depends on the accurate determination neurons number, where a large (small) number of neurons lead to over (under) fitting [38] [2].
The used data set is partitioned into 50% training and 50% testing. Next, the number of neurons is increased from 50 to 200 at a step of 5. In each case, the data set is partitioned into 80% for training and 20% for validation. Validation is performed on a part of the training data set because in the normal functioning mode of the system the testing data set is unavailable. The number of neurons in the hidden layer corresponding to the best validation accuracy is then selected. Once the optimal number of neurons is selected, the search for the best training-testing ratio is performed by assigning a fixed testing data set size. Allocation of fixed percentage of the data for testing is required to avoid the bias in the RSME with increasing testing data set. Afterward, the percentage of the training data is increased from 10% to 60% for validating each case using the validation part composed of 20% of the training data set. From the total data set, 50% is found to be best for training. Table 5 summarizes the training data set (%) dependent accuracy levels for the Lena, Sails and Baboon images. Figs 8-11 displays the training data set percentages dependent variation in the Corr, MSE, SSIM, and fusion1 values between the host and stego images (Lena, Sails, and Baboon).  Extreme learning machine based optimal embedding location finder for image steganography Extreme learning machine based optimal embedding location finder for image steganography Design and optimization of ELM 1. The data set is partitioned into 50% for training as well as validation and 50% for testing.
2. The ELM regression model is designed based on the training data set (Appendix A) which is partitioned into 80% for training and 20% for validation.
3. The ELM regression model is further used to predict the best square window in terms of the fusion2 metric.
4. The embedding process is performed to insert the secret message into the identified optimum square window for generating the stego image.
Using the ELM training the message is embedded into each square window and all visual imperceptibility metrics are determined via fusion2 metric given by:

Experiments and results
Experiments are conducted on Intel 1 Core™ i7-2670QM CPU @ 2.20 GHz 6 GB RAM computer with 64-bit operating system. The proposed OELF model is evaluated using 24 gray scale images of size (512 × 512) pixels. Total 5041 square windows are obtained, in which square windows of (232 × 232) are used. The message of size 100 bytes is utilized for embedment. Fig  13 illustrates the tested images before (left panel of each image) and after (right panel of each image) embedding. Table 6 enlists the RMSEs of the ELM prediction for the visual imperceptibility metrics of the host and stego images for 50% training data set.
The experimental results obtained using the proposed OELF model are compared (Table 7 and Fig 14) with the art-of-the existing methods [24], [39] in terms of the fusion2 metric. OELF approach is found to outperform the other methods [24], [39] in terms of imperceptivity and fusion2 measure which are nearly 28% and 114%, respectively. Thus, OELF is demonstrated to be a useful steganography technique for embedding text in images

Conclusion
Based on ELM, we proposed a novel OELF model to achieve high-performance image steganography. In this approach, a modified ELM algorithm is used to establish the supervised mathematical model for determining the optimum embedding image location with minimal distortion. The ELM is trained on an image part (or any host medium) and tested in the regression mode to select the best location for embedding the message. This allowed in achieving the best values of the predicted evaluation metrics. The training is performed based on a set of the extracted texture, statistical features, and their corresponding visual imperceptibility metrics using a part of the image. The trained model is further used for the performance optimization. The proposed model is demonstrated to outperform the existing state-of-the-art models. The excellent features of the results suggest that the present model may constitute a basis for the development of secured image steganography algorithm. It is worth to look at the robustness of the proposed method against various statistical attacks by incorporating a wider range of features. Also, it is good to further develop the model to have more degree of freedom in terms of the region finding by defining the region analytically instead of explicit geometrical definition (block region). Other worthy development is to create an index for ranking the solution based on Pareto efficiency.

Appendix A
For ELM training, the used data is combined with n arbitrary distinct square windows (x j , t j ) with j = 1,. . .n, x j = (x j1 , x ij ,. . ., x jn ) denotes the input vector and t j denotes the target. It is possible to model the standard Single Hidden Layer Feed Forward Network (SLFN) with an activation function g(x) and ñ hidden layer neurons via: where j = 1,. . .n, w i = (a i1 , a i2 ,. . ., a in ) T , b i is the threshold (biases) of the i th hidden node, and β i is the weight connecting the i th hidden node and the output. The above equation is compact form yields: where И = И(a 1 , a 2 ,. . ., a ñ , x 1 , x 2 ,. . ., . . .  Extreme learning machine based optimal embedding location finder for image steganography where И is called the hidden layer output matrix of the neural network and T is the target vector.
One can prove that if the activation function is differentiable then the required number of the hidden layer neurons is lower than the data size or ñ < n. The training of the neural network is achieved via the following steps: 1. Assigning random weights (w i ) and biases (b i ).
2. Calculating the hidden layer output matrix.
3. Computing the output weights (β) via: where И T is the Moore-Penrose generalized inverse of hidden layer output matrix. Extreme learning machine based optimal embedding location finder for image steganography