An Efficient Robust Blind Watermarking Method Based on Convolution Neural Networks in Wavelet Transform Domain

Digital watermarking is one of the most widely used techniques for the protection of ownership rights of digital audio, images, and videos. One of the desirable properties of a digital watermarking scheme is its robustness against attacks aiming at removing or destroying the watermark from the host data. Different from the common watermarking techniques based on the spatial domain or transform domain, in this paper, a novel scheme of digital image blind watermarking based on the combination of the discrete wavelet transform (DWT) and the convolutional neural network (CNN) is proposed. Firstly, the host images are decomposed by the DWT with 4 levels and, then, the low frequency sub-bands of the first level and the high frequency sub-bands of the fourth level are used as the input data and the output target data to train the CNN model for embedding and extracting the watermark. Experimental results show that the proposed scheme has superior performance against common attacks of JPEG compression, mean and median filtering, salt and pepper noise, Gaussian noise, speckle noise, brightness modification, scaling, cropping, rotation, and shearing operations.


I. INTRODUCTION
The explosive growth of the Internet and the social networks has provided the increasing convenience for the transmission and sharing digital multimedia applications such as audio, images and videos. With the development of the advanced multimedia signal processing technologies, digital multimedia can be easily and simply acquired, copied and tampered. Thus, the issues related to multimedia information protection, copyright and content authentication have been of significant concerns [1]. With digital image data, there are extensive studies on how to prevent unauthorized users from illegally copying, and distributing, modifying the digital images [1], [2]. The digital watermarking techniques which embed hidden information (known as a watermark) to a host media to detect and trace copyright violations have attracted considerable interest from Manuscript  academia and industry [3]. Digital image watermarking can be found in various practical applications of copyright protection, image authentication, medical applications, tamper detection, digital fingerprinting [4]. The most important and desirable properties in applications of watermarking for protecting the owners' copyright are invisibility and robustness. Invisibility measures the changes in the quality of host images before and after watermarking. Robustness measures the ability that the embedded watermarks cannot be destroyed and removed by the signal processing operations. In general, there is a trade-off between invisibility and robustness [5]. Based on the specific applications, a watermarking technique can be appropriately chosen to obtain the desired properties.
Based on the domain in which the watermark is embedded, digital image watermarking techniques can typically be divided into different categories such as the spatial domain [6], [7], transform domain [8]- [10] or hybrid ones [11]. In spatial domain watermarking schemes, the watermarks are inserted in the host images in the spatial domain by modifying the gray level values of chosen pixels in images [7]. Although the spatial domain watermarking is simple to implement, it can be sensitive to common attacks such as JPEG compression, low-pass filtering, and the watermarks can be easily de-attached by using inverse operations [6], [12], [13]. Therefore, spatial domain watermarking techniques are not commonly used in many practical applications. Alternatively, to improve the robustness and imperceptibility, watermarking can be carried out in transform domains such as the fast Fourier transform (FFT) [14], discrete cosine transform (DCT) [15], discrete wavelet transform (DWT) [13], [16], DWT-DCT [17], [18]. The transform domain watermarking can offer better robustness against common attacks since watermark coefficients are spread over the host image.
More recently, to further enhance the imperceptibility of watermarked images and robustness of watermarks, artificial intelligence (AI) based methods in digital image watermarking have attracted great interests, see, for examples [19]- [24] and references therein. In [19], the authors introduced the blind watermarking scheme exploiting the back-propagation (BP) neural network (NN) in the DWT domain. The authors demonstrated that their algorithm offers imperceptibility and robustness to common attacks such as salt and pepper noise, median filtering, rotation, cropping and JPEG compression. Similarly, the authors in [20] studied on a blind watermarking algorithm using a feed-forward NN in the DWT domain. Their simulation results revealed the good performance of imperceptibility and robustness against common attacks as in [19]. The authors of [22] developed a watermarking technique by using the combination of fractal dimension, BP NN, Arnold transform, and multiwavelet transform to improve the security, imperceptibility and robustness. Likewise, references in [23], [24] described the watermarking scheme using DWT and BP NN with good performance in terms of invisibility and imperceptibility. Alternatively, the study in [21] introduced a non-blind watermarking scheme based on a learning-based auto-encoder convolutional neural network (CNN). The experimental results in [21] indicated that their CNN based watermarking outperforms the previously existing methods in terms of peak signal-to-noise ratio (PSNR) and normalized correlation (NC).
Motivated from the above studies, in this paper, we develop a blind watermarking scheme by using the CNN in the wavelet transform domain. In our watermarking algorithm, we decompose the host images into sub-bands by using the DWT. Then, the selected low frequency sub-bands are used as the inputs for the CNN while the high frequency sub-bands are used as the target outputs. We carry out the training for the CNN by using the set of images in the standard database. The trained CNN is employed for both embedding and extracting the watermarks. Different from the method in [23] which used the DWT decomposition with 2 levels, and employed the BP NN for watermark embedding and extraction, our proposed method invokes the DWT decomposition with 4 levels and applied the CNN for watermark embedding and extraction. To evaluate the performance of our scheme, we carry out extensive numerical experiments on various images and different attacks of non-geometric and geometric transforms. The numerical results demonstrate that our watermarking scheme offers superior performance in terms of robustness and invisibility as compared with the other existing methods. The main contributions of the paper can be summarized as follows:  We introduce the novel watermarking scheme by appropriately selecting the sub-bands in the DWT domain as the inputs and target outputs for training the CNN. Then, the trained CNN is used to embed and extract the watermarks.  We conducted extensive experiments to demonstrate the superiority of the proposed scheme over other typical existing methods in terms of invisibility and robustness in the various attacks. The rest of this paper is organized as follows. In Section II, we introduce the proposed watermarking scheme along with the details on the DWT, CNN. Then, the effectiveness of the proposed watermarking scheme which is evaluated by experimental results for various attacks is presented in Section III. Finally, the concluding remarks are given in Section IV.

II. PROPOSED WATERMARKING SCHEME
In this work, we propose a blind digital watermarking scheme for copyright protection based on the DWT and CNN. The host image is transformed into the DWT domain and, then, the image in the transform domain is used for training the CNN to embed and extract the watermark efficiently.

A. Discrete Wavelet Transform (DWT)
The DWT is a frequency domain method which is commonly used in image processing. The DWT is a means to effectively present the image into a multi-scale analysis with the lower computational cost. This method in image processing includes the decomposition of images into frequency channels of constant bandwidth [25]. The DWT has been applied in various image processing applications, for examples, noise reduction and image compression. In the DWT, a two-dimensional image is expressed in the DWT domain by applying the low-pass and high-pass filters to the rows and columns of the image and, then, the transformed image at level 1 is partitioned into four sub-bands, namely LL, LH, HL and HH where the first letter denotes the low (L) or high (H) pass filtering operations to the rows while the second letter refers to the filtering operations to the columns [25], [26]. The division of each sub-band can be repeatedly carried out until the required number of levels is obtained. The forward DWT is defined by [27]  [ ] 2 [2 ].
The inverse DWT can be computed by A two-dimensional image after four levels of the DWT decomposition is demonstrated in Fig. 1. Then, we select the high frequency sub-bands to embed the watermark since the human visual system is more sensitive to the LL sub-band which represents the low frequency component [28].

B. Convolutional Neural Network (CNN)
CNNs consist of one or more convolutional layers with pooling/subsampling steps followed by one or more fully connected layers (a multilayer neural network) as demonstrated in Fig. 2 [29]. An image is feed to the network as an input which goes through multiple convolutions, pooling layers and finally a fully connected layer to produce the outputs. CNNs have proven to be efficient in various applications such as image classification, object detection, and recognition since they are able to efficiently learn and represent the image by the limited number of parameters (kernels, biases, weights). In CNNs, the back propagation (BP) algorithm using the standard gradient descent method is commonly invoked for updating kernels, weights and biases in the layers [30], [31]. The following will present the BP algorithm in fully connected networks and, then, BP updated for convolutional and sup-sampling layers in a 2D CNN which will be applied for image watermarking.  Feedforward pass: Let W t and t b be a weight matrix and vector associated to layer t. Then, the output of layer t can be written as where (.) f is an activation function. In our experiments, we use tanh, mean and sigmoid functions. Denoting the value of output k for training sample n by n k y and the corresponding target output (desired output) by n k d , the error for the individual sample n is computed by where c is the number of the outputs. Back-propagation: Denoting the bias sensitivity at layer t as t  , the sensitivity of layer t in the BP process is defined by where stands for element-wise multiplication and the sensitivities for the output layer t = L is Accordingly, the partial derivative of each weight at layer t which is an outer product between the input and the sensitivity is given by Then, the neuron weights in layer t are updated by where  is a learning rate parameter [30].
Convolutional layer: One of the major building blocks of a CNN is the convolutional layer which performs a convolution operation on the input image with filters (kernels) to produce a feature map. By using numerous filters for convolutions, one can obtain various feature maps at the output of the convolution layer. Since the convolutional layer is followed by a subsampling layer, to calculate the sensitivities at layer t, we need up-sample on the sensitivity map of the sub-sampling layer. In sub-sampling layer, for each map j in the convolutional layer, gradient formula is given by up is an up-sampling operation and  is a multiplicative bias parameter.
Subsampling layer: A subsampling or pooling layer often follows a convolutional layer to down-sample the output of the convolutional layer. The major function of a subsampling layer is to reduce the number of parameters to be learned by CNNs. The outputs of the subsampling layer are the down-sampled version of the input maps. By denoting ow (.) dnas a subsampling operation, we have where b is an additive bias. Assume that the sub-sampling is followed by a full connection network, the BP algorithm can be used to obtain the sensitivity of the sub-sampling layer. First, we can calculate the gradient of convolution kernel by 1 1 Accordingly, the gradient for the additive bias is given by while the gradient for multiplicative bias  is computed by

C. Watermark Embedding Algorithm
Our proposed scheme based on the DWT and CNN is described as follows. The host gray images with size 512x 512 pixels are decomposed by using DWT2 to obtain 4 sub-bands LL, HL, LH, HH; then the sub-band LL is taken to analyze by using DWT2 to create 4 other sub-bands LL1, HL1, LH1, HH1; sub-band HH1 are decomposed by the DWT2 to obtain LL2, HL2, LH2, HH2; and then sub-band HH2 are decomposed again to obtain 4 sub-bands LL3, HL3, LH3, HH3 at the end. This resultant image after DWT is demonstrated in Fig. 1. Accordingly, the sub-band (256 × 256) at the low frequency LL is divided into non-overlapped blocks, namely I(x, y) ( where  is a system parameter and can be determined by the requirements of the users [22]. The step by step procedure for embedding a binary watermark is described in Algorithm 1. 3: The watermark is embedded by using the following loops: 4: for i=1 to 32 do 5: for j=1 to 32 do end if 11: end for 12: end for 13: Use the inverse DWT2 to obtain the watermarked image. 14: Output: watermarked image.

D. Watermark Extracting Algorithm
Watermark recovery is almost the inverse process of watermark embedding except that the training procedure and inverse transform are not required. The watermarked image is decomposed by using DWT2 to produce 4 sub-bands LL, HL, LH, HH; then sub-band LL is decomposed by using DWT2 to create 4 sub-band LL1, HL1, LH1, HH1; next sub-band HH1 is decomposed to obtain sub-bands LL2, HL2, LH2, HH2; and finally sub-band HH2 are decomposed by DWT2 to have 4 sub-bands LL3, HL3, LH3, HH3. Sub-band HH3 is selected as O(i, j). The sub-band (256 × 256) at the low frequency LL is divided into blocks namely I(x, y) ( '(i, j); Then, the extracted watermark is obtained by [22] 1 ( , ) The detailed procedure for extracting the watermark is given in Algorithm 2.

E. Performance Metrics for Evaluating Watermarking Schemes
To evaluate the performance of our watermarking scheme, we use performance metrics of the peak signal-to-noise ratio (PSNR), normalized correlation (NC) and structural similarity (SSIM). Let A and B be the original and modified images (e.g., host image and watermarked image, watermark image and extracted watermark image), respectively. Assume that the size of the image is M × N and (i, j) is the pixel at row i and column j of the image. To measure the quality of the watermarked image, we use the PSNR defined as 2 (19) It is obvious that the higher PSNR, the higher quality of the watermarked image. Alternatively, we can use other distortion measures such as NC and SSIM to measure the similarity of two images. The NC is defined as [3]  ( , ) ( , ) To compute the SSIM metric for two images of the same size, three components, namely luminance, contrast and structure are used. At each step, the local windows x and y of two images are used to find the local statistics and SSIM index. The range of SSIM is from 0 to 1, where 1 indicates that two images are identical. The local SSIM measure is the product of three components: luminance, contrast and structure given by [ crosscovariance between x and y; C1 and C2 are constants [32].

III. EXPERIMENT RESULTS
Our experiments are carried out to test the imperceptibility of the hidden watermark as well as the robustness of the proposed scheme against attacks. Additionally, the performance of our method is compared with those of the ones in [18] and [23]. Both experiments are conducted on the host grey images with the size of 512 × 512 pixels, 8 bits/pixel and the binary watermark with the size of 32 × 32 pixels. The images in the experiments are taken from standard image database at http://sipi.usc.edu/database of the University of Southern California. The same initial structure of CNN is used in two following experiments including an input layer with 64 nodes; a convolutional layer with the size of feature maps 15, size of kernels 3x3 and activation function 'tanh'; an average pooling layer with sub sampling factor 3, a fully connected layer with 150 nodes with the activation function 'tanh'; a fully connected layer with 1 node, activation function 'sigmoid'. The 8 × 8 pixel blocks of the images in the transform domain are used for training the CNN. The trained CNN is used to embed and extract the watermark. All of the experiments are conducted in MATLAB R2013a on an Intel Core i5-2450M CPU @ 2.50GHz personal computer with 4 GB (RAM).

A. Experiment 1: The Invisibility and Robustness Performance of the Proposed Scheme
The aim of this experiment is to evaluate the performance in terms of invisibility and the ability to extract exactly the watermark by training the CNN model for 8 different grayscale images in the conditions with and without attacks. 8 grayscale images of 512 × 512 pixels are shown in Fig.  4(a)- Fig. 4(h) while the 32 × 32 binary watermark is presented in Fig. 4(k). We decompose each host image into sub-bands by using DWT. Using the analysis scheme, for each sub-band LL and sub-band HH3 of an image we divide sub-band LL (size 256 × 256 pixels) into 8 × 8 non-overlapped blocks and assign each block with the corresponding pixel in sub-band HH3. The pairs of 8 × 8 blocks in sub-band LL and the pixel in sub-band HH3 are used as the inputs and the desired output to train the CNN.
Results of the performance in terms of invisibility and robustness for embedding watermark of the proposed scheme based on the CNN in DWT domain in conditions of no attacks are shown in Fig. 5 and Table I. Fig. 5 illustrates the watermarked images and their corresponding PSNRs between the original and watermarked versions. As can been seen, the PSNR values of all 8 images are greater than 42 dB. Note that the PSNR value greater than 30 dB for a processed image is acceptable to human eyes [33]. Thus, the proposed watermarking scheme offers high invisibility. Table I provides the performance metrics to measure the invisibility of the watermarked images and the robustness of the extracted watermarks. As observed from the results in Table  I, our algorithm achieves good performance in terms of PSNR, SSIM. With such high PSNR values, no visual artifacts can be noticed in the watermarked images. Additionally, the watermarks can be extracted from these 8 watermarked images with NC = 1. Next, we investigate the performance of our watermarking scheme under various kinds of attacks including both geometric and non-geometric attacks. For non-geometric attacks, we consider the following operations: mean filtering; median; noise: salt & pepper, Gaussian, speckle; JPEG compression; brightness increase/decrease. First, we investigate the impacts of noise on the invisibility of the watermarked images and the robustness of the watermark. Fig. 6 shows the watermarked images attacked by Gaussian noise with zero mean and variance 0.01, salt & pepper noise with density 0.01, and speckle noise with variance 0.01. The PSNR and SSIM of the watermarked image after being attacked by noise are given in Table II. The reduction of PSNR and SSIM of the watermarked image of Lena as compared to the results without attack in Table I reveals the impacts of noise attacks. However, the NC and SSIM values of the extracted watermark in Table II are acceptable, which demonstrates the robustness of the proposed watermarking scheme against noise attacks.   Next, we study the effects of the JPEG compression to the watermarked images and extracted watermark. JPEG is an image compression standard which is widely used. The compression ratio of the JPEG is associated with the quality factor between 0 and 100. When the quality factor is decreased, the image compression is improved, but the quality of the resulting image is significantly reduced. To evaluate the robustness of the watermark, the results of NC and SSIM for watermarks are shown in Table III for the different quality factors of JPEG compression on the watermarked images. As can be seen from Table III, the proposed method can perfectly extract the watermarks (with NC and SSIM about 1) while the quality factors are greater than 80. An average NC of 8 extracted watermarks is still equal to 0.6703 for the quality factor of 20. Furthermore, Fig.  7 indicates the NC of the extracted watermark on the images attacked by JPEG compression. When the quality factors are greater than 30, the NCs are greater than 0.8. The results in Fig. 7 show that the robustness of the proposed scheme is assured. For visual observation on the extracted watermarks, Fig. 8 shows the extracted watermarks in the conditions of being attacked by JPEG with different quality factors.  Next, we investigate the impacts of the median filtering, mean filtering and brightness varying operations on the watermarked images. Note that the filtering operations are often used in image processing and, thus, they can make the variations of the watermarks. In our experiments, a 3 × 3-pixel median filter, a 3 × 3 pixel mean filter, and increase/decrease 20% brightness are applied on the watermarked images. The NC and SSIM performance for each case of attacks is given in Table IV. As can be seen from Table IV, the watermark can be extracted perfectly under the changes of brightness 20%. On the other hand, almost all the NC values in Table IV are greater than 0.9. This result proves that the robustness of the proposed scheme is assured against the mean filtering, median filtering, brightness variation attacks. Now, we consider the geometric attacks as such as rotation, scaling, cropping and shearing operations. In the experiments we conduct the geometric attacks as follows: for scaling attacks, the watermarked images are scaled down by 70%, for cropping attacks, ¼ sizes of watermark images are cropped at the top left or the bottom right, for rotation attacks, the watermarked images are rotated by 180 0 or 45 0 . The watermarked images which are affected by these geometric attacks are illustrated in Fig. 9. We use the proposed CNN watermarking scheme for the attacked images to extract the watermark. The extracted watermarks are shown in Fig. 10 and the NC and SSIM performance metrics are given in Table V. From Fig. 10, we can see that watermarks can be fully recovered for all attacks in the experiments however the watermarks are transformed by the same geometric operations.

B. Experiment: Performance Comparison between the Proposed Scheme with the Previous Methods
To evaluate the performance of our proposed watermarking scheme, we compare the PSNR and NC performance of our method with the ones in [18] and [23]. It should be noted that the study in [18] is a typical method of watermarking in the transform domain of the DWT-DCT combination while the approach in [23] is to use DWT and BP NNs.
To compare the performance of the proposed scheme with one in [18], we use 8 gray scale images shown in Fig. 4 as the inputs and the same watermark in [18] as shown Fig. 11(a). First, the invisibility and the robustness of the proposed scheme in the condition of no attacks are determined. Without attacks, the PSNR in our method is measured to be 43.2999 dB as compared to 42.6950 dB in [18]. The NC of the extracted watermark is equal to 1 for both schemes. To compare the robustness of our scheme with the one in [18], the JPEG compression with quality=50, salt & pepper noise with density=0.001 and, Gaussian noise (mean=0, variance=0,002) are used. Fig. 11 shows the original watermark and extracted watermarks with and without attacks. All of the extracted watermarks are clearly observed. Table VI shows that the robustness under salt& pepper noise, Gaussian noise of our algorithm are superior to those in [18].  In order to compare the performance of the proposed scheme with those of the algorithm in [23], we use the test image "Mandrill" (512 × 512 pixels), and 1024 bits of a binary pattern watermark 32 × 32 as shown in Fig. 12(a) and (c). Fig. 12 shows the host image "Mandrill", watermarked image and extracted watermarks by our scheme for different attacks. The PSNR of the watermarked images (or attacked images) and NC of the extracted watermark of our proposed schemes and approach in [23] are listed in TABLE VII. With the comparable PSNR, our scheme offers the higher NC for all considered attacks. This means that our scheme outperforms the approach in [23] in terms of robustness against the attacks.  Comparison results are listed in Table VII. According to this table, the NC values of the extracted watermarks by our method are always higher than those values of the extracted watermarks by the method in [23] under common image processing operators. Thus, it is obvious that our approach has superior performance.

IV. CONCLUDING REMARKS
In this paper, we have proposed a new watermarking scheme based on the combination of two powerful signal processing tools: CNN and DWT. The DWT has been used to decompose the host images into different sub-bands. Then, the pixels in the selected low frequency and high frequency sub-bands have been used as the inputs and the desired outputs to train the CNN. The watermark embedding and extracting processes are performed by the trained CNN. The extensive experimental results have been conducted to measure the invisibility and robustness of the proposed watermarking scheme. By the numerical results, the proposed method has demonstrated its superior performance in terms of PSNR, NC, and SSIM in comparison with the other methods.