**Post: #1**

[attachment=2719]

DIGITAL WATERMARKING ON AN AUDIO FILE THROUGH SIGNAL PROCESSING

Authors:

S. K. Moinuddin, M. Tipu Sait,

ECE Dept., ECE Dept.,

SJCET, SJCET,

Reg No: 06G31A04A7 Reg No: 06G31A0467

ST. JOHNS COLLEGE OF ENGINEERING & TECHNOLOGY

YERRAKOTA, YEMMIGANUR-518360,

KURNOOL DIST., A.P

Introduction:

Audio watermarks are special signals embedded into digital audio. These signals are extracted by detection mechanisms and decoded. Audio watermarking schemes rely on the imperfection of the human auditory system. However, human ear is much more sensitive than other sensory motors. Thus, good audio watermarking schemes are difficult to design (Kim etal. 2003). Even though the current watermarking techniques are far from perfect, during the last decade audio watermarking schemes have been applied widely. These schemes are sophisticated very much in terms of robustness and imperceptibility (Bender et al. 1996) (Cox et al. 2002) (Cox and Miller 2002). Robustness and imperceptibility are important requirements of watermarking, while they are conflicting each other.

In general there are two types of Watermarking methods: Blind Watermarking: Spread Spectrum, Quantization, Echo hiding, phase coding etc. Non-blind watermarking schemes are theoretically interesting, but not so useful in practical use, since it requires double storage capacity and double communication bandwidth for watermark detection. Of course, non-blind schemes may be useful as copyrightverification mechanism in a copyright dispute (and even necessary, see (Craver et al. 1998) or inversion attacks). On the other hand, blind watermarking

Scheme can detect and extract watermarkswithout use of the unwatermarked audio. Therefore, it requires only a half storage capacity and half bandwidth compared with the non-blind watermarking scheme. Hence,only blind audio watermarking schemes are considered in this chapter. Needless to say, the blind watermarking methods need selfdetection mechanisms for detecting watermarks without unwatermarked audio.

The present paper presents two techniques among five types of audiowatermarking: DC Watermarking and Spread Spectrum Watermarking. The details of this techniques is as described as below:

DC-Watermarking Scheme: This section details the implementation of a digital audio watermarking scheme, which can be used to hide auxiliary information within a sound file. Although this watermarking scheme is for instructional use as a tool for perceptual audio education, it provides an overview of techniques which are common to all digital audio watermarking schemes.

The DC watermarking scheme hides watermark data in lower frequency components of the audio signal, which are below the perceptual threshold of the human auditory system.

Watermark Insertion: The process of inserting a digital watermark into an audio file can be divided into four main processes (see Figure 8). A original audio file in wave format is fed into the system, where it is subsequently framed, analyzed, and processed, to attach the inaudible watermark to the output signal.

.

Fig: Watermark Insertion Process

Framing:

The audio file is portioned into frames which are 90 milliseconds in duration. This frame size is chosen so that the embedded watermark does not introduce any audible distortion into the file.

With a 90 ms frame size, our bit rate for watermarked data is equal to 1 / 0.09 = 11.1 bits per second.

Spectral Analysis: Subsequent to the framing of the unprocessed audio signal, we perform spectral analysis on the signal, consisting of a fast Fourier transform (FFT), which allows us to calculate the low frequency components of each frame, as well as the overall frame power. The FFT processing is accomplished in Matlab, using the following equation:

(N denotes the last frame in the audio

With a standard 16 bit CD quality audio file having a sampling rate, Fs = 44,100 samples per second, a frame consists of 3969 samples. If we perform a FFT on a frame of this size with N = 3969, we end up with a frequency resolution as follows:

From the FFT, we are now able to determine the low frequency (DC) component of the frame F(1), as well as the frame spectral power. To calculate the frame power, we use the sum of amplitude spectrum squared:

Figure 8: below shows an example of the above spectral analysis completed on the first eight frames of an audio file. The spectrum plot is restricted to frequencies from 0 to 4000 Hz for visibility.

Fig 8.Sample spectrum from first 8 signals frames.

DC Removal: From the above spectral analysis of each frame, we have calculated the low frequency (DC) component F (1), which can now be removed by subtraction from each frame using the following formula:

Watermark Signal Addition: From the spectral analysis completed previously, we calculated the spectral power for each frame, which is now Utilized for embedding the watermark signal data. The power in each frame determines the amplitude of the watermark which

can be added to the low frequency spectrum. The magnitude of the watermark is added according to the formula:

Where Ks is the scaling factor, which ensures the watermark is embedded below the audibility threshold, and w(n) represents the watermark signal data, which is binary, having a value of 1, or -1.

The f(n) function has now been watermarked with the above process, and is ready for storage, testing, and watermark extraction.

Watermark Extraction: The process of extracting the digital watermark from the audio file is similar to the technique for inserting the watermark. The computer processing requirements for extraction are slightly lower. A marked audio file in wave format is fed into the system, where it is subsequently framed, analyzed, and processed, to remove the embedded data which exists as a digital watermark.

Figure 9.Watermark Extraction Process.

Framing: As with the insertion process, the audio file is partitioned into frames which are 90 milliseconds in duration. With a 90 ms frame size, we expect an extracted watermark data rate equal to 11.1 bits per second.

Spectral Analysis: Subsequent to the framing of the watermarked audio signal, we perform spectral analysis on the signal, consisting of a fast Fourier transform (FFT), which again allows us to calculate the low frequency components of each frame, as well as the overall frame power. The FFT processing is accomplished in Matlab, using the previous FFT equation.

As before, with 16 bit CD quality audio, our frames consist of 3969 samples.

Watermark Signal Extraction: From the spectral analysis completed previously, we calculated the spectral power for each frame, which allows us to examine the low frequency power in each frame and subsequently extract the watermark, according to the following formula:

(N denotes the last frame in the audio file).

The extracted watermark signal, w(n), should be an exact replica of the original watermark, providing the original audio file has enough power per frame to embed information below the audible threshold, and above the quantization floor.

Limitations: This DC watermarking scheme has some major limitations with regards to robustness and data density. The robustness of the scheme can be increased somewhat with longer audio files, by inserting the watermark signal multiple times, which will aid in extraction, and also in error correction if the signal is manipulated. In order to attain higher hidden data density in the watermarked signal, more advanced techniques must be used such as spread spectrum, phase encoding, or echo hiding. The highest rate and most versatile and reliable watermarking scheme would consist of a combination of all of the above, allowing the software to capitalize on the strengths of each technique when processing the unmarked audio.

Spread-Spectrum Method: Spread-spectrum Watermarking scheme is an example of the correlation method which embeds pseudorandom sequence and detects watermark by calculating correlation between pseudo-random noise sequence and watermarked audio signal. Spreadspectrum scheme is the most popular scheme and has been studied well in literature (Boney et al. 1996)(Cox et al. 1996) (Cvejic et al. 2001) (Kirovski and Malvar 2001) (Kim 2000) (Lee and Ho 2000) (Seok et al. 2002) (Swanson et al. 1998). This method is easy to implement, but has some serious disadvantages: it requires time-consuming psycho-acoustic shaping to reduce audible noise, and susceptible to time-scale modification attack. (Of course, usage of psychoacoustic models is not limited to spread spectrum techniques.) Basic idea of this scheme and implementation techniques are described below.

Basic Idea:

This scheme spreads pseudo-random sequence across the audio signal . The wideband noise can be spread into either time-domain signal or transform-domain signal no matter what transform is used. Frequently used transforms include DCT (Discrete Cosine Transform), DFT (Discrete Fourier Transform), and DWT (Discrete Wavelet Transform). The binary watermark

message v = {0, 1} or its equivalent bipolar variable b = {-1, +1} is modulated by a pseudorandom sequence r(n) generated by means of a secret key. Then the modulated watermark w(n) = b*r(n) is scaled according to the required energy of the audio signal s(n). The scaling factor a controls the trade-off between robustness and inaudibility of the watermark. The modulated watermark w(n) is equal to either r(n) or -r(n) depending on whether v = 1

or v = 0. The modulated signal is then added to the original audio to produce the watermarked audio x(n) such as

x(n) = s(n) + aw(n).

The detection scheme uses linear correlation. Because the pseudo-random sequence r(n) is known and can be regenerated generated by means of a secret key, watermarks are detected by using correlation between x(n) and r(n) such as

c =1/N i=1N_Ni=1x(i)r(i), (3)

where N denotes the length of signal. Equation (3) yields the correlation sum of two components as follows:

c =1/N_Ni=1s(i)r(i) +

1/N_Ni=1abr2(i). (4)

Fig: Embedding process in Spread Spectrum technique.

Assume that the first term in Equation (4) is almost certain to have small magnitudes. If those two signals s(n) and r(n) are ndependent, the first term should vanish. However, it is not the case. Thus, the watermarked audio is preprocessed as is shown in Figure 4 in order to make such assumption valid. One possible solution is filtering out s(n) from x(n). Preprocessing

methods include high-pass filtering (Hartung and Girod 1998) (Haitsma et al. 2000), linear predictive coding (Seok et al. 2002), and filtering by whitening filter (Kim 2000).

Such preprocessing allows the second term in Equation (4) to have a much larger magnitude and the first term almost to be vanished. If the first term has similar or larger magnitude than the second term, detection result will be erroneous. Based on the hypothesis test using the correlation value c and the predefined threshold t, the detector outputs.

0 if c = t

Typical value of t is 0. The detection threshold has a direct effect both on the false positive and false negative probabilities. False positive means a type of error in which a detector incorrectly determines that a watermark is present in a unwatermarked audio. On the other hand, false negative is a type of error

in which a detector fails to detect a watermark in a watermarked audio.

Fig: Extraction Process in Spread Sprectrum technique.

Pseudo-Random Sequence:

Pseudo-random sequence has statistical properties similar to those of a truly random signal, but it can be exactly regenerated with knowledge of privileged information

(see Section 2.1). Good pseudo-random sequence has a good correlation property such that any two different sequences are almost mutually orthogonal. Thus, cross-correlation value between them is

very low, while auto-correlation value is moderately large. Most popular pseudo-random sequence is the maximum length sequence (also known as M-sequence). This sequence is a binary sequence r(n) = {0, 1} having the length N = 2m -1 where m is the size of the linear feedback shift register. This sequence has very nice auto-correlation and cross-correlation properties. If we map the binary sequence r(n) = {0, 1} into

bipolar sequence r(n) = {-1, +1}, auto-correlation of the M-sequence is given as follows:

1/N N_-1i=0r(i)r(i - k) =1 if k = 0

=-1/N otherwise

The M-sequences have two disadvantages. First, length of the M-sequences, which is called chip rate, is strictly limited to as given by 2m - 1. Thus, it is impossible to get, for example, nine-chip sequences. Length of the typical pseudo-random sequences is 1,023 (Cvejic et al. 2001) or 2,047. There is always a possibility to make the trade-off between the length of the pseudo-random sequence and robustness. However, very short sequences such as length 7 are also used (Liu et al. 2002). Second, the number of different M-sequences is also limited

once the size m is determined. It is shown that M-sequence is not secure in terms of cryptography. Thus, not all pseudo-random sequences are M sequences. Sometimes, non-binary and consequently real-valued pseudo-random sequence r(n) R with Gaussian distribution (Cox et al. 1996) is used. Non binary chaotic sequence (Bassia et al. 2001) is also

used. As long as they are non-binary, its correlation characteristic is very nice. However, since we have to use integer sequences (processed such as _ar(n)_) due to finite precision, correlation properties become less promising.

Fig: Waveform representation of Pseudo random Sequence.

Advantages:

Â¢ It is used in wars for sending an information secretly.

Â¢ It is used to eradicate the piracy.

Â¢ It is used in universities for revealing codes during examination.

Â¢ It is used in voice conferencing systems to indicate to others which party is currently speaking.

Â¢ It is perfectly imperceptible (canâ„¢t understand).

Applications:

Content Authentication

Broadcast monitoring

Content management

Copy control

Copyright protection

Transaction tracking

Future Scope: 1. The force currently driving development is intellectual property protection, via copy- prevention and detection systems.

2. Microsoft is currently developing new watermark technologies and is in the process of testing future operating systems equipped with DRM for all media types.

Conclusion:1. The digital watermark has great potential to be used as part of an overall system for managing IP rights, and can be used not only to signify the author of a particular audio file, but catalog the path a particular file takes if it is distributed in an unauthorized manner.

2.So, in this way the Intellectual Property Protection can be done.

References:1. Arfib, D., Keiler, F., and ZÃ‚Â¨oler, U. (2002), Timefrequency Processing, in DAFX: Digital Audio Effects.

2. Arnold, M. (2000),Audio watermarking: features, applications and algorithms, IEEE International Conferenc Multimedia and Expo,vol.2, pp. 1013- 1016.

3. Arnold, M. (2001), Audio Watermarking: Burying information in the data, Dr. Dobbâ„¢s Journal, vol. 11, pp. 21-28.

4. Google search

5. Nptel Website.