A NOVEL METHOD OF COMPRESSING SPEECH WITH HIGHER BANDWIDTH EFFICIENCY
This paper illustrates a novel method of speech compression and transmission. This method saves the transmission bandwidth required for the speech signal by a considerable amount. This scheme exploits the property of low pass nature of the speech signal. Also this method applies equally well for any signal, which is low pass in nature, speech being the more widely used in Real Time Communication, is highlighted here.
As per this method, the low pass signal (speech) at the transmitter is divided into set of packets, each containing, say N number of samples. Of the N samples per packet, only certain lesser number of samples, say aN alone are transmitted. Here a is less than unity, so compression is achieved. The N samples per packet are subjected to a N-Point DFT. Since low pass signals alone are considered here, the number of significant values in the set of DFT samples is very limited. Transmitting these significant samples alone would suffice for reliable transmission. The number of samples, which are transmitted, is determined by the parameter a.
The parameter a is almost independent of the source of the speech signal. In other methods of speech compression, the specific characteristics of the source such as pitch are important for the algorithm to work.
An exact reverse process at the receiver reconstructs the samples. At the receiver, the N-point IDFT of the received signal is performed after necessary zero padding. Zero padding is necessary because at the transmitter of the N samples only aN samples are transmitted, but at the receiver N samples are again needed to honestly reconstruct the signal.
Hence this method is efficient as only a portion of the total number of samples is transmitted thereby saving the bandwidth. Since the frequency samples are transmitted the phase information has also to be transmitted. Here again by exploiting the property of signals and their spectra that the PHASE INFORMATION CAN BE EMBEDDED WITHIN THE MAGNITUDE SPECTRUM by using simple mathematics without any heavy computations or by increasing the bandwidth.
Also the simulation result of this method shows that smaller the size of the packet the more faithful is the reproduction of received signal that is again an advantage as the computation time is reduced. The reduction in the computation time is due to the fact that the transmitter has to wait until N samples are obtained before starting the transmission. If N is small, the transmitter has to wait for a less duration of time and a smaller value of N achieves a better reconstruction at the receiver. Thus this scheme provides a more efficient method of speech compression and this scheme is also very easy to implement with the help of available high-speed processors.
A NOVEL METHOD OF
COMPRESSING SPEECH WITH
HIGHER BANDWIDTH EFFICIENCY
Today, rapid speech transmission has become critical in many applications. With more quality being demanded by the end-user, and an increase in bandwidth usage, the delivery of audio and allied applications on demand cannot be left behind.
In this paper, we wish to present a new algorithm for speech compression using the frequency domain approach.
The same method has also been used in the compression of static images also.
To transmit a speech signal digitally, we have a lot of schemes.
Â¢ Sampling the signal in time domain.(PCM,DPCM,ADPCM,DM)
Â¢ Dividing the signal into number of sub-bands and encoding them separately (Adaptive sub-band coding)
Â¢ Encoding information about how the speech signal was produced by the human vocal system (Vocoders, RELP, CELP, LPC)
We are trying to introduce another scheme that utilizes the properties of speech signals and transmits at a lower bit rate and reconstructs the signal back with less distortion.
PROPERTIES OF SPEECH SIGNALS:
Following are some of the basic properties of speech signals:
They are low pass in nature.
Their power spectrum approaches zero for zero frequency and reaches a peak in the neighborhood of few hundred Hertz.
Hearing mechanism is highly sensitive to frequency.
Human ear is insensitive to phase variations.
Frequency band from 300 to 3100 Hz is considered adequate for telephonic communication.
The above properties of the speech signal have enabled us to devise a new method of speech compression
A typical speech signal will look like,
Its corresponding spectrum would be,
Transmitting the spectrum of the signal instead of transmitting the original signal is far more efficient. This is because the energy of the speech signal above 4 kHz is negligible; we can very well compute the spectrum of the signal and transmit only the samples that correspond to 4 KHz of the spectrum irrespective of the sampling frequency.
By this type of transmission we can save the bandwidth required for transmission considerably. Also it is not necessary that we have to transmit all the samples corresponding to the 4 kHz frequency as it is sufficient to transmit a fraction of the samples without any degradation in the quality.
Since the spectrum is considered in the above method both the magnitude and phase information must be transmitted to reproduce the signal without any error. But this requires twice the actual bandwidth .This problem can be solved by exploiting the property of real and even signals. The spectrum of the samples is real and
evenliness is artificially introduced such that their spectra are also real and even. Thus by simple mathematics the complete phase information is embedded within the magnitude spectrum and it is needed only to send ËœaNâ„¢ samples instead of Ëœ2Nâ„¢samples of the spectra (Magnitude and phase).
By adopting all these procedures and embedding the phase information in the magnitude spectrum, a MATLAB simulation has been performed to determine the optimum value of Ëœaâ„¢ and â„¢Nâ„¢. The result of the simulation is also provided.
Divide the speech samples into a set of packets each of size ËœNâ„¢.
Compute the corresponding N-point DFT of each packet.
By signal processing, embed the phase information into the magnitude spectrum.
Select only ËœaNâ„¢ number of samples of each packet and transmit it.
Follow a similar reverse process at the receiver to reconstruct the signal. (After doing appropriate zero padding).
From the above algorithm it is seen that a proper choice of a and N is important.
The inverse Fourier transform of the actual signal is given by its spectral components as,
x [n] = 1/N*[S (X [k] * exp (-j*2*p*n*k/N))] (1) (N-point IDFT)
Since the phase information has to be embedded in the magnitude spectrum at the transmitter the processed spectrum would be,
xt [n]=1/(2*N)[SXt (k) exp (-j*2*p*n*k/(2*N))] (2)
where XT [n] has both x [n] and its mirror image.
Since ËœaNâ„¢ samples of the spectrum are transmitted at the receiver the even spectrum is formed by padding N-aN-1 zeros at the end and we have
The reconstructed signal is,
x [n] =(1/2*N)[X (0)+2*SX (k)cos (2*p*n*k/(2*N))] (3)
What do we require
We expect the value of a to be very low because to achieve maximum reduction in the number of samples to be transmitted
Â¢ We expect ËœNâ„¢ to be very low as it is an important factor in determining the speed of operation of the transmitter because at the transmitter the ËœNâ„¢ samples are fed to a processor, which computes the FFT of the samples. The time required for this operation would be O[logN].
Taking into account the above requirements and choosing a small but optimum value of Ëœaâ„¢ and ËœNâ„¢ the algorithm still gives a faithful reproduction of the signal without any complexity both at the transmitter and at the receiver.
How does it work
Simply speaking, the phase information is embedded with the magnitude of the frequency samples by transforming the frequency samples from complex to real one. This has an added advantage because for any low pass signal the frequency spectrum obtained by this method is found to roll off very rapidly compared to the ordinary spectrum.
Hence the total number of significant frequency samples obtained with this method is very less compared with the actual frequency spectrum samples of the signal. This helps us to effectively reduce the number of samples to be chosen thereby reducing the number of samples to be transmitted.
Thus we have to choose a relatively small number of frequency samples using this phase embedding method than the actual method to compute the signal spectrum, even though the signal is low pass in nature.
Assuming a pulse to be a low pass signal a MATLAB simulation has been performed to explain the method of compression.
The pulse is as shown besideÂ¦
For a = 0.2, we haveÂ¦
And for a = 0.5, we haveÂ¦
considering a speech signal and using the compression we have ,
The parameters chosen for simulation are:
a = .2 N=10
And, the compressed signal retrieved at the receiver is given asÂ¦
The autocorrelation of the actual signal isÂ¦
And, the cross correlation of the received and the actual signal isÂ¦
From the above simulations it is clear that even with lesser number of transmitted samples the signal is reproduced faithfully.
y=wavread('yok');%The voice signal to be transmitted
xlabel('TRANS.BANDWIDTH EXPRESSED AS FRACTION OF THE SAMPLING FREQUENCY');
ylabel('CORR.COEFF. OF THE TRANS. AND RECEIVED SIGNAL--->');
title('PLOT OF TRANS.BANDWIDTH (fraction of total bandwidth) vs CORR.COEFF.');
title('CROSS CORRELATION OF TRANSMITTEDSIGNAL AND RECEIVED SIGNAL');
title('AUTO CORRELATION OF THE RECEIVED SIGNAL');
) zeros(1,e) f(1,e+1:c-1)];
The above method is more advantageous because of reduction of transmission bandwidth.
Â¢ Since only ËœaNâ„¢ samples are transmitted, the minimum required Bandwidth (Nyquist band width) is reduced by a factor 1/a .
Â¢ Also since ËœNâ„¢ is less, this reduces the computation time of the FFT and hence the successive samples need not be queued in a buffer (memory) by making computing time (O [log (N)]) less than ËœN times sampling periodâ„¢. The computation of N-point DFT can be implemented with high-speed processors with very less time delay.
Â¢ This method does not require any computations with the adjacent samples to make any decision except to simply collect the samples and compute the Fourier transform. Because of this it can be implemented in real time without any time delay between adjacent packets.
Â¢ This method of speech compression is speaker independent. Hence it does not require any speaker model or the thereby psychoacoustic model of the ear to make any decision making the method very simple.
Given the conceptual ease of understanding and design, as well as the many advantages listed above, we are bound to conclude that the new algorithm lends itself to use directly. Also, the universal applicability (the same has been tried out successfully on static images) of the same makes it furthermore appealing.